Probability Theory: A Concise Course

Probability Theory: A Concise Course

by Y. A. Rozanov
     
 

View All Available Formats & Editions

This book, a concise introduction to modern probability theory and certain of its ramifications, deals with a subject indispensable to natural scientists and mathematicians alike. Here the readers, with some knowledge of mathematics, will find an excellent treatment of the elements of probability together with numerous applications. Professor Y. A. Rozanov, an

Overview

This book, a concise introduction to modern probability theory and certain of its ramifications, deals with a subject indispensable to natural scientists and mathematicians alike. Here the readers, with some knowledge of mathematics, will find an excellent treatment of the elements of probability together with numerous applications. Professor Y. A. Rozanov, an internationally known mathematician whose work in probability theory and stochastic processes has received wide acclaim, combines succinctness of style with a judicious selection of topics. His book is highly readable, fast-moving, and self-contained.
The author begins with basic concepts and moves on to combination of events, dependent events and random variables. He then covers Bernoulli trials and the De Moivre-Laplace theorem, which involve three important probability distributions (binomial, Poisson, and normal or Gaussian). The last three chapters are devoted to limit theorems, a detailed treatment of Markov chains, continuous Markov processes. Also included are appendixes on information theory, game theory, branching processes, and problems of optimal control. Each of the eight chapters and four appendixes has been equipped with numerous relevant problems (150 of them), many with hints and answers.
This volume is another in the popular series of fine translations from the Russian by Richard A. Silverman. Dr. Silverman, a former member of the Courant Institute of Mathematical Sciences of New York University and the Lincoln Laboratory of the Massachusetts Institute of Technology, is himself the author of numerous papers on applied probability theory. He has heavily revised the English edition and added new material. The clear exposition, the ample illustrations and problems, the cross-references, index, and bibliography make this book useful for self-study or the classroom.

Product Details

ISBN-13:
9780486321141
Publisher:
Dover Publications
Publication date:
05/27/2013
Series:
Dover Books on Mathematics
Sold by:
Barnes & Noble
Format:
NOOK Book
Pages:
148
File size:
10 MB

Related Subjects

Read an Excerpt

Probability Theory

A Concise Course


By Y. A. Rozanov, Richard A. Silverman

Dover Publications, Inc.

Copyright © 1969 Richard A. Silverman
All rights reserved.
ISBN: 978-0-486-32114-1



CHAPTER 1

BASIC CONCEPTS


1. Probability and Relative Frequency

Consider the simple experiment of tossing an unbiased coin. This experiment has two mutually exclusive outcomes, namely "heads" and "tails." The various factors influencing the outcome of the experiment are too numerous to take into account, at least if the coin tossing is "fair." Therefore the outcome of the experiment is said to be "random." Everyone would certainly agree that the "probability of getting heads" and the "probability of getting tails" both equal 1/2. Intuitively, this answer is based on the idea that the two outcomes are "equally likely" or "equiprobable," because of the very nature of the experiment. But hardly anyone will bother at this point to clarify just what he means by "probability."

Continuing in this vein and taking these ideas at face value, consider an experiment with a finite number of mutually exclusive outcomes which are equiprobable, i.e., "equally likely because of the nature of the experiment." Let A denote some event associated with the possible outcomes of the experiment. Then the probability P(A) of the event A is defined as the fraction of the outcomes in which A occurs. More exactly,

P(A) = N(A)/N, (1.1)

where N is the total number of outcomes of the experiment and N(A) is the number of outcomes leading to the occurrence of the event A.

Example 1. In tossing a well-balanced coin, there are N = 2 mutually exclusive equiprobable outcomes ("heads" and "tails"). Let A be either of these two outcomes. Then N(A) = 1, and hence

P(A) = 1/2.

Example 2. In throwing a single unbiased die, there are N = 6 mutually exclusive equiprobable outcomes, namely getting a number of spots equal to each of the numbers 1 through 6. Let A be the event consisting of getting an even number of spots. Then there are N(A) = 3 outcomes leading to the occurrence of A (which ones?), and hence

P(A) = 3/6 = 1/2

Example 3. In throwing a pair of dice, there are N = 36 mutually exclusive equiprobable events, each represented by an ordered pair (a,b), where a is the number of spots showing on the first die and b the number showing on the second die. Let A be the event that both dice show the same number of spots. Then A occurs whenever a = b, i.e., n(A) = 6. Therefore

P(A) = 6/36 = 1/6.

Remark. Despite its seeming simplicity, formula (1.1) can lead to nontrivial calculations. In fact, before using (1.1) in a given problem, we must find all the equiprobable outcomes, and then identify all those leading to the occurrence of the event A in question.

The accumulated experience of innumerable observations reveals a remarkable regularity of behavior, allowing us to assign a precise meaning to the concept of probability not only in the case of experiments with equiprobable outcomes, but also in the most general case. Suppose the experiment under consideration can be repeated any number of times, so that, in principle at least, we can produce a whole series of "independent trials under identical conditions," in each of which, depending on chance, a particular event A of interest either occurs or does not occur. Let n be the total number of experiments in the whole series of trials, and let N(A) be the number of experiments in which A occurs. Then the ratio

n(A)/n

is called the relative frequency of the event A (in the given series of trials). It turns out that the relative frequencies n(A)/n observed in different series of trials are virtually the same for large n, clustering about some constant

P(A) ~ n(A)/n, (1.2)

called the probability of the event A. More exactly, (1.2) means that

[MATHEMATICAL EXPRESSION OMITTED] (1.3)

Roughly speaking, the probability P(A) of the event A equals the fraction of experiments leading to the occurrence of A in a large series of trials.

Example] 4. Table 1 shows the results of a series of 10,000 coin tosses, grouped into 100 different series of n = 100 tosses each. In every case, the table shows the number of tosses N(A) leading to the occurrence of a head. It is clear that the relative frequency of occurrence of "heads" in each set of 100 tosses differs only slightly from the probability P(A) = 1/2 found in Example 1. Note that the relative frequency of occurrence of "heads" is even closer to if we group the tosses in series of 1000 tosses each.

Example 5 (De Méré's paradox). As a result of extensive observation of dice games, the French gambler de Méré noticed that the total number of spots showing on three dice thrown simultaneously turns out to be 11 (the event A more often than it turns out to be 12 (the event A2), although from his point of view both events should occur equally often. De Méré reasoned as follows: A occurs in just six ways (6:4:1, 6:3:2, 5:5:1, 5:4:2, 5:3:3, 4:4:3), and A2 also occurs in just six ways (6:5:1, 6:4:2, 6:3:3, 5:5:2, 5:4:3, 4:4:4). Therefore A1 and A2 have the same probability P(A1) = P(A2).

The fallacy in this argument was found by Pascal, who showed that the outcomes listed by de Méré are not actually equiprobable. In fact, one must take account not only of the numbers of spots showing on the dice, but also of the particular dice on which the spots appear. For example, numbering the dice and writing the number of spots in the corresponding order, we find that there are six distinct outcomes leading to the combination 6:4:1, namely (6, 4, 1), (6, 1, 4), (4, 6, 1), (4, 1, 6), (1, 6, 4) and (1, 4, 6), whereas there is only one outcome leading to the combination 4:4:4, namely (4, 4, 4). The appropriate equiprobable outcomes are those described by triples of numbers (a,b,c), where a is the number of spots on the first die, b the number of spots on the second die, and c the number of spots on the third die. It is easy to see that there are then precisely N = 63 = 216 equiprobable outcomes. Of these, N(A)1 = 27 are favorable to the event A1 (in which the sum of all the spots equals 11), but only N(A)2 = 25 are favorable to the event A (in which the sum of all the spots equals 12). This fact explains the tendency observed by de Méré for 11 spots to appear more often than 12.


2. Rudiments of Combinatorial Analysis

Combinatorial formulas are of great use in calculating probabilities. We now derive the most important of these formulas.

Theorem 1.1.Given n1 elements [MATHEMATICAL EXPRESSION OMITTED] and n2elements [MATHEMATICAL EXPRESSION OMITTED], there are precisely n1n2distinct ordered pairs (ai bj) containing one element of each kind.

Proof. Represent the elements of the first kind by points of the x-axis, and those of the second kind by points of the y-axis. Then the possible pairs (ai,bj) are points of a rectangular lattice in the xy-plane, as shown in Figure 1. The fact that there are just n1n2 such pairs is obvious from the figure.

More generally, we have


Theorem 1.2.Given n1elements [MATHEMATICAL EXPRESSION OMITTED] elements [MATHEMATICAL EXPRESSION OMITTED] elements x1, x2, ..., xnr, there are precisely n1n2 ... nr distinct ordered r-tuples [MATHEMATICAL EXPRESSION OMITTED] containing one element of each kind.

Proof. For r = 2, the theorem reduces to Theorem 1.1. Suppose the theorem holds for r – 1, so that in particular there are precisely n2 ... nr (r – 1)-tuples ([MATHEMATICAL EXPRESSION OMITTED] containing one element of each kind. Then, regarding the (r - 1)-tuples as elements of a new kind, we note that each r-tuple [MATHEMATICAL EXPRESSION OMITTED] can be regarded as made up of a (r - 1)-tuple [MATHEMATICAL EXPRESSION OMITTED] and an element ai1. Hence, by Theorem 1.1, there are precisely

n1 (n2 ... nr) = n1n2 ... nr

r-tuples containing one element of each kind. The theorem now follows for all r by mathematical induction.


Example 1. What is the probability of getting three sixes in a throw of three dice?

Solution. Let a be the number of spots on the first die, b the number of spots on the second die, and c the number of spots on the third die. Then the result of throwing the dice is described by an ordered triple (a, b, c), where each element takes values from 1 to 6. Hence, by Theorem 1.2 with r = 3 and n1 = n2 = n3 = 6, there are precisely N = 63 = 216 equiprobable outcomes of throwing three dice (this fact was anticipated in Example 5, p. 3). Three sixes can occur in only one way, i.e., when a = b = c = 6. Therefore the probability of getting three sixes is 1/216.

Example 2 (Sampling with replacement). Suppose we choose r objects in succession from a "population" (i.e., set) of n distinct objects a,1a,2 ..., an, in such a way that after choosing each object and recording the choice, we return the object to the population before making the next choice. This gives an "ordered sample" of the form

(ai1, ai2, ..., ai). (1.4)

Setting n1 = n2 = ... = nr = n in Theorem 1.2, we find that there are precisely

N = n' (1.5)

distinct ordered samples of the form (1.4).

Example 3 (Sampling without replacement). Next suppose we choose r objects in succession from a population of n distinct objects a1, a2, ..., an, in such a way that an object once chosen is removed from the population. Then we again get an ordered sample of the form (1.4), but now there are n - 1 objects left after the first choice, n - 2 objects left after the second choice, and so on. Clearly this corresponds to setting n1 = n, n2 = n - 1, ..., nr = n - r + 1 in Theorem 1.2. Hence, instead of nr distinct samples as in the case of sampling with replacement, there are now only

N = n(n - 1) ··· (n - r + 1) (1.6)

distinct samples. If r = n, then (1.6) reduces to

N = n(n - 1) ··· 2 · 1 = n!, (1.7)

the total number of permutations of n objects.

Example 4. Suppose we place r distinguishable objects into n different "cells" (r < n), with no cell allowed to contain more than one object. Numbering both the objects and the cells, let i1 be the number of the cell into which the first object is placed, i2 the number of the cell into which the second object is placed, and so on. Then the arrangement of the objects in the cells is described by an ordered r-tuple (i1, i2, ..., ir). Clearly, there are n1 = n empty cells originally, n2 = n - 1 empty cells after one cell has been occupied, n3 = n - 2 empty cells after two cells have been occupied, and so on. Hence, the total number of distinct arrangements of the objects in the cells is again given by formula (1.6).

Example 5. A subway train made up of n cars is boarded by r passengers (r< n), each entering a car completely at random. What is the probability of the passengers all ending up in different cars?

Solution. By hypothesis, every car has the same probability of being entered by a given passenger. Numbering both the passengers and the cars, let i1 be the number of the car entered by the first passenger, i2 the number of the car entered by the second passenger, and so on. Then the arrangement of the passengers in the cars is described by an ordered r-tuple (i1, i2, ..., ir), where each of the numbers i1, i2, ..., ir can range from 1 to n. This is equivalent to sampling with replacement, and hence, by Example 2, there are

N = nr

distinct equiprobable arrangements of the passengers in the cars. Let A be the event that "no more than one passenger enters any car." Then A occurs if and only if all the numbers i1, i2, ..., ir are distinct. In other words, if A is to occur, the first passenger can enter one of n cars, but the second passenger can only enter one of n – 1 cars, the third passenger one of n - 2 cars, and so on. This is equivalent to sampling without replacement, and hence, by Example 3, there are

N(A) = n(n - 1) ... (n - r + 1)

arrangements of passengers in the cars leading to the occurrence of A. Therefore, by (1.1), the probability of A occurring, i.e., of the passengers all ending up in different cars, is just

P(A) = n(n - 1) ··· (n - r + 1)/nr.

Any set of r elements chosen from a population of n elements, without regard for order, is called a subpopulation of size r of the original population. The number of such subpopulations is given by


Theorem 1.3.A population of n elements has precisely

[MATHEMATICAL EXPRESSION OMITTED] (1.8)

subpopulations of size r< n.

Proof. If order mattered, then the elements of each subpopulation could be arranged in r! distinct ways (recall Example 3). Hence there are r! times more "ordered samples" of r elements than subpopulations of size r. But there are precisely n(n - 1) ... (n - r + 1) such ordered samples (by Example 3 again), and hence just

n(n - 1) ··· (n - r + 1)/r! = n!/r!(n - r)!

subpopulations of size r.

Remark. An expression of the form (1.8) is called a binomial coefficient, often denoted by

[MATHEMATICAL EXPRESSION OMITTED]

instead of [MATHEMATICAL EXPRESSION OMITTED]. The number [MATHEMATICAL EXPRESSION OMITTED] is sometimes called the number of combinations of n things taken r at a time (without regard for order).

The natural generalization of Theorem 1.3 is given by

Theorem 1.4.Given a population of n elements,let n1, n2, ..., nkbe positive integers such that

n1 + n2 + ... + nk = n.

Then there are precisely

N = n!/n1!n2! ··· nk! (1.9)

ways of partitioning the population into k subpopulations, of sizes n1, n2, ..., nk respectively.

Proof. The order of the subpopulations matters in the sense that n1 = 2, n2 = 4, n3, ..., nk and n1 = 4, n2 = 2, n2 ..., nk (say) represent different partitions, but the order of elements within the subpopulations themselves is irrelevant. The partitioning can be effected in stages, as follows: First we form a group of n1 elements from the original population. This can be done in

[MATHEMATICAL EXPRESSION OMITTED]

ways. Then we form a group of n2 elements from the remaining n - n1 elements. This can be done in

[MATHEMATICAL EXPRESSION OMITTED]

ways. Proceeding in this fashion, we are left with n - n1 - ··· - nk-2 = nk-1 + nk elements after k - 2 stages. These elements can be partitioned into two groups, one containing nk-1 elements and the other nk elements, in

[MATHEMATICAL EXPRESSION OMITTED]

ways. Hence, by Theorem 1.2, there are

[MATHEMATICAL EXPRESSION OMITTED]

distinct ways of partitioning the given population into the indicated k subpopulations. But

[MATHEMATICAL EXPRESSION OMITTED]

in keeping with (1.9).

Remark. Theorem 1.4 reduces to Theorem 1.3 if

k = 2, n1 = r, n2 = n - r.

The numbers (1.9) are called multinomial coefficients, and generalize the binomial coefficients (1.8).


(Continues...)

Excerpted from Probability Theory by Y. A. Rozanov, Richard A. Silverman. Copyright © 1969 Richard A. Silverman. Excerpted by permission of Dover Publications, Inc..
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Meet the Author

Customer Reviews

Average Review:

Write a Review

and post it to your social network

     

Most Helpful Customer Reviews

See all customer reviews >