## Read an Excerpt

#### Probability Theory

#### A Concise Course

**By Y. A. Rozanov, Richard A. Silverman**

**Dover Publications, Inc.**

**Copyright © 1969 Richard A. Silverman**

All rights reserved.

ISBN: 978-0-486-32114-1

All rights reserved.

ISBN: 978-0-486-32114-1

CHAPTER 1

**BASIC CONCEPTS**

**1. Probability and Relative Frequency**

Consider the simple experiment of tossing an unbiased coin. This experiment has two mutually exclusive outcomes, namely "heads" and "tails." The various factors influencing the outcome of the experiment are too numerous to take into account, at least if the coin tossing is "fair." Therefore the outcome of the experiment is said to be "random." Everyone would certainly agree that the "probability of getting heads" and the "probability of getting tails" both equal 1/2. Intuitively, this answer is based on the idea that the two outcomes are "equally likely" or "equiprobable," because of the very nature of the experiment. But hardly anyone will bother at this point to clarify just what he means by "probability."

Continuing in this vein and taking these ideas at face value, consider an experiment with a finite number of mutually exclusive outcomes which are *equiprobable,* i.e., "equally likely because of the nature of the experiment." Let *A* denote some event associated with the possible outcomes of the experiment. Then the *probability* P(*A*) of the event *A* is defined as the fraction of the outcomes in which *A* occurs. More exactly,

P(A) = N(A)/N, (1.1)

where *N* is the total number of outcomes of the experiment and *N(A) * is the number of outcomes leading to the occurrence of the event *A.*

** Example 1.** In tossing a well-balanced coin, there are

*N*= 2 mutually exclusive equiprobable outcomes ("heads" and "tails"). Let

*A*be either of these two outcomes. Then

*N(A)*= 1, and hence

P(A) = 1/2.

** Example 2.** In throwing a single unbiased die, there are

*N*= 6 mutually exclusive equiprobable outcomes, namely getting a number of spots equal to each of the numbers 1 through 6. Let

*A*be the event consisting of getting an even number of spots. Then there are

*N(A)*= 3 outcomes leading to the occurrence of

*A*(which ones?), and hence

P(A) = 3/6 = 1/2

** Example 3.** In throwing a pair of dice, there are

*N*= 36 mutually exclusive equiprobable events, each represented by an ordered pair (

*a,*

*b*), where

*a*is the number of spots showing on the first die and

*b*the number showing on the second die. Let

*A*be the event that both dice show the same number of spots. Then

*A*occurs whenever

*a = b,*i.e.,

*n(A)*= 6. Therefore

P(A) = 6/36 = 1/6.

** Remark.** Despite its seeming simplicity, formula (1.1) can lead to nontrivial calculations. In fact, before using (1.1) in a given problem, we must find all the equiprobable outcomes, and then identify all those leading to the occurrence of the event

*A*in question.

The accumulated experience of innumerable observations reveals a remarkable regularity of behavior, allowing us to assign a precise meaning to the concept of probability not only in the case of experiments with equiprobable outcomes, but also in the most general case. Suppose the experiment under consideration can be repeated any number of times, so that, in principle at least, we can produce a whole series of "independent trials under identical conditions," in each of which, depending on chance, a particular event *A* of interest either occurs or does not occur. Let *n* be the total number of experiments in the whole series of trials, and let *N(A)* be the number of experiments in which *A* occurs. Then the ratio

n(A)/n

is called the *relative frequency* of the event *A* (in the given series of trials). It turns out that the relative frequencies *n(A)/n* observed in different series of trials are virtually the same for large *n,* clustering about some constant

P(A) ~ n(A)/n, (1.2)

called the *probability* of the event *A.* More exactly, (1.2) means that

[MATHEMATICAL EXPRESSION OMITTED] (1.3)

Roughly speaking, the probability P(*A*) of the event *A* equals the fraction of experiments leading to the occurrence of *A* in a large series of trials.

** Example] 4.** Table 1 shows the results of a series of 10,000 coin tosses, grouped into 100 different series of

*n*= 100 tosses each. In every case, the table shows the number of tosses

*N(A)*leading to the occurrence of a head. It is clear that the relative frequency of occurrence of "heads" in each set of 100 tosses differs only slightly from the probability P(

*A*) = 1/2 found in Example 1. Note that the relative frequency of occurrence of "heads" is even closer to if we group the tosses in series of 1000 tosses each.

** Example 5 (De Méré's paradox).** As a result of extensive observation of dice games, the French gambler de Méré noticed that the total number of spots showing on three dice thrown simultaneously turns out to be 11 (the event

*A*more often than it turns out to be 12 (the event

*A*2), although from his point of view both events should occur equally often. De Méré reasoned as follows:

*A*occurs in just six ways (6:4:1, 6:3:2, 5:5:1, 5:4:2, 5:3:3, 4:4:3), and

*A*2 also occurs in just six ways (6:5:1, 6:4:2, 6:3:3, 5:5:2, 5:4:3, 4:4:4). Therefore

*A*1 and

*A*2 have the same probability P(

*A*1) = P(

*A*2).

The fallacy in this argument was found by Pascal, who showed that the outcomes listed by de Méré are not actually equiprobable. In fact, one must take account not only of the numbers of spots showing on the dice, but also of the particular dice on which the spots appear. For example, numbering the dice and writing the number of spots in the corresponding order, we find that there are six distinct outcomes leading to the combination 6:4:1, namely (6, 4, 1), (6, 1, 4), (4, 6, 1), (4, 1, 6), (1, 6, 4) and (1, 4, 6), whereas there is only one outcome leading to the combination 4:4:4, namely (4, 4, 4). The appropriate equiprobable outcomes are those described by triples of numbers (*a,**b,**c*), where *a* is the number of spots on the first die, *b* the number of spots on the second die, and *c* the number of spots on the third die. It is easy to see that there are then precisely *N* = 63 = 216 equiprobable outcomes. Of these, *N(A)*1 = 27 are favorable to the event *A*1 (in which the sum of all the spots equals 11), but only *N(A)*2 = 25 are favorable to the event *A* (in which the sum of all the spots equals 12). This fact explains the tendency observed by de Méré for 11 spots to appear more often than 12.

**2. Rudiments of Combinatorial Analysis**

Combinatorial formulas are of great use in calculating probabilities. We now derive the most important of these formulas.

**Theorem 1.1.***Given n1 elements [MATHEMATICAL EXPRESSION OMITTED] and n2elements [MATHEMATICAL EXPRESSION OMITTED], there are precisely n*1

*n*2

*distinct ordered pairs*(

*ai bj) containing one element of each kind.*

*Proof.* Represent the elements of the first kind by points of the *x*-axis, and those of the second kind by points of the *y*-axis. Then the possible pairs (*ai,bj*) are points of a rectangular lattice in the *xy*-plane, as shown in Figure 1. The fact that there are just *n*1*n*2 such pairs is obvious from the figure.

More generally, we have

**Theorem 1.2.***Given n*1*elements* [MATHEMATICAL EXPRESSION OMITTED] *elements* [MATHEMATICAL EXPRESSION OMITTED] *elements x*1, *x*2, ..., *xnr, there are precisely n*1*n*2 ... *nr distinct ordered r-tuples* [MATHEMATICAL EXPRESSION OMITTED] *containing one element of each kind.*

*Proof.* For *r* = 2, the theorem reduces to Theorem 1.1. Suppose the theorem holds for *r* – 1, so that in particular there are precisely *n*2 ... *nr* (*r* – 1)-tuples ([MATHEMATICAL EXPRESSION OMITTED] containing one element of each kind. Then, regarding the (*r* - 1)-tuples as elements of a new kind, we note that each *r*-tuple [MATHEMATICAL EXPRESSION OMITTED] can be regarded as made up of a (*r* - 1)-tuple [MATHEMATICAL EXPRESSION OMITTED] and an element *ai1.* Hence, by Theorem 1.1, there are precisely

n1 (n2 ... nr) = n1n2 ... nr

*r*-tuples containing one element of each kind. The theorem now follows for all *r* by mathematical induction.

** Example 1.** What is the probability of getting three sixes in a throw of three dice?

*Solution.* Let *a* be the number of spots on the first die, *b* the number of spots on the second die, and c the number of spots on the third die. Then the result of throwing the dice is described by an ordered triple (*a, b, c*), where each element takes values from 1 to 6. Hence, by Theorem 1.2 with *r* = 3 and *n*1 = *n*2 = *n*3 = 6, there are precisely *N* = 63 = 216 equiprobable outcomes of throwing three dice (this fact was anticipated in Example 5, p. 3). Three sixes can occur in only one way, i.e., when *a = b = c* = 6. Therefore the probability of getting three sixes is 1/216.

** Example 2 (Sampling with replacement).** Suppose we choose

*r*objects in succession from a "population" (i.e., set) of

*n*distinct objects

*a,*1

*a,*2 ...,

*an,*in such a way that after choosing each object and recording the choice, we return the object to the population before making the next choice. This gives an "ordered sample" of the form

(ai1, ai2, ..., ai). (1.4)

Setting *n*1 = *n*2 = ... = *nr* = *n* in Theorem 1.2, we find that there are precisely

N = n' (1.5)

distinct ordered samples of the form (1.4).

** Example 3 (Sampling without replacement).** Next suppose we choose

*r*objects in succession from a population of

*n*distinct objects

*a*1,

*a*2, ...,

*an*, in such a way that an object once chosen is removed from the population. Then we again get an ordered sample of the form (1.4), but now there are

*n*- 1 objects left after the first choice,

*n*- 2 objects left after the second choice, and so on. Clearly this corresponds to setting n1 = n, n2 =

*n*- 1, ..., nr = n - r + 1 in Theorem 1.2. Hence, instead of

*nr*distinct samples as in the case of sampling with replacement, there are now only

N = n(n - 1) ··· (n - r + 1) (1.6)

distinct samples. If *r* = *n,* then (1.6) reduces to

N = n(n - 1) ··· 2 · 1 = n!, (1.7)

the total number of *permutations* of *n* objects.

** Example 4.** Suppose we place

*r*distinguishable objects into

*n*different "cells" (

*r < n*), with no cell allowed to contain more than one object. Numbering both the objects and the cells, let

*i*1 be the number of the cell into which the first object is placed,

*i*2 the number of the cell into which the second object is placed, and so on. Then the arrangement of the objects in the cells is described by an ordered

*r*-tuple (

*i*1,

*i*2, ...,

*ir*). Clearly, there are

*n*1 =

*n*empty cells originally,

*n*2 =

*n*- 1 empty cells after one cell has been occupied,

*n*3 =

*n*- 2 empty cells after two cells have been occupied, and so on. Hence, the total number of distinct arrangements of the objects in the cells is again given by formula (1.6).

** Example 5.** A subway train made up of

*n*cars is boarded by

*r*passengers (

*r*<

*n*), each entering a car completely at random. What is the probability of the passengers all ending up in different cars?

*Solution.* By hypothesis, every car has the same probability of being entered by a given passenger. Numbering both the passengers and the cars, let *i*1 be the number of the car entered by the first passenger, *i*2 the number of the car entered by the second passenger, and so on. Then the arrangement of the passengers in the cars is described by an ordered *r*-tuple (*i*1, *i*2, ..., *ir*), where each of the numbers *i*1, *i*2, ..., *ir* can range from 1 to *n.* This is equivalent to sampling with replacement, and hence, by Example 2, there are

*N* = *nr*

distinct equiprobable arrangements of the passengers in the cars. Let *A* be the event that "no more than one passenger enters any car." Then *A* occurs if and only if all the numbers *i*1, *i*2, ..., *ir* are distinct. In other words, if *A* is to occur, the first passenger can enter one of *n* cars, but the second passenger can only enter one of *n* – 1 cars, the third passenger one of *n* - 2 cars, and so on. This is equivalent to sampling without replacement, and hence, by Example 3, there are

N(A) = n(n - 1) ... (n - r + 1)

arrangements of passengers in the cars leading to the occurrence of *A.* Therefore, by (1.1), the probability of *A* occurring, i.e., of the passengers all ending up in different cars, is just

P(A) = n(n - 1) ··· (n - r + 1)/nr.

Any set of *r* elements chosen from a population of *n* elements, *without regard for order,* is called a *subpopulation of size r* of the original population. The number of such subpopulations is given by

**Theorem 1.3.***A population of n elements has precisely*

[MATHEMATICAL EXPRESSION OMITTED] (1.8)

*subpopulations of size r*< *n.*

*Proof.* If order mattered, then the elements of each subpopulation could be arranged in *r*! distinct ways (recall Example 3). Hence there are *r*! times more "ordered samples" of *r* elements than subpopulations of size *r.* But there are precisely *n*(*n* - 1) ... (*n* - *r* + 1) such ordered samples (by Example 3 again), and hence just

n(n - 1) ··· (n - r + 1)/r! = n!/r!(n - r)!

subpopulations of size *r*.

** Remark.** An expression of the form (1.8) is called a

*binomial coefficient,*often denoted by

[MATHEMATICAL EXPRESSION OMITTED]

instead of [MATHEMATICAL EXPRESSION OMITTED]. The number [MATHEMATICAL EXPRESSION OMITTED] is sometimes called the *number of combinations of n things taken r at a time* (without regard for order).

The natural generalization of Theorem 1.3 is given by

**Theorem 1.4.***Given a population of n elements,**let n*1, *n*2, ..., *nk**be positive integers such that*

*n1 + n2 + ... + nk = n.*

*Then there are precisely*

N = n!/n1!n2! ··· nk! (1.9)

*ways of partitioning the population into k subpopulations, of sizes n*1, *n*2, ..., *nk respectively.*

*Proof.* The order of the subpopulations matters in the sense that *n*1 = 2, *n*2 = 4, *n*3, ..., *nk* and *n*1 = 4, *n*2 = 2, *n*2 ..., *nk* (say) represent different partitions, but the order of elements within the subpopulations themselves is irrelevant. The partitioning can be effected in stages, as follows: First we form a group of *n*1 elements from the original population. This can be done in

[MATHEMATICAL EXPRESSION OMITTED]

ways. Then we form a group of *n*2 elements from the remaining *n* - *n*1 elements. This can be done in

[MATHEMATICAL EXPRESSION OMITTED]

ways. Proceeding in this fashion, we are left with *n* - *n*1 - ··· - *nk*-2 = *nk*-1 + *nk* elements after *k* - 2 stages. These elements can be partitioned into two groups, one containing *nk*-1 elements and the other *nk* elements, in

[MATHEMATICAL EXPRESSION OMITTED]

ways. Hence, by Theorem 1.2, there are

[MATHEMATICAL EXPRESSION OMITTED]

distinct ways of partitioning the given population into the indicated *k* subpopulations. But

[MATHEMATICAL EXPRESSION OMITTED]

in keeping with (1.9).

*Remark.* Theorem 1.4 reduces to Theorem 1.3 if

*k = 2, n1 = r, n2 = n - r.*

The numbers (1.9) are called *multinomial coefficients,* and generalize the binomial coefficients (1.8).

*(Continues...)*

Excerpted fromProbability TheorybyY. A. Rozanov, Richard A. Silverman. Copyright © 1969 Richard A. Silverman. Excerpted by permission of Dover Publications, Inc..

All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.

Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.