The Mathematical Theory of Communication

The Mathematical Theory of Communication

The Mathematical Theory of Communication

The Mathematical Theory of Communication

eBook

$19.95 

Available on Compatible NOOK Devices and the free NOOK Apps.
WANT A NOOK?  Explore Now

Related collections and offers

LEND ME® See Details

Overview

Scientific knowledge grows at a phenomenal pace--but few books have had as lasting an impact or played as important a role in our modern world as The Mathematical Theory of Communication, published originally as a paper on communication theory more than fifty years ago. Republished in book form shortly thereafter, it has since gone through four hardcover and sixteen paperback printings.  It is a revolutionary work, astounding in its foresight and contemporaneity.  The University of Illinois Press is pleased and honored to issue this commemorative reprinting of a classic.
 
 

Product Details

ISBN-13: 9780252098031
Publisher: University of Illinois Press
Publication date: 09/01/1998
Sold by: Barnes & Noble
Format: eBook
Pages: 144
File size: 3 MB

About the Author

Claude E. Shannon was a research mathematician at the Bell Telephone Laboratories and Donner professor of science at the Massachusetts Institute of Technology. Warren Weaver had a distinguished academic, government, and foundation career. Both authors received numerous awards and honors.
 

Read an Excerpt

The Mathematical Theory of Communication


By Claude E. Shannon, Warren Weaver

UNIVERSITY OF ILLINOIS PRESS

Copyright © 1998 Board of Trustees of the University of Illinois
All rights reserved.
ISBN: 978-0-252-09803-1



CHAPTER 1

Discrete Noiseless Systems


1. The Discrete Noiseless Channel

Teletype and telegraphy are two simple examples of a discrete channel for transmitting information. Generally, a discrete channel will mean a system whereby a sequence of choices from a finite set of elementary symbols S1 ... Sn can be transmitted from one point to another. Each of the symbols Si is assumed to have a certain duration in time ti seconds (not necessarily the same for different Si, for example the dots and dashes in telegraphy). It is not required that all possible sequences of the Si be capable of transmission on the system; certain sequences only may be allowed. These will be possible signals for the channel. Thus in telegraphy suppose the symbols are: (1) A dot, consisting of line closure for a unit of time and then line open for a unit of time; (2) A dash, consisting of three time units of closure and one unit open; (3) A letter space consisting of, say, three units of line open; (4) A word space of six units of line open. We might place the restriction on allowable sequences that no spaces follow each other (for if two letter spaces are adjacent, they are identical with a word space). The question we now consider is how one can measure the capacity of such a channel to transmit information.

In the teletype case where all symbols are of the same duration, and any sequence of the 32 symbols is allowed, the answer is easy. Each symbol represents five bits of information. If the system transmits n symbols per second it is natural to say that the channel has a capacity of 5n bits per second. This does not mean that the teletype channel will always be transmitting information at this rate—this is the maximum possible rate and whether or not the actual rate reaches this maximum depends on the source of information which feeds the channel, as will appear later.

In the more general case with different lengths of symbols and constraints on the allowed sequences, we make the following definition: The capacity C of a discrete channel is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where N(T) is the number of allowed signals of duration T.

It is easily seen that in the teletype case this reduces to the previous result. It can be shown that the limit in question will exist as a finite number in most cases of interest. Suppose all sequences of the symbols S1, ..., Sn are allowed and these symbols have durations t1, ..., tn. What is the channel capacity? If N(t) represents the number of sequences of duration t we have

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The total number is equal to the sum of the numbers of sequences ending in S1, S2, ..., Sn and these are N (tt1), N (tt2), ..., N(ttn), respectively. According to a well-known result in finite differences, N (t) is then asymptotic for large t to AX'0 where A is constant and X0 is the largest real solution of the characteristic equation:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and therefore

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

In case there are restrictions on allowed sequences we may still often obtain a difference equation of this type and find C from the characteristic equation. In the telegraphy case mentioned above

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

as we see by counting sequences of symbols according to the last or next to the last symbol occurring. Hence C is—log µ 0 where µ0 is the positive root of 1 = µ2 + µ4 + µ 5 + µ7 + µ8 + µ 10. Solving this we find C = 0.539.

A very general type of restriction which may be placed on allowed sequences is the following: We imagine a number of possible states al, a2, ..., am. For each state only certain symbols from the set S1, ..., Sn can be transmitted (different subsets for the different states). When one of these has been transmitted the state changes to a new state depending both on the old state and the particular symbol transmitted. The telegraph case is a simple example of this. There are two states depending on whether or not a space was the last symbol transmitted. If so, then only a dot or a dash can be sent next and the state always changes. If not, any symbol can be transmitted and the state changes if a space is sent, otherwise it remains the same. The conditions can be indicated in a linear graph as shown in Fig. 2. The junction points correspond to the states and the lines indicate the symbols possible in a state and the resulting state. In Appendix 1 it is shown that if the conditions on allowed sequences can be described in this form C will exist and can be calculated in accordance with the following result:

Theorem 1: Let b(n)ij be the duration of the sth symbol which is allowable in state i and leads to stage j. Then the channel capacity C is equal to log W where W is the largest real root of the determinantal equation:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where δij = 1 if i = j and is zero otherwise.

For example, in the telegraph case (Fig. 2) the determinant is:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

On expansion this leads to the equation given above for this set of constraints.


2. The Discrete Source of Information

We have seen that under very general conditions the logarithm of the number of possible signals in a discrete channel increases linearly with time. The capacity to transmit information can be specified by giving this rate of increase, the number of bits per second required to specify the particular signal used.

We now consider the information source. How is an information source to be described mathematically, and how much information in bits per second is produced in a given source? The main point at issue is the effect of statistical knowledge about the source in reducing the required capacity of the channel, by the use of proper encoding of the information. In telegraphy, for example, the messages to be transmitted consist of sequences of letters. These sequences, however, are not completely random. In general, they form sentences and have the statistical structure of, say, English. The letter E occurs more frequently than Q, the sequence TH more frequently than XP, etc. The existence of this structure allows one to make a saving in time (or channel capacity) by properly encoding the message sequences into signal sequences. This is already done to a limited extent in telegraphy by using the shortest channel symbol, a dot, for the most common English letter E; while the infrequent letters, Q, X, Z are represented by longer sequences of dots and dashes. This idea is carried still further in certain commercial codes where common words and phrases are represented by four- or five-letter code groups with a considerable saving in average time. The standardized greeting and anniversary telegrams now in use extend this to the point of encoding a sentence or two into a relatively short sequence of numbers.

We can think of a discrete source as generating the message, symbol by symbol. It will choose successive symbols according to certain probabilities depending, in general, on preceding choices as well as the particular symbols in question. A physical system, or a mathematical model of a system which produces such a sequence of symbols governed by a set of probabilities, is known as a stochastic process. We may consider a discrete source, therefore, to be represented by a stochastic process. Conversely, any stochastic process which produces a discrete sequence of symbols chosen from a finite set may be considered a discrete source. This will include such cases as:

1. Natural written languages such as English, German, Chinese.

2. Continuous information sources that have been rendered discrete by some quantizing process. For example, the quantized speech from a PCM transmitter, or a quantized television signal.

3. Mathematical cases where we merely define abstractly a stochastic process which generates a sequence of symbols. The following are examples of this last type of source.

(A) Suppose we have five letters A, B, C, D, E which are chosen each with probability .2, successive choices being independent. This would lead to a sequence of which the following is a typical example.

B D C B C E C C C A D C B D D A A E C E E A A B B D A E E C A C E E B A E E C B C E A D.

This was constructed with the use of a table of random numbers.

B) Using the same five letters let the probabilities be .4, .1, .2, .2, .1, respectively, with successive choices independent. A typical message from this source is then:

A A A C D C B D C E A A D A D A C E D A E A D C A B E D A D D C E C A A A A A D.

(C) A more complicated structure is obtained if successive symbols are not chosen independently but their probabilities depend on preceding letters. In the simplest case of this type a choice depends only on the preceding letter and not on ones before that. The statistical structure can then be described by a set of transition probabilities pi(j), the probability that letter i is followed by letter j. The indices i and j range over all the possible symbols. A second equivalent way of specifying the structure is to give the "digram" probabilities p(i,j), i.e., the relative frequency of the digram i j. The letter frequencies p(i), (the probability of letter i), the transition probabilities pi (j) and the digram probabilities p (i,j) are related by the following formulas:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

As a specific example suppose there are three letters A, B, C with the probability tables:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

A typical message from this source is the following:

A B B A B A B A B A B A B A B B B A B B B B B A B A B A B A B A B B B A C A C A B B A B B B B A B B A B A C B B B A B A.

The next increase in complexity would involve trigram frequencies but no more. The choice of a letter would depend on the preceding two letters but not on the message before that point. A set of trigram frequencies p (i, j, k) or equivalently a set of transition probabilities pij (k) would be required. Continuing in this way one obtains successively more complicated stochastic processes. In the general n-gram case a set of n-gram probabilities p (i1, i2, ..., in) or of transition probabilities ITLπITL1, i2, ..., in–1 (in) is required to specify the statistical structure.

(D) Stochastic processes can also be defined which produce a text consisting of a sequence of "words." Suppose there are five letters A, B, C, D, E and 16 "words" in the language with associated probabilities:

.10 A .16 BEBE .11 CABED .04 DEB
.04 ADEB .04 BED .05 CEED .15 DEED
.05 ADEE .02 BEED .08 DAB .01 EAB
.01 BADD .05 CA .04 DAD .05 EE


Suppose successive "words" are chosen independently and are separated by a space. A typical message might be:

DAB EE A BEBE DEED DEB ADEE ADEE EE DEB BEBE BEBE BEBE ADEE BED DEED DEED CEED ADEE A DEED DEED BEBE CABED BEBE BED DAB DEED ADEB.

If all the words are of finite length this process is equivalent to one of the preceding type, but the description may be simpler in terms of the word structure and probabilities. We may also generalize here and introduce transition probabilities between words, etc.


These artificial languages are useful in constructing simple problems and examples to illustrate various possibilities. We can also approximate to a natural language by means of a series of simple artificial languages. The zero-order approximation is obtained by choosing all letters with the same probability and independently. The first-order approximation is obtained by choosing successive letters independently but each letter having the same probability that it has in the natural language. Thus, in the first-order approximation to English, E is chosen with probability .12 (its frequency in normal English) and W with probability .02, but there is no influence between adjacent letters and no tendency to form the preferred digrams such as TH, ED, etc. In the second-order approximation, digram structure is introduced. After a letter is chosen, the next one is chosen in accordance with the frequencies with which the various letters follow the first one. This requires a table of digram frequencies ITLπITL(j). In the third-order approximation, trigram structure is introduced. Each letter is chosen with probabilities which depend on the preceding two letters.


3. The Series of Approximations to English

To give a visual idea of how this series of processes approaches a language, typical sequences in the approximations to English have been constructed and are given below. In all cases we have assumed a 27-symbol "alphabet," the 26 letters and a space.

1. Zero-order approximation (symbols independent and equiprobable).

XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD.

2. First-order approximation (symbols independent but with frequencies of English text).

OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA NAH BRL.

3. Second-order approximation (digram structure as in English).

ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.

4. Third-order approximation (trigram structure as in English).

IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.

5. First-order word approximation. Rather than continue with tetragram, ..., n-gram structure it is easier and better to jump at this point to word units. Here words are chosen independently but with their appropriate frequencies.

REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE.

6. Second-order word approximation. The word transition probabilities are correct but no further structure is included.

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.


The resemblance to ordinary English text increases quite noticeably at each of the above steps. Note that these samples have reasonably good structure out to about twice the range that is taken into account in their construction. Thus in (3) the statistical process insures reasonable text for two-letter sequences, but four-letter sequences from the sample can usually be fitted into good sentences. In (6) sequences of four or more words can easily be placed in sentences without unusual or strained constructions. The particular sequence of ten words "attack on an English writer that the character of this" is not at all unreasonable. It appears then that a sufficiently complex stochastic process will give a satisfactory representation of a discrete source.

The first two samples were constructed by the use of a book of random numbers in conjunction with (for example 2) a table of letter frequencies. This method might have been continued for (3), (4) and (5), since digram, trigram and word frequency tables are available, but a simpler equivalent method was used. To construct (3) for example, one opens a book at random and selects a letter at random on the page. This letter is recorded. The book is then opened to another page and one reads until this letter is encountered. The succeeding letter is then recorded. Turning to another page this second letter is searched for and the succeeding letter recorded, etc. A similar process was use for (4), (5) and (6). It would be interesting if further approximations could be constructed, but the labor involved becomes enormous at the next stage.


(Continues...)

Excerpted from The Mathematical Theory of Communication by Claude E. Shannon, Warren Weaver. Copyright © 1998 Board of Trustees of the University of Illinois. Excerpted by permission of UNIVERSITY OF ILLINOIS PRESS.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Table of Contents

Contents

Foreword Richard E. Blahut and Bruce Hajek,
Preface,
Some Recent Contributions to the Mathematical Theory of Communication Warren Weaver,
The Mathematical Theory of Communication Claude E. Shannon,
Introduction,
1. Discrete Noiseless Systems,
2. The Discrete Channel with Noise,
3. Continuous Information,
4. The Continuous Channel,
5. The Rate for a Continuous Source,
Appendixes,

From the B&N Reads Blog

Customer Reviews