Elements of Information Theory / Edition 2

Hardcover (Print)
Rent from BN.com
(Save 67%)
Est. Return Date: 07/25/2015
Used and New from Other Sellers
Used and New from Other Sellers
from $42.59
Usually ships in 1-2 business days
(Save 65%)
Other sellers (Hardcover)
  • All (17) from $42.59   
  • New (10) from $87.05   
  • Used (7) from $42.59   


The latest edition of this classic is updated with new problem sets and material

The Second Edition of this fundamental textbook maintains the book's tradition of clear, thought-provoking instruction. Readers are provided once again with an instructive mix of mathematics, physics, statistics, and information theory.

All the essential topics in information theory are covered in detail, including entropy, data compression, channel capacity, rate distortion, network information theory, and hypothesis testing. The authors provide readers with a solid understanding of the underlying theory and applications. Problem sets and a telegraphic summary at the end of each chapter further assist readers. The historical notes that follow each chapter recap the main points.

The Second Edition features:
* Chapters reorganized to improve teaching
* 200 new problems
* New material on source coding, portfolio theory, and feedback capacity
* Updated references

Now current and enhanced, the Second Edition of Elements of Information Theory remains the ideal textbook for upper-level undergraduate and graduate courses in electrical engineering, statistics, and telecommunications.

An Instructor's Manual presenting detailed solutions to all the problems in the book is available from the Wiley editorial department.

Read More Show Less

Editorial Reviews

From the Publisher
"As expected, the quality of exposition continues to be a high point of the book. Clear explanations, nice graphical illustrations, and illuminating mathematical derivations make the book particularly useful as a textbook on information theory." (Journal of the American Statistical Association, March 2008)

"This book is recommended reading, both as a textbook and as a reference." (Computing Reviews.com, December 28, 2006)

The authors introduce the basic quantities of entropy, relative entropy, and mutual information and show how they arise as natural answers to questions of data compression, channel capacity, rate distortion, hypothesis testing, information flow in networks, and gambling. Accessible to students of communication theory, computer science, and statistics. Annotation c. Book News, Inc., Portland, OR (booknews.com)
Read More Show Less

Product Details

Meet the Author

THOMAS M. COVER, PHD, is Professor in the departments of electrical engineering and statistics, Stanford University. A recipient of the 1991 IEEE Claude E. Shannon Award, Dr. Cover is a past president of the IEEE Information Theory Society, a Fellow of the IEEE and the Institute of Mathematical Statistics, and a member of the National Academy of Engineering and the American Academy of Arts and Science. He has authored more than 100 technical papers and is coeditor of Open Problems in Communication and Computation.

JOY A. THOMAS, PHD, is the Chief Scientist at Stratify, Inc., a Silicon Valley start-up specializing in organizing unstructured information. After receiving his PhD at Stanford, Dr. Thomas spent more than nine years at the IBM T. J. Watson Research Center in Yorktown Heights, New York. Dr. Thomas is a recipient of the IEEE Charles LeGeyt Fortescue Fellowship.

Read More Show Less

Read an Excerpt

Chapter 1: Introduction and Preview

This "first and last lecture" chapter goes backwards and forwards through information theory and its naturally related ideas. The full definitions and study of the subject begin in Chapter 2.

Information theory answers two fundamental questions in communication theory: what is the ultimate data compression (answer: the entropy H), and what is the ultimate transmission rate of communication (answer: the channel capacity C). For this reason some consider information theory to be a subset of communication theory. We will argue that it is much more. Indeed, it has fundamental contributions to make in statistical physics (thermodynamics), computer science (Kolmogorov complexity or algorithmic complexity), statistical inference (Occam's Razor: "The simplest explanation is best") and to probability and statistics (error rates for optimal hypothesis testing and estimation).

Figure 1.1 illustrates the relationship of information theory to other fields. As the figure suggests, information theory intersects physics (statistical mechanics), mathematics (probability theory), electrical engineering (communication theory) and computer science (algorithmic complexity). We now describe the areas of intersection in greater detail:

Electrical Engineering (Communication Theory). In the early 1940s, it was thought that increasing the transmission rate of information over a communication channel increased the probability of error. Shannon surprised the communication theory community by proving that this was not true as long as the communication rate was below channel capacity. The capacity can be simply computed from the noise characteristics of the channel. Shannon further argued that random processes such as music and speech have an irreducible complexity below which the signal cannot be compressed. This he named the entropy, in deference to the parallel use of this word in thermodynamics, and argued that if the entropy of the source is less than the capacity of the channel, then asymptotically error free communication can be achieved.

Information theory today represents the extreme points of the set of all possible communication schemes, as shown in the fanciful Figure 1.2. The data compression minimum I(X; X) lies at one extreme of the set of communication ideas. All data compression schemes require description rates at least equal to this minimum. At the other extreme is the data transmission maximum I(X; Y), known as the channel capacity. Thus all modulation schemes and data compression schemes lie between these limits.

Information theory also suggests means of achieving these ultimate limits of communication. However, these theoretically optimal communication schemes, beautiful as they are, may turn out to be computationally impractical. It is only because of the computational feasibility of simple modulation and demodulation schemes that we use them rather than the random coding and nearest neighbor decoding rule suggested by Shannon's proof of the channel capacity theorem. Progress in integrated circuits and code design has enabled us to reap some of the gains suggested by Shannon's theory. A good example of an application of the ideas of information theory is the use of error correcting codes on compact discs.

Modern work on the communication aspects of information theory has concentrated on network information theory: the theory of the simultaneous rates of communication from many senders to many receivers in a communication network. Some of the trade-offs of rates between senders and receivers are unexpected, and all have a certain mathematical simplicity. A unifying theory, however, remains to be found.

Computer Science (Kolmogorov Complexity). Kolmogorov, Chaitin and Solomonoff put forth the idea that the complexity of a string of data can be defined by the length of the shortest binary program for computing the string. Thus the complexity is the minimal description length. This definition of complexity turns out to be universal, that is, computer independent, and is of fundamental importance. Thus Kolmogorov complexity lays the foundation for the theory of descriptive complexity. Gratifyingly, the Kolmogorov complexity K is approximately equal to the Shannon entropy H if the sequence is drawn at random from a distribution that has entropy H. So the tie-in between information theory and Kolmogorov complexity is perfect. Indeed, we consider Kolmogorov complexity to be more fundamental than Shannon entropy. It is the ultimate data compression and leads to a logically consistent procedure for inference.

There is a pleasing complementary relationship between algorithmic complexity and computational complexity. One can think about computational complexity (time complexity) and Kolmogorov complexity (program length or descriptive complexity) as two axes corresponding to program running time and program length. Kolmogorov complexity focuses on minimizing along the second axis, and computational complexity focuses on minimizing along the first axis. Little work has been done on the simultaneous minimization of the two...

Read More Show Less

Table of Contents

Preface to the Second Edition.

Preface to the First Edition.

Acknowledgments for the Second Edition.

Acknowledgments for the First Edition.

1. Introduction and Preview.

1.1 Preview of the Book.

2. Entropy, Relative Entropy, and Mutual Information.

2.1 Entropy.

2.2 Joint Entropy and Conditional Entropy.

2.3 Relative Entropy and Mutual Information.

2.4 Relationship Between Entropy and Mutual Information.

2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information.

2.6 Jensen’s Inequality and Its Consequences.

2.7 Log Sum Inequality and Its Applications.

2.8 Data-Processing Inequality.

2.9 Sufficient Statistics.

2.10 Fano’s Inequality.



Historical Notes.

3. Asymptotic Equipartition Property.

3.1 Asymptotic Equipartition Property Theorem.

3.2 Consequences of the AEP: Data Compression.

3.3 High-Probability Sets and the Typical Set.



Historical Notes.

4. Entropy Rates of a Stochastic Process.

4.1 Markov Chains.

4.2 Entropy Rate.

4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph.

4.4 Second Law of Thermodynamics.

4.5 Functions of Markov Chains.



Historical Notes.

5. Data Compression.

5.1 Examples of Codes.

5.2 Kraft Inequality.

5.3 Optimal Codes.

5.4 Bounds on the Optimal Code Length.

5.5 Kraft Inequality for Uniquely Decodable Codes.

5.6 Huffman Codes.

5.7 Some Comments on Huffman Codes.

5.8 Optimality of Huffman Codes.

5.9 Shannon–Fano–Elias Coding.

5.10 Competitive Optimality of the Shannon Code.

5.11 Generation of Discrete Distributions from Fair Coins.



Historical Notes.

6. Gambling and Data Compression.

6.1 The Horse Race.

6.2 Gambling and Side Information.

6.3 Dependent Horse Races and Entropy Rate.

6.4 The Entropy of English.

6.5 Data Compression and Gambling.

6.6 Gambling Estimate of the Entropy of English.



Historical Notes.

7. Channel Capacity.

7.1 Examples of Channel Capacity.

7.2 Symmetric Channels.

7.3 Properties of Channel Capacity.

7.4 Preview of the Channel Coding Theorem.

7.5 Definitions.

7.6 Jointly Typical Sequences.

7.7 Channel Coding Theorem.

7.8 Zero-Error Codes.

7.9 Fano’s Inequality and the Converse to the Coding Theorem.

7.10 Equality in the Converse to the Channel Coding Theorem.

7.11 Hamming Codes.

7.12 Feedback Capacity.

7.13 Source–Channel Separation Theorem.



Historical Notes.

8. Differential Entropy.

8.1 Definitions.

8.2 AEP for Continuous Random Variables.

8.3 Relation of Differential Entropy to Discrete Entropy.

8.4 Joint and Conditional Differential Entropy.

8.5 Relative Entropy and Mutual Information.

8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information.



Historical Notes.

9. Gaussian Channel.

9.1 Gaussian Channel: Definitions.

9.2 Converse to the Coding Theorem for Gaussian Channels.

9.3 Bandlimited Channels.

9.4 Parallel Gaussian Channels.

9.5 Channels with Colored Gaussian Noise.

9.6 Gaussian Channels with Feedback.



Historical Notes.

10. Rate Distortion Theory.

10.1 Quantization.

10.2 Definitions.

10.3 Calculation of the Rate Distortion Function.

10.4 Converse to the Rate Distortion Theorem.

10.5 Achievability of the Rate Distortion Function.

10.6 Strongly Typical Sequences and Rate Distortion.

10.7 Characterization of the Rate Distortion Function.

10.8 Computation of Channel Capacity and the Rate Distortion Function.



Historical Notes.

11. Information Theory and Statistics.

11.1 Method of Types.

11.2 Law of Large Numbers.

11.3 Universal Source Coding.

11.4 Large Deviation Theory.

11.5 Examples of Sanov’s Theorem.

11.6 Conditional Limit Theorem.

11.7 Hypothesis Testing.

11.8 Chernoff–Stein Lemma.

11.9 Chernoff Information.

11.10 Fisher Information and the Cramèr–Rao Inequality.



Historical Notes.

12. Maximum Entropy.

12.1 Maximum Entropy Distributions.

12.2 Examples.

12.3 Anomalous Maximum Entropy Problem.

12.4 Spectrum Estimation.

12.5 Entropy Rates of a Gaussian Process.

12.6 Burg’s Maximum Entropy Theorem.



Historical Notes.

13. Universal Source Coding.

13.1 Universal Codes and Channel Capacity.

13.2 Universal Coding for Binary Sequences.

13.3 Arithmetic Coding.

13.4 Lempel–Ziv Coding.

13.5 Optimality of Lempel–Ziv Algorithms.




Historical Notes.

14. Kolmogorov Complexity.

14.1 Models of Computation.

14.2 Kolmogorov Complexity: Definitions and Examples.

14.3 Kolmogorov Complexity and Entropy.

14.4 Kolmogorov Complexity of Integers.

14.5 Algorithmically Random and Incompressible Sequences.

14.6 Universal Probability.

14.7 Kolmogorov complexity.

14.9 Universal Gambling.

14.10 Occam’s Razor.

14.11 Kolmogorov Complexity and Universal Probability.

14.12 Kolmogorov Sufficient Statistic.

14.13 Minimum Description Length Principle.



Historical Notes.

15. Network Information Theory.

15.1 Gaussian Multiple-User Channels.

15.2 Jointly Typical Sequences.

15.3 Multiple-Access Channel.

15.4 Encoding of Correlated Sources.

15.5 Duality Between Slepian–Wolf Encoding and Multiple-Access Channels.

15.6 Broadcast Channel.

15.7 Relay Channel.

15.8 Source Coding with Side Information.

15.9 Rate Distortion with Side Information.

15.10 General Multiterminal Networks.



Historical Notes.

16. Information Theory and Portfolio Theory.

16.1 The Stock Market: Some Definitions.

16.2 Kuhn–Tucker Characterization of the Log-Optimal Portfolio.

16.3 Asymptotic Optimality of the Log-Optimal Portfolio.

16.4 Side Information and the Growth Rate.

16.5 Investment in Stationary Markets.

16.6 Competitive Optimality of the Log-Optimal Portfolio.

16.7 Universal Portfolios.

16.8 Shannon–McMillan–Breiman Theorem (General AEP).



Historical Notes.

17. Inequalities in Information Theory.

17.1 Basic Inequalities of Information Theory.

17.2 Differential Entropy.

17.3 Bounds on Entropy and Relative Entropy.

17.4 Inequalities for Types.

17.5 Combinatorial Bounds on Entropy.

17.6 Entropy Rates of Subsets.

17.7 Entropy and Fisher Information.

17.8 Entropy Power Inequality and Brunn–Minkowski Inequality.

17.9 Inequalities for Determinants.

17.10 Inequalities for Ratios of Determinants.



Historical Notes.


List of Symbols.


Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing 1 Customer Reviews
  • Anonymous

    Posted February 4, 2007


    This book is horrible. Theories are given, but no examples are provided. Please do not make students' life worst than it is now.

    Was this review helpful? Yes  No   Report this review
Sort by: Showing 1 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)