Read an Excerpt
 
Chapter Nine: Bit by Bit by Bit
When Tony Orlando requested in a 1973 song that his beloved "Tie a Yellow Ribbon Round the Ole Oak Tree," he wasn't asking for elaborate explanations or extended discussion. He didn't want any ifs, ands, or buts. Despite the complex feelings and emotional histories that would have been at play in the real-life situation the song was based on, all the man really wanted was a simple yes or no. He wanted a yellow ribbon tied around the tree to mean "Yes, even though you messed up big time and you've been in prison for three years, I still want you back with me under my roof." And he wanted the absence of a yellow ribbon to mean "Don't even think about stopping here." 
These are two clear-cut, mutually exclusive alternatives. Tony Orlando did not sing, "Tie half of a yellow ribbon if you want to think about it for a while" or "Tie a blue ribbon if you don't love me anymore but you'd still like to be friends." Instead, he made it very, very simple. 
Equally effective as the absence or presence of a yellow ribbon (but perhaps more awkward to put into verse) would be a choice of traffic signs in the front yard: Perhaps "Merge" or "Wrong Way." 
Or a sign hung on the door: "Closed" or "Open." 
Or a flashlight in the window, turned on or off. 
You can choose from lots of ways to say yes or no if that's all you need to say. You don't need a sentence to say yes or no; you don't need a word, and you don't even need a letter. All you need is a bit, and by that I mean all you need is a 0 or a 1. 
As we discovered in theprevious chapter, there's nothing really all that special about the decimal number system that we normally use for counting. It's pretty clear that we base our number system on ten because that's the number of fingers we have. We could just as reasonably base our number system on eight (if we were cartoon characters) or four (if we were lobsters) or even two (if we were dolphins). 
But there is something special about the binary number system. What's special about binary is that it's the simplest number system possible. There are only two binary digits—0 and 1. If we want something simpler than binary, we'll have to get rid of the 1, and then we'll be left with just a 0. We can't do much of anything with just a 0. 
The word bit, coined to mean binary digit, is surely one of the loveliest words invented in connection with computers. Of course, the word has the normal meaning "a small portion, degree, or amount," and that normal meaning is perfect because a bit—one binary digit—is a very small quantity indeed. 
Sometimes when a new word is invented, it also assumes a new meaning. That's certainly true in this case. A bit has a meaning beyond the binary digits used by dolphins for counting. In the computer age, the bit has come to be regarded as the basic building block of information. 
Now that's a bold statement, and of course, bits aren't the only things that convey information. Letters and words and Morse code and Braille and decimal digits convey information as well. The thing about the bit is that it conveys very little information. A bit of information is the tiniest amount of information possible. Anything less than a bit is no information at all. But because a bit represents the smallest amount of information possible, more complex information can be conveyed with multiple bits. 
By saying that a bit conveys a "small" amount of information, I surely don't mean that the information borders on the unimportant. Indeed, the yellow ribbon is a very important bit to the two people concerned with it. Another way to view information is as a choice or selection among two or more possibilities. When you speak a sentence, for example, your words are chosen from a whole dictionary of possible words. A single bit indicates a choice between just two possibilities (such as "stay away" or "come home"), and two is surely the smallest useful number of possibilities. Multiple bits indicate a choice between more than two possibilities. "Listen, my children, and you shall hear / Of the midnight ride of Paul Revere," wrote Henry Wadsworth Longfellow, and while he might not have been historically accurate when describing how Paul Revere alerted the American colonies that the British had invaded, he did provide a thought-provoking example of the use of bits to communicate important information: 
 
He said to his friend "If the British march
By land or sea from the town to-night,
Hang a lantern aloft in the belfry arch
Of the North Church tower as a special light,— 
One, if by land, and two, if by sea." To summarize, Paul Revere's friend has two lanterns. If the British are invading by land, he will put just one lantern in the church tower. If the British are coming by sea, he will put both lanterns in the church tower. 
However, Longfellow isn't explicitly mentioning all the possibilities. He left unspoken a third possibility, which is that the British aren't invading just yet. Longfellow implies that this possibility will be conveyed by the absence of lanterns in the church tower. 
Let's assume that the two lanterns are actually permanent fixtures in the church tower. Normally they aren't lit: 
This means that the British aren't yet invading. If one of the lanterns is lit the British are coming by land. If both lanterns are lit, the British are coming by sea. 
Each lantern is a bit. A lit lantern is a 1 bit and an unlit lantern is a 0 bit. Tony Orlando demonstrated to us that only one bit is necessary to convey one of two possibilities. If Paul Revere needed only to be alerted that the British were invading (and not where they were coming from), one lantern would have been sufficient. The lantern would have been lit for an invasion and unlit for another evening of peace. 
Conveying one of three possibilities requires another lantern. Once that second lantern is present, however, the two bits allows communicating one of four possibilities: 
 
00 = The British aren't invading tonight.
  
01 = They're coming by land.
  
10 = They're coming by land.
  
11 = They're coming by sea.
 What Paul Revere did by sticking to just three possibilities was actually quite sophisticated. In the lingo of communications theory, he used redundancy to counteract the effect of noise. The word noise is used in communications theory to refer to anything that interferes with communication. Static on a telephone line is an obvious example of noise that interferes with a telephone communication. Communication over the telephone is usually successful, nevertheless, even in the presence of noise because spoken language is heavily redundant. We don't need to hear every syllable of every word in order to understand what's being said. 
In the case of the lanterns in the church tower, noise can refer to the darkness of the night and the distance of Paul Revere from the tower, both of which might prevent him from distinguishing one lantern from the other. Here's the crucial passage in Longfellow's poem: 
 
And lo! As he looks, on the belfry's height
A glimmer, and then a gleam of light!
He springs to the saddle, the bridle he turns,
But lingers and gazes, till full on his sight
A second lamp in the belfry burns! It certainly doesn't sound as if Paul Revere was in a position to figure out exactly which one of the two lanterns was first lit. 
The essential concept here is that information represents a choice among two or more possibilities. For example, when we talk to another person, every word we speak is a choice among all the words in the dictionary. If we numbered all the words in the dictionary from 1 through 351,482, we could just as accurately carry on conversations using the numbers rather than words. (Of course, both participants would need dictionaries where the words are numbered identically, as well as plenty of patience.) 
The flip side of this is that any information that can be reduced to a choice among two or more possibilities can be expressed using bits. Needless to say, there are plenty of forms of human communication that do not represent choices among discrete possibilities and that are also vital to our existence. This is why people don't form romantic relationships with computers. (Let's hope they don't, anyway.) If you can't express something in words, pictures, or sounds, you're not going to be able to encode the information in bits. Nor would you want to. 
A thumb up or a thumb down is one bit of information. And two thumbs up or down—such as the thumbs of film critics Roger Ebert and the late Gene Siskel when they rendered their final verdicts on the latest movies—convey two bits of information. (We'll ignore what they actually had to say about the movies; all we care about here are their thumbs.) Here we have four possibilities that can be represented with a pair of bits: 
 
00 = They both hated it.
  
01 = Siskel hated it; Ebert loved it.
  
10 = Siskel loved it; Ebert hated it.
  
11 = They both loved it.
 The first bit is the Siskel bit, which is 0 if Siskel hated the movie and 1 if he liked it. Similarly, the second bit is the Ebert bit. 
So if your friend asked you, "What was the verdict from Siskel and Ebert about that movie Impolite Encounter?" instead of answering, "Siskel gave it a thumbs up and Ebert gave it a thumbs down" or even "Siskel liked it; Ebert didn't," you could have simply said, "One zero." As long as your friend knew which was the Siskel bit and which was the Ebert bit, and that a 1 bit meant thumbs up and a 0 bit meant thumbs down, your answer would be perfectly understandable. But you and your friend have to know the code. 
We could have declared initially that a 1 bit meant a thumbs down and a 0 bit meant a thumbs up. That might seem counterintuitive. Naturally, we like to think of a 1 bit as representing something affirmative and a 0 bit as the opposite, but it's really just an arbitrary decision. The only requirement is that everyone who uses the code must know what the 0 and 1 bits mean. 
The meaning of a particular bit or collection of bits is always understood contextually. The meaning of a yellow ribbon around a particular oak tree is probably known only to the person who put it there and the person who's supposed to see it. Change the color, the tree, or the date, and it's just a meaningless scrap of cloth. Similarly, to get some useful information out of Siskel and Ebert's hand gestures, at the very least we need to know what movie is under discussion. 
If you maintained a list of the movies that Siskel and Ebert reviewed and how they voted with their thumbs, you could add another bit to the mix to include your own opinion. Adding this third bit increases the number of different possibilities to eight: 
 
000 = Siskel hated it; Ebert hated it; I hated it.
  
001 = Siskel hated it; Ebert hated it; I loved it.
  
010 = Siskel hated it; Ebert loved it; I hated it.
  
011 = Siskel hated it; Ebert loved it; I loved it.
  
100 = Siskel loved it; Ebert hated it; I hated it.
  
101 = Siskel loved it; Ebert hated it; I loved it.
  
110 = Siskel loved it; Ebert loved it; I hated it.
  
111 = Siskel loved it; Ebert loved it; I loved it.
 One bonus of using bits to represent this information is that we know that we've accounted for all the possibilities. We know there can be eight and only eight possibilities and no more or fewer. With three bits, we can count only from zero to seven. There are no more 3-digit binary numbers. 
Now, during this description of the Siskel and Ebert bits, you might have been considering a very serious and disturbing question, and that question is this: What do we do about Leonard Maltin's Movie & Video Guide? After all, Leonard Maltin doesn't do the thumbs up and thumbs down thing. Leonard Maltin rates the movies using the more traditional star system. 
To determine how many Maltin bits we need, we must first know a few things about his system. Maltin gives a movie anything from 1 star to 4 stars, with half stars in between. (Just to make this interesting, he doesn't actually award a single star; instead, the movie is rated as a BOMB.) There are 7 possibilities, which means that we can represent a particular rating using just 3 bits: 
"What about 111?" you may ask. Well, that code doesn't mean anything. It's not defined. If the binary code 111 were used to represent a Maltin rating, you'd know that a mistake was made. (Probably a computer made the mistake because people never do.) 
You'll recall that when we had two bits to represent the Siskel and Ebert ratings, the leftmost bit was the Siskel bit and the rightmost bit was the Ebert bit. Do the individual bits mean anything here? Well, sort of. If you take the numeric value of the bit code, add 2, and then divide by 2, that will give you the number of stars. But that's only because we defined the codes in a reasonable and consistent manner. We could just as well have defined the codes this way: 
This code is just as legitimate as the preceding code so long as everybody knows what it means. 
If Maltin ever encountered a movie undeserving of even a single full star, he could award a half star. He would certainly have enough codes for the half-star option. The codes could be redefined like so: 
But if he then encountered a movie not even worthy of a half star and decided to award no stars (ATOMIC BOMB?), he'd need another bit. No more 3-bit codes are available. 
The magazine Entertainment Weekly gives grades, not only for movies but for television shows, CDs, books, CD-ROMs, Web sites, and much else. The grades range from A+ straight down to F (although it seems that only Pauly Shore movies are worthy of that honor). If you count them, you see 13 possible grades. We would need 4 bits to represent these grades: 
We have three unused codes: 1101, 1110, and 1111, for a grand total of 16. 
Whenever we talk about bits, we often talk about a certain number of bits. The more bits we have, the greater the number of different possibilities we can convey. 
It's the same situation with decimal numbers, of course. For example, how many telephone area codes are there? The area code is three decimal digits long, and if all of them are used (which they aren't, but we'll ignore that), there are 103, or 1000, codes, ranging from 000 through 999. How many 7-digit phone numbers are possible within the 212 area code? That's 107, or 10,000,000. How many phone numbers can you have with a 212 area code and a 260 prefix? That's 104, or 10,000...