The Mind within the Net: Models of Learning, Thinking, and Acting

A highly readable, non-mathematical introduction to neural networks--computer models that help us to understand how we perceive, think, feel, and act.

How does the brain work? How do billions of neurons bring about ideas, sensations, emotions, and actions? Why do children learn faster than elderly people? What can go wrong in perception, thinking, learning, and acting? Scientists now use computer models to help us to understand the most private and human experiences. In The Mind Within the Net, Manfred Spitzer shows how these models can fundamentally change how we think about learning, creativity, thinking, and acting, as well as such matters as schools, retirement homes, politics, and mental disorders.

Neurophysiology has told us a lot about how neurons work; neural network theory is about how neurons work together to process information. In this highly readable book, Spitzer provides a basic, nonmathematical introduction to neural networks and their clinical applications. Part I explains the fundamental theory of neural networks and how neural network models work. Part II covers the principles of network functioning and how computer simulations of neural networks have profound consequences for our understanding of how the brain works. Part III covers applications of network models (e.g., to knowledge representation, language, and mental disorders such as schizophrenia and Alzheimer's disease) that shed new light on normal and abnormal states of mind. Finally, Spitzer concludes with his thoughts on the ramifications of neural networks for the understanding of neuropsychology and human nature.

1100658502
The Mind within the Net: Models of Learning, Thinking, and Acting

A highly readable, non-mathematical introduction to neural networks--computer models that help us to understand how we perceive, think, feel, and act.

How does the brain work? How do billions of neurons bring about ideas, sensations, emotions, and actions? Why do children learn faster than elderly people? What can go wrong in perception, thinking, learning, and acting? Scientists now use computer models to help us to understand the most private and human experiences. In The Mind Within the Net, Manfred Spitzer shows how these models can fundamentally change how we think about learning, creativity, thinking, and acting, as well as such matters as schools, retirement homes, politics, and mental disorders.

Neurophysiology has told us a lot about how neurons work; neural network theory is about how neurons work together to process information. In this highly readable book, Spitzer provides a basic, nonmathematical introduction to neural networks and their clinical applications. Part I explains the fundamental theory of neural networks and how neural network models work. Part II covers the principles of network functioning and how computer simulations of neural networks have profound consequences for our understanding of how the brain works. Part III covers applications of network models (e.g., to knowledge representation, language, and mental disorders such as schizophrenia and Alzheimer's disease) that shed new light on normal and abnormal states of mind. Finally, Spitzer concludes with his thoughts on the ramifications of neural networks for the understanding of neuropsychology and human nature.

35.0 In Stock
The Mind within the Net: Models of Learning, Thinking, and Acting

The Mind within the Net: Models of Learning, Thinking, and Acting

by Manfred Spitzer
The Mind within the Net: Models of Learning, Thinking, and Acting

The Mind within the Net: Models of Learning, Thinking, and Acting

by Manfred Spitzer

Paperback(Reprint)

$35.00 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

A highly readable, non-mathematical introduction to neural networks--computer models that help us to understand how we perceive, think, feel, and act.

How does the brain work? How do billions of neurons bring about ideas, sensations, emotions, and actions? Why do children learn faster than elderly people? What can go wrong in perception, thinking, learning, and acting? Scientists now use computer models to help us to understand the most private and human experiences. In The Mind Within the Net, Manfred Spitzer shows how these models can fundamentally change how we think about learning, creativity, thinking, and acting, as well as such matters as schools, retirement homes, politics, and mental disorders.

Neurophysiology has told us a lot about how neurons work; neural network theory is about how neurons work together to process information. In this highly readable book, Spitzer provides a basic, nonmathematical introduction to neural networks and their clinical applications. Part I explains the fundamental theory of neural networks and how neural network models work. Part II covers the principles of network functioning and how computer simulations of neural networks have profound consequences for our understanding of how the brain works. Part III covers applications of network models (e.g., to knowledge representation, language, and mental disorders such as schizophrenia and Alzheimer's disease) that shed new light on normal and abnormal states of mind. Finally, Spitzer concludes with his thoughts on the ramifications of neural networks for the understanding of neuropsychology and human nature.


Product Details

ISBN-13: 9780262692366
Publisher: MIT Press
Publication date: 02/28/2000
Series: A Bradford Book
Edition description: Reprint
Pages: 376
Product dimensions: 6.00(w) x 9.00(h) x 0.90(d)
Age Range: 18 Years

Read an Excerpt




Chapter One

Basics


The following three chapters provide the basic information needed for discussion of higher mental functions in neurobiological and simulated neural network terms. In chapter 2 we describe in a detailed, step-by-step fashion how small networks made of a few neurons actually process information. In order to function properly, such networks have to be "programmed." In chapter 3 we see that whereas in conventional computers such programming is carried out by the storage of rules, neural networks are programmed by learning from examples. In this respect, neural networks are like children, who do not become educated by learning and applying rules through preaching or programming but by observing, experiencing, and practicing many examples. Finally, in chapter 4, we tackle the question of how information is coded and why vector coding is advantageous in terms of the required hardware. It turns out that neural networks do linear algebra; that is, they compute with vectors.


Chapter Two

Neuronal Teamwork


Every living cell is capable of responding to its environment. Heat, cold, the presence of certain chemical substances, or merely touch can change the permeability of the cell's membrane to ions and thereby induce changes in the membrane potential. Moreover, these changes in membrane potential propagate along the cell membrane. In the course of evolution, however, one type of cell became specialized in its excitability and its ability to conduct changes of electrical potential along the membrane. These cells are the neurons (figure2.1).

    The neuronal functions of excitability and conductivity of signals are based on the movement of ions, that is, electrically charged atoms. All cells of the body use at least one-third of their energy to pump sodium ions out of, and potassium ions into, themselves. This causes an electrical potential of about seventy millivolts at the cell membrane (and a negative charge inside the cell); this is the so-called resting potential. In addition to sodium and potassium, other substances are constantly exchanged between the cell and its environment. The involved processes and structures—ion pumps, ion channels, and receptors—interact in complex ways, which has the net effect that the membrane resting potential can become unstable under certain circumstances. In other words, if the resting potential is disturbed, it may change rapidly and either increase or decrease. Although this process can occur in all cells, neurons have become specialized to use these changes in membrane potential for the processing of information.

    If the resting potential of a neuron is decreased by about twenty millivolts, rapid changes take place inside the membrane and cause the negative potential to decrease further and even to become positive for a brief period of time. These processes are like an avalanche: once started, they are inevitable and continue automatically, guided by their own nature. This depolarization of the membrane is followed, within one to ten milliseconds, by the re-establishment of the resting potential, that is, by a repolarization process. This very brief burst of the membrane potential into the positive range followed by the immediate restoration of the resting potential is called an action potential. Action potentials follow a simple all-or-nothing rule: there are no graded—that is, small or large—action potentials (Kandel & Schwarz 1995, Nichols et al. 1995, Schmidt & Thews 1995).

    The membrane potential at which the avalanchelike action potential is automatically triggered is called the threshold of the neuron. If a neuron gets excited, its resting potential is shifted more and more toward the threshold, until it reaches threshold and fires off an action potential. Action potentials start at certain positions on the cell's membrane and travel along the axon of the neuron, which then conducts the potential (and hence the information that something has happened at the cell where it originated) to other cells.


Neurons as Information-Processing Units


The neuron is the information-processing unit that transforms many input signals into a single output signal. These input signals are carried to the neuron by the axons of other neurons. A run-of-the-mill neuron of the human cortex receives between a thousand and ten thousand incoming axons from other neurons. The incoming signals are action potentials that all look alike and carry information by their mere presence or absence, like the dots and dashes of Morse code. For example, if light falls on a patch of neurons in the light-sensitive rear of the eye (the retina), cells in the retina send action potentials to the brain.

    The signal-carrying axons from other neurons end with small button-like protrusions, the synapses. Unlike the connections between electrical wires, which ordinarily work equally well under any circumstances, the transduction of a signal at a synapse can be good or poor, a fact that is important for the proper functioning of the nervous system. In fact, it is crucial that the signals coming into a neuron from other neurons have different effects, which are caused by differences in the efficacy of the respective synapses. In other words, the ability of a signal to be conducted to other neurons depends critically on the synapse through which it must travel.

    In computational terms, the incoming signal can be described by a number (figure 2.2). In the simplest case, we can look at a neuron at any given moment and state whether or not an action potential is being conducted through a synapse of the neuron. The incoming signal, called the input of the neuron, may be described, again in the simplest case, by either the number 1 or the number 0. Of course, it is also possible to take a dynamic perspective and to characterize the input in terms of action potentials per unit of time. When this is done, the input is a variable that denotes the frequency of the firing input neuron.

    The quality of the synaptic transmission can also be represented by a number. If the signal travels freely through the synaptic cleft and has a high impact on the resting potential of the neuron, we can say that the action potential has a full impact on the neuron. If the signal is conducted poorly, its impact on the neuron will be small. If the synapse has an inhibitory effect on the neuron, the input must be multiplied by a negative number representing the inhibitory input. Mathematically, this effect is captured by simply multiplying each action potential by a number that specifies the strength of each synapse. We can use any number between -1 and +1 to describe inhibitory as well as excitatory synapses of various strength. The strength of a synapse is often referred to as its weight.

    Incoming action potentials, weighted by the strengths of the synapses through which they travel, lead to more or less depolarization of the neuron's membrane resting potential. Whenever the resting potential drops below a certain threshold, the neuron sends out an action potential. In mathematical terms, the neuron sums up the weighted input signals (i.e., the products of all incoming signals, zeros and ones, and their corresponding synaptic weights) and compares the sum with a threshold. Whenever the weighted input is larger than the threshold, the neuron fires an action potential—that is, sends out a 1. If the weighted input is smaller than the threshold, nothing happens. In biological neurons, the threshold will be probabilistic; that is, when graphed it will not look like a step, but rather have a sigmoidlike shape. This curve is called the activation function.

    Computationally, the function of a neuron can be completely described by the input, the synaptic weights, and the activation functions of the neurons.


A Simple Network for Pattern Recognition


All living organisms face the problem of recognizing patterns and producing specific and appropriate responses to them. Sensory cells transmit uninterpreted signals from the environment in the form of action potentials. These "incoming nerve currents," as the psychologist William James termed them, have been replaced in modern neurophysiology by a stream of zeros and ones. However, the basic problem remains the same: How does the organism produce, from an uninterpreted stream of data, meaningful information about the environment? In particular, how does it respond in a way that takes into account these data and benefits from them? To take an example from vision, the retina is bombarded by dots of light, which are transmitted to the brain as spatiotemporal patterns of action potentials. However, the problem for the organism is how to use these patterns of electrical activity to come up with meanings, such "there is something to eat over there" and "to the left there is some edible stuff, but right behind there is a predatory animal" (figure 2.3). In order to survive, the organism has to come up with such "ideas"; it has to recognize patterns quickly and reliably. How does the nervous system accomplish this feat?

    To answer this question, let us look a very simplified case (see figure 2.4). Consider a layer of light-sensitive sensory cells consisting of only three neurons, the input neurons of the network. Let us further assume that in the environment of the organism three different, important patterns (A, B, and C) occur, to which the organism has to respond differentially. These patterns are perceived by the organism when, falling on the sensory cells of the organism's eye, they become represented. Finally, let us assume that the organism has to produce three different kinds of behavior, that is, three different outputs. This is accomplished by the firing of three different output neurons. Output neuron 1 may represent flight (perhaps the motor neuron of a leg muscle); output neuron 2 may represent eating (such as a neuron controlling the tongue); while output neuron 3 may signal digestion by firing the vagus nerve. The task of the nervous system is therefore to produce, as quickly and as reliably as possible, a mapping of the three input patterns to the three output neurons. How does it produce such a mapping?

    First, think about how a conventional personal computer (PC) would perform the task. A program for the recognition of patterns A, B, and C might look something like the following serial code. "Go to the middle neuron of the retina and determine whether it is active or not. If it is not active, generate output pattern 1. If it is active, go to the top neuron. If it is not active, generate output pattern 3; if it is active, generate output pattern 2." This algorithm is rather simple, but if the patterns get more complicated (i.e., if the number of pixels is increased), the algorithm gets very long and complex. For example, for such an algorithm to detect all the characters in a 5 by 7 grid, it would need to consist of several pages of code. To recognize faces, it would probably need to be as long as a book. Because, as we have seen above, neuronal switching takes several milliseconds, such serial algorithms would be terribly slow if they were carried out by the brain. It would take the brain seconds or minutes to run a program of thousands of lines of code, whereas we can actually perform the recognition task within fractions of a second. Furthermore, serial algorithms are prone to error: when a computer works its way through a complicated decision tree, a single mistake can cause the process to take a completely different path and, hence, to produce a wrong result. The erroneous recognition of a pattern in the environment, however, may have disastrous consequences for the organism.

    In contrast to serial algorithms, neural networks accomplish pattern recognition quickly and robustly. To understand how they work, we need only remember a little elementary school mathematics.

    To produce the mapping of the above-described input and output, a neural network would utilize three input nodes and three output nodes (figure 2.5). Each input node is connected to every output node. Let us further assume that every output node has an activation threshold of 0.8; that is, each of the output nodes becomes active whenever its weighted input is larger than the threshold. If the weighted input is smaller than the threshold, the neuron remains at its resting potential.

    The differential strengths of the synaptic connections between the neurons of the input and the output layer are crucial to the effective functioning of the network. Let us look at what happens because of these differences in synaptic strength. If pattern A is perceived, the activity of the input layer corresponds to pattern A; the top and the bottom neurons are active, whereas the middle neuron is not. This pattern is transmitted to all output neurons through the connections. Each output neuron, however, processes this input differently, because each output neuron has connections to the input neurons of a different strength. The top neuron in the output layer, for example, receives the input 1 from the top neuron of the input layer through the corresponding synapse, which has a weight of 0.5. The weighted input, therefore, is 1 x 0.5 = 0.5. The weighted input coming through the other two synapses is 0 x -0.5 = 0, and 1 x 0.5 = 0.5, respectively. The sum of these weighted inputs, therefore, is 1. This is larger than the threshold of 0.8, which is why the neuron will become active.

    Similarly, we can calculate the weighted input of the middle neuron (0.6) and that of the lower neuron (-0.6). Neither of these neurons will become active. (Readers can use figure 2.5 to do the calculations themselves.) The net result is just what we wanted the network to do—produce an active top output neuron when Input A is present. How about pattern B? The weighted inputs of the top and the bottom neurons are 0.5 and 0.4, respectively. Only the middle neuron receives a weighted input (0.9) that is larger than the activation threshold. Pattern B, therefore, will cause only the middle neuron in the output layer to fire. Pattern C will also be recognized properly by the network, because at presentation the sums of the weighted input of the upper, middle, and lower neurons are -0.5, 0.3, and 1, respectively. Therefore only the lower output neuron will fire when presented with input Pattern C.


Shared Labor by Parallel Processing


The simple example we have examined demonstrates how pattern recognition can be accomplished by a simple network of two layers. It is important to realize that all three output neurons receive all of the input at the same time; that is, they work in parallel. Pixels are not processed one by one, as in the serial PC model described above. Instead, the entire pattern is processed by each of the output neurons, which is why the brain's form of processing is sometimes called parallel as well as distributed. Neural network models are therefore also called parallel distributed processing (PDP) models (cf. McClelland & Rumelhart 1986).

    When contrasted to serial processing, parallel processing has a number of important advantages. Notice that the patterns are recognized in a single computational step that is much faster than a serial algorithm working its way through the parts of the patterns one by one. If the patterns are more complex and consist, for example, of characters or faces, the recognition process remains, in principle, just as fast; it simply uses more neurons. (As we have already seen, we have plenty of them!) More complex patterns need no more time to process than simple patterns, because they are processed in parallel. In principle, a face made up of ten thousand pixels (cf. figure 2.3) could therefore be recognized in a single computational step, given the fact that the pyramidal neurons of the human Cortex can each receive up to ten thousand input fibers. In short, pattern recognition can be very fast if done by parallel processing, even though the processing units are relatively slow.

    We can hardly overemphasize the difference between information processing in neural networks and rule-based logical serial systems. The network, in contrast to the serial algorithm used by a conventional computer, contains neither rules nor calculation procedures. Its "knowledge" resides entirely in the weights of the connections. Although neural networks do not contain rules, what they do can be readily described by rules. This distinction may sound sophistical, but it has far-reaching consequences for our understanding of ourselves.

    Until a few years ago it was generally assumed that the concepts of folk psychology somehow refer to things actually going on in our brains. When we spoke of thoughts, hopes, feelings, sensations, memories, and so on, we usually presupposed that there were brain states that in some way resemble these thoughts, hopes, feelings, sensations, and memories. For example, the structure of thoughts, which may be experienced as internal speech, was likened to the processing of memories by the use of symbolic formal operations. Internal images and symbols were thought to become activated and newly associated by thoughts and volitions, and the result of such processes were the mental acts experienced by an individual. Accordingly, mental operations have often been simulated by computer programs and flow charts. Thought was thus conceived of as the manipulation of symbols.

    If we assume that the brain does not operate like a conventional computer, but rather like the neural network we have described, we gain an entirely new framework for understanding mental operations. Instead of rule-based algorithms working with symbols, they consist of subsymbolic processes, which can be described by rules and symbols only to a limited degree. Moreover, the internal representations involved in these processes constantly change during these subsymbolic operations. Such rules as exist are not in the head but are merely post hoc ways of describing mental functions (cf. Bechtel & Abrahamson 1991, Churchland & Sejnowski 1992, Churchland 1995, Clark 1993).


Language Acquisition by Children and Networks


Human language is arguably the most prominent example of a rule-based system, which human beings appear to use routinely in the production of such high-level mental functions as thinking and communicating. We all know implicitly a complicated set of rules. Each of the about eight thousand languages on earth comes with such a set of rules. Notwithstanding the obvious differences, there are some general principles to which all human beings adhere when speaking. Most noticeable is the astonishing fact that children are enormously creative during the process of acquiring language. Although they obviously use the many examples they hear to perfect their language skills, the linguist Noam Chomsky (1972, 1978, 1988) has convincingly argued that these examples are not sufficient for the child to generate the general rules necessary to speaking correctly. They simply do not hear enough examples; moreover, those they hear are sometimes contradictory. The child could never, from the examples alone, generate all the necessary rules. Chomsky has proposed, therefore, that children must acquire language through some inborn competence, some form of language instinct (cf. Pinker 1994).

    Let us take a close look at an example. When children learn to speak, they must acquire, among other things, the rules that govern the generation of the past tense from the word stem. Although they do not know these rules explicitly—none of us can consult a grammar book in our head—they can use them creatively. One of these rules, for example, specifies how to convert the word stem (sing, chant) into the past tense (sang, chanted). In English, there are two ways to form the past tense. Many verb stems are converted to the past tense by adding the ending -ed (chant becomes chanted). However, there are exceptions to this rule, and a fair number of verbs have irregular forms of the past tense (e.g., sing becomes sang). While regular verbs can be changed from present to past by the application of a single rule, irregular forms have to be learned one by one.

    Psycholinguistic studies on the development of language skills in children have demonstrated that children acquire the past tense in certain steps or phases. First, they learn to change irregular verbs; this probably happens by imitation, as irregular verbs are also frequently used verbs (e.g., to have, to be). In a second phase, children appear to acquire the rule that governs production of the past tense in regular verbs but tend to use it indiscriminately; they apply it to irregular as well as regular verbs. In this phase, errors like singed, or even sanged, are common and children are able to creatively generate the past tense of nonexistent verbs. When asked, for example, "What is the past tense of quang?," they reply "quanged." This capacity to create past tenses for words they have never heard before has been regarded as crucial evidence that children have acquired a rule. Only in the third phase are children capable of forming correctly the past tenses of regular as well as irregular verbs; that is, they have learned both the rule and the exceptions to the rule. They know that it is take, took but bake, baked. Once you start to think about how to produce the past tense, you realize how complicated a task it really is. Moreover, you realize the enormous difficulties almost all children master in the first few years of their lives.

    About a decade ago, neural networks were applied to the problem of past-tense acquisition for the first time by Rumelhart and McClelland (1986). They programmed a neural network with 460 input and 460 output nodes in which every input node was connected to every output node, resulting in 211,600 connections ([460.sup.2] = 211,600). The input patterns consisted of 420 word stems; the task of the network was to learn the corresponding 420 forms of the past tense. Learning (see chapter 3 for details) was performed by presenting the input layer of the network with a soundlike pattern of the stem that caused random activation of the neurons in the output layer. Because the desired output (i.e., the sound pattern of the past tense) was known, the actual and desired outputs could be compared. The difference between the output actually produced and the desired output was used to make small changes in the weights of the connections between input and output neurons. These small adjustments in weight in the desired direction gradually taught the network to respond with the correct past tense when presented with the stem.

    In fact, after 79,900 trials, the network had learned the correct correspondence between input and output; that is, it had come up with the synaptic weights needed to produce the sound pattern of the past tense when presented with the sound pattern of the stem. Even when presented with new verbs somewhat similar to the nonexistent verbs given to children, the network operated almost flawlessly: it produced the correct past tense of regular verbs in 92 percent of the cases and, even, the correct past tense of irregular verbs with 84-percent accuracy.

    What was most notable about the model was that its learning curve for regular and irregular verbs resembled that of children (figure 2.6). The network acquired the past tenses of regular verbs steadily; that is, it produced a constantly decreasing number of errors. In contrast, it first produced the past tenses of irregular verbs increasingly well, then reached a stage at which performance actually decreased, only to increase again later.

    The fact that the network model, like children, not only gradually improved its performance over time but also went through similar phases and produced errors that resemble those of children can be regarded as strong support for the idea that children and neural networks learn in a similar way. The striking parallels suggest, at least, that similar mechanisms are at work. This similarity has far-reaching consequences.

    Notice, for a start, that there was no explicit learning of any rule, just a gradual change in connections. Moreover, the rule does not exist except as a description of what has been learned. Whereas linguists like Chomsky and other members of his school of thought had assumed that rules as rules must be explicitly represented in the brain for language acquisition to occur, the network model has proved that this is not necessary. The network performed the task because the connections between hundreds of neurons changed their strengths, not because it followed a learned rule. Nonetheless, the network produced the correct output (the sound patterns of the past tense) for regular as well as irregular input (the sound patterns of the stem). It treated regular and irregular verb forms in the very same way. In the network—and by analogy in the heads of speaking human beings—there is neither a rule nor an exception! (This may be why so many of us find grammar difficult.) According to the new view, the rules of grammar are not inside our heads. Rather, they are spelled out after the fact, to describe what the networks actually do.

    Although a number of details of the model proposed by Rumelhart and McClelland have been subjected to criticism (cf. Marcus 1995, Pinker & Prince 1988, Plunkett 1995), the model is very plausible and has been confirmed by further simulation experiments (cf. Hoeffner 1992). These simulations were able to prove, for the first time, that rule-governed language behavior is possible without any explicit internal representation of the rules.

    Is it true that we do not follow rules when we talk? Can we only state rules on a post hoc basis? If so, what does this imply about other rule-based human activities and behaviors?


Computer Simulations of Higher Cognitive Functions


There is an increasing number of studies showing how higher cognitive functions can be simulated by neural networks. These studies have sometimes come up with astonishing results. Such models offer advantages over explanatory hypotheses and theories. Unlike the latter, a simulation can be subjected to experimental manipulations and empirical tests of the effects of these manipulations.

    In other words, the only way to test a theoretical explanation is to confront it with new real world data to see how more, or less, well the theory predicts the new data. A simulation model, however, allows us to introduce changes and observe the effects of these changes. In "playing" with the model, we can generate predictions about the real world and check them by comparing them with new observations and experiments. In short, the model allows us to experiment with the mind. For example, the parameters of the network—such as the patterns of connection, the activation function of the neurons, or its size—can, like the input signals, be manipulated, and the resulting behavior can be studied in detail. Such experiments are either impossible or very hard to carry out on real neuronal systems. A network can even be damaged to various degrees (neurons and connections can be set to zero) to test the effects of such damage on performance under different circumstances—for example, at different levels of informational load. We have already discussed an example in which the learning history—that is, the changes in performance over time—is highly informative, especially when there are specific differences that depend on input characteristics. The more detailed the predictions are and the more sophisticated the model is, the more informative are the results of such simulations. Of particular interest are the counterintuitive results that simulations may produce; these can direct researchers to phenomena and functional relations that would otherwise have escaped their attention. In short, the possibilities of network simulations are endless. They are limited only by the creativity of the experimenter.

    Obviously, the fact that a certain task can be simulated on a computer running a neural network model does not mean that the task is implemented in the brain in the same way (cf. Crick 1988, 1989). In particular, the fact that a network simulation model behaves the way a real nervous system does cannot be taken as proof that biological nervous systems produce behavior in the same way. Notwithstanding these caveats, a network model can demonstrate operating principles that might be at work in real nature; in fact, such a model may be the only way to detect these principles. If a damaged model, for example, behaves in a way that is strikingly similar to a damaged real nervous system, and if the model generates unexpected predictions about the behavior of the biological system that, upon subsequent careful examination, are found to be correct, one can hardly escape the compelling plausibility of such a model.


DECtalk versus NETtalk


The final example we discuss in this chapter is another simulation regarding an aspect of language learning (cf. Churchland 1995). Like Rumelhart and McClelland's model, this simulation calls into question the view that language can only be mastered by the application of a complex set of rules.

    A few years ago, the American computer company Digital Equipment Corporation (DEC) developed a program to convert written text into spoken language. Such a program is useful for blind people, who otherwise have to rely upon especially produced texts in Braille. The conversion of written text into sound, however, is more or less difficult in various languages. If the language has a good spelling-to-sound correlation, like Spanish or Italian, the task is relatively easy. English, however, appears hopeless in this respect. Not only foreigners learning English as a second language, but even British and American people, have difficulty learning to write correctly, which is why a large proportion of classroom time is devoted to the rote memorization of awkward spelling. The letter i may sound like the first i in imagine or like ai, as in I and icon; an a may be bright (œ) like the first a in Adam, dull (e) like the second a in Adam, or like ei (a) in aorta. George Bernard Shaw once caricatured this puzzling feature of English by pointing out that the word fish could be written as ghoti—using the f from enough, the i from women, and the sh from nation.

    Because of the complicated rules governing the transformation from spelling to sound in English, DEC's computer specialists had to take into account not just single characters, but also characters and their environment. They used the three characters to the left and to the right of a single character to figure out the correct sound pattern for that character. Of course, this turned out to be a complex enterprise, and the programming of the many rules involved took several man-years of programming. When run by a fast computer, the software did its tricks, however. The machine—called DECtalk---could actually read written English text.

    A few years later, a neural network was used to accomplish the same task. The same hardware was used to scan the text and to synthesize the voice, but no algorithmic computer program with complex rules provided the software; instead, a neural network called NETtalk was used. The network was trained (see chapter 3) with text input and correct speech output for only ten hours; at the end of this period it produced output that was 95 percent correct. Additional training improved its performance to 97.5 percent correct. The network carried out the task just as well as the algorithmic machine that had taken years to develop.

    Like Rumelhart and McClelland's model, NETtalk contains no rules; it functions properly because it makes a series of small corrections of weight during training. Like the tiny network of three neurons introduced at the beginning of this chapter, NETtalk's performance relies upon nothing but the correct mapping of input patterns onto output patterns achieved by correctly tuned synaptic connections.

    It should be clear by now that these connections are crucial to the functioning of neural networks. Addressing the origin of connections, we have referred a number of times to "training," and stated that proper training causes these weights to come about miraculously. But how does this happen? And, more importantly, how are these weights produced in biological systems (if we assume that live systems work in a somewhat similar way)? In short, how does the brain learn? The answers to these questions are crucial for our understanding of education, in particular of how children learn and how they should be taught. We will address these issues in the next chapter. But, before doing so, let us look at an idea about how synaptic weights come about that appears plausible at first glance but that cannot be correct.


Beethoven, Karajan, Sony, and the Human Genome


One might suppose that the proper connections between neurons could be genetically preprogrammed. After all, why should an organism run the risks involved in learning? Why not just inherit the correct weights? But what are the correct weights? Well, that depends on the environment of the organism! Moreover, the environment may change or the organism may move from one environment into a different one. In some areas it may be beneficial to respond to the presence of red berries with appetite, while it may be deadly to do so in other areas. If the organism were preprogrammed with a fixed set of parameters determining an equally fixed repertoire of behaviors (like the knee-jerk reflex, which is prewired), the organism would have no flexibility to respond differently to different environments and circumstances. In short, genetically programmed connections in the central nervous system of complex organisms like human beings is not desirable.

    Furthermore, a little mathematical calculation allows us to estimate that the human genome is not large enough to store all the information needed to code all the connections in the human brain. The human genome consists of about three billion base pairs. Each single base can be one out of four possible bases, which makes for an informational content of each base of two bits and an information content for the entire human genome of six billion bits. When you buy a computer, memory storage capabilities are often given not in bits, but rather in bytes (eight bits equal one byte). The information content of the human genome therefore is six-eighths of a billion bytes, that is, 750 megabytes (MB).

    We can compare this capacity with the standard size of compact discs (CDs). When the standard was first set, the small round silver disk was to have a diameter of 11.5 centimeters (cm), which allowed it to hold 550 MB of information. Conductor Herbert von Karajan, as well as the wife of the chairman of Sony, objected to the standard, because the ninth symphony of Beethoven, which lasts for about 74 minutes, did not fit on such a CD. The standard was therefore changed; the disc's present size is 12 cm, which allows it to hold 74 minutes of music, equivalent to 680 MB (cf. Schlicht 1995).

    The nucleus of each cell in the human body, therefore, has an information storage capacity that is only slightly larger than that of a CD. But how much information would be needed to genetically preprogram all the connections in the human brain?

    Let us assume that there are ten billion ([10.sup.10]) neurons in the forebrain and that each of these neurons is connected to one thousand ([10.sup.3]) other neurons. The number of connections, therefore, is [10.sup.13], that is, ten trillion. Even if we characterize these connections as only one bit of information each (i.e., we state only whether the connection is present or absent), this results in [10.sup.13] bits of information needed—that is, 1.25 X [10.sup.12] bytes or 1,250,000 MB. With its capacity of only 750 MB, the human genome could contain only a fraction of all the information needed to preprogram these connections in our brains. Moreover, this estimate is conservative, in that the number of cells, as well as of connections per cell, is likely to be larger and in that connections are not just present or absent but rather graded.

    The upshot of these considerations is clear: even if the entire human genome were used to code all the connections between neurons in our brains, it would be several orders of magnitude too small. The brains of human beings, it follows, cannot be prewired. Instead, we learn from experience; that is, our experiences wire our brains. How this is done is the subject of the next chapter.


Recap


Simulated neural networks are information-processing systems that consist of a large number of processing units. Because these units are more or less similar to biological neurons, we sometimes refer to them as neurons and use them to construct neural networks to model human cognitive functions.

    Information in these networks is processed by the activation and inhibition of neurons. Neural network research is biologically motivated; that is, it has the aim of characterizing neuronal function computationally. By abstracting from a neuron's biological features—such as form, color, microscopic structure, cell physiology, and neurochemistry—we can conceive of it as an information-processing device. Within this neuro-computational framework, the function of a neuron is to calculate the products of the input signals and the weights, to sum all these weighted inputs, and to compare the result with a threshold weight.

    Even simple networks can recognize patterns faster and more efficiently than serial computers can; and all it takes for the processing of more complex patterns is a larger number of neurons. Because information processing in neural networks is distributed across many neurons working simultaneously, the type of information processing performed by them is often called parallel distributed processing (PDP).

    At one time, researchers assumed that mental operations are carried out by the operation of rules upon fixed mental representations. In the past few decades, however, our image of what the mind is and what it does has changed from one that is static and rule-based to one that is dynamic and process-based. It has become increasingly clear that biological mental operations are similar to the operations performed by computerized neural networks.

    We have seen that rules merely describe what brains and networks do; they do not exist as explicit entities within these systems. By looking at two examples of apparently rule-based behavior—the production of the past tense from the word stem and reproduction by a computer of the sounds of English speech—we have seen that such tasks can be carried out by networks trained only by examples and not by the storage of explicit rules.

    Finally, we have demonstrated that the connections between the neurons in a human brain cannot possibly be genetically determined, because the entire human genome is by far too small to contain all the necessary information. Instead, humans learn through interactions with the environment that change the connections in our biological brains. The precise mechanism of this learning is the subject of the next chapter.

Table of Contents

Preface
Acknowledgments
1 Introduction
I Basics
2 Neuronal Teamwork
3 Learning
4 Vectors in the Head
II Principles
5 Maps in the CorteX
6 Hidden Layers
7 Neuroplasticity
8 Feedback
III Applications
9 Representing Knowledge
10 Semantic Networks
11 The Disordered Mind
12 Thoughts and Impressions
Glossary
References
IndeX

What People are Saying About This

From the Publisher

"Seductive on-screen views of brain activity open up a closed realm by rendering the mind visible. A new enlightenment beckons. A new stupidity, too, a new confusion of the moral and mechanical, if we don't listen carefully to sane and discriminating voices like Spitzer's." PaulFisher, Daily Telegraph"Spitzer... has written a highly readable introduction to 'traditional'neural-net models.." Terrence J. Sejnowski, Nature

From the B&N Reads Blog

Customer Reviews