How New Languages Emerge

How New Languages Emerge

by David Lightfoot
Pub. Date:
Cambridge University Press


View All Available Formats & Editions
Current price is , Original price is $51.99. You
Select a Purchase Option (New Edition)
  • purchase options
    $47.40 $51.99 Save 9% Current price is $47.4, Original price is $51.99. You Save 9%.
  • purchase options


How New Languages Emerge

New languages are constantly emerging, as existing languages diverge into different forms. To explain this fascinating process, we need to understand how languages change and how they emerge in children. In this pioneering study, David Lightfoot explains how languages come into being, arguing that children are the driving force. He explores how new systems arise, how they are acquired by children, and how adults and children play different, complementary roles in language change. Lightfoot makes an important distinction between 'external language' (language as it exists in the world), and 'internal language' (language as represented in an individual's brain). By examining the interplay between the two, he shows how children are 'cue-based' learners, who scan their external linguistic environment for new structures, making sense of the world outside in order to build their internal language. Engaging and original, this book offers an interesting account of language acquisition, variation and change.

Product Details

ISBN-13: 9780521676298
Publisher: Cambridge University Press
Publication date: 03/31/2006
Edition description: New Edition
Pages: 208
Product dimensions: 5.98(w) x 8.98(h) x 0.43(d)

About the Author

David Lightfoot is Professor of Linguistics at Georgetown University, and Assistant Director of the National Science Foundation.

Read an Excerpt

How New Languages Emerge
Cambridge University Press
0521859131 - How New Languages Emerge - by David Lightfoot

1 Internal languages and the outside world

1.1 Languages and the language capacity

Languages come and languages go. We deplore it when they go, because the disappearance of a language is a loss for the richness of human experience. These days, linguists are devoting much energy to documenting expiring languages. That documentation itself may increase the use of the language, which may increase its chance of surviving in some form. For example, simply finding a written version of a language facilitates its use for new purposes and new uses lead the language to be spoken more widely. Adapting computer software to accommodate the language may bring further advantages. Ultimately, however, people cease to speak a language because they come to identify with a different group, perhaps encouraged by factors of economic interest, perhaps influenced by governmental policy favoring one language above others in schools and official discourse.

Nettle & Romaine (2000: ⅸ) note that "the greatest linguistic diversity is found in some of the ecosystems richest in biodiversity inhabited by indigenous peoples, who represent around 4% of the world's population, but speak at least 60% of its 6,000 or more languages." Expiring languages tend to be spoken by small, underprivileged groups that lack resources. The disappearance of languages is a complicated matter that began to generate widespread concern in the 1990s, when funds were invested in investigating the death of languages and efforts were made to document endangered languages. Now the National Science Foundation and the National Endowment for the Humanities have begun to fund work jointly on endangered languages.

The Ethnologue, a website maintained by SIL International, reports that there were 6,912 languages spoken in the year 2005 - 239 in Europe, 2,092 in Africa. One can argue about how the languages were counted. English is listed as a single language, although it embraces varieties that are mutually incomprehensible, but the very similar Norwegian and Swedish are listed as distinct languages (Grimes & Grimes 2000). Conventionally, we often speak of Chinese and Arabic as single languages, although they include mutually incomprehensible varieties - "Chinese" seems to encompass eight very different languages. Whatever the best number is for the world's languages, it will be smaller in a short time. A total of 497 languages are listed as "nearly extinct," which means that "only a few elderly speakers are still living." Some linguists, the Ethnologue reports, believe that over half of the world's languages will not be passed on to the next generation.

Meanwhile new languages are emerging and we often deplore that, too, on the grounds that new forms represent a kind of decay and degenerate speech that violates norms that we have been taught in school. Nonetheless, Latin became Portuguese, Spanish, French, Italian, Romanian, and other languages, Dutch became Afrikaans in nineteenth-century South Africa, and early English developed into distinct forms of West Saxon and Mercian, into London, Scots, and Lancashire English, and later into Texan, Australian, Delhi, Jamaican, and many other forms. Within the last generation, we have even been privileged to witness the sudden emergence ex nihilo of some new signed languages in Nicaragua and Israel, as we shall discuss in chapter 7.

The emergence of new languages is harder to track than the loss of languages. It is sometimes an identifiable event when the last native speaker of a language dies, e.g. Dolly Pentreath in 1777, allegedly the last speaker of Cornish, but there was no comparable discrete event when, say, Portuguese became a new language as opposed to just the form of Latin spoken around the River Tagus. We now think of Australian and Jamaican as particular forms of English, and they may one day become as distinct as Portuguese, Spanish, and Italian, distinct languages with their own names, perhaps Strine and Jamenglish. If so, there will be no identifiable day or even year in which this happens, no matter how alert the recording linguists.

We may wonder what might have happened if the Romans had lost the Second Punic War in 202 BCE and Hannibal's descendants had brought to western Europe forms of Phoenician, which would have become as different from each other as modern French, Italian, and Sardinian. However, we could not provide a precise date for the emergence of a Semitic language spoken along the River Seine, any more than we can provide a date for the emergence of Latin-based French.

Languages diversify, and not just languages that spread over large areas through conquest and other forms of social domination. The phenomenon, like language death, connects to the way that people identify themselves with groups, adopting modes of speech that characterize the group. People, teenagers from every generation, speak differently as they feel themselves to belong to a distinct group, just as they may dress differently or wear their hair differently. The tendency for languages to diversify reflects the fact that linguistic change is a constant of human experience.

Like it or not, human languages are in constant flux. They flow around something that does not change, the human capacity for language, a biological property. That capacity is common to the species, is not found outside our species, and has not changed, as far as we know, over the period in which recorded human languages have been coming and going and changing in subtle and in bigger, more dramatic ways. That invariant capacity is one of the constants of human nature and helps us understand how brains deal with the shimmering world outside and impose an internal order, and how that interaction with the world outside yields the diversity of human languages.

Indeed, from certain points of view, there is only one human language. If one asks how many human hearts there are, a reasonable answer is one. The human heart has distinctive properties and is uniform across the species. There are differences, but not of a kind to suggest that there are different types of heart, each genetically determined in the way that, say, eyes may differ in color. At the genetic level, there is one heart, and that is the crucial level for answering such a question. Similarly, if one asks how many languages there are, seen from a biological point of view and given the current state of biology, a plausible answer is ONE, the human language, Human. This is not a new idea: Wilhelm von Humboldt held that "the form of all languages must be fundamentally identical" (1836/1971: 193) and they differ as human physiognomies differ: "the individuality is undeniably there, yet similarities are evident" (1836/1971: 29).

When human beings examine the communication systems of other species, herring gulls, honeybees, or dolphins, we establish the distinctive properties, showing how honeybees differ from herring gulls, and the differences are radical. Species differ in big ways that are genetically determined. Honeybees communicate the direction and distance to nectar sources through their "dance language," by wiggling their rear ends at different rates (von Frisch 1967), herring gulls communicate fear and warning by various body movements and calls (Tinbergen 1957), geese mimic social behaviors through imprinting (Lorenz 1961), and, more controversially, dolphins communicate instructions for finding food through high-pitched tones (Lilly 1975) (von Frisch, Lorenz, and Tinbergen shared the 1973 Nobel Prize in Physiology or Medicine).1 Only after establishing the major species properties are we able to detect differences within the species and rarely do we make much progress in that regard, although different "dialects" of honeybee communication and of passerine birdsongs have been identified.

If colleagues from the Department of Biology, following their usual methods, were to examine the communication systems of life forms on this planet, putting humans alongside honeybees and dolphins, in the way that, say, Niko Tinbergen investigated herring gulls, they would find a number of properties shared by the human species and by no other species, the human language organ (Anderson & Lightfoot 2002). These properties constitute the biggest discovery of modern linguistics. For example, the human language system is not stimulus-bound (not limited to elements within the sensory field), but it is finite and ranges over infinity, it is compositional, algebraic, and involves distinctive computational operations, as we shall see in a few pages. The properties are general - everybody has them - and they facilitate the emergence of the system in young children. The way the system emerges in children also has distinctive properties. For example, the capacity of a mature individual goes far beyond his/her initial experience, unlike birds, for instance, who usually sing pretty much exactly what their models sing (Anderson & Lightfoot 2002: ch. 9; Marler 1999). These are big, distinguishing properties that are biologically based and define the species and its language, Human; Human is very different from any other communication system in the natural world.

Whatever the biological perspective, people do speak differently in Tokyo and Toronto, in the Bronx and in Brooklyn. London is said to have over 300 languages spoken by its citizens, and people's speech is as distinctive as their thumbprint - it often takes only a second or two to know who is on the other end of the telephone line. Why does human speech vary so much and change so readily, if the capacity for language is uniform and static? I shall argue that postulating an invariant CAPACITY for language enables us to understand how we communicate in the context of such rich diversity, where not even sisters speak identically and speech patterns differ in a lottery of linguistic influences. We can understand central aspects of language change and variation, and understand them better than in the past. In particular, we can understand how new systems and new languages emerge.

The POSSIBILITY of variation is biologically based but the actual variation is not. For example, we know that there are distinct systems represented in the language most commonly used in Hamburg and in the most common language of Chicago: verb phrases (VP) are basically object-verb in Hamburg (Ich glaube, dass Gerda VP[Tee trinkt] 'I think that Gerda drinks tea') and verb-object in Chicago (I think that Gerda VP[drinks tea]); finite verbs raise to a high structural position in Hamburg (occurring to the left of the subject of the sentence) but not in Chicago (In Hamburg trinkt Gerda Tee 'In Hamburg Gerda drinks tea,' lit. in Hamburg drinks Gerda tea), and people speak differently. This kind of variation represents something interesting: the language capacity is a biological system that is open, consistent with a range of phenotypical shapes. This is not unique in the biological world - there are plants that grow differently above or below water and immune systems develop differently depending on what people are exposed to (Jerne 1985) - but it is unusual.

One could think of this variation in the way that we think about differences between species. The biology of life is similar in all species, from yeasts to humans. Small differences in factors like the timing of cell mechanisms can produce large differences in the resulting organism, the difference, say, between a shark and a butterfly. Similarly the languages of the world are cast from the same mold, their essential properties being determined by fixed, universal prin- ciples. The differences are not due to biological properties but to environmental factors: if children hear different things, they may grow a different mature system. Linguists want to know how differences in experience entail different mature systems.

Observed variations between languages are secondary to the general, universal properties, and they are not biologically based: anybody can become an object-verb speaker and there is nothing biological about it. Such differences amount to little compared to the distinctive properties that hold for all forms of Human, compositionality, structure dependence, and all the particular computational possibilities (see the next section). That is what distinguishes us from other species and constitutes Human, not the Hamburg-Chicago variation. What distinguishes us from other species must be represented in the human genome; what distinguishes a German speaker from an English speaker is not represented in the genetic material but is represented somehow in brain physiology, although not in ways that are detectable by the present techniques of biologists and neuroscientists. We have no significant knowledge yet of the biochemistry of acquired physiological properties. In fact, fundamental matters are quite open: neuroscientists have traditionally focused on neurons but brain cells of a different type, the glia, are now attracting more scrutiny and outnumber neurons nine-to-one. Glia "listen in" on nerve signals and commu- nicate chemically with other glia. Until we know more, a biologist or neuroscientist using currently available techniques will not detect the differences between German and English speakers and will conclude that there is just one human language, Human, which has the rich kinds of properties we have discussed.

At this stage of the development of biochemistry and imaging techniques, biologists cannot determine physiological properties of the Hamburg-Chicago phenotypical variation. However, they are used to teasing out information that must be provided genetically and we are now beginning to learn about genes like FOXP2, which seem to be implicated in the human language capacity. This work is in its infancy but it has begun. Investigators have found families with mutant forms of the FOXP2 gene and mutant forms of language (Gopnik & Crago 1991). We should not expect a simple solution under which there is a small number of genes specifically controlling language organs. We know that the FOXP2 gene, for example, occurs in other species in somewhat different forms and controls aspects of respiratory and immune systems. Work on smell by Richard Axel and Linda Buck, honored in the 2004 Nobel Prize in Physiology or Medicine, showed a family of one thousand genes controlling a mammalian olfactory system that can recognize 10,000 different smells, and it is possible that many genes play a role in controlling the operation of language organs.

We may identify more genes involved in the operation of language organs and that is in prospect, as we learn more about the functioning of genes quite generally. We can also imagine a day when we can examine a brain and deduce something about acquired characteristics, perhaps that it is the brain of a Japanese-speaking, cello-playing mother of two children, but that day seems to be much further off.

In the first few years of life, children grow systems that characterize their particular, individual linguistic range; adapting traditional terminology for new, biological purposes, we call these systems GRAMMARS. Despite all the personal diversity, we know that individuals each have a system and that certain properties in a person's speech entail other properties, systematically. A person's system, his/her grammar, grows in the first few years of life and varies at the edges depending on a number of factors.

We observe that from time to time children acquire systems that are significantly different from pre-existing systems - they speak differently from their parents, sometimes very differently, and they have new languages. New "Englishes" have emerged in postcolonial settings around the globe. Crystal (2004) argues that English has recently recovered from a few centuries of pedantry and snobbery on the part of elite groups who sought to impose their own norms on others, and literature in non-standard Englishes is flourishing again. Schneider (2003) claims that, for all the dissimilarities, a uniform developmental process has been at work, shaped by consistent sociolinguistic and language-contact conditions.

Sometimes there are big changes, which take place quickly in ways that we shall examine carefully. Those big changes will be the focus of this book and we shall need to understand what the systems are, how children acquire their linguistic properties, and how languages change. We can understand certain kinds of change by understanding how acquisition happens, and, vice versa, we can learn much about acquisition by understanding how structural shifts take place.

Understanding how new grammars emerge involves understanding many aspects of language; a modern historical linguist needs to be a generalist and to understand many different subfields - grammatical theory, variation, acquisition, the use of grammars and discourse analysis, parsing and speech comprehension, textual analysis, and the external history of languages. We shall consider diachronic changes in general, changes through time, but particularly syntactic changes in the history of English, treating them in terms of how chil- dren acquire their linguistic range.

I shall ask for a three-way distinction between the language capacity, internal languages, and external language. That distinction, incorporating what we now call I-language and E-language (Chomsky 1986), has been revitalized in modern generative work but its origins go back a long way. For example, Humboldt wrote that language "is not a mere external vehicle, designed to sustain social intercourse, but an indispensable factor for the development of human intellectual powers . . . While languages are . . . creations of nations, they still remain personal and independent creations of individuals" (1836/1971: 5, 22). E-language is to the nation as I-languages are to the citizens that constitute it.

Internal languages are systems that emerge in children according to the dic- tates of the language capacity and to the demands of the external language to which they are exposed. Internal languages or grammars (I use the terms interchangeably) are properties of individual brains, while external language is a group phenomenon, the cumulative effects of a range of internal languages and their use. Individuals typically acquire some particular form of English, an I-language and not the external language of English as a whole.

1.2 Internal languages

A core notion is that of a grammar, sometimes called an I-language, "I" for internal and individual. This is what I mean by an "internal language" and a grammar, in this view, is a mental system that characterizes a person's linguistic range and is represented somehow in the individual's brain. This is a person's language organ, the system. For example, English speakers - and "English" is a rough-and-ready notion that cannot be defined in any precise way, an external language - have grammars that characterize the fact that the first is may be reduced to 's in a sentence like Kim is taller than Jim is, but not the second is; they would say Kim's taller than Jim is but not ∗Kim's taller than Jim's (the ∗ indicates a logically possible sentence that does not in fact occur). They might say Jim said he was happy with he referring either to Jim or to some other male, but Jim likes him could only be used to refer to two separate people. The plural of cat is pronounced with a VOICELESS hissing sound, the plural of dog is pronounced with a VOICED buzzing z sound, and the plural of church involves an extra syllable - if a new word is introduced, say flinge, we know automatically what its plural sounds like, like the plural of binge. All of this is systematic, characterized by a person's internal language system, his/her grammar.

Linguists know many things about people's grammars; in fact, our knowledge has exploded over the last fifty years. Since grammars are represented in people's brains, they must be FINITE even though they range over an infinitude of data. That is, there is an infinite number of sentences within an individual's range. Give me what you think is your longest sentence and I will show you a longer one by putting He said that . . . in front of it; so your The woman in Berlin's hat was brown is lengthened to He said that the woman in Berlin's hat was brown. And then She thought that he said that the woman in Berlin's hat was brown. And so on. If we had the patience and the longevity, we could string relative clauses along indefinitely: This is the cow that kicked the dog that chased the cat that killed the rat that caught the mouse that nibbled the cheese that lay in the house that Jack built. All of this means that grammars have RECURSIVE devices that permit expressions to be indefinitely long, and therefore indefi- nitely numerous. Finite grammars, therefore, generate indefinite numbers of structures and involve computational operations to do so. That's part of the system.

Grammars are also ALGEBRAIC and generalizations are stated not in terms of particular words but in terms of category variables like verb, noun, preposi- tion, etc. The VERB category ranges over die, like, speak, realize, and the PREPOSITION category ranges over over, up, through, etc.

Also, grammars consist of different kinds of devices and are therefore MODULAR. Some device derives cats, with a voiceless s, from cat+plural, as opposed to dogs, with a voiced z, from dog+plural, and as opposed to churches, with an extra syllable, from church+plural. A different device relates a structure corresponding to What do you like? to You like what, with what in the position in which it is understood, namely as the direct object (or COMPLEMENT) of the verb like. That device "displaces" any phrase containing a wh- word and creates a structure in which the displaced wh- phrase (in square brackets) is followed by an auxiliary verb (italicized) (1). Again, this is systematic.


  1. [What] do you like?
  2. [What books] will she buy?
  3. [What books about linguistics written in English] have you read?
  4. [Which books about linguistics that the guy we met in Chicago told us about] could they publish?

So grammars are generally supposed to be finite and ranging over an infinitude of data, algebraic, and modular (consisting of different types of mechanisms), and to involve computational operations of a special kind. These are some very basic, general properties of people's grammars, all grammars, and I shall discuss more as we go along. For the moment, we just need to grasp that people's language capacity is systematic.

A fundamental property of people's grammars is that they develop in the first few years of life and, again, our knowledge of how they develop has exploded over the last few generations: we have learned a great deal about what young children say and what they do not say, using new experimental techniques. Also in this domain, there is a great deal of systematicity, much of it newly discovered and different from what we find in other species. A person's grammar emerges on exposure to particular experiences in conformity with genetic prescriptions. An English speaker's system arose because, as a child, he/she was exposed to certain kinds of experiences. Children raised in Hamburg are exposed to different experiences and develop different systems. One empirical matter is to determine which experiences trigger which aspects of people's grammars, not a trivial matter.

Linguists refer to what children hear, to the crucial experiences, as the primary linguistic data (PLD). Somehow grammars are acquired on exposure only to PRIMARY linguistic data but characterize secondary data in addition to the primary data. For example, children might hear expressions like Kim is tall or Kim's tall and thereby learn that is may be reduced to 's. So primary data might trigger an operation mapping is to the reduced 's. However, the grammar must also characterize the secondary fact, already noted, that the second is does not reduce in Kim is taller than Jim is. That is a secondary fact, because the non-occurrence of ∗Kim's taller than Jim's is not something that children hear. You cannot hear something that doesn't occur.

This is crucial and constitutes part of the POVERTY-OF-STIMULUS problem, which will turn out to be important for our general story. Somehow the stimulus that children have is rich enough for them to learn that is may be reduced, but not rich enough to determine that it not be reduced in the longer sentence. The fact that the second is cannot be reduced cannot be learned directly from experience.

Children converge on a system, subconsciously, of course, in which certain instances of is are never reduced, even though their experience doesn't demonstrate this. These poverty-of-stimulus problems are widespread. In fact, there are very few, if any, generalizations that work straightforwardly; all but the most superficial break down and reveal poverty-of-stimulus problems, like the reduction of is to 's. The problems are solved by postulating information that is available to children independently of experience, represented in some fashion in the genetic material, directly or indirectly. This is a central part of our reasoning and we shall illustrate the logic in chapter 3.

The reason why poverty-of-stimulus problems are pervasive is that there are genetic factors involved, and those genetic factors solve the problems. Careful examination of the poverty-of-stimulus problems reveals the genetic factors that must be involved, just as Gregor Mendel postulated genetic factors to solve the poverty-of-stimulus problems of his pea-plants.

In this view, children are internally endowed with certain information, what linguists call Universal Grammar (UG), and, when exposed to primary linguistic data, they develop a grammar, a mature linguistic capacity, a person's internal language or I-language (2a). The essential properties of the eventual system are prescribed internally and are present from birth, in much the way that Goethe (1790) saw the eventual properties of plants as contained in their earliest form in a kind of ENTELECHY, where the telos 'end' or potential is already contained in the seed.2

To summarize, grammars are systems: formal characterizations of an indi- vidual's linguistic capacity, conforming to principles of a universal initial state, UG, built from its elements, and developing as a person is exposed to his/her childhood linguistic experience. A grammar, in this terminology, is a mental organ, a person's language organ, and is physically represented in the brain, "secreted" by the brain in Darwin's word. The grammar characterizes not only the primary but also the secondary data. One can think of the Primary Linguistic Data (PLD) as the triggering experience that makes the linguistic genotype (UG) develop into a linguistic phenotype, a person's mature grammar (2).


  1. Primary Linguistic Data (UG → grammar)
  2. Triggering experience (genotype → phenotype)

Grammars emerge through an interplay of genetic and environmental factors, nature and nurture. A task for linguists is to distinguish the genetic from the environmental factors, teasing apart the common properties of the species from the information derived from accidental experience, the source of the diversity.

Two analogies with chemistry are appropriate here. As noted, these grammars characterize a person's linguistic capacity and are represented in the brain. Damage to different parts of the brain may affect a person's language capacity differently. Grammars or I-languages consist of structures and computational operations, of a kind that we shall see, and not of neurons, synapses, and the stuff of neuroscientists, but nonetheless they are represented in that kind of matter. The claim here is that there are significant generalizations statable in these linguistic terms and that internal languages constitute a productive level of abstraction in the same way that chemical elements make up a level of analysis at which productive generalizations can be stated. The elements of chemistry can also be reduced to some degree to other levels of abstraction, to quanta and the elements of physics. However, they don't need to be so reduced. In fact, there are few instances of such reductions in science and for the most part scientists work at different levels of abstraction, each justified by the kinds of generalizations that it permits. Chemists and physicists work at different levels, each able to state interesting generalizations.3 Likewise biologists, physiologists, and medical doctors.

© Cambridge University Press

Table of Contents

Preface; 1. Internal languages and the outside world; 2. Traditional language change; 3. Some properties of language organs; 4. Languages emerging in children; 5. New E-language cuing new I-languages; 6. The use and variation of grammars; 7. The eruption of new grammars; 8. A new historical linguistics; References; Index.

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews