Read an Excerpt
By Bryan K. Epperson
Princeton University PressBryan K. Epperson
All right reserved.
ANCIENT EVENTS IN SPATIAL-TEMPORAL PROCESSES
The occurrence of new alleles each produced by a unique mutation or recombination [causes]... the "gametic pool" of Sewall Wright to be extended, as time progresses, to an indefinitely increasing number of new alleles; now called infinitallelism, which might have been the germ of the molecular clock had not... the (falsely Mendelian) consensus about fixed genes... paralyzed Post-Darwinism imagination.
-Gustave Malécot (1998 personal communication)
Major events in the distant past may leave transient signatures in spatial and spatial-temporal patterns of genetic variation. Important ancient events include the time and place of genetic "innovations," refugia, range expansions, colonizations or major immigration events, and fragmentation. Transient effects of ancient events contrast with stable patterns that can be produced by selection, genetic drift, and migration averaged over long periods. The study of transient effects of major events in the distant past calls for a somewhat different emphasis in the context of spatial-temporal processes. For example, in the theoretical works of Malécot (e.g., 1948) the focus was on deriving spatial distributions produced over long periods or at equilibrium, and they provide an entirely appropriate basis for empirical studies of stable geographical patterns of genetic variation. Equilibrium results do not show the transient effects of major events. Moreover, we should distinguish long-lasting yet transient effects of ancient events from more recent or shorter term transient effects. The latter may be analyzed using the migration matrix approach or short-term space-time correlations.
The signatures left by most nonrecurring ancient events are delible and continued gene flow will eventually erase them. Many empirical studies aim to detect the trace of an ancient event, in effect to parse off particularly important features of the past from the spatial-temporal context. Frequent goals are to infer the geographic origins of new genetic variants or by extension the geographic origins of species themselves. In this chapter examples are used to illustrate the primary issues of the general conditions required for valid separation of particular temporary spatial or spatial-temporal patterns from the space-time process in which they are embedded. Typically, such studies do the following: (1) use population differentiation per se of molecular variation at the present time; (2) conduct phylogenetic reconstruction to infer the gene genealogies, and the ancestral genotype (generally without including information on spatial proximity, structure, and migration in the probability models); and then (3) use present spatial patterns of types that are most like the inferred "ancestral" type, and hence infer the past location of ancestral type. Examination of such studies also illustrates some of the key distinctions of modern molecular data from gene frequency data. Phylogenetic studies can sometimes take advantage of a unique kind of temporal "depth" (Templeton 1998) to spatially distributed molecular data. Steps 1 and 2 may not encounter serious problems, particularly if the timescale on which mutations accumulate is much slower than that on which migrations occur, and when coalescences within populations are much more recent than those among populations (Nordborg 1997). The rationale for the third step has received the least attention, and the step often appears to be made subconsciously. While it may seem to be a safe assumption that a given gene sampled at the present time is most closely related to the past or ancient genes in the same population, the gene may also be very closely related to ancient genes in other populations, perhaps in some cases quite distant geographically. This is a recurring theme in the inference of major events in spatial-temporal processes.
The study of ancient events in population genetics generally requires extensive datasets. Most survey data are contemporary, but as ancient DNA samples become more available, they may become disproportionately important and informative. There are still few species for which sufficient, even purely spatial (i.e., not space-time) datasets exist. There can be no doubt that in the near future sample sizes orders of magnitudes larger will become available for some species, including humans. But because data requirements have been yet rarely met in surveys, much of this chapter is devoted to a specific example, the origin of anatomically modern humans. Other examples of ancient population genetics follow, for humans and other species.
OUT OF AFRICA
Studies of human genetics typically have especially large numbers of widely separated study populations, with large sample sizes and numbers of genetic markers, and this makes them well suited for inferring the location of an ancestral population from the distant past. We will examine issues around the so-called out-of-Africa hypothesis or theory of the origins of modern humans. I wish to make it very clear that this examination is not necessarily intended to challenge this theory. It very well may be true, and there are many considerations not addressed here that support the theory.
The paramount features of the out-of-Africa hypothesis surround the geographic location and isolation of the first anatomically modern humans. By some half-million years ago so-called archaic hominids had spread throughout much of the Old World. The simplest form of the out-of-Africa hypothesis is that the first anatomically modern humans evolved in a small population of probably less than 10,000 individuals, in complete reproductive isolation, somewhere in Africa, about 200,000 years ago. Once evolved, this population began to increase in size and spread geographically. One scenario has it that humans spread along the coastal zones of Africa, and later into Asia and Europe. As they spread further to various regions of the Old World, they must have come into contact with the archaics already present, but did not interbreed at all. All genes today are descended from those in the isolated original population in Africa. This theory has become widely accepted in the last decade or so, while enthusiasm for a contrasting theory, the "multiregional hypothesis" (e.g., Wolpoff 1989), waned. The multiregional hypothesis states that the gene pool of anatomically modern humans, us, contains substantial contributions from many prior regionalized and differentiated archaic populations. Much of the cited support for the out-of-Africa theory is genetic, in particular the pattern of genetic differentiation among geographically separated groups, especially ethnic populations that have not undergone recent global migrations. There is also some physical evidence for the theory (e.g., Klein 1995; Lahr 1996; Sokal et al. 1997b).
The out-of-Africa theory became widely supported following phylogenetic studies of mitochondrial DNA, mtDNA. Mitochondrial DNA is strictly maternally inherited (e.g., Stoneking and Soodyall 1996), hence the genealogy of mtDNA is matrilineal. In a highly influential and widely publicized paper, Cann et al. (1987) studied samples of mtDNA from global populations and inferred a "mitochondrial Eve," the woman who carried the most recent common ancestor (MRCA) of all mitochondria today. In other words, all mito-chondria lineages trace back to or "coalesce" in the mitochondrial Eve. Using polymorphic sites in the data and the molecular clock, Cann et al. (1987) estimated that "Eve" lived about 200,000 years ago. While this feature captured headlines and public imagination, the existence of a mitochondrial Eve is a necessary outcome of life, because any set of genes must trace to a common ancestor at some time in the past. Moreover, because mtDNA does not recombine, the entire mtDNA genome must all be descended from a single woman. The estimate that the time back to the most recent common ancestor, or TMRCA, was 200,000 years ago is interesting in part because it coincides with fossil evidence of the appearance of anatomically modern humans. However, this coincidence in itself also does not mean much, because there is no a priori reason to expect the TMRCA of a set of haplotypes or DNA sequences to necessarily coincide with the event of origination.
If there is selectively neutrality, the TMRCA should depend primarily on a function of the overall effective population size, and estimates of the TMRCA depend also on the mutation rates. For example, the TMRCA could in principle occur long after the origin of modern humans, in particular if human population size bottlenecks occurred after the origin. The TMRCA could also have been much earlier than the origination, if populations remained large prior to the formation and throughout the existence of modern humans. There could have been polymorphism within the theorized isolated single original population under the out-of-Africa hypothesis, depending on what its size was and for how long it was isolated. Based on the estimated value of TMRCA, the human effective population size, which is usually a function of the harmonic mean, since the time Eve existed has been estimated at around 10,000. A number of studies have argued that this is too small to fit the multiregional hypothesis, because, they maintain, the Old World population of archaics must have been much larger simply to have been sustained (e.g., Harpending et al. 1998).
Effective population size Ne is an important concept in population genetics, and it is worth noting some of its general properties. It simplifies the extension of theoretical models, originally constructed for an "ideal" population, to many other situations. Typically, the ideal population is constant in size, with monoecious random mating, and certain constraints on the variance of numbers of progeny produced per parent (e.g., see discussion about the process Equation 5.1 in chapter 5). Generally, an Ne is a function of the actual population size N and other, modifying factors, and it can simply be substituted for N in the model equations formulated for the ideal population. Equations are variously expressed in terms of probabilities of identity by descent, coalescences, or gene frequency covariances or variances, hence Ne is "effective" with respect to these measures. The most common are the "inbreeding effective number" and the "variance effective number" (e.g., see review by Crow and Denniston 1988). In some cases, but not others, they are equivalent. In general, population biological factors that substantially affect Ne include unequal numbers of females and males, large variance in numbers of offspring, and various forms of systemic inbreeding (e.g., Caballero and Hill 1992). However, when examining out-of-Africa the increase in population size is far more important. Ne is strongly disproportionately affected by any small sizes, as, for example, when Ne is a function of the harmonic mean (e.g., Crow and Denniston 1988).
While the TMRCA of mtDNA appears to be on fairly solid footing, it is much more difficult to determine the sequence and timing of how population size may have expanded and contracted during the last 200,000 years or so. The TMRCA for mtDNA is not so satisfying because it represents only the mitochondria. It does not mean that all other (e.g., autosomal) genes also coalesced at the same time, nor even that all genes came from the same population as the mitochondrial Eve. More recent analyses of larger datasets of mtDNA polymorphisms (e.g., Stoneking and Soodyall 1996) generally support the TMRCA reported by Cann et al. (1987). Together with the TMRCA, the genotype of the most recent common ancestor is also inferred from the gene genealogy, based solely on probabilities of mutations. It should be noted that the actual probabilities of common ancestry are also functions of geographic structure and migration rates, as well as mutation rates, and, although the former two factors may not make much difference, they were not considered in the calculations. However, this may not matter much if mutation effects occur on a much slower timescale than migration effects (e.g., Nordborg 1997; Fu 1997).
The geographic location of a most recent common ancestor is even more difficult to assess. Cann et al. (1987) found that among sampled populations, those in Africa were most similar to the inferred ancestral mtDNA sequence. Many studies have stated that (provided the data and gene genealogical conclusions are reliable) this means the ancestral sequence and the mitochondrial Eve existed in an African population. However, usually such statements have been made without further comment and are unsupported by any population genetic arguments or models. Recent theoretical developments have shown that the restricted presence of the inferred ancestral gene in specific present-day populations may or may not be closely related to the likelihoods of those populations having originated the ancestral gene, depending on the rates of mutation and migration (Epperson 1999a; 2002). The inferred ancestral gene (if selectively neutral) is in essence a randomly chosen representative of a population, hence relevant theoretical results can be expressed as space-time probabilities of identity by descent, a space-time extension of Malécot's definition of spatial probabilities of identity by descent. Under the conditions of the models (Epperson 1999a; 2002; and chapter 5), the relative values of these probabilities equal the probabilities of origination, because of the simple fact that one of the populations must have contained the ancestor of any given gene at present. Particularly important is how the probabilities of descent depend on the geographic distance between a potential ancestral (origination) geographically located population and the location of a present population. For example, consider Africa and Asia as alternative potential locations of origination, containing the ancestral mtDNA (mitochondrial Eve), and that now only African populations contain the ancestral type mtDNA. What are the relative likelihoods that Africa rather than Asia was the location? The two likelihoods depend on the migration rates and mutation rates (Epperson 1999a; 2002), and they should be nearly equal when the amount of migration is high and the rate of mutation is high. In such cases, the location of a gene today (e.g., ancestral type mtDNA in some African populations today) has almost nothing to do with where its ancestor was a long time ago. In other words, the spatial or geographic pattern of genetic variation today contains almost no information about where the origination was, whether one uses haplotype frequencies or gene genealogies based on phylogenetic reconstruction and degrees of differentiation (e.g., among DNA sequences).
Excerpted from Geographical Genetics by Bryan K. Epperson Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.