AMERICAN JOURNAL. OF PHYSICAL ANTHROPOLOGY 96109-132 (1995) Indo-European Origins: A Computer-Simulation Test of Five Hypotheses GUIDO BARBUJANI, ROBERT R. S O W , AND NEAL L. ODEN Dipartimento di Scienze Statistiche, Universita di Bologna, I-40126 Bologna, Italy (G.B.); Department of Ecology and Evolution, State University of New York, Stony Brook, New York 11 794-5245 (R.R.S., G.B.); The EMMES Corporation, Potomac, Maryland 20854 (N.L.O.) KEY WORDS Genetic variation, Demic diffusion, Language, Computer simulation ABSTRACT Allele frequency distributions were generated by computer simulation of five models of microevolution in European populations. Genetic distances calculated from these distributions were compared with observed genetic distances among Indo-European speakers. The simulated models differ in complexity, but all incorporate random genetic drift and short-range gene flow (isolation by distance). The best correlations between observed and simulated data were obtained for two models where dispersal of Neolithic farmers from the Near East depends only on population growth. More complex models, where the timing of the farmers’ expansion is constrained by archaeological time data, fail to account for a larger fraction of the observed genetic variation; this is also the case for a model including late Neolithic migrations from the Pontic steppes. The genetic structure of current populations speaking Indo-European languages seems therefore to largely reflect a Neolithic expansion. This is consistent with the hypothesis of a parallel spread of farming technologies and a proto-Indo-European language in the Neolithic. Allele-frequencygradients among Indo-European speakers may be due either to incomplete admixture between dispersing farmers, who presumably spoke proto-Indo-European, and pre-existing hunters and gatherers (as in the traditional demic diffusion hypothesis), or to founder effects during the farmers’ dispersal. By contrast, successive migrational waves from the East, if any, do not seem to have had genetic consequences detectable by the present comparison of observed and simulated allele frequencies. 0 1995 Wiley-Liss, Inc. Wide allele-frequency clines exist a t sev- gies of Mesolithic and Neolithic settlements eral loci in Europe (Menozzi et al., 1978; indicate a westward spread of farming techSokal and Menozzi, 1982; Sokal et al., nologies from the Near East, starting ap1989a). Their extent is such that simple proximately 8,000 BC (Ammerman and Cavmodels of isolation by distance are unlikely alli-Sforza, 1984; Renfrew, 1987). The to explain them. It is generally agreed that observed gene-frequency gradients can then they result from a population expansion be explained by attributing the propagation starting in Anatolia approximately 10,000 of farming t o the dispersal of early farmers, years ago (Menozzi et al., 1978; Sokal and Menozzi, 1982; Renfrew, 1987, 1991, 1992; Cavalli-Sforza, 1988; Sokal et al., 1991), Received January 25,1994; accepted August 7,1994 most likely associated with the development Address reprint requests to Robert R. Sokal, Department of of technologies for food production (Hassan, Ecology and Evolution, State University of New York, Stony 1973; Zeven, 1980). Radiocarbon chronolo- Brook, NY 11794-5245. 0 1995 WILEY-LISS, INC. 110 G . BARBUJANI ET AL. who interbred only sparingly with the bands of hunters and gatherers whom they met in the process, and who had already colonized most of Europe (Ammerman and CavalliSforza, 1984). Such a combination of demographic growth, range expansion, and limited admixture has been termed demic diffusion (Menozzi et al., 1978). Its expected consequences include correlations between allele frequencies and the dates of onset of agriculture (Sgaramella-Zonta and CavalliSforza, 1973), which have actually been observed (Sokal et al., 1991). The European regions where early (Neolithic) agriculturalists expanded correspond approximately to the current western range of Indo-European languages. This raises the question whether the cultural process of Indo-European diffusion was also determined by the demographic processes accompanying the spread of farming (Renfrew, 1987). The traditional view holds, on the contrary, that proto-Indo-European entered Europe not earlier than 4500 BC, through three migrational waves also coming from the East, namely from the Pontic steppes (Gimbutas, 1979, 1986). Although directed westwards, like demic diffusion, these waves were not associated with population increases comparable to those caused by the introduction of farming and animal breeding. Therefore, diffusion of Indo-European in the late Neolithic would imply that languages spread more through cultural contacts (Zvelebil and Zvelebil, 1988) than by demographic processes (see Renfrew, 1989). As a corollary t o this, association between patterns of genetic and linguistic variation should be limited and occasional among contemporary Indo-European speakers. Support for the view linking the spread of proto-Indo-European in Europe with demic diffusion comes from studies showing that patterns of linguistic and genetic diversity correspond in many European populations (Sokal, 1988; Cavalli-Sforza et al., 1988, 1992; Harding and Sokal, 1988; Barbujani and Sokal, 1990, 1991; Bertranpetit and Cavalli-Sforza, 1991). The exceptions include Basques (Piazza et al., 1988; Bertranpetit and Cavalli-Sforza, 19911, Hungarians (Barbujani et al., 1990), and Uralic-speakers (Guglielmino et al., 19901, that is to say, groups whose current languages do not belong to the Indo-European phylum. Recent years have seen both a refinement of the hypotheses on Indo-European origins, and the emergence of contradictory data. Based on archaeological and linguistic evidence, Cavalli-Sforza (1988) and Renfrew (1991, 1992) argued that the common elements recognized within the so-called Nostratic linguistic macrofamily (Kaiser and Shevoroshkin, 19881, including Indo-European, Altaic, Afro-Asiatic, and Elamo-Dravidian, derive from a common biological origin of most of their current speakers. It is then possible to interpret the current distribution of most Nostratic, and not only IndoEuropean, languages as a consequence of a multidirectional spread of agriculture. Recent genetic analyses agree with this view (Cavalli-Sforza et al., 1993; Barbujani and Pilastro, 1993). On the other hand, a model incorporating the effects of the origin of agriculture and/or specifically the hypotheses of Renfrew and Gimbutas failed to explain a larger fraction of the correlations between genetic and linguistic distances than is explained by the simple effects of geographic distances (Sokal et al., 1992). Further evidence in favor of either hypothesis may be obtained by simulating their genetic consequences, and then comparing them with the patterns of genetic variation observed in the field. In the only simulation study available so far, presenttime genetic variation was demonstrated to be compatible both with a Neolithic origin of Indo-European speakers, and with later immigration from the Pontic steppes (Rendine et al., 1986). However, the two hypotheses were not contrasted in that study, but combined. A relationship between current genefrequency gradients and dispersal from the east was evident, and seems undisputable. However, when exactly, and by what type of process, proto-Indo-Europeans spread has not been convincingly ascertained. We are particularly interested in establishing if the demic diffusion of early farmers can, at least in principle, explain a large share of current genetic diversity among Indo-European speakers, or if additional processes must be included in the model to obtain a better fit with real data. Among these additional pro- INDO-EUROPEAN ORIGINS cesses, we gave a special emphasis to the migratory waves postulated by Gimbutas. In this study, we simulated microevolutionary scenarios of increasing complexity, from an unrealistically simple one to models including several archaeologically documented migrations. We then calculated correlation coefficients between the simulated and real gene frequencies (or, more precisely, between matrices of genetic distances calculated from each of them). We expected to observe an increasing agreement between real and simulated data as the models get more and more realistic. When an increase in the complexity of the model is not matched by an increase in the correlations, we conclude that the new factors included in that model do not improve our understanding of the phenomenon, and should therefore be considered unnecessary. This does not automatically imply that these factors played no evolutionary role at all. But, if they did, evidence of their effects should be sought in data other than the currently available allele frequencies. A large database of European allele frequencies (see Sokal et al., 1989a) was analyzed for this purpose. METHODS AND DATA Overview of the simulations We carried out a series of computer simulations of five microevolutionary models. All models were based on a stepping-stone population structure, consisting of a 60 X 37 regular lattice, superimposed onto the map of Europe. Each node of the lattice represents a l-degree square quadrat. Of the 2,220 nodes of the lattice, 1,512 are land areas supporting a human population. Each node (population) is characterized by its effective size (N,) and by the frequency of one allele (p).At each generation p undergoes random variation, representing the effects of genetic drift, which is a function of N,. Migration is allowed only between adjacent populations. The numbers of individuals migrating at each generation depend on population sizes, and on factors of resistance to migration, which are zero across plains, but greater than zero across mountain chains and seas. The differences among the five 111 models lie in parameters other than those described so far. Each simulation experiment consisted of 440 iterations (representing generations) of a series of population processes. In this way, assuming a 25-year generation interval and non-overlapping generations, each experiment spans 11,000 years. The simulated allele frequencies were printed at generation 440, representing present time, for the localities corresponding to those for which allelefrequency data are available in a database of European allele frequencies (Sokal et al., 1989a), from which non-Indo-European speakers had been discarded. Matrices of Prevosti’s genetic distances (Prevosti et al., 1975) were calculated on both real and simulated allele frequencies, and their degree of resemblance was evaluated by Mantel (1967) tests of matrix correlation. Five FORTRAN programs were written, each corresponding to one of five models for the origins of Indo-Europeans described in the following section, and incorporating subroutines developed by Press et al. (1986)and Manly (1991). An outline of the models IBD: Isolation by distance The first, clearly oversimplified, hypothesis, is that Indo-European-speaking populations evolved under conditions of isolation by distance (IBD model). Current patterns of genetic variation would then simply result from the interaction between random fluctuations of allele frequencies in time, i.e., genetic drift, and dispersal of individuals. Under isolation by distance, variations in population size affect only the impact of genetic drift-the larger the population, the smaller the allele-frequency fluctuations. Population growth, which occurs after generation 40, does not prompt migratory movements. Thus, the IBD model neglects all gene flow processes other than those in which movements of individuals from their birthplaces are local and random (i.e., equally likely in all directions except for migration resistance factors, see below). Under the IBD model, the demographic increase that occurred in the Neolithic (and is detailed in the section Population Growth 112 G. BARBUJANI ET A L Among Farmers) was simulated without separating hunting-gathering and farming populations, i.e., as if all hunter-gatherers turned to agriculture a t 8,000BC. OAC: Isolation by distance, plus effects the origin of agriculture, and cultural transmission of Cultural transmission from farmers to hunter-gatherers may be built into the model, yielding what we call OAC; C stands OAG: Isolation by distance plus effectsof for culture. Under this model, at all localithe origin of agriculture ties some hunter-gatherers learn how to produce food, and therefore their alleles are Isolation by distance is the null hypothetransmitted across generations with greater sis for human microevolution (Wijsman and efficiency. From the genetic standpoint, this Cavalli-Sforza, 1984). Therefore, all models is equivalent to a certain degree of admixthat follow are not alternative to IBD. ture, whereby some genes of the hunterRather, they incorporate it as a necessary, if not sufficient process, for determining the gatherers contribute to the gene pool of the farmers. As a consequence, these genes currently observed patterns of genetic variaspread at once with the genes of the farmers, tion. OAG, the simplest such model, is one and are thus carried into new localities. This which combines IBD with the likely effects is the Neolithic demic diffusion model, as of the demographic processes following the originally proposed (Menozzi et al., 1978; origin of agriculture. Under this model, popRenfrew, 1987). ulations of hunter-gatherers initially occupy Europe and evolve under isolation by distance. At a specific moment in time (8,000 BC, chosen on the basis of archaeological inATC: Isolation by distance, plus effects formation), a few populations in southern of the origin of agriculture, cultural Anatolia turn to farming. This starts a local transmission, and archaeological process of population growth in the areas time constraints where farming is being practiced, followed Under OAC, the spread of farmers from by dispersal outwards when local population densities have reached a certain threshold. Anatolia into Europe is driven by their inIn this way, migratory movements between crease in numbers at each locality, which farming communities are not necessarily causes dispersal towards areas of lower popsymmetrical, as is reasonable to assume in ulation density. Therefore, in the OAC many evolutionary scenarios (Rogers and model, the farming technologies spread a t Jorde, 1987). The rate of spread of farmers is an approximately constant rate through driven by their intrinsic growth rate; it is space (as in Ammerman and Cavalli-Sforza, constrained only by geographical factors 1971). This is known to be an approximation such as mountain chains or bodies of water. (Barker, 1985).A further refinement of OAC No cultural transmission is simulated be- considers archaeological time constraints tween the hunter-gatherers and the farmers (ATC). Under ATC, we use archaeological who immigrate into their regions. Only the information about the likely date at which farmers’ allele frequencies are eventually farming reached each specific site in Europe compared with observed matrices of genetic (see Sokal et al., 1991). In this way, the distances. In this way, the genetic conse- arrival of farmers into a new locality requences of this model are those that would flects archaeologically documented cultural be expected if hunter-gatherers were re- transformations; farmers spread at an irregplaced without admixture, i.e., became ex- ular rate, corresponding to the actual protinct. The only microevolutionary role they cess as inferred from archaeological eviplay is to serve as a source population at the dence. Incorporation of hunter-gatherers beginning of the Neolithic, for that small into each farming population occurs at the fraction in Anatolia of the total population same rates and through the same processes that develops the new farming technologies. as in OAC. INDO-EUROPEAN ORIGINS GIM: Isolation by distance, plus effects of the origin of agriculture, cultural transmission, archaeological time constraints, and late Neolithic migmtions In the OAG, OAC, and ATC models, the first farmers are also considered the first speakers of proto-Indo-European. The alternative hypothesis considers them as the Neolithic inhabitants of areas that were later invaded by the first proto-Indo-European speakers, the Kurgan people (Gimbutas, 1979, 1986). The three migrational waves postulated by Gimbutas are added to ATC in the GIM model, by simulating long-distance migratory movements between 4,250 BC and 2,900 BC. A number of successive population movements are added as well; presumably, they were independent from the spread of Indo-European, but are considered by Gimbutas (personal communication) relevant to an accurate description of human evolution in Europe. Details on the simulation parameters and algorithms The data matrix In all simulation cycles, a matrix of 60 columns by 37 rows was defined, each element in the matrix representing a square of edge length 1 degree in a Mercator projection of Europe. The data matrix covers the area between 10 degrees of longitude West and 50 degrees East, and between 72 and 35 degrees of latitude North. Iceland is not included in this simulation. 113 ing the frequency of an allele at a polymorphic locus in the hunting-gathering population, i.e., in the only type of population existing at the beginning of the simulation. From a mathematical standpoint, it does not make a difference if this locus is regarded as biallelic, or if it is considered multiallelic, since the fate of only one of its alleles is simulated in the followingphases. Allele frequencies were drawn from a gamma distribution truncated at one (Nei, 19871, whose mean was fixed either a t 0.33 or at 0.50. Initial population sizes Estimates of population densities among current hunting-gathering tribes suggested to Rendine et al. (1986) that the effective size NHG of the hunting-gathering populations of Europe should be approximately 300 in each of the 840 elementary areas of their simulation. To have the same population density in the 2,220 pixels of this simulation, NHG was fixed at 114. This corresponds to a population density of 0.04 individuals per square km,within the estimated range of population densities for hunter-gatherers in temperate climates (Hassan, 1981). In Rendine et al.’s (1986) model, the individuals were considered as haploid, whereas here they are diploid. This may have caused a certain degree of divergence between the two models, as the drift variances are affected by the levels of ploidy of a population. Genetic drift among hunter-gatherers Each of the 2,220 elements (= nodes or pixels or localities) of the data matrix contained an integer value, L, which was 1 for plains, 2 for mountains, 3 for seas, and 4 for the Black Sea. A local population was assigned to each of the 1,512 land pixels. Non-overlapping generations were simulated. At each generation, a new allele frequency was drawn, for each locality, from a normal distribution whose mean was the allele frequencyp of the same population a t the previous generation, and whose variance w a s p 0 - p), divided by twice the effective population size NHG (Nei, 1987). This represented the effect of sampling of alleles from one generation to the following, i.e., random genetic drift. Initial allele frequencies Dispersal of hunter-gatherers Under all the models tested, for each land pixel a variable PHG was defined, represent- Symmetrical dispersal occurred between adjacent pixels, once every generation fol- Geography 114 G. BARBUJANI ET AL. TABLE 1. Factors of resistance to migration (RTW And an adjacent pixel in t h e Plains Mountains Sea Black Sea Between a pixel in thePlains Mountains Sea Black Sea 0.00 0.25 0.45 1.00 0.50 0.70 1.00 0.90 1.00 1.00 lowing drift; population sizes of the two localities exchanging individuals did not change as a result of dispersal. Therefore, this study assumed a stepping-stone model (Kimura and Weiss, 19641,whose properties are discussed by Jorde (1980). We chose a dispersal rate m of 0.065 (as in Rendine et al., 1986), which means that at each generation, after reproduction, 6.5%of the resident individuals could be replaced by immigrants from the adjacent pixels. However, physical obstacles in the pixels between which dispersal occurred could reduce the number of migrants. Physical obstacles t o migration were expressed by a factor of resistance to migration, RTM, detailed in Table 1. The average value of RTM, calculated between all suitable pairs of pixels, was 0.300. This means that, on the average, only 70% of the potentially dispersing individuals actually moved from their birthplace to an adjacent locality. To compensate for this, the dispersal rate m was replaced by m' = 0.0651 0.700 = 0.0928. In other words, 9.28%of the individuals in a locality were potentially subject to migration elsewhere, and 6.5%actually migrated, on the average. The number N(AB) of individual hunter-gatherers moving from locality A t o B (and vice versa, from B to A) was N(AB) = NHGm' (1 - RTM(AB)) (1) 4 ~ where the denominator refers to the number of adjacent populations in a stepping-stone model, and RTM(AB) depends on the environmental features a t localities A and B (Table 1).WhenNHGwas fixed at 114, each pair of adjacent localities exchanged 2 effective individuals per generation. However, under the IBD model the hunting-gathering populations increase in size, starting at genera- tion 40; in this case, the number of migrants per generation increased accordingly. When, by contrast, hunter-gatherers decreased in numbers owing to the expansion of farmers, NAB) decreased as well until it reached zero. Because of the RTM factor, two localities in the plains exchanged freely onefourth of the migrants allowed at each generation, whereas dispersal was reduced between populations separated by mountains or bodies of water, and for populations a t the extremes of the simulated area. No dispersal was allowed across the Black Sea, to more carefully represent the population processes in the surrounding area (e.g., the migrations of Kurgan people), which occurred mostly by land movements. The allele frequency after migration, pfHG, was then calculated as P'HG = P N G [ ( ~- m') (1 - RTMI + rn'PHGin (2) where RTM is the average resistance to migration between the pixel of interest and the nA adjacent pixels (1 < nA < 41, and pHGin is Inception of farming in the nuclear zone Under all models except IBD, the spread of farming starts with the splitting of some populations into two groups, one practicing agriculture, and the other still living in a hunting-gathering economy. At generation 40 (i.e., 10,000 years ago), 20 individuals turn to farming at each of six pixels in Anatolia, around the village of Catal Humk, where the oldest archaeological evidence of farming activities is situated (Redrew, 1991). Each group of 20 individuals represents a random sample of the pre-existing hunting-gathering population at the same site; therefore, initially they have the same allele frequency: pF = PHG. However, from generation 41 they evolve in reproductive isolation, so that there will be two distinct populations of hunter-gatherers and farmers, HG and F, at those localities, with distinct allele frequencies. In the absence of INDO-EUROPEAN ORIGINS cultural transmission between groups, i.e., under the OAG model, the two groups coexist without any genetic exchange, at all localities where farming communities have been established. 115 pansion towards localities a t the same latitude (i.e., with the same type of climate) was presumably more common than dispersal northwards, towards more rigorous climates. This led us to subdivide the history of our simulated farming populations into four successive phases. Phase 1: Initially, Population growth among farmers the population is scarce, and simply tends to The farming populations have access to a increase logistically, without sending emiwider range of resources, and tend to in- grants, but receiving immigrants from other crease in numbers (Hassan, 1973); 50-fold farming populations a t higher density, if increases have been estimated by Ammer- any. Phase 2: When NF reaches a first man and Cavalli-Sforza (1984). We simu- threshold, T1, a few individuals begin to dislated a logistic increase, whose key parame- perse longitudinally; this often entails coloter, the growth rate, generally referred to as nization of a new site on the west, whereas r (Feller, 1940; Eisen, 1979; Keyfitz, 1977), gene flow is asymmetrical with the eastern was here called a,following Rendine et al. neighbors, whose density is still higher. (1986). In the absence of reliable informa- Phase 3: As the population size approaches tion on growth rates among early farmers, the carrying capacity of 7,560, a second we tried a preliminary set of values, and threshold, T2 is reached, and gene flow occhose a = 0.5, which gave us a rate of popu- curs also northwards and southwards, once lation increase compatible with the known again giving rise to a new population of rates of spread of farming in Europe, 1 km farmers if the adjacent pixel to the north or every year, on the average (Ammerman and south has not been colonized yet. The Cavalli-Sforza, 1971). That value was em- thresholds had to be fixed in a somewhat ployed in all the simulations presented here. arbitrary manner. It seemed realistic to alThe equation calculating, at generation t low for the first westwards dispersal two to and for each locality, the effective size NF of three generations after inception of the the farming population is farming economy, so as to roughly match the archaeologically documented rates of spread. T1 was fixed at 24 (corresponding to 1) x (1 + a (1 a census size of 72 individuals), whereas for T2 we chose 50% of the carrying capacity, where NF(t - 1)is the population size a t the i.e., 3,780. Phase 4: Finally, when adjacent former generation, and 7,560 is the carrying populations have reached their equilibrium capacity of the area, chosen for reasons anal- size of 7,560 effective individuals, the migraogous to those that led us to choose tory exchanges become symmetrical. The NHG= 114. Since the effective population general equation, expressing the number of size is approximately one-third of the cen- farmers moving, say, from locality A to adjasus size (Wright, 19691, this corresponds cent locality B, N(AB),is roughly to a farming population of 23,000 dwelling on a 1-degree-square quadrat of land, and to 20,000 farmers in each elementary area of Rendine et al.’s (1986) simulation. for NF(A) greater than the appropriate threshold, i.e., T1 for latitudinal and T2 for longitudinal movements, respectively. All Dispersal of farmers and origin of relevant quantities have been defined for farming outside the nuclear zone Equation 1. Since N,(A) is variable across Under the OAG and OAC models, the generations, N(AB) varies too. growth of farming populations prompts miIn this way, the input and output of genes, gratory movements into neighboring locali- from and to the adjacent pixels of the map, ties where farming has not yet started. Ex- could be different a t different times during G . BARBUJANI ET AL. 116 the simulations. If the number of immigrants from each locality is taken into account in the calculation of the allele frequency of the immigrants, which we called pHGinin Equation 2 and which will be called pFinfor farmers, a n analogous formula gives the allele frequency after gene flow among farmers: p'F = p F [ ( l - m ' ) (1 - RTM)I tion 200, however, the four models yield a similar pattern of land occupation. The only major exception is an area of north-western Alps, where archaeological evidence shows a delayed onset of farming activities (Sokal et al., 1991). Of course, such a delay was not predicted by the mechanism of farming expansion underlying our OAG and OAC models. + rn'PFin (5) Pixels in the sea Figure 1 is an example of allele frequencies generated under the OAG and OAC models. Allele frequencies of specific localities were, of course, different in different realizations of the same process, and between OAG and OAC. What was constant, however, was the pattern of occupation of land areas by expanding farmers, because it depended only on population growth and dispersal parameters, which were kept constant across realizations. For the ATC and GIM models, by contrast, the spread of farming followed the pattern that can be inferred from archaeological evidence. Once a farming community exists at a certain locality, the exchange of genes with neighboring farming communities occurred in the same manner as described for the OAG and OAC models (Eq. 5). The difference is in the establishment of new farming populations, which under ATC and GIM was controlled by a matrix of dates of origin of agriculture at each land pixel of the map (details on how archaeological information was processed for this purpose are in Sokal et al. (1991)). Therefore, when colonization of a new locality by the first farmers had to be simulated, eight effective founding individuals were sampled with replacement from the closest suitable locality. In a few cases, this required input of immigrants from localities that were not directly adjacent to the one in which farming was starting. This was the only violation that we tolerated of the assumptions of the stepping-stone model. Figure 2 is an example of allele frequencies generated under the ATC and GIM models. It shows that the spread of farmers is initially slower than simulated under the OAG and OAC models, especially along the northern shores of the Mediterranean Sea. By genera- A stepping-stone model does not allow for long-distance population movements, i.e., those associated with sailing, which are considered important in the colonization of Europe, and in the successive phases of agricultural dispersal (Renfrew, 1987). To simulate the effects of the movement of a few individuals across the seas, we chose to assign a pseudopopulation to each pixel located in the sea. Pseudopopulations did not undergo random fluctuation of allele frequencies, and did not increase in numbers. They included only the individuals dispersing from neighboring pixels (their number was determined according to Equation 1)) whose descendants had the same allele frequencies, and a t each generation proceeded one step forward in the dispersal process. In this way, the movement of a few individuals across the sea was simulated. This had little importance for the allele frequencies of Fig. 1. Spread of farmers under the OAG and OAC models. The localities where farming is being practiced are indicated by letters representingallele frequency in farmers, at generations 100 (6500 BC), 200 (4000 BC), and 240 (3000 BC). Eight allele-frequencyclasses are defined, from a to h, each corresponding to an interval equal to 0.125 (a, p F < 0.125; b, 0.125 <p,0.250; c, 0.250 < p F < 0.375; etc.). Hyphens represent areas inhabited only by hunter-gatherers.The nuclear zone is delimited by a solid square. While the pattern of land occupation shown was constant for all the realizations of the models, the allele frequencies depicted are those of a single run of OAC. Fig. 2. Spread of farmers under the ATC and GIM models. The localities where farming is being practiced, based on archaeological information, are indicated by figures representing allele frequency of farmers, at generations 100 (6500 BC), 200 (4000 BC), and 240 (3000 BC). Allele-frequencyclasses are as in Figure 1. The pattern ofland occupationwas again the same in all realizations of ATC and GIM, but the allele frequency shown is that of a single realization ofATC. t 1 118 G. BARBUJANI ET AL. hunter-gatherers and for those of farmers once the spread of farming through Europe was completed; however, it had an effect on the establishment of farming communities under the OAG and OAC models, as a few individuals could quickly reach distant localities by sea. The resulting pattern of occupation of coastal regions corresponds well with the archaeological evidence (see Figs. 1 and 2). Cultural contacts and admixture Rendine et al. (1986), the value of y was adjusted so that resultant values of S had approximately the same magnitude as in Equation 6. However, we found, over a range of test conditions, that the results were only trivially affected. Therefore, we report results for Expression 6 without redoing the analysis for every test condition. Disappearance of hunting-gathering populations In the IBD model, hunter-gatherers adopt Under the OAC, ATC, and GIM models, at farming at generation 40, and thus all genes each generation a certain number of hunterof Indo-European speakers come from the gatherers adopted farming, if a farming community already existed a t the locality genetic pool of the hunting-gathering comwhere they lived. The likelihood of this cul- munities. In the OAG model, conversely, the tural shift depended on the probability of hunter-gatherers go extinct, and thus all contacts between farmers and hunter-gath- genes of Indo-European speakers derive erers, and on a coefficient of acculturation from the genes of the few first farmers of which was called y by Rendine et al. (1986). Southern Anatolia. These are the two exThe number S of individuals shifting to treme models, as far as the origins of Indofarming at each generation is related to the Europeans are concerned. Under the other probability of contacts between farmers and models, a certain degree of admixture is simhunter-gatherers. If farmers are NF at a ulated between the two communities at each given locality and a t a given moment in locality, reflecting a widespread view of hutime, their probability of meeting one of the man evolution in Europe (Cavalli-Sforza NHG hunter-gatherers will represent a frac- and Piazza, 1993). Under the OAC, ATC, and GIM models, tion equal to 2(NF x NHG) of all the the hunters and gatherers are considered to (NF+ NHG)' possible contacts. The probadisappear from a certain locality when their bility y that such a contact will result in number is such that S is less than 1. Beacculturation has been estimated at 0.00024 starts after a phase of cause acculturation (Rendine et al., 1986). Therefore, at each generation, the NF farmers will transmit population buildup for farmers, the extinctheir technologies t o a number S of the tion of the hunting-gathering communities also proceeds as a wave, from southeast to hunter-gatherers estimated as northwest, spreading in parallel with the farming economy, but several generations later. Long-range migratory movements where all parameters have already been defined. It can be argued that the change in a particular quadrat of the number of huntergatherers per generation might better be modelled as proportional to the product of the number of hunter-gatherers and the number of farmers, that is In the GIM model, three major migratory waves of Kurgan people are supposed to have introduced Indo-European languages into Europe, around 4,250 BC, 3,400 BC, and 2,900 BC, respectively (Gimbutas, 1979, 1986). In Gimbutas' (1979) view, the westward migrations of Kurgan people in Europe inS = YNHGNF (7) troduced a new patriarchal culture, characterized by horse-riding and new warfare where y = 1.56250 x In this expres- techniques. These cultural changes were not sion, closely resembling a formula used by associated with major innovations of the INDO-EUROPEAN ORIGINS subsistence techniques. Most of the populations of Europe by then were farmers. It is therefore highly unlikely that concomitant population growth could occur. We chose to represent these waves as a flow of genes from the purported source area, north of the Black Sea and west of the Caspian Sea. For the sake of simplicity, these movements were concentrated in one generation’s time (at generations 190,224, and 244), although each of them probably lasted two centuries (Gimbutas, 1979). For each movement, we simulated replacement of 20% of the genes in the target area, with genes coming from the source area. Probably this overemphasizes the genetic consequences of the simulated migratory movements. The invading Kurgan people were not the entire population of the area between the Black and Caspian sea moving en masse; rather, they were groups of individuals belonging to semi-nomadic tribes (Gimbutas, 1979). Accordingly, we chose to simulate their contribution to the genetic pool of the ”invaded” populations as if they were coming from several populations in the appropriate zone. The location of four such populations was chosen a t random, and then held constant in all 2,600 simulation cycles of the GIM model. The allele frequencies of the recipient populations (Fig. 3) were then recalculated as if 20% of the pre-existing individuals had been replaced by immigrant Kurgan people. In addition, Gimbutas (personal communication) pointed out to us 12 other directional and potentially migratory processes that may have been important in determining the current linguistic population structure of Europe. These processes are summarized in Figure 4, and were incorporated in the GIM model. Once again, for each of them we simulated replacement of 20% of the individuals of the “target” area by individuals whose allele frequency was the average allele frequency in the “source”area. 119 parameters being constant. Gene flow is reduced at generation 265, i.e., 3,375 years ago, by which time all suitable regions had been colonized by early agriculturalists. The parameters of the simulation are summarized in Table 2. Gene-frequency data The simulated sets of allele frequencies were compared with a database of allele frequencies, which had been analyzed in various studies on Europe (Sokal et al., 1988, 1989a,b, 1990, 1991, 1992; Harding and Sokal, 1988; Barbujani and Sokal, 1990; Sokal, 19911, and had been continuously updated. The data corresponding to populations speaking languages other than IndoEuropean (Basque, Finnish, Estonian, Lapp, Hungarian, Turkish: Ruhlen, 1987)were discarded. Twenty-six genetic systems were considered. Most of them corresponded to independent loci; exceptions are ABO, MN, and Rh, for which two (or three, for Rh) systems were independently considered, each resulting from typing of alleles by different sets of antisera. This convention has long been followed in studies on human variation (e.g., see Lewontin, 1972). Each system is indicated by a letter code, preceded by a number referring to Mourant’s coding system (Mourant et al., 1976), except for 100HLA-A, 101/2HLA-B, 200GM, and 201Kh4, whose numerical codes were assigned in our laboratory. Overall, 3,481 records, and 93 alleles or haplotypes were considered. The number of samples available for the 26 systems varied widely, ranging from a minimum of 27 (for 5-1 LUTHERAN), to a maximum of 762 (for 1-1ABO).Genetic differences between localities were summarized by 26 matrices of Prevosti’s distances (Prevosti et al., 19751, separately calculated for each system. We shall refer to these matrices as observed distance matrices, as opposed to the simulated ones, generated by Decrease of population mobility after the computer under one of the five models establishment of a farming economy tested. To properly compare the two sets of Once farming populations have reached data, prior to calculating genetic distances the maximum size allowed by the programs, we pooled the observed frequencies of all ala reduction of mobility is simulated by sim- leles except the one whose average freply halving the migration rate m’, all other quency was closest to 0.5. 120 G. BARBUJANI ET AL. Fig. 3. Migratory movements simulated under the GIM model. The four localities whose allele frequencies are averaged, to represent the allele frequencies of the migrating population, are marked by asterisks. The regions affected by the three migratory waves proposed by Gimbutas are surrounded by solid lines; the zone where Basque is currently spoken is not supposed to have been affected by these migratory episodes. Figures refer to the generation at which each migratory wave was simulated. Hypothesis testing For every one of the 26 genetic systems, each of the five simulation programs was run 100 times, yielding 13,000 simulation cycles (or realizations) in all, 2,600 for each model. Every realization resulted from deterministic movements of individuals across the map of Europe, and random allele-frequency fluctuations occurring during initialization and from genetic drift each generation. The latter were dictated by a random number generator, and gave rise to the various replicates at generation 440. A particular realization of the OAG process adds OAG movements to those already required by the IBD model, but random allelefrequency fluctuations are the same. This is also true for the other models, each incorporating the previous ones. For all models, then, random change in allele frequencies was the same, for each pixel and each generation, under all five models. It was not the same, however, for the hunting-gathering and for the farming populations of the same pixel. Thus, the 500 cycles of simulation for each locus fall naturally into 100 groups, each with five matched runs, in ascending model order. Because of the matching, we INDO-EUROPEAN ORIGINS 121 0 Fig. 4. Twelve potentially important migratory processes considered by Gimbutas (personal communication) to have been relevant in European ethnohistory, each one represented by an arrow from a “source”region to a “target”region. Figures refer to the generation at which each population movement was simulated. could use paired statistical tests to compare models. This provided a considerable increase in statistical power over unpaired comparisons. After 440 generations in each computer cycle, simulated allele frequencies were sampled from localities chosen so as to match the locations of the samples of the observed allele-frequency database. Matrices of Prevosti’s distances were computed from the simulated data, so as to obtain 100 simulated matrices for each matrix of observed genetic distances (Prevosti et al., 1975). This measure of genetic distance was chosen for consistency with previous studies (Sokal, 1988; Sokal et al., 1993). Simulated and observed matrices were then compared pairwise by means of Mantel’s test of matrix association (Mantel, 1967; Smouse et al., 1986). This test computes the equivalent of a correlation coefficient between matrices, and evaluates its significance by constructing a null distribution of the test statistic. A Monte Carlo procedure is employed for this purpose; rows and columns of one matrix are repeatedly permuted at random, while the other matrix is kept constant, and the test statistic is re- 122 G. BARBUJANI ET AL. TABLE 2. Parameters defined in the simulation L NHG NF PHG PF a Y m RTMAB) RTM Environment: 1 = plains; 2 = mountains; 3 = seas; 4 = Black Sea Effective population size of hunter-gatherers (114in all models but IBD, where it is allowed to increase up to 7560. In the OAC, ATC, and GIM models it is then reduced as an effect of the cultural transmission of farming technologies) Effective population size of farmers (Initially equal to 0, then allowed to increase up to 7560 in all models but IBD) Frequency of one allele among hunter-gatherers (initially sampled from a gamma distribution) Frequency of one allele among farmers for all models except IBD (Initially undefined. For the six pixels of the nuclear zone, pF = pHG a t generation 40. In the other pixels, the initial p F value reflects the proportion of immigrating farmers and of hunter-gatherers who were incorporated into the farming population, under the different models) Intrinsic growth rate of the farming populations Acculturation rate, i.e., rate of assimilation of hunter-gatherers by farmers Migration rate Resistance to migration between localities A and B (see Table 1) Average resistance to migration between one pixel and its adjoining pixels calculated each time, so a s to yield the desired null distribution. Because each observed matrix was compared with 100 simulated matrices for each model tested, a procedure was needed to combine all this information. We chose to compute average Mantel correlation coefficients, and to calculate Fisher’s combined probabilities (Sokal and Rohlf, 1995) from the 100 individual probabilities for each model. Since we are looking for positive association of observed and simulated data, and a negative correlation would have no biological meaning, all tests of significance were one-tailed. However, a s a further control, we also counted the number of occurrences of negative correlations that would be significant if the test had been two-tailed. This allowed us to identify the models generating allele-frequency distributions departing widely from the observed distributions. The main purpose of this study was to compare competing hypotheses on the origin of Indo-Europeans. Because the hypotheses can be ranked, by increasing complexity, from IBD through OAG, OAC, ATC, and GIM, four painvise tests of goodness of fit were carried out, OAG versus IBD, OAC versus OAG, ATC versus OAC, and GIM versus ATC. This was done taking advantage of the paired design based on the same random seeds in the simulations. We employed two different procedures: 1) a paired-comparisons t-test, where the test statistic was the difference between the average Mantel cor- relation coefficients, and only positive differences (indicating a n improvement of the fit for the more complex hypothesis) were considered significant; 2) Wilcoxon’s signed rank test (Sokal and Rohlf, 19951, a nonparametric paired-comparisons test, once again considering significant only the cases in which the fit improved for the more complex hypothesis. RESULTS The average Mantel correlations for each genetic system (Table 3) yield numerous significant agreements between observed and simulated matrices of genetic distances, for all models, although fewest with IBD. The number of significant ( P < 0.05) positive average correlations is maximal for the ATC model (20/26), but it is not substantially lower for OAG, OAC, and GIM (respectively, 18, 17, and 16 systems). For IBD it is only 10/26. The numerical values of the correlations are low despite their high level of statistical significance. This is characteristic for genetic distances and is because the Mantel correlations were constrained to be linear. Next we examine the number of cases in which individual simulation realizations gave genetic distance matrices that would appear negatively associated (P s 0.05) with the observed one, if the test had been two-tailed. A substantial number of disagreements, between observed and simulated data is evident for IBD (6 at system 4-13 RHESUS), for ATC (8 in the 1-1 ABO INDO-EUROPEAN ORIGINS 123 TABLE 3. A) Average Mantel correlations of observed with simulated genetic distances and B) significance levels based on ont-tailed probabilities for positive correlation combined by Fisher's method' System IBD OAG OAC An: GIM ~ A. Mantel correlations 1-1-mo 1-2x30 2-5-MN 2-7-MN 3-1-P 4-1RHESU 4-13RHES 4-19RHES 5-1LUTH 6-1-KELL 6-3-KELL 7-1ABHSE 8-1DUFFY 36-1-HP 37-1-TF 38-1-GC 50-1-1AP 52-PGD 53-PGM1 56-AK 63-ADA 65-TASTE 100HLA-A 101-102 200-GM 201-KM B. Significance levels 1-1-ABO 1-2-ABO 2-5-MN 2-7-MN 3-14' 4-1RHESU 4-13RHES 4-19RHES 5-1-LUTH 6-1-KELL 6-3-KELL 7-1ABHSE 8-1DWFY 36-1-HP 37-1-TF 38-1-GC 50-1-1AP 52-PGD 53-PGM1 56-AK 63-ADA 65-TASTE 100HLA-A 101-102 200-GM 201-KM Overall probability 0.01294 0.05699 0.00324 -0.02886 -0.03558 0.01311 -0.02547 -0.04258 -0.00590 0.01789 -0.04254 0.01781 0.00290 0,11991 0.00592 -0.01547 -0.00965 0.04775 -0.00845 -0.00696 0.00667 0.02802 0.01417 -0.00854 0.01455 -0.02816 0.14800 0.10831 -0.05627 -0.02236 -0.04556 -0.01457 0.06920 -0.03158 -0.05392 0.07960 0.06145 0.01064 0.04017 0.15415 0.08658 -0.05739 0.12963 -0.03417 0.13174 0.06676 0.27423 0.17960 0.13654 0.24013 0.34928 0.07813 0.15233 0.09718 -0.05352 -0.02091 -0.03411 -0.01428 0.06120 -0.02899 -0.04768 0.08378 0.01510 0.02805 0.05362 0.17941 0.09142 -0.05843 0.11903 -0.03457 0.14000 0.10858 0.26493 0.16922 0.15443 0.22784 0.33330 0.09279 0.13137 0.00411 -0.04100 -0.03127 0.02409 0.00674 0.03431 -0.01635 0.00777 0.03392 -0.01550 0.05623 0.01197 0.07459 0.08117 -0.01637 0.10520 -0.04371 0.10789 0.05081 0.07537 0.07373 .0.08977 0.12228 0.14102 0.05959 0.01887 0.03532 -0.03231 -0.03912 0.01112 0.03130 0.02375 -0.08314 0.02033 0.03330 -0.06665 0.05420 -0.01077 0.08002 0.00461 0.05178 0.06548 0.01603 0.06152 -0.01301 0.03099 0.04040 0.01311 0.09136 0.03480 -0.03732 0.00000 0.00000 0.00339 1.00000 1.00000 0.00000 0.99975 0.99996 0.52240 0.47350 1.00000 0.00942 0.94188 0.00000 0.82795 0.98856 0.94062 0.00554 0.99819 0.97077 0.05673 0.00000 0.00107 0.97945 0.04825 0.99886 0.00000 0.00000 0.00000 1.00000 1.ooooo 1.Ooooo 1.00000 0.00000 0.99980 1.00000 0.00000 0.00000 1.00000 1.00000 1.00000 1.00000 0.00000 0.99913 1.00000 0.00000 0.00489 1.00000 1.00000 0.o0000 0.00198 0.00000 0.00001 0.22014 0.00001 0.94677 0.00000 0.01885 0.00000 0.00000 0.51374 0.00000 0.99974 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 1.00000 0.00003 0.00000 0.00000 1.00000 0.04162 0.00385 1.00000 0.00000 0.99931 0.00000 0.79836 0.00000 0.00000 0.68047 0.00000 0.99442 0.00132 0.00000 0.06326 0.00000 0.00115 0.99918 0.00000 0.0oooo o.ooooo 0.00000 0.03488 0.00000 0.00000 0.00000 1.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.0oooo 0.00000 0.00000 0.00000 0.09492 0.00002 0.00000 0.00000 0.00000 1.00000 0.00000 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 o.ooooo 0.00000 0.00000 0.00000 0.0oooo 0.00000 0.00000 0.00000 'Values below 0.05 show significant positive correlation between simulated and observed genetic distances and 11 in the 2-5 MN systems), and for GIM (26 in the 1-1 ABO, and 8 in the 4-19 RHESUS systems). The painvise comparisons by t-tests and by Wilcoxon's signed ranks test agree in their means and significances that the most significant increase in the resemblance of observed and simulated genetic distances occurs between IBD and OAG. In the t-tests shown in Table 4 , 1 7 systems show a significant increase of correlation,whereas in only 7 cases does the similarity decrease. These G. BARBUJANI ET AL 124 TABLE 4. Results of paired comparisons t-tests for differences between 5 Indo-European simulation hypotheses: P ( H I ) - P (H2)’ ~ ~ System 1-1-ABO 1-2-ABO 2-5-MN 2-7-MN 3-1-P 4-1RHESU 4-13RHES 4-19RHES 5-1-LUTH 6-1-KELL 6-3-KELL 7-1ABHSE 8-1DUFFY 36-1-HP 37-1-TF 38-1-GC 50-1-1AP 52-PGD 53-PGM1 56-AK 63-ADA 65-TASTE 100HLA-A 101-102 200-GM 201-KM Mean difference OAG - IBD 0.13506*** 0.05133*** -0.05951 0.00650 -0.00998 -0.02767 0.09467*** 0,01099 -0.04802 0.06171*** 0.10399*** -0.00717 0.03727*** 0.03425** 0.08065*** -0.04192 0.13928*** -0.08192 0.14019*** 0.07372*** 0.26756*** 0.15158*** 0.12237*** 0.24867*** 0.33473*** 0.10630*** 0.07402 OAC - OAG 0.00432 -0.01113 0.00275*** 0.00145 0.01145*** 0.00029 -0.00800 0.00259 0.00624 0.00419 -0.04634 0.01741*** 0.01345*** 0.02525** 0.00484 -0.00104 -0.01061 -0.00040 0.00826 0.04182*** -0.00930 -0.01038 0.01789*** -0.01229 -0.01597 0.01466 0.00198 ‘Asterisks indicate significant positive differences as follows: *0.05 3 P 2 ATC - OAC -0.02095 - 0.09307 0.01252** -0.01037 0.05819*** 0.02102*** -0.02689 0.01265 0.05545*** -0.04987 - 0.03060 0.02818* -0.04164 - 0.10482 -0.01026 0.04206*** -0.01383 -0.00914 -0.03212 -0.05777 -0.18956 -0.09549 -0.06466 -0.10556 -0.19229 -0.03320 -0.03662 GIM - ATC -0.11250 0.03121*** 0.00869* -0.00785 -0.01296 0.02456*** -0.01056 -0.06679 0.01256 -0.00061 -0.05115 -0.00203 -0.02275 0.00543 -0.07655 0.06816*** -0.03972 0.05974*** -0.04637 -0.06382 -0.04438 -0,03333 -0.07666 -0.03092 -0.10622 -0.09691 -0.02660 0.01,**0.01P P > 0.001,***0.001 P. figures compare with 7 systems showing sig- although ATC shows a significant positive nificantly improved correspondence and 10 correlation for the largest number of sysshowing decreased correspondence for OAC tems, on the average, correlations between versus OAG, 6 and 19, and 5 and 19, respec- observed and simulated genetic distance are tively, for ATC versus OAC, and GIM versus higher for OAC and OAG. ATC. These results are reflected in the To test the plausibility of our simulations mean differences shown at the bottom of Ta- we also calculated FsTvalues (Wright, ble 4. 1978) for both our observed gene-frequency Using Wilcoxon’s criterion, the results do surfaces and the simulated surfaces. The not change much, so we do not feature them median results over all genetic systems are as a separate table. The numbers of systems 0.011780 for the observed surfaces, and showing significance increased and any de- 0.098273, 0.10399, 0.096562, 0.056609, and creased resemblance between observed and 0.003211, respectively, for the simulated simulated data are, respectively, 17 and 7 surfaces of models IBD, OAG, OAC, ATC, for OAG versus IBD; 10 and 9 for OAC ver- and GIM. Although the FsTvalues of the sus OAG; 5 and 19 for ATC versus OAC; and observed surfaces overlapped only slightly 5 and 18 for GIM versus ATC. those of the simulated surfaces, the median We conclude, as a result of all the tests, of the observed data falls within the boundthat similarity between observed and simu- aries of the medians described by the modlated genetic distances increases from IBD els. The latter fall into 3 groups by magnito OAG, and, to a lesser extent, from OAG to tude of FsT.These are 1)IBD, OAG, OAC; 2) OAC. On the contrary, it decreases as mod- ATC; and 3) GIM. Thus, the clear superiorels are tested in which the spread of farmers ity of OAG and OAC over IBD cannot be is constrained by archaeological time data shown by FST since various patterns of local(ATC), or demographic processes occurring ity differentiation can yield the same F,, in the late Neolithic are added (GIM). Thus value. INDO-EUROPEAN ORIGINS DISCUSSION Which model matches observed data best? The IBD model assumes that, in the Neolithic, groups of hunter-gatherers and farmers were not separated. The former gradually turned to farming, so that there was a genetic continuity between pre- and postNeolithic populations in Europe, and IndoEuropean languages spread only by cultural transmission. Allele frequency patterns generated under this model resemble poorly the patterns of genetic variation observed in contemporary populations, showing that this evolutionary hypothesis does not fit with the available genetic evidence. Resemblance between observed and simulated patterns is much greater for the other four models, in which farmers evolve separately from hunter-gatherers, and processes of population expansion are important. The levels of resemblance, however, do not differ much among these four models. For instance, the GIM model, including several population processes occurring in the last 5,000 years, does not give higher correlations, or significant correlations at a higher number of loci, than the OAG model, where the demographic changes prompted by the origin of agriculture are simulated in a much rougher manner. Actually, various results of this simulation study indicate that the fit of simple models, such OAG and OAC, is better than that of more complex models. The Mantel correlations between observed and simulated genetic distances would be negative and significant in only 21 of the 2,600 cases for OAG, and in 14 cases for OAG, had tests been two-tailed. These figures compare with 89 and 105 negative significant correlations for ATC and GIM, respectively. It seems, therefore, that nothing is added to our understanding of the phenomena, if we add archaeological time data to constrain the spread of Neolithic farmers, and even less so if we simulate population movements in the late Neolithic. The models where farmers disperse into new areas simply because of their numbers, which increase logistically, yield patterns showing a better agreement with the observed data. 125 Similarly, the pairwise comparison of models show a substantial increase of fit of the OAG over the IBD model (Table 41, whereas the elements included in the ATC and GIM simulations cause a slight but evident departure from the patterns observed, making them poorer fits than OAG. A first conclusion one may draw from the results of this simulation study is that two models account best for many aspects of the contemporary genetic structure of IndoEuropean-speaking populations of Europe. One is the demic diffusion model, as originally put forward by Menozzi et al. (1978), and associated with linguistic evidence by Renfrew (1987). Under this model, here called OAC, the two forces driving microevolution in Europe were population growth determined by farming, and dispersal accompanied by limited population admixture between early agriculturalists (possibly proto-Indo-European speakers) and preexisting hunters and gatherers. The other model, OAG, is a simplified version of the demic diffusion model, in which dispersal of farmers does not lead to any degree of admixture with hunter-gatherers. How plausible is the OAG model? While the OAC model has already received support from studies focussing on its genetic (Sokal et al., 1991; Cavalli-Sforza et al., 19931, as well as linguistic and archaeological, aspects (reviewed in Renfrew, 1992), what we called OAG here has not been analyzed in detail so far. An apparent problem with it is, how can a model not involving admixture account for the continent-wide clines observed in Europe? Inspection of gene-frequency maps generated in this study, at various moments in time, shows that founder effects are common while farmers disperse. Founder effects are due to the limited numbers of individuals who start the farming communities in new localities. In the OAG model as we simulated it, most farming communities start with 8 effective individuals; but even if this number were larger, the probability for the allele in question to be lost or fixed would be substantial. Loss of genetic variation through repeated founder effects has been invoked as the likely cause of clines in sev- 126 G. BARBUJANI ET AL. era1 studies on natural populations of toads in Australia (Easteal, 1988) and aquatic invertebrates in Canada (Boileau et al., 1992). Theoretical work on the genetic effects of colonization of previously unoccupied localities (Wade and McCauley, 1988) agrees with this view. An additional factor, increasing the likelihood of clines even in the absence of admixture between farmers and hunter-gatherers, is the Black Sea. Archaeological evidence (e.g., see Renfrew, 1991) indicates that two waves of early farmers dispersed westwards and northwards from the Near East, with the Black Sea separating them (this is why we did not allow movement of individuals through it, but only along its coasts). The two waves later converged in eastern Europe, after a period of independent evolution. If the same allele had been lost, or fixed, in both groups of farmers, no particular pattern would result; but if founder effects had had opposite consequences in the two groups, the successive admixture would initially determine a steep cline, and successive gene flow would smooth it, resulting in a wide gradient (Endler, 1977). Even under OAG, therefore, a certain role of admixture is important. But admixture, under OAG, is between different groups of farmers, who were geographically separated in part of their evolutionary history, rather than between farmers and hunter-gatherers of the same area. This interpretation emphasizes the role both of geographical factors, such as distance between regions, and of cultural barriers between sympatric communities of farmers and hunters-gatherers. Indeed, physical barriers are often associated with genetic and linguistic change, even between Indo-European speakers (Barbujani and Sokal, 1990, 1991), although other evolutionary mechanisms may also account for that association (Barbujani, 1991). Genetics and Kurgan waves Introducing the three migratory waves postulated by Gimbutas (GIM model) into the simulation, not only does not increase the correlations, but somewhat reduces them. This means that the current patterns of allele frequencies among Indo-Europeans can be explained without resorting to the migrations of Kurgan people. This study cannot establish whether or not these migration events really occurred, but, if they occurred, they did not leave a significant mark on the allele frequencies of current populations. Renfrew (1987) argued that the cultural transformations that led Gimbutas to hypothesize late-Neolithic migration waves could be due to cultural contacts instead, and equated the first Indo-Europeans with the first farmers. The extensive changes in ceramics, architecture, and metallurgy occurring in the late Neolithic are then attributed to trading and imitation; long-distance migratory movements, if any, may have been marginal. Although not proved by our simulation, this view is fully compatible with it. This study, therefore, agrees with the main views expressed by Menozzi et al. (19781, Rendine et al. (19861, Piazza (19931, and Cavalli-Sforza et al. (1993). By contrast, the emphasis laid by the same authors on late Neolithic migrations from the Pontic steppes (Cavalli-Sforza et al., 1993) does not find support in our simulations. Among the possible causes of this discrepancy, it may be that Mantel's correlations are not sensitive enough to recognize the effects of minor processes of gene flow, such as those presumably occurring in the late Neolithic. Alternatively, however, or in addition, one should consider the possibility that principal components associated with low eigenvalues reflect, at least in part, artificial gradients due to data interpolation. This may be the case for areas where population samples are sparse, such as most of eastern Europe. For example, the Caucasus seems to show clinal variation in the first and third principal components of Cavalli-Sforza et al. (1993), but a detailed genetic study shows that clines are very uncommon there (Barbujani et al., 1994). Our evaluation of the Gimbutas model should be revised if evidence could be provided that the spread of the Kurgan people was accompanied by an increase in population sizes larger than that simulated by us. A certain level of ambiguity exists about this, as the movement of people from the Pontic steppes that Gimbutas (1979) hy- INDO-EUROPEAN ORIGINS pothesized is called a “population expansion” by Cavalli-Sforza et al. (1993). These authors seem to suggest that, because of the warfare technologies associated with it, larger populations could be supported in the regions affected. This aspect remains to be explored, and we do not have evidence for or against this view. However, even if this had been the case, the increases in population sizes prompted by the beginning of food production seem to have been much larger than those associated with new war technologies (see Ammerman and Cavalli-Sforza, 1984). Unless European populations increased dramatically in size between 6,000 and 5,000 years ago, as they did with the arrival of the new farming technologies, we conclude that the long-distance migrations postulated by Gimbutas remain an unnecessary element in the evolution of Indo-European-speaking populations, as reconstructed from the comparison of theoretical models and gene-frequency data. Besides, early farmers expanded into areas of low population density, where few immigrants could substantially modify the genetic build up of local populations; but this was not the case for late Neolithic groups, who invaded regions already occupied by large farming communities. Simulations of genetic processes based on the coalescent approach (see Hudson, 1990) show that patterns of genetic variation do not tend to change much after a demographic expansion (Harpending, 1994; Rogers and Jorde, 1995). Successive population movements can smooth out the gradients and blur some patterns, but are unlikely to leave a significant mark on allele frequencies. Are parasites responsible for clines? Recent evolutionary models (reviewed in Ladle, 1992) indicate that new genotypes entering an area could be resistant to the parasites that are already adapted to the common resident genotypes. The new genotypes would then increase in frequency, until the parasites adapt to them. A selective mechanism of this type, combined with gene flow, might have been important in determining the European clines of allele frequencies; models may be envisaged whereby a form of frequency-dependent selection, 127 rather than limited admixture or founder effects, leads to clinal variation of gene frequencies (e.g., see Hedrick, 1986). It is intriguing to note that the hypothesis of demic diffusion from the Near East was initially developed to account for clines a t the histocompatibility loci, HLA-A and HLA-B (Menozzi et al., 1978; Sokal and Menozzi, 19821, and that the most significant evidence for clines spanning Eurasia has been found at the glyoxalase locus (Barbujani, 19871, which is linked with HLA on chromosome 6, in a region of extensive linkage disequilibrium (Hedrick et al., 1986). However, this view, although compatible with the gradients existing a t the HLA and linked loci, can hardly account for the patterns of variation observed among Indo-European speakers at other, independently inherited, loci. Had the resistance to parasites been the main cause of clines in Europe, one would expect isolation by distance patterns at most loci not involved in tissue recognition, which is not the case (Sokal et al., 1989a; and this study). On the contrary, the nearly parallel gradients observed for many independent alleles suggest that an evolutionary pressure affecting the entire genome and not merely part of it, i.e., gene flow, played a major evolutionary role (Slatkin, 1985,1987). Relation to other work Diakonov (1984; cited in Redrew, 1987) listed what he called the essential questions concerning the origins of Indo-European speakers: Who migrated? Why? How many of them were there? Was it actually a migration of people, or rather the transfer of a language from one population to another? The present study may contribute to answering some of these questions. Our results show that migrations in the late Neolithic, which have been inferred from changes in the material culture of eastern and central Europe (Gimbutas, 19791, are not reflected in the current genetic structure of Indo-European-speaking populations. Conversely, the correlations of observed and simulated data are positive and significant only if we simulate dispersal of farmers from the Levant by demic diffusion. The results of this study are, therefore, compatible with the 128 G. BARBUJANI ET AL. view of identifying proto-Indo-European speakers with the first Neolithic farmers (Renfrew, 1987). These findings appear at first glance to contradict the results of Sokal et al. (1992) who use the same genetic dataset. These authors concluded that geographic proximity explained a substantial amount of the observed correlation between genetic and IndoEuropean linguistic distances. However, after allowing for geographic distances, statistically significant partial correlations remain, which are not explained by distances describing the origin of agriculture by demic diffusion, Renfrew’s hypothesis (as described by his postulated transitions subsequent to the origin of agriculture), or Gimbutas’ hypothesis. But note that the study by Sokal et al. (1992) was based on the spatial pattern of correlations between genetic and linguistic distances, whereas this study examines genetic variation patterns only. The simulations reported here do not consider the patterns of linguistic diversity observed in Europe today. Our findings are therefore compatible as well with a simpler model in which the observed genetic patterns reflect the process of demic diffusion accompanying the origin and spread of agriculture in Europe, but these populations are not the proto-Indo-Europeans. Note that the new findings provide further support for the demic diffusion hypothesis of Ammerman and Cavalli-Sforza (1984). Statistical tests of this hypothesis were carried out by Sokal et al. (1991) using origin-of-agriculture distances constructed from observed dates of the onset of the Neolithic. In the present study these distances were constructed from the simulation results using simple models €or the spread of farming populations. It is reassuring that these models yield results in agreement with observed genetic patterns as had already been noted by Rendine et al. (1986). Our work also offers an answer to Diakonov’s second question; presumably, protoIndo-European speakers dispersed because their increase in numbers forced them to look for new suitable land. A study of genetic variation such as this cannot provide reliable estimates of population sizes in the re- mote past, and even less so of numbers of migrants. However, even a very small number of dispersing individuals may yield patterns that correlate with the observed ones, as seen for the OAG model. The results of this study are compatible both with a complete replacement of pre-existing huntergatherers by Near Eastern farmers (the OAG model), and with the more conventional view that this replacement was only partial, and that hunter-gatherers contributed to some extent to the genetic pool of Indo-European speaking populations (the OAC model). But the view whereby language replacement was largely independent of population movements (Zvelebil and Zvelebil, 1988) fails to account for the largescale clinal patterns matching the direction of the spread of agriculture observed in Europe, and therefore does not seem easy to reconcile with the available genetic evidence. In principle, one could also envisage a scenario whereby expanding farmers determined the main genetic characteristics of European populations, whereas Indo-European languages spread in a later moment, and mainly by a cultural process. Although this cannot be ruled out, it does not seem the best explanation available for the current patterns of genetic and linguistic variation. The model of Neolithic demic diffusion proposed by Renfrew (1991) predicts the existence of clines in three linguistic groups which are supposed to have expanded together with Indo-European. Many such clines have actually been observed among speakers of Altaic and Elamo-Dravidian languages (Barbujani and Pilastro, 1993; Barbujani et al., 1994). Moreover, some of these clines disappear if different linguistic groups are jointly analyzed (Barbujani and Pilastro, 1993). Although not a proof, these findings suggest that linguistic affiliation is the key to deciphering gene-frequency patterns in much of Eurasia. In the areas where Indo-European, Elamo-Dravidian, and Altaic languages are spoken, linguistic, genetic, and archaeological evidence can jointly be accounted for by a demic expansion from the Near East. Clearly, some languages may also have changed by cultural INDO-EUROPEAN ORIGINS contact (some examples are well documented: Renfrew, 1991); however, the overlap between large clines and linguistic areas suggests that cultural transmission had a lesser impact than demographic expansions. 129 among Indo-European speakers. These models are not mutually exclusive, and it may well be that both phenomena were important, a t different localities. Conversely, there is no evidence for a major evolutionary role of migratory phenomena occurring in the late Neolithic. These phenomena may have affected population sizes and allele frequencies on a local scale, but the large-scale structure of Indo-European speaking populations seems basically to reflect Neolithic demic diffusion. New archaeological evidence will certainly be valuable for describing in detail times and modes of Neolithic expansions, on which our ideas are certainly simplistic at the moment. On the genetic side, new allele frequency data on previously neglected populations are unlikely to substantially alter the picture, since, for most loci, the data sets already include hundreds of samples. Rather, collection and analysis of mtDNA may offer a new perspective on human evolution in Europe, as it has already done for other areas, including Oceania (Stoneking et al., 1990) and the Americas (Ward et al., 1991,1993; Torroni et al., 1992; Wallace and Torroni, 1992). CONCLUSIONS AND FUTURE STUDIES The replacement of a food-collectingeconomy by one of food production has been a complex process. In the Mediterranean area, for instance, farming replaced huntinggathering very rapidly in certain regions, but gradually in other regions where the two economies coexisted for centuries (Barker, 1988). However, this does not seem to have deeply affected gene-frequency patterns, since the ATC model, where farmers spread at an irregular rate, did not result in a greater correspondence of observed and simulated genetic distances than OAG and OAC. This may mean that the variable rate of spread of Neolithic agriculture depended on factors in the physical environment. Once these factors are incorporated in the model (OAG or OAC), simulated and observed allele frequency patterns resemble each other. At any rate, the models put forward and tested in our study should doubtless be regarded as approximate. Archaeological, linACKNOWLEDGMENTS guistic, and genetic studies of individual This is contribution No. 913 in Ecology populations will certainly add details to our reconstruction of European history; complex and Evolution from the State University of models of the Indo-European expansion, New York a t Stony Brook. We thank Prof. such as the one outlined by Sherratt (1988), Marija Gimbutas and Lord Renfrew for their may be reformulated in such a way as to collegial cooperation in this work. We are become comparable with genetic data. Nev- indebted to Barbara A. Thomson for techniertheless, this study indicates that not all cal assistance. Jeff Walker computed the hypotheses on the origins of Indo-Europeans 3’-statistics and Donna DiGiovanni wordaccount equally well for the available ge- processed the manuscript. Part of the comnetic evidence. putation was carried out on the Cornell The hypotheses whereby Indo-Europeans National Supercomputer Facility. This entered Europe as the first farmers show research was supported by National Science the best fit. There seems to be no cogent Foundation grant BNS 9117350. This paper reason to think that the farmers’ spread was was prepared while Robert R. Sokal was a due to factors other than their tendency to Fellow at the Center for Advanced Study in grow in numbers, thanks to the increased the Behavioral Sciences at Stanford, Caliresources available. Both incomplete admix- fornia. He is grateful for financial support ture with hunter-gatherers, and founder ef- provided by the National Science Foundafects occurring in the expansion of farmers, tion grant SES-9022192, and by sabbatical seem to account satisfactorily for the ob- funds from his home institution. Guido Barserved patterns of genetic differentiation bujani wishes to acknowledge fruitful dis- G.BARBUJANI ET AL. 130 cussions with Professors Luca Cavalli- Diakonov IM (1984)On the original home of the speakers~of Indo-European. 23: Sforza, H~~~ ~ ~ ~ ~R ~~ ~ d ~ i ~ Soviet ~ ~Anthropol. ~ , Archaeol. , 5-87. Italo Scardovi, and Michael Turelli. Easteal S (1988)Range expansion and its genetic consequences in populations of the giant toad, Bufo rnarinus. Evol. Biol. 23:4%84. Eisen MM (1979)Mathematical Models in Cell Biology Ammerman AJ, and Cavdli-Sforza LL (1971)Measurand cancer Chemotherapy,~ ~ springer, ~ l i ~ : ing the rate Of spread Of in Europe' Man Endler JA (1977)Geographic Variation, Speciation, and 6:674-688. Clines. Princeton, New Jersey: Princeton University Ammerman AJ, and Cavalli-Sforza LL (1984)The Neopress. lithic Transition and the Genetics of Populations in Feller (1940)On the logistic law of growth and its Europe. Princeton, New Jersey: Princeton University empirical verifications in biology, Ada Biotheor, 5: Press. 51-66. Barbujani G (1987)Diversity of Some gene frequencies Gimbutas M (1979)The three of KurKan people in European and Asian populations. 111. Spatial correinto Old Europe, 4500-2500 B,C, Arch, Suisses Anlogram analysis. Ann. Hum. Genet. 51:345-353. thropol. Gen. 43~113-137. us about BarbuJani (1991)What do languages Gimbutas M (1986)Remarks on the ethnogenesis of the human microevolution? Trends Ecol. Evol. 6:151-156. Indo-Europeans in Europe. In Bernhard and A Barbujani G, Jacquez GM, and Ligi L (1990)Diversity of Kandler-P&son (eds.): Ethnogenese Europaischer Somegene frequencies in European and Volker, Stuttgart: Gustav Fischer Verlag, pp. 5-20. tions. V. Steep multilocus clines. Am. J. Hum. Genet. Guglielmino CR, Piazza A, Menozzi p, and Cavalli47:a67-875. Sforza LL (1990)Uralic genes in Europe. Am. J . Phys. Barbujani G, Nasidze IS, and Whitehead GN (1994)Ge83:57-68. netic diversity in the Caucasus. Hum. Biol. 66:639Harding RM, and Sokal RR (1988)Classification of the 668. European language families by genetic distance. Proc. Barbujani G, and Pilastro A (1993)Genetic evidence on Natl, Acad. sci, u, s,A. 85,.937c9372, origin and dispersal of human populations speaking languages of the Nostratic macrofamily, pro,., Natl, Harpending HC (1994)Signature Of ancient population growth in a low resolution mitochondria1 DNA misAcad. Sci. U. S. A. 90:4670-4673. match distribution. Hum. Biol. 66:591-600. Barbujani G, and Sokal RR (1990)Zones of sharp geHaMan FA (1973)On the mechanisms Of population netic change in Europe are also language boundaries, growth during the Neolithic. Curr. Anthropol. 14: Proc. Natl. Acad. Sci. U. S. A. 87:1816-1819. 535-543. Barbujani G, and Sokal RR (1991)Genetic population FA (lg81) New structure of Italy. XI. Physical and cultural barriers to York: Academic Press. gene flow. Am. J . Hum. Genet. 48:398-411. Barker G (1985)Prehistoric Farming in Europe. Cam- Hedrick pw (1986)Genetic P b o V h i s m in heterogeneous environments: A decade later. Annu. Rev. Ecol. bridge: Cambridge University Press. 'yst. 17t535-566. Barker G (1988)Comment on "Archaeology and LanHedrick PW, Thomson G, and Klitz W (1986)Evolutionguage," by C. Renfrew. Cum. h t h r o p o l . 29:44H49. system. In s Karary genetics: HJA as an Bertranpetit J, and Cavalli-Sforza LL (1991)A genetic lin and E Nevo (eds.): Evolutionary Processes and reconstruction of the history of the population of the Theory. Orlando, Florida: Academic Press, pp. 503Iberian peninsula. Ann. Hum. Genet. 5551-67. '06. Boileau MG, Hebert PDN, and Schwartz SS (1992)Nonequilibrium gene frequency divergence: Persistent Hudson RR (1990)Gene genealogies and the coalescent Process. In FutuWa and J AntonovicS (eds.): OXfounder effects in natural populations. J. Evol. Biol. ford Surveys in Evolutionary Biology, Vol. 7.Oxford, 5:25-39. U K Oxford University Press, pp. Cavalli-Sforza LL (1988)The Basque population and Jorde LB (1980)The genetic structure of subdivided huancient migrations in Europe. Munibe 6:12%137. man Populations. A review. In JH Mieke and MH Cavalli-Sforza LL, Menozzi p , and Piazza A (1993) Crawford (eds.): Current Developments in Anthropo~~~i~ and human evolution, science 259: logical Genetics, Theory and Methods. New York: Ple639446. numt pp' 135-208' Cavalli-Sforza LL, Minch E, and Mountain J L (1992) Coevolution of genes and languages revisited, P ~ c . Kaiser M, and Shevoroshkin V (1988)Nostratic. Annu. Rev, Anthropol. 17:30%329. Natl. Acad. Sci. U. S. A. 89:5620-5624. Cavalli-Sforza LL, and Piazza A (1993)Human genomic Keyfitz N (1977)Introduction to the Mathematics of Populations with Revisions. Reading, UK: Addisondiversity in Europe: A summaw of recent research Wesley. and prospects for the future. Eur. J . Hum. Genet. Kimura M, and Weiss GH (1964)The stepping stone l:3-18. model of Population structure and the decrease of geCavalli-Sforza LL, Piazza A, Menozzi P, and Mountain J netic correlation with distance. Genetics 49561-576. (1988)Reconstruction of human evolution: Bringing together genetic, archaeological and linguistic data. Ladle RJ (1992)Parasites and sex: Catching the red Proc. Natl. Acad. Sci. U. S. A. 85:6002-6006. queen. Trends Ecol. Evol. 7:405408. LITERATURE CITED INDO-EUROPEAN ORIGINS Lewontin RC (1972)The apportionment of human diversity. Evol. Biol. 6:381-398. Manly BFJ (1991) Randomization and Monte Carlo Methods in Biology. London: Chapman and Hall. Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Res. 27: 209-220. Menozzi P, Piazza A, and Cavalli-Sfona LL (1978) Synthetic maps of human gene frequencies in Europeans. Science 201:786-792. Mourant AE, Kopec AC, and Domaniewska-Sobczak K (1976) The Distribution of the Human Blood Groups and Other Polymorphisms. Oxford Oxford University Press. Nei M (1987) Molecular Evolutionary Genetics. New York Columbia University Press. Piazza A (1993) Who are the Europeans? Science 260: 1767-1769. Piazza A, Cappello N, Olivetti E, and Rendine S (1988) The Basques in Europe: A genetic analysis. Munibe 6:168-176. Press WH, Flannery BF, Teukolsky SA, and Vetterling WT (1986) Numerical Recipes. Cambridge: Cambridge University Press. Prevosti A, Ocana J , and Alonso G (1975) Distances between populations of Drosophila suboscura based on chromosome arrangement frequencies. Theor. Appl. Genet. 45:231-241. Rendine S, Piazza A, and Cavalli-Sforza LL (1986) Simulation and separation by principal components of multiple demic expansions in Europe. Am. Nat. 128: 681-706. Renfrew C (1987) Archaeology and Language. London: Jonathan Cape. Renfrew C (1989) Models of change in language and archaeology. Trans. Philol. SOC.87t103-155. Renfrew C (1991) Before Babel: Speculations on the origins of linguistic diversity. Cambridge Arch. J . 1 : s 23. Renfrew C (1992) Archaeology, genetics and linguistic diversity. Man N.S. 27:445478. Rogers AR, and Jorde LB (1987) The effect of non-random migration on genetic differences between populations. Ann. Hum. Genet. 51:169-176. Rogers AR, and Jorde LB (1995) Genetic evidence on modern human origins. Hum. Biol., in press. Ruhlen M (1987)A Guide to the World’s Languages. Vol. 1: Classification. London: Edward Arnold. Sgaramella-Zonta L, and Cavalli-Sforza LL (1973) A method for the detection of a demic cline. In NE Morton (ed.): Genetic Structure of Populations. Honolulu: University of Hawaii Press, pp. 12%135. Sherratt A (1988) Comment on -Archaeology and Language,” by C. Renfrew. Curr. Anthropol. 29:45-63. Slatkin M (1985) Gene flow in natural populations. Annu. Rev. Ecol. Syst. 16:393430. Slatkin M (1987) Gene flow and the geographic structure of natural populations. Science 236:787-792. Smouse PE, Long JC, and Sokal RR (1986) Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst. Zool. 35:627432. 131 Sokal RR (1988) Genetic, geographic, and linguistic distances in Europe. Proc. Natl. Acad. Sci. U. S. A. 85: 1722-1726. Sokal RR (1991) Ancient movement patterns determine modern genetic variances in Europe. Hum. Biol. 63: 589-606. Sokal RR, Harding RM, and Oden NL (1989a) Spatial patterns of human gene frequencies in Europe. Am. J . Phys. Anthropol. 80:267-294. Sokal RR, Jacquez GM, Oden NL, DiGiovanni D, Falsetti AB, McGee E, and Thomson BA (1993) Genetic relationships of European populations reflect their ethnohistorical affinities. Am. J. Phys. Anthropol. 91: 55-70. Sokal RR, and Menozzi P (1982) Spatial autocorrelation of HLA frequencies in Europe support demic diffusion of early farmers. Am. Nat. 119:l-17. Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J, Thomson BA, Vaudor A, Harding RM, and Barbujani G (1990) Genetics and language in European populations. Am. Nat. 135:157-175. Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J , and Vaudor A (1989b) Genetic differences among language families in Europe. Am. J . Phys. Anthropol. 79:489-502. Sokal RR, Oden NL, and Thomson BA (1988) Genetic changes across language boundaries in Europe. Am. J. Phys. Anthropol. 76:337-361. Sokal RR, Oden NL,and Thomson BA (1992) Origins of the Indo-Europeans: Genetic evidence. Proc. Natl. Acad. Sci. U. S. A. 89:7669-7673. Sokal RR, Oden NL, and Wilson C (1991) Genetic evidence for the spread of agriculture in Europe by demic diffusion. Nature 351:143-145. Sokal RR, and Rohlf FJ (1995) Biometry, 3rd ed. New Bhatia K, and Wilson AC (1990) Geographic variation in human mitochondrial DNA from Papua New Guinea. Genetics 124:717-733. Torroni A, Schurr TG, Yang C-C, Szathmary EJE, Williams RC, Schanfield MS, Troup GA, Knowler WC, Lawrence DN, Weiss KM, and Wallace DC (1992) Native American mitochondrial DNA analysis indicates that the American and NaDene populations were founded by two independent migrations. Genetics 130r153-162. Wade MJ, and McCauley DE (1988) Extinction and recolonization: Their effect on the genetic differentiation of populations. Evolution 42:995-1005. Wallace DC, and Torroni A (1992) American Indian prehistory as written in the mitochondrial DNA A review. Hum. Biol. 64t403-416. Ward RH, Frazier BL, Dew-Jager K, and Paabo S (1991) Extensive mitochondrial diversity within a single Amerindian tribe. Proc. Natl. Acad. Sci. U. S. A. 88:872& 8724. Ward RH, Redd A, Valencia D, Frazier BL, and Paabo S (1993) Genetic and linguistic differentiation in the Americas. Proc. Natl. Acad. Sci. U. S. A. 90: 1066% 10667. Wijsman EM, and Cavalli-Sforza LL (1984) Migration and genetic population structure with special reference to humans. Annu. Rev. Ecol. Syst. 15:279-301. 132 G. BARBUJANI ET AL. Wright S (1969) Evolution and the Genetics of Populations. Vol. 2. The Theory of Gene Frequencies. Chicago: University of Chicago Press. Wright S (1978) Evolution and the Genetics of Populations. Vol. 4. Variability within and among Natural Populations. Chicago: University of Chicago Press. Zeven AC (1980) The spread of bread wheat over the old world since the Neolithicum as indicated by its genotype for hybrid necrosis. J. d’Agric. Trad. Bota. Appl. 27:25-53. Zvelebil M, and Zvelebil KV (1988) Agricultural transition and Indo-European dispersal. Antiquity 62574583.