вход по аккаунту


Indo-European origins A computer-simulation test of five hypotheses.

код для вставкиСкачать
Indo-European Origins: A Computer-Simulation Test of
Five Hypotheses
Dipartimento di Scienze Statistiche, Universita di Bologna, I-40126
Bologna, Italy (G.B.); Department of Ecology and Evolution, State
University of New York, Stony Brook, New York 11 794-5245 (R.R.S.,
G.B.); The EMMES Corporation, Potomac, Maryland 20854 (N.L.O.)
Genetic variation, Demic diffusion, Language,
Computer simulation
Allele frequency distributions were generated by computer
simulation of five models of microevolution in European populations. Genetic
distances calculated from these distributions were compared with observed
genetic distances among Indo-European speakers. The simulated models differ in complexity, but all incorporate random genetic drift and short-range
gene flow (isolation by distance). The best correlations between observed and
simulated data were obtained for two models where dispersal of Neolithic
farmers from the Near East depends only on population growth. More complex models, where the timing of the farmers’ expansion is constrained by
archaeological time data, fail to account for a larger fraction of the observed
genetic variation; this is also the case for a model including late Neolithic
migrations from the Pontic steppes. The genetic structure of current populations speaking Indo-European languages seems therefore to largely reflect a
Neolithic expansion. This is consistent with the hypothesis of a parallel
spread of farming technologies and a proto-Indo-European language in the
Neolithic. Allele-frequencygradients among Indo-European speakers may be
due either to incomplete admixture between dispersing farmers, who presumably spoke proto-Indo-European, and pre-existing hunters and gatherers (as
in the traditional demic diffusion hypothesis), or to founder effects during the
farmers’ dispersal. By contrast, successive migrational waves from the East, if
any, do not seem to have had genetic consequences detectable by the present
comparison of observed and simulated allele frequencies.
0 1995 Wiley-Liss, Inc.
Wide allele-frequency clines exist a t sev- gies of Mesolithic and Neolithic settlements
eral loci in Europe (Menozzi et al., 1978; indicate a westward spread of farming techSokal and Menozzi, 1982; Sokal et al., nologies from the Near East, starting ap1989a). Their extent is such that simple proximately 8,000 BC (Ammerman and Cavmodels of isolation by distance are unlikely alli-Sforza, 1984; Renfrew, 1987). The
to explain them. It is generally agreed that observed gene-frequency gradients can then
they result from a population expansion be explained by attributing the propagation
starting in Anatolia approximately 10,000 of farming t o the dispersal of early farmers,
years ago (Menozzi et al., 1978; Sokal and
Menozzi, 1982; Renfrew, 1987, 1991, 1992;
Cavalli-Sforza, 1988; Sokal et al., 1991),
Received January 25,1994; accepted August 7,1994
most likely associated with the development
Address reprint requests to Robert R. Sokal, Department of
of technologies for food production (Hassan, Ecology
and Evolution, State University of New York, Stony
1973; Zeven, 1980). Radiocarbon chronolo- Brook, NY 11794-5245.
who interbred only sparingly with the bands
of hunters and gatherers whom they met in
the process, and who had already colonized
most of Europe (Ammerman and CavalliSforza, 1984). Such a combination of demographic growth, range expansion, and limited admixture has been termed demic
diffusion (Menozzi et al., 1978). Its expected
consequences include correlations between
allele frequencies and the dates of onset of
agriculture (Sgaramella-Zonta and CavalliSforza, 1973), which have actually been observed (Sokal et al., 1991).
The European regions where early (Neolithic) agriculturalists expanded correspond
approximately to the current western range
of Indo-European languages. This raises the
question whether the cultural process of
Indo-European diffusion was also determined by the demographic processes accompanying the spread of farming (Renfrew,
1987). The traditional view holds, on the
contrary, that proto-Indo-European entered
Europe not earlier than 4500 BC, through
three migrational waves also coming from
the East, namely from the Pontic steppes
(Gimbutas, 1979, 1986). Although directed
westwards, like demic diffusion, these
waves were not associated with population
increases comparable to those caused by the
introduction of farming and animal breeding. Therefore, diffusion of Indo-European
in the late Neolithic would imply that languages spread more through cultural contacts (Zvelebil and Zvelebil, 1988) than by
demographic processes (see Renfrew, 1989).
As a corollary t o this, association between
patterns of genetic and linguistic variation
should be limited and occasional among contemporary Indo-European speakers.
Support for the view linking the spread of
proto-Indo-European in Europe with demic
diffusion comes from studies showing that
patterns of linguistic and genetic diversity
correspond in many European populations
(Sokal, 1988; Cavalli-Sforza et al., 1988,
1992; Harding and Sokal, 1988; Barbujani
and Sokal, 1990, 1991; Bertranpetit and
Cavalli-Sforza, 1991). The exceptions include Basques (Piazza et al., 1988; Bertranpetit and Cavalli-Sforza, 19911, Hungarians
(Barbujani et al., 1990), and Uralic-speakers
(Guglielmino et al., 19901, that is to say,
groups whose current languages do not belong to the Indo-European phylum.
Recent years have seen both a refinement
of the hypotheses on Indo-European origins,
and the emergence of contradictory data.
Based on archaeological and linguistic evidence, Cavalli-Sforza (1988) and Renfrew
(1991, 1992) argued that the common elements recognized within the so-called Nostratic linguistic macrofamily (Kaiser and
Shevoroshkin, 19881, including Indo-European, Altaic, Afro-Asiatic, and Elamo-Dravidian, derive from a common biological origin of most of their current speakers. It is
then possible to interpret the current distribution of most Nostratic, and not only IndoEuropean, languages as a consequence of a
multidirectional spread of agriculture. Recent genetic analyses agree with this view
(Cavalli-Sforza et al., 1993; Barbujani and
Pilastro, 1993). On the other hand, a model
incorporating the effects of the origin of agriculture and/or specifically the hypotheses of
Renfrew and Gimbutas failed to explain a
larger fraction of the correlations between
genetic and linguistic distances than is explained by the simple effects of geographic
distances (Sokal et al., 1992).
Further evidence in favor of either hypothesis may be obtained by simulating
their genetic consequences, and then comparing them with the patterns of genetic
variation observed in the field. In the only
simulation study available so far, presenttime genetic variation was demonstrated to
be compatible both with a Neolithic origin of
Indo-European speakers, and with later immigration from the Pontic steppes (Rendine
et al., 1986). However, the two hypotheses
were not contrasted in that study, but combined. A relationship between current genefrequency gradients and dispersal from the
east was evident, and seems undisputable.
However, when exactly, and by what type of
process, proto-Indo-Europeans spread has
not been convincingly ascertained. We are
particularly interested in establishing if the
demic diffusion of early farmers can, at least
in principle, explain a large share of current
genetic diversity among Indo-European
speakers, or if additional processes must be
included in the model to obtain a better fit
with real data. Among these additional pro-
cesses, we gave a special emphasis to the
migratory waves postulated by Gimbutas.
In this study, we simulated microevolutionary scenarios of increasing complexity,
from an unrealistically simple one to models
including several archaeologically documented migrations. We then calculated correlation coefficients between the simulated
and real gene frequencies (or, more precisely, between matrices of genetic distances
calculated from each of them). We expected
to observe an increasing agreement between
real and simulated data as the models get
more and more realistic. When an increase
in the complexity of the model is not
matched by an increase in the correlations,
we conclude that the new factors included in
that model do not improve our understanding of the phenomenon, and should therefore be considered unnecessary. This does
not automatically imply that these factors
played no evolutionary role at all. But, if
they did, evidence of their effects should be
sought in data other than the currently
available allele frequencies. A large database of European allele frequencies (see
Sokal et al., 1989a) was analyzed for this
Overview of the simulations
We carried out a series of computer simulations of five microevolutionary models. All
models were based on a stepping-stone population structure, consisting of a 60 X 37
regular lattice, superimposed onto the map
of Europe. Each node of the lattice represents a l-degree square quadrat. Of the
2,220 nodes of the lattice, 1,512 are land
areas supporting a human population. Each
node (population) is characterized by its effective size (N,) and by the frequency of one
allele (p).At each generation p undergoes
random variation, representing the effects
of genetic drift, which is a function of N,.
Migration is allowed only between adjacent
populations. The numbers of individuals migrating at each generation depend on population sizes, and on factors of resistance to
migration, which are zero across plains, but
greater than zero across mountain chains
and seas. The differences among the five
models lie in parameters other than those
described so far.
Each simulation experiment consisted of
440 iterations (representing generations) of
a series of population processes. In this way,
assuming a 25-year generation interval and
non-overlapping generations, each experiment spans 11,000 years. The simulated allele frequencies were printed at generation
440, representing present time, for the localities corresponding to those for which allelefrequency data are available in a database of
European allele frequencies (Sokal et al.,
1989a), from which non-Indo-European
speakers had been discarded. Matrices of
Prevosti’s genetic distances (Prevosti et al.,
1975) were calculated on both real and simulated allele frequencies, and their degree of
resemblance was evaluated by Mantel
(1967) tests of matrix correlation.
Five FORTRAN programs were written,
each corresponding to one of five models for
the origins of Indo-Europeans described in
the following section, and incorporating subroutines developed by Press et al. (1986)and
Manly (1991).
An outline of the models
IBD: Isolation by distance
The first, clearly oversimplified, hypothesis, is that Indo-European-speaking populations evolved under conditions of isolation
by distance (IBD model). Current patterns
of genetic variation would then simply result from the interaction between random
fluctuations of allele frequencies in time,
i.e., genetic drift, and dispersal of individuals. Under isolation by distance, variations
in population size affect only the impact of
genetic drift-the larger the population, the
smaller the allele-frequency fluctuations.
Population growth, which occurs after generation 40, does not prompt migratory movements. Thus, the IBD model neglects all
gene flow processes other than those in
which movements of individuals from their
birthplaces are local and random (i.e.,
equally likely in all directions except for migration resistance factors, see below).
Under the IBD model, the demographic
increase that occurred in the Neolithic (and
is detailed in the section Population Growth
Among Farmers) was simulated without
separating hunting-gathering and farming
populations, i.e., as if all hunter-gatherers
turned to agriculture a t 8,000BC.
OAC: Isolation by distance, plus effects
the origin of agriculture, and cultural
Cultural transmission from farmers to
hunter-gatherers may be built into the
model, yielding what we call OAC; C stands
OAG: Isolation by distance plus effectsof for culture. Under this model, at all localithe origin of agriculture
ties some hunter-gatherers learn how to
produce food, and therefore their alleles are
Isolation by distance is the null hypothetransmitted across generations with greater
sis for human microevolution (Wijsman and
efficiency. From the genetic standpoint, this
Cavalli-Sforza, 1984). Therefore, all models
is equivalent to a certain degree of admixthat follow are not alternative to IBD.
ture, whereby some genes of the hunterRather, they incorporate it as a necessary, if
not sufficient process, for determining the gatherers contribute to the gene pool of the
farmers. As a consequence, these genes
currently observed patterns of genetic variaspread at once with the genes of the farmers,
tion. OAG, the simplest such model, is one
and are thus carried into new localities. This
which combines IBD with the likely effects
is the Neolithic demic diffusion model, as
of the demographic processes following the
originally proposed (Menozzi et al., 1978;
origin of agriculture. Under this model, popRenfrew, 1987).
ulations of hunter-gatherers initially occupy
Europe and evolve under isolation by distance. At a specific moment in time (8,000
BC, chosen on the basis of archaeological inATC: Isolation by distance, plus effects
formation), a few populations in southern of the origin of agriculture, cultural
Anatolia turn to farming. This starts a local transmission, and archaeological
process of population growth in the areas time constraints
where farming is being practiced, followed
Under OAC, the spread of farmers from
by dispersal outwards when local population
densities have reached a certain threshold. Anatolia into Europe is driven by their inIn this way, migratory movements between crease in numbers at each locality, which
farming communities are not necessarily causes dispersal towards areas of lower popsymmetrical, as is reasonable to assume in ulation density. Therefore, in the OAC
many evolutionary scenarios (Rogers and model, the farming technologies spread a t
Jorde, 1987). The rate of spread of farmers is an approximately constant rate through
driven by their intrinsic growth rate; it is space (as in Ammerman and Cavalli-Sforza,
constrained only by geographical factors 1971). This is known to be an approximation
such as mountain chains or bodies of water. (Barker, 1985).A further refinement of OAC
No cultural transmission is simulated be- considers archaeological time constraints
tween the hunter-gatherers and the farmers (ATC). Under ATC, we use archaeological
who immigrate into their regions. Only the information about the likely date at which
farmers’ allele frequencies are eventually farming reached each specific site in Europe
compared with observed matrices of genetic (see Sokal et al., 1991). In this way, the
distances. In this way, the genetic conse- arrival of farmers into a new locality requences of this model are those that would flects archaeologically documented cultural
be expected if hunter-gatherers were re- transformations; farmers spread at an irregplaced without admixture, i.e., became ex- ular rate, corresponding to the actual protinct. The only microevolutionary role they cess as inferred from archaeological eviplay is to serve as a source population at the dence. Incorporation of hunter-gatherers
beginning of the Neolithic, for that small into each farming population occurs at the
fraction in Anatolia of the total population same rates and through the same processes
that develops the new farming technologies. as in OAC.
GIM: Isolation by distance, plus effects of
the origin of agriculture, cultural
transmission, archaeological time
constraints, and late Neolithic
In the OAG, OAC, and ATC models, the
first farmers are also considered the first
speakers of proto-Indo-European. The alternative hypothesis considers them as the Neolithic inhabitants of areas that were later
invaded by the first proto-Indo-European
speakers, the Kurgan people (Gimbutas,
1979, 1986). The three migrational waves
postulated by Gimbutas are added to ATC in
the GIM model, by simulating long-distance
migratory movements between 4,250 BC and
2,900 BC. A number of successive population
movements are added as well; presumably,
they were independent from the spread of
Indo-European, but are considered by Gimbutas (personal communication) relevant to
an accurate description of human evolution
in Europe.
Details on the simulation parameters
and algorithms
The data matrix
In all simulation cycles, a matrix of 60
columns by 37 rows was defined, each element in the matrix representing a square of
edge length 1 degree in a Mercator projection of Europe. The data matrix covers the
area between 10 degrees of longitude West
and 50 degrees East, and between 72 and 35
degrees of latitude North. Iceland is not included in this simulation.
ing the frequency of an allele at a polymorphic locus in the hunting-gathering population, i.e., in the only type of population
existing at the beginning of the simulation.
From a mathematical standpoint, it does not
make a difference if this locus is regarded as
biallelic, or if it is considered multiallelic,
since the fate of only one of its alleles is
simulated in the followingphases. Allele frequencies were drawn from a gamma distribution truncated at one (Nei, 19871, whose
mean was fixed either a t 0.33 or at 0.50.
Initial population sizes
Estimates of population densities among
current hunting-gathering tribes suggested
to Rendine et al. (1986) that the effective
size NHG of the hunting-gathering populations of Europe should be approximately 300
in each of the 840 elementary areas of their
simulation. To have the same population
density in the 2,220 pixels of this simulation, NHG was fixed at 114. This corresponds
to a population density of 0.04 individuals
per square km,within the estimated range
of population densities for hunter-gatherers
in temperate climates (Hassan, 1981). In
Rendine et al.’s (1986) model, the individuals were considered as haploid, whereas
here they are diploid. This may have caused
a certain degree of divergence between the
two models, as the drift variances are affected by the levels of ploidy of a population.
Genetic drift among hunter-gatherers
Each of the 2,220 elements (= nodes or
pixels or localities) of the data matrix contained an integer value, L, which was 1 for
plains, 2 for mountains, 3 for seas, and 4 for
the Black Sea. A local population was assigned to each of the 1,512 land pixels.
Non-overlapping generations were simulated. At each generation, a new allele frequency was drawn, for each locality, from a
normal distribution whose mean was the allele frequencyp of the same population a t
the previous generation, and whose variance w a s p 0 - p), divided by twice the effective population size NHG (Nei, 1987). This
represented the effect of sampling of alleles
from one generation to the following, i.e.,
random genetic drift.
Initial allele frequencies
Dispersal of hunter-gatherers
Under all the models tested, for each land
pixel a variable PHG was defined, represent-
Symmetrical dispersal occurred between
adjacent pixels, once every generation fol-
TABLE 1. Factors of resistance to migration (RTW
And an adjacent
pixel in t h e
Black Sea
Between a pixel in thePlains Mountains Sea Black Sea
lowing drift; population sizes of the two localities exchanging individuals did not
change as a result of dispersal. Therefore,
this study assumed a stepping-stone model
(Kimura and Weiss, 19641,whose properties
are discussed by Jorde (1980). We chose a
dispersal rate m of 0.065 (as in Rendine et
al., 1986), which means that at each generation, after reproduction, 6.5%of the resident
individuals could be replaced by immigrants
from the adjacent pixels. However, physical
obstacles in the pixels between which dispersal occurred could reduce the number of
migrants. Physical obstacles t o migration
were expressed by a factor of resistance to
migration, RTM, detailed in Table 1. The
average value of RTM, calculated between
all suitable pairs of pixels, was 0.300. This
means that, on the average, only 70% of the
potentially dispersing individuals actually
moved from their birthplace to an adjacent
locality. To compensate for this, the dispersal rate m was replaced by m' = 0.0651
0.700 = 0.0928. In other words, 9.28%of the
individuals in a locality were potentially
subject to migration elsewhere, and 6.5%actually migrated, on the average. The number N(AB) of individual hunter-gatherers
moving from locality A t o B (and vice versa,
from B to A) was
N(AB) = NHGm' (1 - RTM(AB)) (1)
where the denominator refers to the number
of adjacent populations in a stepping-stone
model, and RTM(AB) depends on the environmental features a t localities A and B (Table 1).WhenNHGwas fixed at 114, each pair
of adjacent localities exchanged 2 effective
individuals per generation. However, under
the IBD model the hunting-gathering populations increase in size, starting at genera-
tion 40; in this case, the number of migrants
per generation increased accordingly. When,
by contrast, hunter-gatherers decreased in
numbers owing to the expansion of farmers,
NAB) decreased as well until it reached
zero. Because of the RTM factor, two localities in the plains exchanged freely onefourth of the migrants allowed at each generation, whereas dispersal was reduced
between populations separated by mountains or bodies of water, and for populations
a t the extremes of the simulated area. No
dispersal was allowed across the Black Sea,
to more carefully represent the population
processes in the surrounding area (e.g., the
migrations of Kurgan people), which occurred mostly by land movements.
The allele frequency after migration,
pfHG, was then calculated as
P'HG = P N G [ ( ~-
m') (1 - RTMI
+ rn'PHGin (2)
where RTM is the average resistance to migration between the pixel of interest and the
nA adjacent pixels (1 < nA < 41, and pHGin
Inception of farming in the nuclear zone
Under all models except IBD, the spread
of farming starts with the splitting of some
populations into two groups, one practicing
agriculture, and the other still living in a
hunting-gathering economy. At generation
40 (i.e., 10,000 years ago), 20 individuals
turn to farming at each of six pixels in Anatolia, around the village of Catal Humk,
where the oldest archaeological evidence of
farming activities is situated (Redrew,
1991). Each group of 20 individuals represents a random sample of the pre-existing
hunting-gathering population at the same
site; therefore, initially they have the same
allele frequency: pF = PHG. However, from
generation 41 they evolve in reproductive
isolation, so that there will be two distinct
populations of hunter-gatherers and farmers, HG and F, at those localities, with distinct allele frequencies. In the absence of
cultural transmission between groups, i.e.,
under the OAG model, the two groups coexist without any genetic exchange, at all localities where farming communities have
been established.
pansion towards localities a t the same
latitude (i.e., with the same type of climate)
was presumably more common than dispersal northwards, towards more rigorous
climates. This led us to subdivide the history
of our simulated farming populations into
successive phases. Phase 1: Initially,
Population growth among farmers
the population is scarce, and simply tends to
The farming populations have access to a increase logistically, without sending emiwider range of resources, and tend to in- grants, but receiving immigrants from other
crease in numbers (Hassan, 1973); 50-fold farming populations a t higher density, if
increases have been estimated by Ammer- any. Phase 2: When NF reaches a first
man and Cavalli-Sforza (1984). We simu- threshold, T1, a few individuals begin to dislated a logistic increase, whose key parame- perse longitudinally; this often entails coloter, the growth rate, generally referred to as nization of a new site on the west, whereas
r (Feller, 1940; Eisen, 1979; Keyfitz, 1977), gene flow is asymmetrical with the eastern
was here called a,following Rendine et al. neighbors, whose density is still higher.
(1986). In the absence of reliable informa- Phase 3: As the population size approaches
tion on growth rates among early farmers, the carrying capacity of 7,560, a second
we tried a preliminary set of values, and threshold, T2 is reached, and gene flow occhose a = 0.5, which gave us a rate of popu- curs also northwards and southwards, once
lation increase compatible with the known again giving rise to a new population of
rates of spread of farming in Europe, 1 km farmers if the adjacent pixel to the north or
every year, on the average (Ammerman and south has not been colonized yet. The
Cavalli-Sforza, 1971). That value was em- thresholds had to be fixed in a somewhat
ployed in all the simulations presented here. arbitrary manner. It seemed realistic to alThe equation calculating, at generation t low for the first westwards dispersal two to
and for each locality, the effective size NF of three generations after inception of the
the farming population is
farming economy, so as to roughly match the
archaeologically documented rates of
spread. T1 was fixed at 24 (corresponding to
1) x (1 + a (1 a census size of 72 individuals), whereas for
T2 we chose 50% of the carrying capacity,
where NF(t - 1)is the population size a t the i.e., 3,780. Phase 4: Finally, when adjacent
former generation, and 7,560 is the carrying populations have reached their equilibrium
capacity of the area, chosen for reasons anal- size of 7,560 effective individuals, the migraogous to those that led us to choose tory exchanges become symmetrical. The
NHG= 114. Since the effective population general equation, expressing the number of
size is approximately one-third of the cen- farmers moving, say, from locality A to adjasus size (Wright, 19691, this corresponds cent locality B, N(AB),is
roughly to a farming population of 23,000
dwelling on a 1-degree-square quadrat of
land, and to 20,000 farmers in each elementary area of Rendine et al.’s (1986) simulation.
for NF(A) greater than the appropriate
threshold, i.e., T1 for latitudinal and T2 for
longitudinal movements, respectively. All
Dispersal of farmers and origin of
relevant quantities have been defined for
farming outside the nuclear zone
Equation 1. Since N,(A) is variable across
Under the OAG and OAC models, the generations, N(AB) varies too.
growth of farming populations prompts miIn this way, the input and output of genes,
gratory movements into neighboring locali- from and to the adjacent pixels of the map,
ties where farming has not yet started. Ex- could be different a t different times during
the simulations. If the number of immigrants from each locality is taken into account in the calculation of the allele frequency of the immigrants, which we called
pHGinin Equation 2 and which will be called
pFinfor farmers, a n analogous formula gives
the allele frequency after gene flow among
p'F = p F [ ( l
m ' ) (1 - RTM)I
tion 200, however, the four models yield a
similar pattern of land occupation. The only
major exception is an area of north-western
Alps, where archaeological evidence shows a
delayed onset of farming activities (Sokal et
al., 1991). Of course, such a delay was not
predicted by the mechanism of farming
expansion underlying our OAG and OAC
+ rn'PFin (5) Pixels in the sea
Figure 1 is an example of allele frequencies
generated under the OAG and OAC models.
Allele frequencies of specific localities were,
of course, different in different realizations
of the same process, and between OAG and
OAC. What was constant, however, was the
pattern of occupation of land areas by expanding farmers, because it depended only
on population growth and dispersal parameters, which were kept constant across realizations.
For the ATC and GIM models, by contrast,
the spread of farming followed the pattern
that can be inferred from archaeological evidence. Once a farming community exists at
a certain locality, the exchange of genes with
neighboring farming communities occurred
in the same manner as described for the
OAG and OAC models (Eq. 5). The difference is in the establishment of new farming
populations, which under ATC and GIM was
controlled by a matrix of dates of origin of
agriculture at each land pixel of the map
(details on how archaeological information
was processed for this purpose are in Sokal
et al. (1991)). Therefore, when colonization
of a new locality by the first farmers had
to be simulated, eight effective founding
individuals were sampled with replacement from the closest suitable locality. In a
few cases, this required input of immigrants
from localities that were not directly adjacent to the one in which farming was starting. This was the only violation that we
tolerated of the assumptions of the stepping-stone model. Figure 2 is an example
of allele frequencies generated under the
ATC and GIM models. It shows that the
spread of farmers is initially slower than
simulated under the OAG and OAC models,
especially along the northern shores of
the Mediterranean Sea. By genera-
A stepping-stone model does not allow for
long-distance population movements, i.e.,
those associated with sailing, which are considered important in the colonization of Europe, and in the successive phases of agricultural dispersal (Renfrew, 1987). To simulate
the effects of the movement of a few individuals across the seas, we chose to assign a
pseudopopulation to each pixel located in
the sea. Pseudopopulations did not undergo
random fluctuation of allele frequencies,
and did not increase in numbers. They included only the individuals dispersing from
neighboring pixels (their number was determined according to Equation 1)) whose descendants had the same allele frequencies,
and a t each generation proceeded one step
forward in the dispersal process. In this
way, the movement of a few individuals
across the sea was simulated. This had little
importance for the allele frequencies of
Fig. 1. Spread of farmers under the OAG and OAC
models. The localities where farming is being practiced
are indicated by letters representingallele frequency in
farmers, at generations 100 (6500 BC), 200 (4000 BC), and
240 (3000 BC). Eight allele-frequencyclasses are defined,
from a to h, each corresponding to an interval equal to
0.125 (a, p F < 0.125; b, 0.125 <p,0.250; c, 0.250
< p F < 0.375; etc.). Hyphens represent areas inhabited
only by hunter-gatherers.The nuclear zone is delimited
by a solid square. While the pattern of land occupation
shown was constant for all the realizations of the models, the allele frequencies depicted are those of a single
run of OAC.
Fig. 2. Spread of farmers under the ATC and GIM
models. The localities where farming is being practiced,
based on archaeological information, are indicated by
figures representing allele frequency of farmers, at generations 100 (6500 BC), 200 (4000 BC), and 240 (3000 BC).
Allele-frequencyclasses are as in Figure 1. The pattern
ofland occupationwas again the same in all realizations
of ATC and GIM, but the allele frequency shown is that
of a single realization ofATC.
hunter-gatherers and for those of farmers
once the spread of farming through Europe
was completed; however, it had an effect on
the establishment of farming communities
under the OAG and OAC models, as a few
individuals could quickly reach distant localities by sea. The resulting pattern of occupation of coastal regions corresponds well
with the archaeological evidence (see Figs. 1
and 2).
Cultural contacts and admixture
Rendine et al. (1986), the value of y was
adjusted so that resultant values of S had
approximately the same magnitude as in
Equation 6. However, we found, over a
range of test conditions, that the results
were only trivially affected. Therefore, we
report results for Expression 6 without redoing the analysis for every test condition.
Disappearance of hunting-gathering
In the IBD model, hunter-gatherers adopt
Under the OAC, ATC, and GIM models, at
at generation 40, and thus all genes
each generation a certain number of hunterof
speakers come from the
gatherers adopted farming, if a farming
community already existed a t the locality genetic pool of the hunting-gathering comwhere they lived. The likelihood of this cul- munities. In the OAG model, conversely, the
tural shift depended on the probability of hunter-gatherers go extinct, and thus all
contacts between farmers and hunter-gath- genes of Indo-European speakers derive
erers, and on a coefficient of acculturation from the genes of the few first farmers of
which was called y by Rendine et al. (1986). Southern Anatolia. These are the two exThe number S of individuals shifting to treme models, as far as the origins of Indofarming at each generation is related to the Europeans are concerned. Under the other
probability of contacts between farmers and models, a certain degree of admixture is simhunter-gatherers. If farmers are NF at a ulated between the two communities at each
given locality and a t a given moment in locality, reflecting a widespread view of hutime, their probability of meeting one of the man evolution in Europe (Cavalli-Sforza
NHG hunter-gatherers will represent a frac- and Piazza, 1993).
Under the OAC, ATC, and GIM models,
tion equal to 2(NF x NHG) of all the
hunters and gatherers are considered to
(NF+ NHG)' possible contacts. The probadisappear
from a certain locality when their
bility y that such a contact will result in
such that S is less than 1. Beacculturation has been estimated at 0.00024
starts after a phase of
(Rendine et al., 1986). Therefore, at each
generation, the NF farmers will transmit population buildup for farmers, the extinctheir technologies t o a number S of the tion of the hunting-gathering communities
also proceeds as a wave, from southeast to
hunter-gatherers estimated as
northwest, spreading in parallel with the
farming economy, but several generations
Long-range migratory movements
where all parameters have already been defined.
It can be argued that the change in a particular quadrat of the number of huntergatherers per generation might better be
modelled as proportional to the product of
the number of hunter-gatherers and the
number of farmers, that is
In the GIM model, three major migratory
waves of Kurgan people are supposed to
have introduced Indo-European languages
into Europe, around 4,250 BC, 3,400 BC, and
2,900 BC, respectively (Gimbutas, 1979,
In Gimbutas' (1979) view, the westward
migrations of Kurgan people in Europe inS = YNHGNF
(7) troduced a new patriarchal culture, characterized by horse-riding and new warfare
where y = 1.56250 x
In this expres- techniques. These cultural changes were not
sion, closely resembling a formula used by associated with major innovations of the
subsistence techniques. Most of the populations of Europe by then were farmers. It is
therefore highly unlikely that concomitant
population growth could occur. We chose to
represent these waves as a flow of genes
from the purported source area, north of the
Black Sea and west of the Caspian Sea. For
the sake of simplicity, these movements
were concentrated in one generation’s time
(at generations 190,224, and 244), although
each of them probably lasted two centuries
(Gimbutas, 1979). For each movement, we
simulated replacement of 20% of the genes
in the target area, with genes coming from
the source area. Probably this overemphasizes the genetic consequences of the simulated migratory movements.
The invading Kurgan people were not the
entire population of the area between the
Black and Caspian sea moving en masse;
rather, they were groups of individuals belonging to semi-nomadic tribes (Gimbutas,
1979). Accordingly, we chose to simulate
their contribution to the genetic pool of the
”invaded” populations as if they were coming from several populations in the appropriate zone. The location of four such populations was chosen a t random, and then held
constant in all 2,600 simulation cycles of the
GIM model. The allele frequencies of the recipient populations (Fig. 3) were then recalculated as if 20% of the pre-existing individuals had been replaced by immigrant Kurgan people.
In addition, Gimbutas (personal communication) pointed out to us 12 other directional and potentially migratory processes
that may have been important in determining the current linguistic population structure of Europe. These processes are summarized in Figure 4, and were incorporated in
the GIM model. Once again, for each of them
we simulated replacement of 20% of the individuals of the “target” area by individuals
whose allele frequency was the average allele frequency in the “source”area.
parameters being constant. Gene flow is reduced at generation 265, i.e., 3,375 years
ago, by which time all suitable regions had
been colonized by early agriculturalists. The
parameters of the simulation are summarized in Table 2.
Gene-frequency data
The simulated sets of allele frequencies
were compared with a database of allele frequencies, which had been analyzed in various studies on Europe (Sokal et al., 1988,
1989a,b, 1990, 1991, 1992; Harding and
Sokal, 1988; Barbujani and Sokal, 1990;
Sokal, 19911, and had been continuously updated. The data corresponding to populations speaking languages other than IndoEuropean (Basque, Finnish, Estonian, Lapp,
Hungarian, Turkish: Ruhlen, 1987)were discarded.
Twenty-six genetic systems were considered. Most of them corresponded to independent loci; exceptions are ABO, MN, and Rh,
for which two (or three, for Rh) systems were
independently considered, each resulting
from typing of alleles by different sets of antisera. This convention has long been followed in studies on human variation (e.g.,
see Lewontin, 1972). Each system is indicated by a letter code, preceded by a number
referring to Mourant’s coding system (Mourant et al., 1976), except for 100HLA-A,
101/2HLA-B, 200GM, and 201Kh4, whose
numerical codes were assigned in our laboratory. Overall, 3,481 records, and 93 alleles or haplotypes were considered.
The number of samples available for the
26 systems varied widely, ranging from a
minimum of 27 (for 5-1 LUTHERAN), to a
maximum of 762 (for 1-1ABO).Genetic differences between localities were summarized by 26 matrices of Prevosti’s distances
(Prevosti et al., 19751, separately calculated
for each system. We shall refer to these matrices as observed distance matrices, as opposed to the simulated ones, generated by
Decrease of population mobility after the computer under one of the five models
establishment of a farming economy
tested. To properly compare the two sets of
Once farming populations have reached data, prior to calculating genetic distances
the maximum size allowed by the programs, we pooled the observed frequencies of all ala reduction of mobility is simulated by sim- leles except the one whose average freply halving the migration rate m’, all other quency was closest to 0.5.
Fig. 3. Migratory movements simulated under the
GIM model. The four localities whose allele frequencies
are averaged, to represent the allele frequencies of the
migrating population, are marked by asterisks. The regions affected by the three migratory waves proposed by
Gimbutas are surrounded by solid lines; the zone where
Basque is currently spoken is not supposed to have been
affected by these migratory episodes. Figures refer to
the generation at which each migratory wave was simulated.
Hypothesis testing
For every one of the 26 genetic systems,
each of the five simulation programs was
run 100 times, yielding 13,000 simulation
cycles (or realizations) in all, 2,600 for each
model. Every realization resulted from deterministic movements of individuals across
the map of Europe, and random allele-frequency fluctuations occurring during initialization and from genetic drift each generation. The latter were dictated by a
random number generator, and gave rise to
the various replicates at generation 440. A
particular realization of the OAG process
adds OAG movements to those already required by the IBD model, but random allelefrequency fluctuations are the same. This is
also true for the other models, each incorporating the previous ones. For all models,
then, random change in allele frequencies
was the same, for each pixel and each generation, under all five models. It was not the
same, however, for the hunting-gathering
and for the farming populations of the same
pixel. Thus, the 500 cycles of simulation for
each locus fall naturally into 100 groups,
each with five matched runs, in ascending
model order. Because of the matching, we
Fig. 4. Twelve potentially important migratory processes considered by Gimbutas (personal communication) to have been relevant in European ethnohistory, each one represented by an arrow from a
“source”region to a “target”region. Figures refer to the generation at which each population movement
was simulated.
could use paired statistical tests to compare
models. This provided a considerable increase in statistical power over unpaired
After 440 generations in each computer
cycle, simulated allele frequencies were
sampled from localities chosen so as to
match the locations of the samples of the
observed allele-frequency database. Matrices of Prevosti’s distances were computed
from the simulated data, so as to obtain 100
simulated matrices for each matrix of observed genetic distances (Prevosti et al.,
1975). This measure of genetic distance was
chosen for consistency with previous studies
(Sokal, 1988; Sokal et al., 1993).
Simulated and observed matrices were
then compared pairwise by means of Mantel’s test of matrix association (Mantel,
1967; Smouse et al., 1986). This test computes the equivalent of a correlation coefficient between matrices, and evaluates its
significance by constructing a null distribution of the test statistic. A Monte Carlo procedure is employed for this purpose; rows
and columns of one matrix are repeatedly
permuted at random, while the other matrix
is kept constant, and the test statistic is re-
TABLE 2. Parameters defined in the simulation
Environment: 1 = plains; 2 = mountains; 3 = seas; 4 = Black Sea
Effective population size of hunter-gatherers (114in all models but IBD, where it is allowed to
increase up to 7560. In the OAC, ATC, and GIM models it is then reduced as an effect of the
cultural transmission of farming technologies)
Effective population size of farmers (Initially equal to 0, then allowed to increase up to 7560 in all
models but IBD)
Frequency of one allele among hunter-gatherers (initially sampled from a gamma distribution)
Frequency of one allele among farmers for all models except IBD (Initially undefined. For the six
pixels of the nuclear zone, pF = pHG a t generation 40. In the other pixels, the initial p F value
reflects the proportion of immigrating farmers and of hunter-gatherers who were incorporated into
the farming population, under the different models)
Intrinsic growth rate of the farming populations
Acculturation rate, i.e., rate of assimilation of hunter-gatherers by farmers
Migration rate
Resistance to migration between localities A and B (see Table 1)
Average resistance to migration between one pixel and its adjoining pixels
calculated each time, so a s to yield the desired null distribution.
Because each observed matrix was compared with 100 simulated matrices for each
model tested, a procedure was needed to
combine all this information. We chose to
compute average Mantel correlation coefficients, and to calculate Fisher’s combined
probabilities (Sokal and Rohlf, 1995) from
the 100 individual probabilities for each
Since we are looking for positive association of observed and simulated data, and a
negative correlation would have no biological meaning, all tests of significance were
one-tailed. However, a s a further control, we
also counted the number of occurrences of
negative correlations that would be significant if the test had been two-tailed. This
allowed us to identify the models generating
allele-frequency distributions departing
widely from the observed distributions.
The main purpose of this study was to
compare competing hypotheses on the origin
of Indo-Europeans. Because the hypotheses
can be ranked, by increasing complexity,
from IBD through OAG, OAC, ATC, and
GIM, four painvise tests of goodness of fit
were carried out, OAG versus IBD, OAC versus OAG, ATC versus OAC, and GIM versus
ATC. This was done taking advantage of the
paired design based on the same random
seeds in the simulations. We employed two
different procedures: 1) a paired-comparisons t-test, where the test statistic was the
difference between the average Mantel cor-
relation coefficients, and only positive differences (indicating a n improvement of the
fit for the more complex hypothesis) were
considered significant; 2) Wilcoxon’s signed
rank test (Sokal and Rohlf, 19951, a nonparametric paired-comparisons test, once
again considering significant only the cases
in which the fit improved for the more complex hypothesis.
The average Mantel correlations for each
genetic system (Table 3) yield numerous significant agreements between observed and
simulated matrices of genetic distances, for
all models, although fewest with IBD. The
number of significant ( P < 0.05) positive average correlations is maximal for the ATC
model (20/26), but it is not substantially
lower for OAG, OAC, and GIM (respectively,
18, 17, and 16 systems). For IBD it is only
10/26. The numerical values of the correlations are low despite their high level of statistical significance. This is characteristic
for genetic distances and is because the
Mantel correlations were constrained to be
Next we examine the number of cases in
which individual simulation realizations
gave genetic distance matrices that would
appear negatively associated (P s 0.05)
with the observed one, if the test had been
two-tailed. A substantial number of disagreements, between observed and simulated data is evident for IBD (6 at system
4-13 RHESUS), for ATC (8 in the 1-1 ABO
TABLE 3. A) Average Mantel correlations of observed with simulated genetic distances and B) significance levels based
on ont-tailed probabilities for positive correlation combined by Fisher's method'
A. Mantel correlations
B. Significance levels
Overall probability
'Values below 0.05 show significant positive correlation between simulated and observed genetic distances
and 11 in the 2-5 MN systems), and for GIM
(26 in the 1-1 ABO, and 8 in the 4-19
RHESUS systems).
The painvise comparisons by t-tests and
by Wilcoxon's signed ranks test agree in
their means and significances that the most
significant increase in the resemblance of
observed and simulated genetic distances
occurs between IBD and OAG. In the t-tests
shown in Table 4 , 1 7 systems show a significant increase of correlation,whereas in only
7 cases does the similarity decrease. These
TABLE 4. Results of paired comparisons t-tests for differences between 5 Indo-European simulation hypotheses:
P ( H I ) - P (H2)’
Mean difference
‘Asterisks indicate significant positive differences as follows: *0.05
- 0.09307
- 0.03060
- 0.10482
0.01,**0.01P P > 0.001,***0.001 P.
figures compare with 7 systems showing sig- although ATC shows a significant positive
nificantly improved correspondence and 10 correlation for the largest number of sysshowing decreased correspondence for OAC tems, on the average, correlations between
versus OAG, 6 and 19, and 5 and 19, respec- observed and simulated genetic distance are
tively, for ATC versus OAC, and GIM versus higher for OAC and OAG.
ATC. These results are reflected in the
To test the plausibility of our simulations
mean differences shown at the bottom of Ta- we also calculated FsTvalues (Wright,
ble 4.
1978) for both our observed gene-frequency
Using Wilcoxon’s criterion, the results do surfaces and the simulated surfaces. The
not change much, so we do not feature them median results over all genetic systems are
as a separate table. The numbers of systems 0.011780 for the observed surfaces, and
showing significance increased and any de- 0.098273, 0.10399, 0.096562, 0.056609, and
creased resemblance between observed and 0.003211, respectively, for the simulated
simulated data are, respectively, 17 and 7 surfaces of models IBD, OAG, OAC, ATC,
for OAG versus IBD; 10 and 9 for OAC ver- and GIM. Although the FsTvalues of the
sus OAG; 5 and 19 for ATC versus OAC; and observed surfaces overlapped only slightly
5 and 18 for GIM versus ATC.
those of the simulated surfaces, the median
We conclude, as a result of all the tests, of the observed data falls within the boundthat similarity between observed and simu- aries of the medians described by the modlated genetic distances increases from IBD els. The latter fall into 3 groups by magnito OAG, and, to a lesser extent, from OAG to tude of FsT.These are 1)IBD, OAG, OAC; 2)
OAC. On the contrary, it decreases as mod- ATC; and 3) GIM. Thus, the clear superiorels are tested in which the spread of farmers ity of OAG and OAC over IBD cannot be
is constrained by archaeological time data shown by FST since various patterns of local(ATC), or demographic processes occurring ity differentiation can yield the same F,,
in the late Neolithic are added (GIM). Thus value.
Which model matches observed
data best?
The IBD model assumes that, in the Neolithic, groups of hunter-gatherers and farmers were not separated. The former gradually turned to farming, so that there was a
genetic continuity between pre- and postNeolithic populations in Europe, and IndoEuropean languages spread only by cultural
transmission. Allele frequency patterns generated under this model resemble poorly the
patterns of genetic variation observed in
contemporary populations, showing that
this evolutionary hypothesis does not fit
with the available genetic evidence.
Resemblance between observed and simulated patterns is much greater for the other
four models, in which farmers evolve separately from hunter-gatherers, and processes
of population expansion are important. The
levels of resemblance, however, do not differ
much among these four models. For instance, the GIM model, including several
population processes occurring in the last
5,000 years, does not give higher correlations, or significant correlations at a higher
number of loci, than the OAG model, where
the demographic changes prompted by the
origin of agriculture are simulated in a
much rougher manner.
Actually, various results of this simulation study indicate that the fit of simple
models, such OAG and OAC, is better than
that of more complex models. The Mantel
correlations between observed and simulated genetic distances would be negative
and significant in only 21 of the 2,600 cases
for OAG, and in 14 cases for OAG, had tests
been two-tailed. These figures compare with
89 and 105 negative significant correlations
for ATC and GIM, respectively. It seems,
therefore, that nothing is added to our understanding of the phenomena, if we add archaeological time data to constrain the
spread of Neolithic farmers, and even less so
if we simulate population movements in the
late Neolithic. The models where farmers
disperse into new areas simply because of
their numbers, which increase logistically,
yield patterns showing a better agreement
with the observed data.
Similarly, the pairwise comparison of
models show a substantial increase of fit of
the OAG over the IBD model (Table 41,
whereas the elements included in the ATC
and GIM simulations cause a slight but evident departure from the patterns observed,
making them poorer fits than OAG.
A first conclusion one may draw from the
results of this simulation study is that two
models account best for many aspects of the
contemporary genetic structure of IndoEuropean-speaking populations of Europe.
One is the demic diffusion model, as originally put forward by Menozzi et al. (1978),
and associated with linguistic evidence by
Renfrew (1987). Under this model, here
called OAC, the two forces driving microevolution in Europe were population growth determined by farming, and dispersal accompanied by limited population admixture
between early agriculturalists (possibly proto-Indo-European speakers) and preexisting
hunters and gatherers. The other model,
OAG, is a simplified version of the demic
diffusion model, in which dispersal of farmers does not lead to any degree of admixture
with hunter-gatherers.
How plausible is the OAG model?
While the OAC model has already received support from studies focussing on its
genetic (Sokal et al., 1991; Cavalli-Sforza et
al., 19931, as well as linguistic and archaeological, aspects (reviewed in Renfrew, 1992),
what we called OAG here has not been analyzed in detail so far. An apparent problem
with it is, how can a model not involving
admixture account for the continent-wide
clines observed in Europe?
Inspection of gene-frequency maps generated in this study, at various moments in
time, shows that founder effects are common
while farmers disperse. Founder effects are
due to the limited numbers of individuals
who start the farming communities in new
localities. In the OAG model as we simulated it, most farming communities start
with 8 effective individuals; but even if this
number were larger, the probability for the
allele in question to be lost or fixed would be
substantial. Loss of genetic variation
through repeated founder effects has been
invoked as the likely cause of clines in sev-
era1 studies on natural populations of toads
in Australia (Easteal, 1988) and aquatic invertebrates in Canada (Boileau et al., 1992).
Theoretical work on the genetic effects of
colonization of previously unoccupied localities (Wade and McCauley, 1988) agrees with
this view.
An additional factor, increasing the likelihood of clines even in the absence of admixture between farmers and hunter-gatherers,
is the Black Sea. Archaeological evidence
(e.g., see Renfrew, 1991) indicates that two
waves of early farmers dispersed westwards
and northwards from the Near East, with
the Black Sea separating them (this is why
we did not allow movement of individuals
through it, but only along its coasts). The
two waves later converged in eastern Europe, after a period of independent evolution. If the same allele had been lost, or
fixed, in both groups of farmers, no particular pattern would result; but if founder effects had had opposite consequences in the
two groups, the successive admixture would
initially determine a steep cline, and successive gene flow would smooth it, resulting in
a wide gradient (Endler, 1977).
Even under OAG, therefore, a certain role
of admixture is important. But admixture,
under OAG, is between different groups of
farmers, who were geographically separated
in part of their evolutionary history, rather
than between farmers and hunter-gatherers
of the same area. This interpretation emphasizes the role both of geographical factors, such as distance between regions, and
of cultural barriers between sympatric communities of farmers and hunters-gatherers.
Indeed, physical barriers are often associated with genetic and linguistic change,
even between Indo-European speakers (Barbujani and Sokal, 1990, 1991), although
other evolutionary mechanisms may also account for that association (Barbujani, 1991).
Genetics and Kurgan waves
Introducing the three migratory waves
postulated by Gimbutas (GIM model) into
the simulation, not only does not increase
the correlations, but somewhat reduces
them. This means that the current patterns
of allele frequencies among Indo-Europeans
can be explained without resorting to the
migrations of Kurgan people. This study
cannot establish whether or not these migration events really occurred, but, if they
occurred, they did not leave a significant
mark on the allele frequencies of current
Renfrew (1987) argued that the cultural
transformations that led Gimbutas to hypothesize late-Neolithic migration waves
could be due to cultural contacts instead,
and equated the first Indo-Europeans with
the first farmers. The extensive changes in
ceramics, architecture, and metallurgy occurring in the late Neolithic are then attributed to trading and imitation; long-distance
migratory movements, if any, may have
been marginal. Although not proved by our
simulation, this view is fully compatible
with it.
This study, therefore, agrees with the
main views expressed by Menozzi et al.
(19781, Rendine et al. (19861, Piazza (19931,
and Cavalli-Sforza et al. (1993). By contrast,
the emphasis laid by the same authors on
late Neolithic migrations from the Pontic
steppes (Cavalli-Sforza et al., 1993) does not
find support in our simulations. Among the
possible causes of this discrepancy, it may be
that Mantel's correlations are not sensitive
enough to recognize the effects of minor processes of gene flow, such as those presumably occurring in the late Neolithic. Alternatively, however, or in addition, one should
consider the possibility that principal components associated with low eigenvalues reflect, at least in part, artificial gradients due
to data interpolation. This may be the case
for areas where population samples are
sparse, such as most of eastern Europe. For
example, the Caucasus seems to show clinal
variation in the first and third principal
components of Cavalli-Sforza et al. (1993),
but a detailed genetic study shows that
clines are very uncommon there (Barbujani
et al., 1994).
Our evaluation of the Gimbutas model
should be revised if evidence could be provided that the spread of the Kurgan people
was accompanied by an increase in population sizes larger than that simulated by us.
A certain level of ambiguity exists about
this, as the movement of people from the
Pontic steppes that Gimbutas (1979) hy-
pothesized is called a “population expansion” by Cavalli-Sforza et al. (1993). These
authors seem to suggest that, because of the
warfare technologies associated with it,
larger populations could be supported in the
regions affected. This aspect remains to be
explored, and we do not have evidence for or
against this view. However, even if this had
been the case, the increases in population
sizes prompted by the beginning of food production seem to have been much larger than
those associated with new war technologies
(see Ammerman and Cavalli-Sforza, 1984).
Unless European populations increased dramatically in size between 6,000 and 5,000
years ago, as they did with the arrival of the
new farming technologies, we conclude that
the long-distance migrations postulated by
Gimbutas remain an unnecessary element
in the evolution of Indo-European-speaking
populations, as reconstructed from the comparison of theoretical models and gene-frequency data.
Besides, early farmers expanded into areas of low population density, where few immigrants could substantially modify the genetic build up of local populations; but this
was not the case for late Neolithic groups,
who invaded regions already occupied by
large farming communities. Simulations of
genetic processes based on the coalescent
approach (see Hudson, 1990) show that patterns of genetic variation do not tend to
change much after a demographic expansion
(Harpending, 1994; Rogers and Jorde,
1995). Successive population movements
can smooth out the gradients and blur some
patterns, but are unlikely to leave a significant mark on allele frequencies.
Are parasites responsible for clines?
Recent evolutionary models (reviewed in
Ladle, 1992) indicate that new genotypes
entering an area could be resistant to the
parasites that are already adapted to the
common resident genotypes. The new genotypes would then increase in frequency, until the parasites adapt to them. A selective
mechanism of this type, combined with gene
flow, might have been important in determining the European clines of allele frequencies; models may be envisaged whereby
a form of frequency-dependent selection,
rather than limited admixture or founder
effects, leads to clinal variation of gene frequencies (e.g., see Hedrick, 1986). It is intriguing to note that the hypothesis of demic
diffusion from the Near East was initially
developed to account for clines a t the histocompatibility loci, HLA-A and HLA-B
(Menozzi et al., 1978; Sokal and Menozzi,
19821, and that the most significant evidence for clines spanning Eurasia has been
found at the glyoxalase locus (Barbujani,
19871, which is linked with HLA on chromosome 6, in a region of extensive linkage disequilibrium (Hedrick et al., 1986).
However, this view, although compatible
with the gradients existing a t the HLA and
linked loci, can hardly account for the patterns of variation observed among Indo-European speakers at other, independently inherited, loci. Had the resistance to parasites
been the main cause of clines in Europe, one
would expect isolation by distance patterns
at most loci not involved in tissue recognition, which is not the case (Sokal et al.,
1989a; and this study). On the contrary, the
nearly parallel gradients observed for many
independent alleles suggest that an evolutionary pressure affecting the entire genome
and not merely part of it, i.e., gene flow,
played a major evolutionary role (Slatkin,
Relation to other work
Diakonov (1984; cited in Redrew, 1987)
listed what he called the essential questions
concerning the origins of Indo-European
speakers: Who migrated? Why? How many
of them were there? Was it actually a migration of people, or rather the transfer of a
language from one population to another?
The present study may contribute to answering some of these questions. Our results
show that migrations in the late Neolithic,
which have been inferred from changes in
the material culture of eastern and central
Europe (Gimbutas, 19791, are not reflected
in the current genetic structure of Indo-European-speaking populations. Conversely,
the correlations of observed and simulated
data are positive and significant only if we
simulate dispersal of farmers from the Levant by demic diffusion. The results of this
study are, therefore, compatible with the
view of identifying proto-Indo-European
speakers with the first Neolithic farmers
(Renfrew, 1987).
These findings appear at first glance to
contradict the results of Sokal et al. (1992)
who use the same genetic dataset. These authors concluded that geographic proximity
explained a substantial amount of the observed correlation between genetic and IndoEuropean linguistic distances. However,
after allowing for geographic distances, statistically significant partial correlations remain, which are not explained by distances
describing the origin of agriculture by demic
diffusion, Renfrew’s hypothesis (as described by his postulated transitions subsequent to the origin of agriculture), or Gimbutas’ hypothesis. But note that the study
by Sokal et al. (1992) was based on the spatial pattern of correlations between genetic
and linguistic distances, whereas this study
examines genetic variation patterns only.
The simulations reported here do not consider the patterns of linguistic diversity observed in Europe today. Our findings are
therefore compatible as well with a simpler
model in which the observed genetic patterns reflect the process of demic diffusion
accompanying the origin and spread of agriculture in Europe, but these populations are
not the proto-Indo-Europeans. Note that the
new findings provide further support for the
demic diffusion hypothesis of Ammerman
and Cavalli-Sforza (1984). Statistical tests
of this hypothesis were carried out by Sokal
et al. (1991) using origin-of-agriculture distances constructed from observed dates of
the onset of the Neolithic. In the present
study these distances were constructed from
the simulation results using simple models
€or the spread of farming populations. It is
reassuring that these models yield results in
agreement with observed genetic patterns
as had already been noted by Rendine et al.
Our work also offers an answer to Diakonov’s second question; presumably, protoIndo-European speakers dispersed because
their increase in numbers forced them to
look for new suitable land. A study of genetic
variation such as this cannot provide reliable estimates of population sizes in the re-
mote past, and even less so of numbers of
migrants. However, even a very small number of dispersing individuals may yield patterns that correlate with the observed ones,
as seen for the OAG model. The results of
this study are compatible both with a complete replacement of pre-existing huntergatherers by Near Eastern farmers (the
OAG model), and with the more conventional view that this replacement was only
partial, and that hunter-gatherers contributed to some extent to the genetic pool of
Indo-European speaking populations (the
OAC model). But the view whereby language replacement was largely independent
of population movements (Zvelebil and
Zvelebil, 1988) fails to account for the largescale clinal patterns matching the direction
of the spread of agriculture observed in Europe, and therefore does not seem easy to
reconcile with the available genetic evidence.
In principle, one could also envisage a scenario whereby expanding farmers determined the main genetic characteristics of
European populations, whereas Indo-European languages spread in a later moment,
and mainly by a cultural process. Although
this cannot be ruled out, it does not seem the
best explanation available for the current
patterns of genetic and linguistic variation.
The model of Neolithic demic diffusion proposed by Renfrew (1991) predicts the existence of clines in three linguistic groups
which are supposed to have expanded together with Indo-European. Many such
clines have actually been observed among
speakers of Altaic and Elamo-Dravidian
languages (Barbujani and Pilastro, 1993;
Barbujani et al., 1994). Moreover, some of
these clines disappear if different linguistic
groups are jointly analyzed (Barbujani and
Pilastro, 1993). Although not a proof, these
findings suggest that linguistic affiliation is
the key to deciphering gene-frequency patterns in much of Eurasia. In the areas where
Indo-European, Elamo-Dravidian, and Altaic languages are spoken, linguistic, genetic, and archaeological evidence can
jointly be accounted for by a demic expansion from the Near East. Clearly, some languages may also have changed by cultural
contact (some examples are well documented: Renfrew, 1991); however, the overlap between large clines and linguistic areas
suggests that cultural transmission had a
lesser impact than demographic expansions.
among Indo-European speakers. These
models are not mutually exclusive, and it
may well be that both phenomena were important, a t different localities. Conversely,
there is no evidence for a major evolutionary
role of migratory phenomena occurring in
the late Neolithic. These phenomena may
have affected population sizes and allele frequencies on a local scale, but the large-scale
structure of Indo-European speaking populations seems basically to reflect Neolithic
demic diffusion.
New archaeological evidence will certainly be valuable for describing in detail
times and modes of Neolithic expansions, on
which our ideas are certainly simplistic at
the moment. On the genetic side, new allele
frequency data on previously neglected populations are unlikely to substantially alter
the picture, since, for most loci, the data sets
already include hundreds of samples.
Rather, collection and analysis of mtDNA
may offer a new perspective on human evolution in Europe, as it has already done for
other areas, including Oceania (Stoneking
et al., 1990) and the Americas (Ward et al.,
1991,1993; Torroni et al., 1992; Wallace and
Torroni, 1992).
The replacement of a food-collectingeconomy by one of food production has been a
complex process. In the Mediterranean area,
for instance, farming replaced huntinggathering very rapidly in certain regions,
but gradually in other regions where the two
economies coexisted for centuries (Barker,
1988). However, this does not seem to have
deeply affected gene-frequency patterns,
since the ATC model, where farmers spread
at an irregular rate, did not result in a
greater correspondence of observed and simulated genetic distances than OAG and
OAC. This may mean that the variable rate
of spread of Neolithic agriculture depended
on factors in the physical environment. Once
these factors are incorporated in the model
(OAG or OAC), simulated and observed allele frequency patterns resemble each other.
At any rate, the models put forward and
tested in our study should doubtless be regarded as approximate. Archaeological, linACKNOWLEDGMENTS
guistic, and genetic studies of individual
This is contribution No. 913 in Ecology
populations will certainly add details to our
reconstruction of European history; complex and Evolution from the State University of
models of the Indo-European expansion, New York a t Stony Brook. We thank Prof.
such as the one outlined by Sherratt (1988), Marija Gimbutas and Lord Renfrew for their
may be reformulated in such a way as to collegial cooperation in this work. We are
become comparable with genetic data. Nev- indebted to Barbara A. Thomson for techniertheless, this study indicates that not all cal assistance. Jeff Walker computed the
hypotheses on the origins of Indo-Europeans 3’-statistics and Donna DiGiovanni wordaccount equally well for the available ge- processed the manuscript. Part of the comnetic evidence.
putation was carried out on the Cornell
The hypotheses whereby Indo-Europeans National Supercomputer Facility. This
entered Europe as the first farmers show research was supported by National Science
the best fit. There seems to be no cogent Foundation grant BNS 9117350. This paper
reason to think that the farmers’ spread was was prepared while Robert R. Sokal was a
due to factors other than their tendency to Fellow at the Center for Advanced Study in
grow in numbers, thanks to the increased the Behavioral Sciences at Stanford, Caliresources available. Both incomplete admix- fornia. He is grateful for financial support
ture with hunter-gatherers, and founder ef- provided by the National Science Foundafects occurring in the expansion of farmers, tion grant SES-9022192, and by sabbatical
seem to account satisfactorily for the ob- funds from his home institution. Guido Barserved patterns of genetic differentiation bujani wishes to acknowledge fruitful dis-
cussions with Professors Luca Cavalli- Diakonov IM (1984)On the original home of the speakers~of Indo-European.
Sforza, H~~~ ~
d ~ i ~ Soviet
~ ~Anthropol.
~ , Archaeol.
Italo Scardovi, and Michael Turelli.
Easteal S (1988)Range expansion and its genetic consequences in populations of the giant toad, Bufo rnarinus. Evol. Biol. 23:4%84.
Eisen MM (1979)Mathematical Models in Cell Biology
Ammerman AJ, and Cavdli-Sforza LL (1971)Measurand cancer
Chemotherapy,~ ~ springer,
~ l i ~ :
ing the rate Of spread Of
in Europe' Man
Endler JA (1977)Geographic Variation, Speciation, and
Clines. Princeton, New Jersey: Princeton University
Ammerman AJ, and Cavalli-Sforza LL (1984)The Neopress.
lithic Transition and the Genetics of Populations in Feller (1940)On the logistic law of growth and its
Europe. Princeton, New Jersey: Princeton University
empirical verifications in biology, Ada Biotheor, 5:
Barbujani G (1987)Diversity of Some gene frequencies Gimbutas M (1979)The three
of KurKan people
in European and Asian populations. 111. Spatial correinto Old Europe, 4500-2500 B,C, Arch, Suisses Anlogram analysis. Ann. Hum. Genet. 51:345-353.
thropol. Gen. 43~113-137.
us about
(1991)What do languages
Gimbutas M (1986)Remarks on the ethnogenesis of the
human microevolution? Trends Ecol. Evol. 6:151-156.
Indo-Europeans in Europe. In
Bernhard and A
Barbujani G, Jacquez GM, and Ligi L (1990)Diversity of
Kandler-P&son (eds.): Ethnogenese Europaischer
Somegene frequencies in European and
Volker, Stuttgart: Gustav Fischer Verlag, pp. 5-20.
tions. V. Steep multilocus clines. Am. J. Hum. Genet. Guglielmino CR, Piazza A, Menozzi p, and Cavalli47:a67-875.
Sforza LL (1990)Uralic genes in Europe. Am. J . Phys.
Barbujani G, Nasidze IS, and Whitehead GN (1994)Ge83:57-68.
netic diversity in the Caucasus. Hum. Biol. 66:639Harding RM, and Sokal RR (1988)Classification of the
European language families by genetic distance. Proc.
Barbujani G, and Pilastro A (1993)Genetic evidence on
Natl, Acad. sci, u, s,A. 85,.937c9372,
origin and dispersal of human populations speaking
languages of the Nostratic macrofamily, pro,., Natl, Harpending HC (1994)Signature Of ancient population
growth in a low resolution mitochondria1 DNA misAcad. Sci. U. S. A. 90:4670-4673.
match distribution. Hum. Biol. 66:591-600.
Barbujani G, and Sokal RR (1990)Zones of sharp geHaMan FA (1973)On the mechanisms Of population
netic change in Europe are also language boundaries,
growth during the Neolithic. Curr. Anthropol. 14:
Proc. Natl. Acad. Sci. U. S. A. 87:1816-1819.
Barbujani G, and Sokal RR (1991)Genetic population
FA (lg81)
structure of Italy. XI. Physical and cultural barriers to
York: Academic Press.
gene flow. Am. J . Hum. Genet. 48:398-411.
Barker G (1985)Prehistoric Farming in Europe. Cam- Hedrick pw (1986)Genetic P b o V h i s m in heterogeneous environments: A decade later. Annu. Rev. Ecol.
bridge: Cambridge University Press.
'yst. 17t535-566.
Barker G (1988)Comment on "Archaeology and LanHedrick PW, Thomson G, and Klitz W (1986)Evolutionguage," by C. Renfrew. Cum. h t h r o p o l . 29:44H49.
system. In s Karary genetics: HJA as an
Bertranpetit J, and Cavalli-Sforza LL (1991)A genetic
lin and E Nevo (eds.): Evolutionary Processes and
reconstruction of the history of the population of the
Theory. Orlando, Florida: Academic Press, pp. 503Iberian peninsula. Ann. Hum. Genet. 5551-67.
Boileau MG, Hebert PDN, and Schwartz SS (1992)Nonequilibrium gene frequency divergence: Persistent Hudson RR (1990)Gene genealogies and the coalescent
Process. In FutuWa and J AntonovicS (eds.): OXfounder effects in natural populations. J. Evol. Biol.
ford Surveys in Evolutionary Biology, Vol. 7.Oxford,
U K Oxford University Press, pp.
Cavalli-Sforza LL (1988)The Basque population and
Jorde LB (1980)The genetic structure of subdivided huancient migrations in Europe. Munibe 6:12%137.
man Populations. A review. In JH Mieke and MH
Cavalli-Sforza LL, Menozzi p , and Piazza A (1993)
Crawford (eds.): Current Developments in Anthropo~~~i~
and human evolution, science 259:
logical Genetics, Theory and Methods. New York: Ple639446.
numt pp' 135-208'
Cavalli-Sforza LL, Minch E, and Mountain J L (1992)
Coevolution of genes and languages revisited, P ~ c . Kaiser M, and Shevoroshkin V (1988)Nostratic. Annu.
Rev, Anthropol. 17:30%329.
Natl. Acad. Sci. U. S. A. 89:5620-5624.
Cavalli-Sforza LL, and Piazza A (1993)Human genomic Keyfitz N (1977)Introduction to the Mathematics of
Populations with Revisions. Reading, UK: Addisondiversity in Europe: A summaw of recent research
and prospects for the future. Eur. J . Hum. Genet.
Kimura M, and Weiss GH (1964)The stepping stone
model of Population structure and the decrease of geCavalli-Sforza LL, Piazza A, Menozzi P, and Mountain J
netic correlation with distance. Genetics 49561-576.
(1988)Reconstruction of human evolution: Bringing
together genetic, archaeological and linguistic data. Ladle RJ (1992)Parasites and sex: Catching the red
Proc. Natl. Acad. Sci. U. S. A. 85:6002-6006.
queen. Trends Ecol. Evol. 7:405408.
Lewontin RC (1972)The apportionment of human diversity. Evol. Biol. 6:381-398.
Manly BFJ (1991) Randomization and Monte Carlo
Methods in Biology. London: Chapman and Hall.
Mantel N (1967) The detection of disease clustering and
a generalized regression approach. Cancer Res. 27:
Menozzi P, Piazza A, and Cavalli-Sfona LL (1978) Synthetic maps of human gene frequencies in Europeans.
Science 201:786-792.
Mourant AE, Kopec AC, and Domaniewska-Sobczak K
(1976) The Distribution of the Human Blood Groups
and Other Polymorphisms. Oxford Oxford University
Nei M (1987) Molecular Evolutionary Genetics. New
York Columbia University Press.
Piazza A (1993) Who are the Europeans? Science 260:
Piazza A, Cappello N, Olivetti E, and Rendine S (1988)
The Basques in Europe: A genetic analysis. Munibe
Press WH, Flannery BF, Teukolsky SA, and Vetterling
WT (1986) Numerical Recipes. Cambridge: Cambridge University Press.
Prevosti A, Ocana J , and Alonso G (1975) Distances
between populations of Drosophila suboscura based
on chromosome arrangement frequencies. Theor.
Appl. Genet. 45:231-241.
Rendine S, Piazza A, and Cavalli-Sforza LL (1986) Simulation and separation by principal components of
multiple demic expansions in Europe. Am. Nat. 128:
Renfrew C (1987) Archaeology and Language. London:
Jonathan Cape.
Renfrew C (1989) Models of change in language and
archaeology. Trans. Philol. SOC.87t103-155.
Renfrew C (1991) Before Babel: Speculations on the origins of linguistic diversity. Cambridge Arch. J . 1 : s
Renfrew C (1992) Archaeology, genetics and linguistic
diversity. Man N.S. 27:445478.
Rogers AR, and Jorde LB (1987) The effect of non-random migration on genetic differences between populations. Ann. Hum. Genet. 51:169-176.
Rogers AR, and Jorde LB (1995) Genetic evidence on
modern human origins. Hum. Biol., in press.
Ruhlen M (1987)A Guide to the World’s Languages. Vol.
1: Classification. London: Edward Arnold.
Sgaramella-Zonta L, and Cavalli-Sforza LL (1973) A
method for the detection of a demic cline. In NE Morton (ed.): Genetic Structure of Populations. Honolulu:
University of Hawaii Press, pp. 12%135.
Sherratt A (1988) Comment on -Archaeology and Language,” by C. Renfrew. Curr. Anthropol. 29:45-63.
Slatkin M (1985) Gene flow in natural populations.
Annu. Rev. Ecol. Syst. 16:393430.
Slatkin M (1987) Gene flow and the geographic structure of natural populations. Science 236:787-792.
Smouse PE, Long JC, and Sokal RR (1986) Multiple
regression and correlation extensions of the Mantel
test of matrix correspondence. Syst. Zool. 35:627432.
Sokal RR (1988) Genetic, geographic, and linguistic distances in Europe. Proc. Natl. Acad. Sci. U. S. A. 85:
Sokal RR (1991) Ancient movement patterns determine
modern genetic variances in Europe. Hum. Biol. 63:
Sokal RR, Harding RM, and Oden NL (1989a) Spatial
patterns of human gene frequencies in Europe. Am. J .
Phys. Anthropol. 80:267-294.
Sokal RR, Jacquez GM, Oden NL, DiGiovanni D, Falsetti AB, McGee E, and Thomson BA (1993) Genetic
relationships of European populations reflect their
ethnohistorical affinities. Am. J. Phys. Anthropol. 91:
Sokal RR, and Menozzi P (1982) Spatial autocorrelation
of HLA frequencies in Europe support demic diffusion
of early farmers. Am. Nat. 119:l-17.
Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J,
Thomson BA, Vaudor A, Harding RM, and Barbujani
G (1990) Genetics and language in European populations. Am. Nat. 135:157-175.
Sokal RR, Oden NL, Legendre P, Fortin M-J, Kim J , and
Vaudor A (1989b) Genetic differences among language families in Europe. Am. J . Phys. Anthropol.
Sokal RR, Oden NL, and Thomson BA (1988) Genetic
changes across language boundaries in Europe. Am.
J. Phys. Anthropol. 76:337-361.
Sokal RR, Oden NL,and Thomson BA (1992) Origins of
the Indo-Europeans: Genetic evidence. Proc. Natl.
Acad. Sci. U. S. A. 89:7669-7673.
Sokal RR, Oden NL, and Wilson C (1991) Genetic evidence for the spread of agriculture in Europe by demic
diffusion. Nature 351:143-145.
Sokal RR, and Rohlf FJ (1995) Biometry, 3rd ed. New
Bhatia K, and Wilson AC (1990)
Geographic variation in human mitochondrial DNA
from Papua New Guinea. Genetics 124:717-733.
Torroni A, Schurr TG, Yang C-C, Szathmary EJE,
Williams RC, Schanfield MS, Troup GA, Knowler WC,
Lawrence DN, Weiss KM, and Wallace DC (1992) Native American mitochondrial DNA analysis indicates
that the American and NaDene populations were
founded by two independent migrations. Genetics
Wade MJ, and McCauley DE (1988) Extinction and recolonization: Their effect on the genetic differentiation of populations. Evolution 42:995-1005.
Wallace DC, and Torroni A (1992) American Indian prehistory as written in the mitochondrial DNA A review. Hum. Biol. 64t403-416.
Ward RH, Frazier BL, Dew-Jager K, and Paabo S (1991)
Extensive mitochondrial diversity within a single Amerindian tribe. Proc. Natl. Acad. Sci. U. S. A. 88:872&
Ward RH, Redd A, Valencia D, Frazier BL, and Paabo S
(1993) Genetic and linguistic differentiation in the
Americas. Proc. Natl. Acad. Sci. U. S. A. 90: 1066%
Wijsman EM, and Cavalli-Sforza LL (1984) Migration
and genetic population structure with special reference to humans. Annu. Rev. Ecol. Syst. 15:279-301.
Wright S (1969) Evolution and the Genetics of Populations. Vol. 2. The Theory of Gene Frequencies. Chicago: University of Chicago Press.
Wright S (1978) Evolution and the Genetics of Populations. Vol. 4. Variability within and among Natural
Populations. Chicago: University of Chicago Press.
Zeven AC (1980) The spread of bread wheat over the old
world since the Neolithicum as indicated by its genotype for hybrid necrosis. J. d’Agric. Trad. Bota. Appl.
Zvelebil M, and Zvelebil KV (1988) Agricultural transition and Indo-European dispersal. Antiquity 62574583.
Без категории
Размер файла
2 012 Кб
test, simulation, hypothesis, origin, five, indo, computer, european
Пожаловаться на содержимое документа