close

Вход

Забыли?

вход по аккаунту

?

Three components of genetic drift in subdivided populations.

код для вставкиСкачать
AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 77:435-449 (1988)
Three Components of Genetic Drift in Subdivided Populations
ALAN R. ROGERS
Department OfAnthropology, UniuersLty of Utah, Salt Lake City, Utah 84112
KEY WORDS
Drift migration, Subdivided populations, Effective
population size
ABSTRACT
Wright’s metaphor of sampling is extended to consider three
components of genetic drift: those occurring before, during, and after migration. To the extent that drift at each stage behaves like a n independent random
sample, the order of events does not matter. When sampling is not random,
the order does matter, and the effect of population size is confounded with that
of mobility. The widely cited result that genetic differentiation of local groups
depends only on the product of group size and migration rate holds only when
nonrandom sampling does not occur prior to migration in the life cycle.
The term genetic drif? is used by evolutionists to refer to the effects of a variety of factors that cause stochastic changes in allele
frequencies within finite populations. Sewall
Wright (1931) is responsible both for the term
drift and for the metaphor of sampling that
is generally used to discuss it. The statistical
effect of drift is likened to that of drawing a
sample of gametes, independently and a t
random, from the hypothetical infinite population of gametes that the parents could have
produced. The ideal of independent random
sampling is seldom achieved in natural populations, and departures from it are accommodated by adjusting the effective population
size. This simple approach has been enormously successful. By absorbing many complexities into this single parameter, it has
allowed geneticists to keep their models
simple.
The value of this approach is reduced, however, when interest focuses on the complexities themselves. For example, I have been
interested in the effects of human patterns of
social behavior on the evolutionary forces affecting genetic population differences. It does
not seem possible to describe these effects by
adjusting the effective population size, or any
other single parameter. Accordingly, I introduce here a more detailed model of the action
of genetic drift in subdivided populations.
The identification of genetic drift with
sampling encourages us to regard it as a n
event that occurs a t a single point in the life
cycle, but this is not the case. For example,
a) mated adults are a sample of the reproductively mature adults, b) offspring genes are a
sample of the genes of mated adults, c) emigrants from each local group are a sample of
(4 1988 ALAN R LISS, INC
the individuals born there, and d) reproductively mature adults are a sample of the population at each earlier stage. Each of these
episodes of sampling results in genetic drift,
but previous models have generally failed to
consider them all, or to distinguish between
them.
A more detailed model will be useful for
several reasons. First, it is possible to estimate some of the components of genetic drift
even when insufficient data are available for
estimation of all components. Second, the
components of drift associated with different
episodes of sampling are of interest in their
own right. We may ask, for example, which
episodes of sampling have the largest effects,
and whether sampling that occurs before migration has the same effect as sampling after.
We may also wonder whether the effects of
sampling at different stages interact, enhancing or canceling each other.
In previous papers (Rogers and Harpending, 1986; Rogers, 19871, Henry Harpending
and I have developed models in which sampling precedes migration. In this work, however, we assumed that effective and actual
group sizes were equal. The present paper
extends that work by incorporating a detailed treatment of effective population size,
and by allowing sampling before, during, and
after migration. In the companion paper
(Rogers, 1988a), statistical methods are introduced for estimating the parameters of the
model from genetic data.
Received May 12,1988; revision accepted August 24, 1988.
436
A.R. ROGERS
birthplace of each individual. With such data,
As Rogers and Harpending (1986) show, all of the quantities above are readily estimost previous theories of evolution in subdi- mated. For convenience, all the notation used
vided populations assume a life cycle in in this paper is summarized in Appendix A.
which the numbers of individuals migrating
The life cycle
are very large. The life cycle begins with a n
The
life
cycle
assumed
in this paper is sumeffectively infinite group of newborns who
1.
The
first events are remarized
in
Table
then migrate. Following migration, densitydependent mortality reduces each group to production and density-dependent mortality,
some fixed number of reproductive adults. which reduce the initial infinite gamete pool
Genetic drift occurs just once in this life cycle, of group j to a population of NJ:, newborns.
during population regulation, and the effect Next, local migration redistributes these
of migration on allele frequencies is deter- newborns among local groups, leaving group
j with N+jjuveniles, and then a second round
ministic.
An alternative life cycle, in which mortal- of mortality occurs, leaving nj adults.
The ratio, g = Itj/N+j, is assumed equal in
ity precedes migration, seems more relevant
to humans and other species in which mor- all groups, and it is further assumed that
tality is relatively high among newborns, and migration does not change local population
low during the ages of migration and repro- sizes so that N+j = Nj+. Consequently, g is
duction. This life cycle has been studied by also equal to nilNi+, the ratio of numbers of
Rogers and Harpending (1986) and Rogers parents to numbers of newborn offspring.
(1987), who ignored the distinction between Note that g measures the relative imporactual and effective population sizes during tance of early mortality. When there is mortheoretical analysis, but then substituted ef- tality before migration but not after, N + j =
fective for actual sizes during data analysis. nj, which makes g = 1. On the other hand,
As we shall see, this procedure is unjustified. when mortality does not occur until after
The model developed here incorporates a migration, N+j+ 00. For finite nj, this implies
more general life cycle that includes mortal- that g+O. Thus, g + l when most mortality
ity both before migration and after. The two precedes migration, and g+O when it follows.
In the final stage of the life cycle, migrants
life cycles discussed above are thus special
cases of the one considered here. Before de- are exchanged with a n external “continent”
scribing it in detail, it will helpful to define with unchanging allele frequency 7 ~ . This
scheme, having two rounds of population resome terms and notation.
duction, is similar to one studied by NagyNotation
laki (1979).
There are three episodes of sampling in
Let us refer to the set of individuals that
were born in the ith local group and later this life cycle: first, during reproduction and
breed in the j t h as the Gth migrant set. The early mortality, second, during local migraiith migrant set thus consists of individuals tion, and third, during late mortality. Deviawho were born in group i and do not emi- tions from the ideal of random sampling
grate. When both the birthplace and adult during each episode are measured by three
residence of each individual in a genetic sample are known, it is a simple matter to estiTABLE 1: The life cycle
mate the allele frequencies of migrant sets.
Notation for group j
Let qv denote the frequency of allele A1
Group
Allele
within the i&h migrant set, and N . .the size Stage
Process
size
frequency
of this migrant set. The subscript^"+" and
a
“*’’ will denote summation and averaging, Gamete pool
PJ
Reproduction
respectively. For example, Ni+ = CjNij, the 1
and early
size of group i before local migration, qi. =
mortality
Ni;l Cj Nvqg, the allele frequency in group i Newborn
NJ*
qJLocal migration
before local migration, q.j = N<’CiNgq,j., 1
N+J
4.J
the allele frequency in group j after local Juvenile
5
Late mortality
migration, and so forth. In surveys of human Adult
genetic variation within subdivided popula- 1
External
migration
tions, it is routine to record not only the
n
P,
residence and genetic phenotype, but also the Adult
MODEL
GE:NETIC DRIFT IN SUBDIVIDED POPULATIONS
parameters, CYE,a ~and
, c r ~ each
,
of which is
defined by a n equation of the form
Sampling variance =
(1 + a) x CRandom sampling variance).
Let us now consider each stage in detail.
Reproduction and early mortality
The gamete pool of each group is a hypothetical infinite sample, drawn a t random
with replacement from the genes of the parents. Hence, its allele frequency exactly
equals, p,,the allele frequency of the parental population (The superposed tilde here and
elsewhere is used to distinguish quantities
referring to the parental generation.). Reproduction and early mortality together select a
possibly nonrandom sample of these gametes
to form the newborn stage of the life cycle. In
the ideal case of random sampling, this adds
a n increment to the allele frequency with
expectation zero and variance pJ1 -pJZN, +
(Wright, 1931). In the general case, where
sampling may not be random, it is conventional to express this variance as
where N$ is the variance effective size (Crow
and Denniston, 1988) of oup .i. The difference between Njt. and Nj+. is a measure of
the extent to which sampling at this stage
of the life cycle is nonrandom. The formulation to be used here is different. Let us rewrite Equation 1 as
437
grandparental as well as the parental generation (Crow and Denniston, 1988). Such formulae are not, to my knowledge, available
for subdivided populations with incompletely
isolated subdivisions, and would be unwieldy
in any event, depending on the migration
pattern as well as the mating systems of the
various groups. To avoid this complexity, I
introduce here a n expression for the inbreeding effective size that involves properties of
the parental generation only. There is no
magic here-the effect of the grandparental
generation is still present. Its effect has simply been absorbed into h, a quantity that is
defined below.
Unfortunately, the inbreeding effective size
has not been defined consistently in the literature of population genetics. Wright (1969,
pp. 211-212) defines it in terms of the rate at
which heterozygosity decays; Ewens (1979,p.
105)defines it in terms of the probability that
two distinct, random genes are copies of the
same parental gene; and Crow and Denniston (1988, p. 484) define it in terms of the
probability that two uniting gametes are
identical by descent a t some locus. Ewens’
usage is followed here.
Let us assign to each gene a genic value,
which equals unity if the gene is a copy of
allele Al, and zero otherwise. Let X j (where
j = 1, 2, . . ., 2N,+) denote the genic value of
the j t h gene among newborns in group i.
Then
\
where CYE= N,,lNs - 1. This formulation
decomposes the variance due to drift into a
“random sampling component,” and a “nonrandom” component that is proportional to
(YE, Thus, CYEmeasures the extent to which
sampling is nonrandom.
Before incorporating this equation into the
theoretical argument developed below, let us
consider how CYEcan be calculated, and what
its value means. It will be helpful to re-express it in terms of the inbreeding effective
size of the parental generation (defined below). In populations with sexual reproduction, formulae for the inbreeding effective
size generally involve properties of the
where Var {xjlp;) = pi(l-pJ, and v is the
average correlation between distinct newborn genes, or equivalently, the correlation
between two newborn genes drawn a t random without replacement.
Define the inbreeding effective size, r& as
the reciprocal of the probability that two random newborn genes, drawn without replacement, were derived from the same individual
in the previous generation. Two such genes
are equally likely to be copies of the same
438
A.R. ROGERS
gene or of distinct genes in their common
parent. Thus,
where fi is the correlation of two genes from
the same parent, and ci the correlation of
genes from different parents. Since these are
correlations of distinct parental genes relative to their own generation, they both equal
(2ni - 1)-' at Hardy-Weinberg equilibrium.
To re-express c; in terms of fi, note that the
correlation between two genes drawn at random with replacement from the parental
generation is zero, but can also be written as
o=-
2ni
Substitution into Equation 3 now leads to
Var{qj.
Ijjj}
=
ignoring terms of order (n{ni)-'. The inbreeding and variance effective sizes are thus
related by
in all groups. Comparing Equation 6 with
Equations 1and 2 shows that (YE = q / g .
These formulae are unusual in that they
express the inbreeding and variance effective
sizes of a sexual population without explicit
reference to the grandparental generation.
This is possible only because the effect of
inbreeding in that generation has been absorbed into the parameter fi. The advantage
of this approach is that it avoids the complexity introduced by immigration from other local groups. The disadvantage is that the
demographic parameters of the population
are not sufficient to specify the sampling
variance a t this stage of the life cycle-an
estimate of fi is also needed. I take fi as a
given and will not try to evaluate it here.
To clarify the meaning of these formulae,
let us apply them to a hypothetical sexually
reproducing population in which each local
group contains 5 male parents and 45 female
parents, who mate a t random to produce 100
offspring. The reciprocal of the inbreeding
effective size is the probability that two offspring genes drawn at random without replacement are derived from the same parent.
The two genes have probability 1/(2N;+ - 1)
= 1/199 of coming from the same offspring,
in which case they cannot have come from
the same parent. With probability 1 - 1/199,
they are from different offspring. Given that
they are from different offspring, they are
both from female parents with conditional
probability 1/4, and both from the same
female parent with conditional probability
1/(4 x 45). Similarly, the conditional probability that they are both from the same male
parent is 1/(4 x 51, since there are five male
parents. Thus, the probability that both
genes derive from the same parent is
1
1
--,=-xo
ni
199
to a close approximation.
Let a~ = (1 fi) (niln; - 1).Then
+
=
where a' depends both on non-random sampling during reproduction and early mortality, and on departures from Hardy-Weinberg
equilibrium in the parental generation. In
general, at may vary among the local groups,
but is assumed here to have the same value
0.0553,
and the inbreeding effective size is nt =
110.0553 = 18.091. a1 is now calculated as
(1 + fi)(nJnf - 1).Let us assume that the
parents are in Hardy-Weinberg equilibrium,
so that fi = -1/(2ni - 1)= -1199. This gives
a1 = (1- 1/99) x (50/18.091 - 1)= 1.746. In
general, the quantity 1 + a1 is approxi-
GENETIC DRIFT IN SUBDIVIDED POPULATIONS
mately equal to the ratio of the number of
parents in a local group to its inbreeding
effective size. In human populations, this is
often between two and five (Wood, 19871, so
CYIshould often fall between unity and four.
Local migration
The pattern of migration among K local
groups is described by the migration matrix,
M, whose ijth element is mu = N,/N+,, the
fraction of group j derived from group i each
generation. I assume that no subset of groups
is completely isolated from the others, and
that there are a t least a few individuals that
do not migrate in each local group. These
assumptions guarantee that M will have exactly one eigenvalue equal to unity, and that
the rest will be smaller in absolute value.
The eigenvalues of M are denoted by A,, and
are indexed such that 1 = XI 2 A, > . . . 2
. ,A
Since each migrant set is a sample of the
offspring born in that local group, drift also
occurs during local migration and adds to
each allele frequency a n increment with
expectation zero. The variance of this increment, and also its covariance with increments from different groups, will depend on
the statistical properties of the migration
pattern. In particular, if biological relatives
tend to migrate together, migration is said to
be kin-structured (Fix, 19781, and the sampling variance due to migration is increased.
To measure this effect, a model of kin structure is required; the one employed here is
based on that of Rogers (1987).
The simplest form of kin-structured migration (KSM) is that in which groups of individuals migrate as units, which are all of one
size, say y. These groups will often be families of some sort, and Rogers (1987) used the
term family to refer to them. This has led to
some confusion, since these groups need not
be families in the usual sense of the word.
Indeed, the theory also holds if these groups
are less related than random individuals.
Here, I use the more general term rnig.de,
which is defined by most dictionaries of natural history as a “unit of migration.” In the
present context, it refers to a group of individuals that migrate together, and independently of other migrules. Each migrant set
contains one or more migrules, and each migrule contains one or more individuals. The
mode1 used here is not a n accurate descrip-
439
tion of any real population because of the
unrealistic assumption that all migrules are
of the same size. Nonetheless, it will be useful a s a baseline against which real populations can be compared.
The task now before us is to find a measure
of the non-randomness of sampling during
migration. Ideally, this measure should be
independent of non-random sampling a t
other points in the life cycle, yet neither of
the existing formulations of kin-structured
migration (Fix, 1978; Rogers, 1987) achieves
this. To appreciate the problem, consider the
important special case of KSM in which siblings migrate together. In this case, each
migrule is the progeny of a single pair of
parents. Variation in progeny size will then
affect population differences in two distinct
ways. First, it affects the effective population
size, which influences the variance of qi.
about pi (Crow and Denniston, 1988).Second,
it affects the variance of migrules about qi..
Only the second of these effects is attributable to KSM, for the first would exist even if
migration were independent and a t random.
Rogers and Harpending (1986) and Rogers
(1987) sidestep this issue by assuming it
away-they assume effective and actual population sizes to be equal-thus precluding any
consideration of non-random sampling during reproduction and early mortality.
Fix (1978) also avoids the issue. He varies
the strength of kin structure by adjusting the
the variance of migrule (i.e., progeny) allele
frequencies about the parental allele frequency. Since each of his migrules is uncorrelated with the others, this adjustment
changes the variance effective size ( N s ) of
the cohort of newborns as well a s the strength
of kin structure. The formulation introduced
here makes it possible to separate these
effects.
Consider the statistical model
where .xirs is the allele frequency (0, 112, or 1)
of the sth individual in the rth migrule born
in group i, and where yir and tirs are migrule
and individual effects, respectively. Since the
numbers of individuals per migrule, and of
mimules Der birthdace are small, the varianie components associated with this model
must be derived using theory appropriate for
440
A.R. ROGERS
finite populations. The terms of this model
can be defined so that
c
tirs =
0, for all i and r, and
($1
(1 -
Y - 1
/
Cyi, = 0 , for all i.
Noting that the number of migrules from
D O U P i is N i + h , we can write the mean
squares within and between the migrules of
the ith birthplace as
respectively.
Ultimately, we will be interested in the
variances and covariances of the allele frequencies of local groups. These can be expressed in terms of the expected squares and
products of allele frequencies of individuals.
There are three cases to consider-the square
of a n individual’s allele frequency, the product between the allele frequencies of migrule
members, and the analogous product for individuals from different migrules. Searle and
Fawcett’s (1970)results can be used to justify
the following expressions:
where r f r’ and s z s’.
Some way is needed of summarizing the
extent to which migration is kin-structured.
Rogers (1987) proposed a measure, 0, that is
not easy to estimate with genetic data. Here,
I introduce another measure that is nearly
identical in value, but much easier to mani
= uty’ + u,
ulate and estimate. Let ut”
denote the variance of individual allele frequencies about qi.. If migrules were formed
by sampling a t random with replacement,
the variance of their allele frequencies would
be uin/y, since each is of size y. Under sampling without replacement, this variance
would become
by analogy with the variance of the hypergeometric distribution. Thus, we can define
, the extent to which sama measure, O ~ Mof
pling during migration is non-random by
In a n earlier paper (Rogers, 1987), I showed
,
y is the size of each
that 0 = (y - l ) ~where
migrule and K the correlation within migrules. This approximation was then used to
estimate 0 from behavioral data. The same
approximation can also be justified for the
new measure, (YM. Equation 10 implies that
This correlation is relative to the current
generation, rather than to the infinite founding stock referred to by the classical formulae of Wright (1969) and Malecot (1969).
However, it can be shown that these formulae are approximately correct for first degree
relatives if Ni+ > 100, and for second-degree
relatives if N i , > 1000 (Rogers, 1986). Thus,
if full sibs migrate together, K = 0.5 in all
but the smallest populations. By rearranging
Equation 13, it can be shown that
PF;
ignoring terms of order y/Ni+. Thus, if migrules are small compared to local groups, the
product (y - 1 ) provides
~
a useful approximation of C U M .This relation also holds approximately for 0 (Rogers, 19871, which shows
GENETIC DRIFT IN SUBDIVIDED POPULATIONS
that o l and
~
0 are nearly equivalent. This
relation also allows D M to be estimated from
information about migrule size and the correlations within migrules. If y and K vary
little between local groups, CYM will also be
invariant or nearly so. For simplicity, I will
assume that CYM has the same value in all
groups.
Late mortality
The final component of drift occurs during
late mortality, which removes a portion of
the population prior to breeding. This process
adds a n increment, qj, with mean zero and
with a variance that depends on the extent
to which sampling is nonrandom. Under random sampling, the variance is
441
The equation describing evolutionary change
Putting all this together, we have
p = (l-s)M*P
+
ST^
+ E,
(15)
where p is a column K-vector whose ith element is pi; 1, a column K-vector each element
of which is unity; 6 , a column K-vector of
deviations due to genetic drift. Equation 15
differs from Rogers and Harpending’s (1986)
Equation 15 only in the definition of t . For
the present model, the j t h entry of t is
=
(1 - ~ ) ( q . j - E(q.jlP)
+ $ (16)
Equations 15 and 16 summarize the evolutionary dynamics of the model, and will be
used to predict the equilibriumvalues of various measures of genetic variation.
MEASURING VARIATION AMONG GROUPS
by analogy with the variance of the hypergeometric distribution. Thus, we can define
a measure of olL of non-random sampling during late mortality by
Genetic variation among groups can be
measured in a variety of ways. The measure
adopted here is a variant of Wright’s F-statistics, and is defined in terms of the K x K
genetic correlation matrix, K,whose ijth entry is
Let w,= N + ,JN++ denote the relative size of
the ith group, and W a diagonal matrix whose
ith diagonal entry is w,.A useful summary
measure of genetic differentiation is
The covariances,E{qjqk), are zero f o r j # h.
K
ro =
c w,
rtr = tr{WR},
2=1
Systematic pressure
I assume that some form of systematic
pressure-mutation, migration, or weak selection-tends to move each local population
toward some intermediate allele frequency.
The strength of this force is measured by the
parameter s, which, for concreteness, is here
taken to be the fraction of each local group
replaced per generation by emigrants from
an external continent with unchanging allele frequency K. For simplicity, external migration is treated deterministically, although
local migration is stochastic. This simplification has little effect on the answers provided
that external migration is relatively weak.
(17)
where td.) denotes the trace (sum of the diagonal elements) of a matrix. The expectation of ro is denoted by p = E(r0). Note that
ro is defined in terms of variation about the
mean of the current generation, p.. My ro is
equivalent to one of the senses of Wright’s
(1951)Fst. See Rogers and Harpending (1986)
for further discussion of the relationship between these notations.
These quantities are defined in terms of
the allele frequencies of adults. We can also
define, for newborns,
442
A.R. ROGERS
R‘ = [ r ’ ~ ]f, 0
, = tr(WR’), and p’ = E { r ’ o } ,
where W’ is the diagonal matrix of relative
group sizes prior to migration. The results
presented below assume that migration does
not change the sizes of local groups, so that
W’ = W. As Rogers and Harpending (1986)
have emphasized, r-statistics for newborns
and adults can can differ greatly in magnitude.
It is often convenient to work with the
quantity
G=-,
P
1 - P
G can be interpreted as the ratio of expected
genic variance between groups to that within,
and is a parameter of considerable biological
interest. For example, it can be shown that
the effectiveness of Wright’s (1945) mechanism of intergroup selection is proportional
to G (Rogers, 1988b).
Rogers and Harpending (1986) show that
Equation 15 implies that, a t equilibrium between migration and genetic drift,
m
Here Lt is the tth power of the “reduced
migration matrix”, and is defined by L =
Mt(I - w l q , where I is the identity matrix.
w is a column vector whose ith entry is wL,
and C = E { E E ~ } / @
-p.))
. ( ~ is a matrix of normalized variances and covariances due to one
generation of genetic drift. To find the expectation of R under the current model, we use
Equation 16 to find C and then substitute
into Equation 18. This calculation is done in
Appendix B.
RESULTS
In the appendix are derived approximations for expectations at equilibrium of the
R-matrices of both adults and children, and
also for p and p ’ . It is shown that, for small s,
Non-random sampling
c
Random
sampling
Reproduction
andearly
mortality
DISCUSSION
h
c
I
-
- ---
G =
(1-s)’
Migration
Late
mortality
+ (I - ~rn,)a,+ Znt,gaM + (I -. gX1
where m, is the effective migration rate, defined by Rogers and Harpending (1986) as
one half the harmonic mean of ( 1 - (1 syh? : i = 2,3, . . .,K } . The first term in the
numerator refers to the effect of random sampling, the others to nonrandom sampling.
When sampling is random, these latter terms
vanish. The effect of nonrandom sampling is
subdivided into separate effects for reproduction and early mortality, migration, and late
mortality. The result of Rogers (1987) is a
special case of Equation 19 obtained by
setting g = 1, and cq = CUL= 0. Rogers
and Harpending’s (1986)results are obtained
by adding the additional restriction that
c4!M = 0.
Much has been made of the observation
that, in most models of migration and genetic drift, G depends on migration rate and
effective group size (say N,) only through
their product N,m, (Lewontin, 1974, p. 213).
Equation 19 shows that this result holds only
if the second and third terms in its numerator are nil, i.e., if a1 = 0 and either g = 0 or
CYM = 0, which requires that any sampling
occurring prior to migration be random. In
that case, Equation 19 could be reduced to
the traditional form, G = l/(N,m,), by a suitable redefinition of N , and m,. When nonrandom sampling occurs early, however, no such
simplification is possible. We might define N ,
as the ratio of n+K/(K - 1)to the numerator
of Equation 19. This, however, would make
N , a function of the migration rate, so it
would make little sense to claim that G depended only on the product of group size and
migration rate. Thus, it seems best to leave
Equation 19 as it stands. Any attempt to
collapse the numerator into a single parameter would obscure the complexity of the relationship between group size and migration
rates in subdivided populations. The effects
of nonrandom sampling depend both on when
in the life cycle it occurs, and on the level of
mobility among populations. These dependencies are summarized in Table 2, discussed
in detail below.
-sPaL
In+m&l(K - 1)
(19)
In general, the ratio, G, of expected between- to expected within-group variance is
influenced by three episodes of genetic drift
involving: a) reproduction and early mortality, b) migration, and c) mortality after migration. All three of these episodes can be
interpreted using Wright’s metaphor of sampling: newborns constitute a sample of the
443
GENETIC DRIFT IN SUBDIVIDED POPULATIONS
TABLE 2. Parameters with dominant influence on G as
determined by mobility and the timing of mortality
Mobility
Mortality
Early ( g = 1)
Late (g = 0)
Low
(me = 0 )
High
(me = 0.5)
“t
“M
a17 “ L
Q,L
hypothetical infinite ensemble of newborns
that their parents could have produced, migrants a sample of the individuals born in
their local group, and reproductive adults a
sample of the local group just after external
migration.
Random sampling
In the simplest case, when sampling is at
random in all three stages, the order of events
does not matter. This can be seen by setting
a1 = CUM = c r ~= 0, in Equation 19, and
noting that G is then independent of g, which
measures the relative importance of early
(premigratory) mortality. This implies that
sampling prior to migration is equivalent to
sampling during or after migration. It was
this equivalence that led Harpending and me
to conclude (Rogers and Harpending, 1986)
that the details of the life cycle had little
effect on the variance among groups of adult
allele frequencies. As is now clear, that conclusion was a n artifact of our assumption of
random sampling.
Nonrandom sampling
In real populations, the assumption of random sampling may fail, and the order of
events becomes important. Furthermore, the
effect of drift is confounded with that of mobility. Four extreme cases may be distinguished, as shown in Table 2.
Reproduction and early mortality. The parameter QI measures the departure from random sampling during reproduction and early
mortality, and is also influenced by departures from Hardy-Weinberg equilibrium in
the parental generation. Variation among
adults in reproductive success, and unequal
numbers of male and female parents, for example, both affect G through the parameter
q. The quantity 1 +
is approximately
equal to the ratio of the size of the adult
population to its inbreeding effective size.
Thus, Q I can be estimated using the wellknown formulae for effective popGIation size
(Crow and Denniston, 1988). In human populations, estimates of effective population
size are usually between 1/5 and 112 the actual size (Nei, 1969; Wood, 1987). These values correspond to values of q between unity
and four. As Table 2 indicates, this effect is
only important when mobility is low. In some
populations, group differences may be nearly
independent of the inbreeding effective group
size.
Migration. Nonrandom sampling during
migration may inflate (or deflate) the variances of migrant sets about the allele frequency of the natal group, a n effect that is
measured by the parameter a ~The
. form of
nonrandom sampling considered here is kinstructured migration (Fix, 1978), which occurs when relatives migrate together. As Table 2 indicates, this effect is most important
when mortality is early and mobility high.
Levin (1988) and Levin and Fix (in press)
suggest that kin-stuctured migration may
have a n important influence on the genetic
structure of plant populations, since some
plant migrules are fruits containing many
seeds. In some cases, however, the fraction of
plant migrules surviving late mortality ( g )
may be low. If so, the present model suggests
that kin-structured migration may have little direct effect.
I have shown previously (Rogers, 1987)how
behavioral data can be used to approximate
0, which is approximately equal to o l ~Those
.
results indicate that CXM = 0.53 for male
lions, and that aM = 3 if entire human sibships dispersed as units. These numbers
indicate that the potential effect of kin-structured migration is about a s large as that of
reproduction and early mortality. The companion paper (Rogers, 1988a)develops statistical methods for estimating a M from genetic
data,Application of this method to data from
the Aland Islands, Finland, shows that c x ~
is near zero there.
The values of CYM and a)[ may often be related. For example, in Fix’s (1978)model (discussed above) the variance among migrules
is inversely proportional to a parameter A,
and there is no mortality after migration. In
terms of the model introduced here, Fix’s
assumptions imply that g = 1,that 1 + a1 =
A -I, and that
1 - 1l2N
l+Ct,YM=
A - 1f2N
= 1+
CYI,
444
A.R. ROGERS
where N is the size of local groups. Thus, the
form of kin structure envisioned by Fix has
equal effects on the first two of the three
episodes of sampling studied here.
Lute mortality. Finally, nonrandom sampling during postmigratory mortality is measured by aL, and is important when mortality
is late, regardless of the level of mobility. To
my knowledge, this component of genetic
drift has never been measured, and its magnitude in natural populations is a mystery.
It is easy, however, to envision mechanisms
that could induce substantial values of a ~ .
For example, intense competition within
migrules might generate a negative correlation between the survivorship of' migrule
members, leading to a negative value of aL.
On the other hand, suppose that postmigratory mortality were entirely due to some fatal infectious disease, and that the rate of
transmission within migrules were much
higher than that between. This would induce
a positive correlation between migrule members that would make a~ a function of a ~In.
fact, if migrules survive or perish as units, it
can be shown that q = a ~This
. shows that
aL is potentially as large as aM-values in
the range from unity to three are reasonable
for all three as.
Male lions present an example of a case in
which the survivorship of migrule members
is likely to be positively correlated. Before
breeding, they must leave the pride in which
they were born and drive out the resident
male in some other pride. Lions that, disperse
in groups seem to have a better chance of
doing this than those that disperse individually (Bygott et al., 1979). Another example
is found in Levin and Fix's suggestion that
kin-structured migration may be important
in plants if the migrule is a fruit containing
several seeds. Levin (personal conimunication) has observed that survivorship of plant
migrule members may also be positively correlated in some cases. The fate of all the
seeds in a fruit, for example, may depend
largely on whether the fruit lands on a favorable or an unfavorable patch of habitat. On
the other hand, these seeds may also compete
with each other, inducing a negative correlation between the survivorship of migrule
members. The sign and magnitude of these
correlations are empirical questions whose
answers are as yet unknown.
CONCLUSIONS
In subdivided populations, the effect of genetic drift is more complicated than most
previous models have indicated. It depends
both on the timing of events in the life cycle,
and also on the level of mobility among local
groups. At each stage in the life cycle, the
sampling variance due to drift can be partitioned into a random sampling component
and a nonrandom component. The ratio, G,
of between- to within-group genetic variance
is insensitive to the timing of the random
component, but sensitive to that of the nonrandom component. Mobility reduces the importance of nonrandom sampling occurring
prior to migration, enhances the importance
of that occurring during migration, and has
no effect on that occurring after.
The component of genetic drift produced by
nonrandom sampling during migration is potentially about as large as that due to reproduction and early mortality. The magnitude
of the component due to nonrandom sampling during postmigratory mortality has
never been evaluated, but there is no obvious
reason to assume it to be negligible. Thus,
all three components may be important.
Most previous studies of the influence of
social and demographic factors on genetic
drift have concentrated on changes in the
effective population size. Yet this cannot account for all three of the episodes of sampling
that occur in subdivided populations. A comprehensive study of drift in a subdivided population would compare the statistical properties of sampling before, during, and after
migration.
ACKNOWLEDGMENTS
I thank J. Boster, E.A. Cashdan, J.F. Crow,
D. O'Rourke, and J.W. Wood for comments
on this paper.
Supported in part by NIH grant MGN 1
R29 GM39593-01.
LITERATURE CITED
Bygott GD, Bertram BCR, and Hanby JF' (1979) Male
lions in large coalitions gain reproductive advantages.
Nature 282839-841.
Crow JF and Denniston C (1988) Inbreeding and variance effective population numbers. Evolution 42:482495.
Ewens WJ (1979) Mathematical Population Genetics.
New York: Springer Verlag.
Fix A (1978) The role of kin-structured migration in
genetic microdifferentiation. Ann. Hum. Genet.
41:329-339.
Harpending HC, and Ward RH (1984) Chemical systematics and human populations. In MH Nitecki (ed.):
Biochemical Aspects of Evolutionary Biology. Chicago:
The University of Chicago Press, pp. 213-256.
Levin DA (1988)Stochastic elements in plant migration.
Am. Nat. (in press).
Levin DA, and Fix AG (1988) A model of kin migration
in plants. Theoretical and Applied Genetics (in press).
GENETIC DRIFT IN SUBDIVIDED POPULATIONS
Lewontin RC (1974) The Genetic Basis of Evolutionary
Change. New York: Columbia University Press.
Malecot G (1969) The Mathematics of Heredity. San
Francisco: Freeman.
Malecot G (1973) Isolation by distance. In NE Morton
(ed.) Genetic Structure of Populations. Honolulu: University of Hawaii Press, pp 72-75.
Nagylaki T (1979) The island model with stochastic migration. Genetics 92:163-176.
Nei M (1969) Effective size of human populations. Am.
J. Hum. Genet. 22694-696.
Rogers AR (1986) Correlations between relatives in small
populations. Am. J. Phys. Anthropol. 7Zr377-389.
Rogers, AR (1987) A model of kin-structured migration.
Evolution 41:417-426.
Rogers AR (1988a) Statistical analysis of the migration
component of genetic drift. Am. J. Phys. Anthropol.
77,451-457.
Rogers AR (1988b) Group selection by selective emigration: The effects of migration and kin structure. Am.
Nat. (submitted).
Rogers AR, and Harpending HC (1986) Migration and
genetic drift in human populations. Evolution 40:13121327.
Searle SR, and Fawcett RF (1970)Expected mean squares
in variance components models having finite popula!.ions. Biometrics 26:243-254.
Strang G (1976) Linear Algebra and Its Applications.
New York: Academic Press.
Wood JW (1987)The genetic demography of the Gainj of
Papua New Guinea. 2. Determinants of effective population size. Am. Nat. 229r165-187.
Wright S (1931) Evolution in Mendelian populations.
Genetics 26:97-159.
Wright S (1945)Tempo and mode in evolution, a critical
review. Ecology 26r415-419.
Wright S (1951) The genetical structure of populations.
Ann. Eugen. Z5:323-354.
Wright S (1969) Evolution and the Genetics of Populations. Volume 2. The Theory of Gene Frequencies. Chicago: University of Chicago Press.
APPENDIX A: NOTATION
This appendix summarizes all notation appearing above, but not that in the appendices
below. Primes (as in r’) are used to distinguish quantities referring to newborns from
those referring to adults, and tildes (as in pi)
to distinguish quantities referring to the previous rather than the present generation.
1
a column vector each element of which is
unity;
aE = N,+/Ny+ = qlg, a measure of nonrandom
sampling during reproduction and early
mortality;
a,r= (1 + f i ~ n ~ n fI), an alternate measure of
nonrandom sampling during reproduction;
aL,= a measure of nonrandom sampling during
late mortality;
aM = a measure of nonrandom sampling during
migration;
t = a vector of deviations due to genetic drift;
y = the size of each migrule;
K = the genetlc correlation of individuals within
migrules;
A, = the ith eigenvalue of the migration matrix,
where 1 = X1 > Xz >. ’ . 2 h ~ ;
q = vector of increments t o allele frequencies
produced by late mortality;
=
P = Ejro};
= the allele frequency
445
of the external
continent, which is assumed not to change;
tLrS
= deviation of the sth individual in the rth
migrule from his migrule allele frequency;
ci = correlation between two alleles, drawn at
random from distinct individuals of the
parental generation of group i;
C = E{ft*}/(p.(l - p.)), a matrix of normalized
variances and covariances due to one
generation of genetic drift;
fi = correlation of two genes from the same
parent relative to the parental generation;
g = n J N , + , the ratio of number of parents to
number of newborns in group i; because of
the assumption that N,+ = N + i ,g is also the
fraction of group i that survives late
population regulation;
G = p/(l - p), the ratio of expected variance
between groups to that within;
I :
the identity matrix;
K = the number of groups;
Lt = Mt(I - wl‘), the tth power of the “reduced
migration matrix”;
rnd = NLJIN+J7
the fraction of group j derived from
group 1. each generation;
M = Lm,],.
the migration matrix;
rzj
the size of group j after late mortality;
n, - the inbreeding effective size of group i;
NV = the size of the ijth migrant set;
N,+ = FINc,,the size of group i before local
migration;
N , , = C,N,,,the size of groupj after local
migration;
N!:
. . = the variance effective size of group i ;
pi = the frequency of allele A l at the reproductive
stage in group j ;
9.] = N;,’C,N,J9,J, the allele frequency in group j
after local migration;
y , . = N;tC,N,q,, the allele frequency in group i
before local migration;
q g = the frequency of allele A l within the ijth
migrant set;
ril = (p, - p.)l(p, - p.Y(p.(l - p.)), the genetic
correlation between groups i and j ;
K = [rtJ],the genetic correlation matrix;
ro = CLrLiwL,
a measure of group differentiation
that is equivalent to one of the senses of
Wright’s Fst;
s = the strength of systematic pressuremutation, migration, or weak selectiontending to move each local population
toward some intermediate allele frequency
(for concreteness, s is here taken to be the
fraction of each local group replaced per
generation by emigrants from a n external
continent);
vf’ = mean square of individual effects within
group 2;
v(-v’= mean square of migrule effects within
group i;
w,= N,+IN+ +, the relative size of the ith local
group;
w = a column vector of relative population sizes;
W = a diagonal matrix of relative population
sizes;
xJ = the genic value (0 or 1)of the j t h gene among
newborns in group i;
nlrp= allele frequency (0, %, or 1)of the sth
individual in the rth migrule in the ith local
group; and
ycr = deviation of the rth migrule in the ith local
group from the allele frequency of that local
group.
?r
7
446
A.R. ROGERS
APPENDIX B: DERIVATION OF CENTRAL
RESULTS
This appendix derives formulae describing
the amount and pattern of genetic variance
expected among a set of K local groups at
equilibrium between the effects of migration
and genetic drift. Notation and assumptions
are given above. To find the equilibrium values of E { R } and p , we must find C under the
present model and substitute into Equation
18. By definition, C = E { E E ~ ) / ( P-, (p~J . In
view of Equation 16, this is
C =
(1 - d2
(Z
p.(l - p.)
+ Y),
members, and NG(NG - y) terms for members of different migrules. Conditioning on
qi. and substituting Equations 9, 21, and 22
into 23 produces
where
(20)
I -1,
otherwise.
where Y = [E{qiqj}] is a matrix of variances and covariances due to the late morThe matrix Z
tality component of genetic drift, and Z =
Z
contains
the
expected squares and cross[El Cou{q. ',q.k ( p } ]is the corresponding matrix for the early mortality and migration products of
components.
Expected squares and products of allele
frequencies of individuals and migrant sets
Calculation of Z will require expressions The j k t h entry of Z is thus
for the squares and products of allele freK
quencies of individuals and migrant sets.
These are derived from Equations 9, 10, and
zjk =
mijmikCou{qij,qik(fii}, (25)
i= 1
11. The latter two equations can be rewritten, using Equation 12, as
where cou{ qLJ,qikI pi} is the conditional covariance of mimant set allele freauencies
given the parencal allele frequency. &me the
effects of drift in different groups are un, (21) correlated, terms such as cov{q,j.qi,k}(where
i # i') are zero, and do not appear in Equation 25.
- vim (1 + 'YM)
The conditional covariances above can be
.
(22) decomposed into components due to early
Ni+ - 1
mortality and migration:
We will also need the expectations of products of the form qGqik. Each of these is itself
a sum of products:
where ydu is the allele frequency (O,?h,
or 1)
of the uth individual in the ijth migrant set.
I f j # k , all the terms in this sum are for
individuals of different migrules, and their
expectation is given by Equation 22 On the
other hand, if j = k , there are NG squared
terms, NG(y - 1)terms for pairs of migrule
The term on the far right is the early mortality component of genetic drift, and is given
by Equation 6. The other term is the migration component of drift, and is obtained from
Equation 24 by taking the conditional expectation of vtn, given pi.
If mating is random within groups, then
u p = qi.(l - qi.)/2, and its conditional expectation is
447
GENETIC DRIFT IN SUBDIVIDED POPULATIONS
Substituting into Equation 26 yields
The matrix Y
Y contains the variances and covariances
produced by late mortality, i.e., Yik =
E{yjvk}. Since mortality affects allele frequencies in different groups independently,
the off-diagonal entries of Y are zero. The
diagonal entries are given by Equation 14,
which can be written as
Here I introduce the approximation pi(1 pi) = p.(l - p.)(l - ro), i = 1,...5.This
approximation has often been used in similar
models (Malecot, 1973; Harpending and
Ward, 1984) and its accuracy has been verified by Rogers and Harpending (1986). With
this approximation, Equation 25 becomes
= IjJl
-
p.X1
-
For simplicity, I will a) take p.(l - p.) (1 ro) as a crude approximation for q.j(l - qOj),
and b) assume that late mortality does not
change relative group sizes so that njln. =
N,j/N, + = wj.These simplifications provide
Y
=
p.(l - p.1cw-1,
(30)
where
ro)
The matrix C
where O(N<%)
contains high-order terms, and
Substituting Equations 28 and 30 into
will be ignored.
Equation 20, and then Equation 20 into 18
F o r j = k , this is equal to
produces
=:
JJ
@.(l- p.x1 - ro)
E{R)
2N++
K
1=1
=
(1 - d2~
L ~ R
+ (L
b + dx),
(31)
where
w,
where wJ = N+jlN++is the relative size of
the ith local group. F o r j # 12,
x
These last two formulae can be written in where is a diagonal matrix containingihe
eigenvalues of L, which are denoted by A;, i
matrix notations as
= 1,2,...,K. Theorem 4 (Appendix C) shows
(28) that these eigenvalueg are identical with
z -- p.(i - p.) ( a ~ W -+ lhw--I),
~
those of M except that hl = 0, whereas A1 =
1. The second line above is justified by Theowhere
448
A.R. ROGERS
rem 5 (Appendix C ) . Using the formula for the distinction between p . and p.. The last
the sum of a geometric series, this becomes
line is justified by Theorem 3 in Appendix C.
Since ''0 = tr{WR'}(see Eq. 171, its expecX = VDVT,
(32) tation is
where D is a diagonal mairix whose ith di19 + crz)(l - PI
agonal element is zero if X i = 0, and (1 p ' = p +
tr{W
(1 - sf%&'
otherwise.
2n +
The expression that results from substitut( g + az)(1 - P )
ing Equation 32 into Equation 31 can be
= p +
,
(35)
2;
simplified using the fact that V% = xVT,
which holds because V T contains the left eiwhere fi = n/(K - 1). The second line is
genvectors of L. This yields
justified by part e of Theorem 3 (see AppenE{R} = VBVT,
(33) dix C). Rearranging this expression leads to
where B is a diagonal matrix whose_ith diagonal entry is zero if i = 1(i.e., if X i = O),
and
B,,
(1 =
px1 -
SY
1 - (1 - s)2h,2
.( 1+
QM
+ (aiE
-
a.w)hL2 (1
+
2N++
- (1 - px1
+ ar,x1-
- SY
2n+
g)
(34)
APPENDIX C: MATRIX THEOREMS
This appendix derives several results concerning the matrices discussed above. Most
+ @M)1 - (1 - s y h f
of them were first derived (to my knowledge)
by Harpending and Ward (1984) or Rogers
The approximation here is crude unless s 4 and Harpending (1986).
Xz. Rogers and Harpending also show that p
Let A = [Nij/N++]denote a matrix whose
is equal to the sum of the diagonal entries of yth entry is the fraction of the total populaB. Summing Equation 34 over i and dividing tion that moves from group i to group j each
by 1 - p leads to Equation 19.
generation. The migration matrix is related
to A by M = AW-I, where W is a diagonal
matrix of relative population sizes.
2n+
. ( g + ark? + (1
-
gx1 + ar,)
Newborns
The results just derived refer to allele freTheorem 1
quencies of "adults," i.e., of individuals after
If the migration pattern is symmetric, i.e.,
migration and late mortality. This section if A = AT, then
extends those results to cover allele frequencies of "newborns," i.e., of individuals after
a. The migration matrix can be written in
early mortality but prior to migration. Let diagonal form as M =UAVT, where the colR' denote the matrix of normalized genetic umns of U and V contain, respectively, the
covariances among the group allele frequen- right and left eigenvectors of M, and where
cies of newborns. Rogers and Harpending A is a diagonal matrix containing the eigen(1986)show that
values of M;
b. U = W%S, where S is an orthogonal
E{R'} =
matrix, i.e., where SST = I.
c. v = w-%.
- E{Rj
Proof
Since A is symmetric, so is the matrix
= E ( R } + (g + ad(1 - P ) W-'H.
X = WPMAW-",and the spectral theorem
2n+
(Strang, 1976) ensures that it has a diagonal
form X = SAST, where S is orthogonal and A
where D i u g { . ) is a diagonal matrix whose diagonal. Now M = W%XW-", which is
jth diagonal entry is the quantity within equal to UAV', as claimed. Since U is the
braces. This expression uses the same ap- inverse of V T , and A is diagonal, this must
proximation as Equation 25 and also ignores be the diagonal form of M.
+
(g
+
ad('
2n+
- P)
Hw-IH,
GENETIC DRIFT IN SUBDIVIDED POPULATIONS
449
Theorem 2
Let 1 denote a column K-vector with each
entry equal to unity. If the migration matrix
is symmetric so that A = AT, then
uj, respectively, the Jth row and column vectors of M. Since M is a stochastic matrix, its
leading eigenvalue is unity. If this eigenvalue is unique (i.e., has multiplicity one),
then
a. A1 = AT1 = w; i.e., relative population
sizes may be obtained by summing either the
rows or the columns of A.
b. Mw = w; i.e., w is a right eigenvector of
M associated with a n eigenvalue of unity.
c. 1% = lT;i.e., lTis the corresponding
left eigenvector of M.
a. Liw = 0; i.e., w is a right eigenvector of
L' with eigenvalue zero;
b. 1TL' = 0; i.e., lTis a left eigenvector of
L' with eigenvalue zero:
c. For j > 1, L'uj = M'uj = Xjiuj.In other
Proof
a, The sum of the rows of A are the relative
population sizes before migration, while the
row sums of AT give the relative sizes after
migration. The symmetry assumption ensures that these sums are the same.
b. Rewriting M as AW-' transforms proposition b into AW-lw = A1 = w, because of
proposition a.
c. Making the same substitution in proposition c gives lTAW-l = wTW-' = lT,which
proves proposition c.
Theorem 3
Let I denote the identity matrix, w a Kvector of relative population sizes, and 1 a
K-vector each element of which is equal to
unity. Then the matrix H = I - w l T has the
following properties:
a. H2 = H ; i.e., H is idempotent.
b. W H W - ' = H.
c. HW-' = (HWP1jT; i.e., H W - l is
symmetric.
d. H W - l H = W-'H.
e. tr(Hj = K - 1, where tr( ) denotes the
trace, or summed diagonal elements of its
argument.
a. H
'
H.
=
Proof
I - 2wlT + wlTwlT = I - w l T =
words, the second and higher e1,genvalues
of M1 are also eigenvalues of L1, and the
associated right eigenvectors are the same
for both matrices.
d. For j > i, vfLi = uJ%i = A j u j In other
words, proposition c also holds for the left
eigenvectors.
e. L' has the same eigenvalues and eigenvectors a s MLexcept that the eigenvalue associated with the dyad w l T is unity in M',
but zero in L'.
Proof
a. The definition of Li implies that Liw =
M'w - wlTw. But lTw = 1,since the entries
of w sum to unity, and M'w = w, since w is
a right eigenvector of M with eigenvalue
unity. Thus, L'w = w - w = 0, as claimed.
b. The proof of proposition b parallels that
of pro osition a, and follows from the fact
that 1 is a left eigenvalue of M with eigenvalue unity.
c. Since uj is a right eigenvector of M, it
follows that Muj = Ajuj. If, in addition, we
suppose that uj is not proportional to w, then
the orthogonality of mismatched left and
right eigenvectors ensures that lTuj = 0.
Therefore, Lyu = M'uj - wl'uj = Xjuj, as
claimed.
d. Proposition d is proved in the same manner, using the fact that, if uj is not proportional to lT,then uyw = 0.
e. This is merely a restatement of propositions (a) through (dj.
F
b. The result is obtained by substituting
WHT = W - wwT into the left side of proposition b.
c. Premultiply proposition b by W - l and
then transpose.
d. Proposition c implies that H W - l H =
W-lH", from which d follows because of a.
e. The trace of H is equal to t4I) - t d w l q ,
where the first of these traces equals K , and
the second unity.
Theorem 5
VXziVT, where is a diagonal matrix containing the eigenvalues of I,.
(Li) n(Y-'Lk
=
Proof
Theorem 4 allows Li to be written as
W"ShLVT.Thus,
cv;iisTw'/")w-1~'/'s;iiv~.
( ~ i ) ~ - =
l ~ i
Theorem 4
The theorem follows from the observation
Let L' denote the matrix M'H, for i = 0,1,...,
and X, the ith eigenvalue of M, indexed so that SW"W-'W'/"S = 1, since S is a n orthat A 1 > X, 2 ... 2 . ,X Denote by vyand thogonal matrix.
Документ
Категория
Без категории
Просмотров
5
Размер файла
1 256 Кб
Теги
population, drift, components, subdivided, three, genetics
1/--страниц
Пожаловаться на содержимое документа