Heterozygosity between populations - a possible alternative to measures of genetic distance

It is shown that the heterozygosity between two populations defined as the expected proportion of heterozygotes in their cross can be used as a complement to measures of genetic distance. This new measure has favourable mathematical properties (fülfils the triangulär inequality) and can be well interpreted from the biological point of view. Its main importance will be in comparing (potential) crosses amongst each other.


Introduction
Several measures were proposed for calculating the genetic distance between two populations (GREGORIUS 1974(GREGORIUS , 1984;;NEI, 1972;PREVOSTI et al, 1975;ROGERS, 1972).These measures were derived from different approaches.NEI's Standard genetic distance (NEI, 1972) is based on a hypothesis of the evolution process.He assumes a linear relationship between the genetic distance and the time in evolution the two genotypes were separated from each other.GREGORIUS (1974,1984) stated a set of conditions to which a distance measure must comply and derived his distance measure from this basis.The same distance measure was used by PREVOSTI et al. (1975).From the animal breeders' point of view, a measure between two populations related to the expected proportion of heterozygotic individuals in the resulting cross, is appealing.In the following text it will be shown that the heterozygosity between populations is such a measure.

Definition ofthe Heterozygosity Between Populations
Letp.", andpn, be the frequency of they'th allele at the /th locus in populations X and Y, respectively.Then, when crossing populations X and Y by mating at random, the expected frequency of homozygotes at locus i, g m , is: where n, is the number of alleles at locus /'.The expected frequency of heterozygotes at locus i in the cross, h m , is then: This quantity is formally similar to the heterozygosity in a population (for X=Y).We call it "heterozygosity between populations at locus i" therefore.This measure has similar properties as a distance measure.It can be well interpreted from the biological point of view and has favourable properties from the mathematical point of view.The heterozygosity between populations at locus /' can be estimated by replacing the allele frequencies in [1] by its estimates.The variance of the heterozygosity between populations is (NEI and ROYCHOUDHURY, 1974): HWrA H J=I J with m x and m r being the number of individuals from populations X and Y, respectively.For a given set of loci, an average heterozygosity between populations (H") can be defined.Its estimate can be calculated simply as arithmetic mean of the estimates of the heterozygosities at the individual loci: with the variance where r is number of loci in this set.
Properties of the Heterozygosity Between Populations First consider the special case of two alleles at locus /' .Putting for simplicity p =p i " and q =p r ,,, the equation for the heterozygosity between population X and Y reduces to hx» = P + <7 -2pq Forp -constant, hm is a linear function of q only.For/? = 0.5, h m is independent of q and takes the value 0.5.h m takes its minimal value of zero for/?= q = 0 and/?= q = \.The maximal value of 1 is reached for the combination /? = 0 and q = 1 or vice versa (/? = 1 and q = 0).In the general case of w, alleles at the /th locus, «,-being any positive integer, the lower bound of hm is zero and its upper bound is unity.The lower bound is reached if and only if in both populations the same allele occurs with the frequency one (i.e.all other alleles have the frequency zero in both populations).The upper bound is reached if and only if both populations do not have any allele in common.The measure h m fulftls the triangulär inequality: hxzi -hxYi + hyzt > where X, Y and Z are three populations and the h's are the heterozygosities referring to the appropriate pairs of populations.The proof is given in the Appendix.If all allele frequencies at locus / in the population X are equal, i.e. /?. , •" = \ln, for all j, then equation [1] simplifies to 1 "' / hxn = 1 iZPrij^1 «, /=/ m as the sum is unity.In that case hm is independent of the allele frequencies in the population Y. Another interesting case is when in population X only one allele, sayy'*, is present, i.e. /?.",.= 1.Then equation [1] reduces to hxn -I -p w That means, h», depends only on the frequency of the same allele in population Y and is independent on the remaining allele frequencies in Y.For completeness it should be added that h", = h, x " so that X and Y can be exchanged without changing the result.Discussion GREGORIUS (1974) stated four conditions for a measure to be a distance measure: (i) it takes only nonnegative values, (ii) it is Symmetrie (the distance between A and B is the same as the distance between B and A), (iii) it takes the value zero if and only if both populations are identical, (iv) it fulftls the triangulär inequality.The heterozygosity between populations meets conditions (i), (ii) and (iv), but not (iii).Therefore this measure has some similarity to a distance measure, but is no distance measure in the sense of the above defmition.The explanation for the divergence from a distance measure with respect to condition (iii) is given below.To illustrate the differences between several measures of genetic distance and the heterozygosity between populations, the Table gives numeric values for some basic situations.Only one locus is considered.All animals in the hypothetic populations X and Y are assumed to be identical or both populations are assumed to consist of one animal each.Capital letters A, B, C and D designate different alleles.The numeric values ofthe new measure given by equation [1] are most similar to the values of GREGORIUS' distance (GREGORIUS, 1974(GREGORIUS, , 1984)).Similarly as in NEI's Standard genetic distance (NEI, 1972), the situations AA AB and AB BC are discriminated against by the new measure.As already stated above (divergence from condition (iii) ofa distance measure), the heterozygosity between populations may differ from zero, though the genotype and the allele frequencies in both populations will be equal (Situation AB AB in the Table ).At first glance, it seems to be illogical that a value greater than zero is calculated although the two populations X and Y are identical from a genetic point of view.But when comparing X and Y on the basis of alleles and not genotypes, half ofthe comparisons will yield equal alleles (A x -A Y , B x -B v , read A x as "allele A from population X" etc.) and half of the comparisons unequal alleles (A x -B Y , A v -B x ).This is the way the new measure is defined and this is the basic difference from the distance measures.The heterozygosity between populations is defined in respect to what will happen when the populations are crossed amongst each other and contains implicitly a dynamic aspect., 1972) is defined as When assuming Hardy-Weinberg equilibrium in the populations X and Y this defmition can be rewritten to where h x , and h" are the heterozygosities within the populations X and Y, respectively, calculated under the above assumption.NEI's minimum genetic distance can therefore be interpreted as an increase in heterozygosity when crossing populations X and Y which are in Hardy-Weinberg equilibrium.This measure may be useful when comparing crosses with purebred populations, but for comparing crosses among each other the uncorrected heterozygosity between populations as defined in equation [1] should be preferred.ROGERS' distance (ROGERS, 1972) is the square root from NEI's minimum genetic distance and therefore related to the distance measure [1] in a similar way.
As in studies with microsatellites all alleles from the given set of loci under consideration can be identified in general, the estimates of heterozygosities within the populations can be calculated by counting the number of heterozygotes and relating it to the overall number of animals.Therefore NEI's minimal genetic distance could be modified in such a way that h xi and h" are replaced by the heterozygosities calculated in the more direct way by counting heterozygotes and not from allele frequencies assuming Hardy-Weinberg equilibrium.This might yield more precise estimates ofthe increase ofthe proportion of heterozygotes in potential crosses, but on the other hand, could yield negative estimates of genetic distances, when the allele distribution in populations X and/or Y is far from Hardy-Weinberg equilibrium.
The heterozygosity between populations should be mainly used when the degree of heterozygosity of potential crosses is of interest.For other purposes such as clustering of genotypes with no respect to their future use in crossbreeding programs the absolute genetic distance of GREGORIUS (1984) may be more suitable and for investigations related to the evolutionary process NEI's Standard genetic distance (NEI, 1972) will be the method of choice.NEI's minimal genetic distance (NEI, 1972) and ROGERS 1 distance (ROGERS, 1972) have the unfavourable property that, in certain situations, they do not give maximal values, even though both populations do not share common alleles (combination AB CD in Table 1).They should therefore be used with care.
Assume now that/?,-< qj and rj > qj.Then A/*y = q/l-Pj) + rj(Pj-qj) = q/l-rj) + Pj(rj-qj) >0 .The last case to consider is pj < qj and r> < qj.It is: Ahj = q/J-pj) + r/J-qj) -rj(l-Pj) = (q r r } )(l-Pj) + rj(l-qj) > 0 .Herewith it has been shown that the triangulär inequality is valid for any allele frequencies.Because of the additivity of the heterozygosity between populations in respect to the loci the triangulär inequality holds not only for the heterozygosity at a given locus, but for the average heterozygosity as well.