Effect of unfavourable population structure on estimates of heritability , systematic effects and breeding values

Empirical estimations of heritability, systematic effects and predictions of sires’ breeding values (BVs) were obtained under various population structures for simulated populations consisted of n=400 animals in 5 herds for a trait of medium heritability (h=0.30). An infinitesimal additive genetic animal model was assumed while simulating data. Population structure was varied to allow for good and poor connectedness across herds and (non)random association between the genetic and the environmental effects. The impact of the various population structures on the parameter estimation(s) was assessed using Mean Squared Error (MSE) and Pearson’s correlations. Allowing sires to have progenies in more than one herd (good herd connectedness) and random use of sires across herds generally resulted in good parameter estimations. Poor connectedness significantly affected herd effects estimation and BV prediction but not heritability estimation as long as random usage of sires across environments was guaranteed. Selective use of the best sires in the best herds along with poor connectedness resulted in poorest estimations of all parameters examined. In the latter case, heritability was seriously underestimated (h=0.06) while highest error, lowest accuracies for the BVs and a remarkable underestimation of the genetic gain were observed. Use of reference sires on a natural mating basis to create genetic links between herds has served a good solution for both heritability and BVs estimation under unfavorable structure. Mating 0.25 of the herd ewes with reference sires resulted in a heritability estimate close to the simulated one. Significantly better estimates of systematic effects and BVs were, however, obtained when 0.5 of the herd ewes were mated by reference sires.

Schlüsselwörter: Heritabilität, Schätzung, Fixe Effekte, Zuchtwerte, Verbundenheit, Populationsstruktur Introduction Sheep and goat as well as beef cattle herds are traditionally, small, closed populations of limited number of animals.In such populations, artificial insemination (AI) is practically non-existent, increasing their regional isolation and the potential of genetic drift.Moreover, animals' performance is recorded across highly diverse environments with poor data quality due to recording errors, misclassification or manipulation of contemporary groups (e. g. herds) and non-random use, selection and mating of breeding animals (SCHAEFFER et al., 1998).Such data are less favourable from an estimation point of view, because of unbalancedness, low connectedness and dependence of the genetic and the environmental structure.Although the effect of data structure in the context of genetic parameters estimation (CLEMENT et al., 2001) and genetic evaluation (HANOCQ et al., 1996;TONG et al., 1980) has been studied, the supremacy of REML and BLUP in analysis of data with unfavourable data structure in small populations still needs to be verified.Aim of the present study was thus the estimation of heritability, systematic effects and breeding values (BVs) under unfavourable population structures by means of stochastic simulations.Population structure was varied to allow for good and poor connectedness and/ or (non)random association between the random and the systematic effects.The effect of creating genetic links between herds through usage of reference sires was also investigated and quantified.

Data simulation
Simulated data were generated using a stochastic procedure that is commonly used (e.g.HARDER et al., 2005).The genetic model assumed a large number of unlinked loci contributing to the genetic variance of a single hypothetical metric trait.The base population consisted of 20 males and 400 females, which were assumed to be unrelated, unselected and randomly sampled from a conceptually infinite population.The base animals were mated at random (20 females per male) to produce 400 progeny.Each progeny record was simulated including the animals' breeding value, a population mean, a herd effect and the residual term.The model for simulation data was thus as follows: where y ij =the phenotypic observation of animal j in herd i, μ=the population mean (μ=100), h i =the herd effect i (i=1,…,5), a ij =the additive breeding value of animal j (i=1,…,400) in herd i, and e ij =the random residual term.
Values for e ij were independently drawn from a normal distribution with mean zero and variance 2 + , where a sj and a dj are the breeding values of the sire (s) and dam (d) of individual j, and m ij is the Mendelian sampling term of individual j.The Mendelian sampling term was drawn from a 2 (0,0.5 ) a N σ and it was assumed to be independent of the breeding values of the sire and dam.Herd effects were drawn from a uniform distribution ranging from −30 to +30, assuming five levels: +30, +20, 0, −20 and −30, respectively.Variances, i. e. 2 a σ and 2 e σ were set to 30 and 70, respectively, to reflect a trait of medium heritability (h 2 =0.30) such as many production traits (e.g.milk yield) in dairy sheep and goats.One hundred repeats of simulating records were created to describe the randomness in the system.Each repeat included 400 animals with phenotypic records plus 420 base population animals without records.Herd size and structure(s) were chosen to reflect a structure commonly observed in small sheep and goat (and in some beef cattle) populations where natural service predominates.Population structure(s) Initially, there were three population structures simulated.In all these structures, the number of daughters per sire was 20 and the number of animals per herd was 80. Across the three structures, daughters per sire were either randomly or non-randomly assigned to herd levels.In the first structure, sires were assigned randomly across the five herd levels, each sire having equal (n=4) number of daughters in each of the five herd levels.Random use of sires and herd connectedness was thus guaranteed in this case (Figure).This structure will be referred as the balanced design (BD).In the second structure, each sire had n=20 of daughters in only one herd.In this case, sires were randomly assigned across herd levels.This structure did not allow for connectedness between herds but random distribution of the sire's BVs across herds was guaranteed (Figure 1).This structure will be referred as the unbalanced design (UD).In the third structure, sires had daughters only in one herd while sires with highest/ lowest breeding values were used in herds with highest/ lowest levels, respectively.This structure (UDC) will be referred as the unbalanced design with genetic environmental dependence (Figure).Three more population scenarios under unfavourable structure were simulated by assuming usage of reference sires (RS) to create genetic links between herds.RS usage was assumed on a natural service since AI is costly and difficult in sheep.The three scenarios were based on the number of RS used per herd: 1, 5 and 10.Reference sires were assumed to be selected at random from top animals based on true BVs.

Measures of connectedness and genetic-environmental association
Connectedness is variously defined in the literature.Sometimes, it is the existence of genetic ties between levels of fixed effects (TOSH and WILTON, 1994) and other times it is defined as the estimability of contrasts between levels of genetic effects (KENNEDY and TRUS, 1993;FOULLEY et al, 1992).All measures belonging to the second category along with others proposed (LALOE, 1993;LEWIS et al. 1999, MATHUR et al., 2002) are difficult to calculate and follow.For this reason, description of connectedness was attempted here using measures proposed by TOSH and WILTON (1994): effective number of progeny per sire ( j e n ), number of herds in which a sire has progeny (M) and direct connections, DC = number of sires with progeny in the same herd.Let n ij be the number of progeny of the j th sire in the i th herd, then the effective number of progeny ( j e n ) of the j th sire is: Dependence between sires' BVs (genetic) and herd levels was quantified using three non-parametric statistical measures: the Kendall's tau-b (τ), the Hoeffding's measure of dependence (D) and the Spearman correlation (ρ).The Kendall's τ is a measure of association based on the number of concordances and discordances in paired observations.Hoeffding's measure of dependence (D) is a measure of association that detects more general departures from independence.Αll above association statistics were estimated by procedure CORR in SAS (2002).

Analyses
The operational model was defined to be the same as the true model used for simulation of the data sets in all population structures.The univariate linear mixed model (animal model) used to analyze the simulated data was: , where a = the vector of additive breeding values and e = the vector of random residual effects, and A = the numerator relationship matrix that included the base population animals.REML estimates of variance components, BLUE of fixed effects and BLUP of BVs per repeat for all the analyses were obtained using an average information algorithm and the ASREML computer program (GILMOUR et al., 1999).Because emphasis was put on sires' use across the herd levels, prediction of BVs was focused on only sires (n=20).

Comparison criteria
Estimated heritability coefficients and fixed effects as well as predicted BVs were compared with respect to their simulated (true) values using Mean Squared Error (MSE) and Pearson's correlations.MSE is the average squared deviation of predicted values from their corresponding true values and represents the mean distance between the simulated and true values.The lower the value of MSE the closer the estimated values are to the simulated (true) ones:

∑
where n = the number of sires or the number of herd levels, and ^ refers to the predicted or estimated value of the parameter p.In the case of BVs, the Pearson's correlation coefficient is by definition a measure of accuracy of prediction.Means comparisons of estimated or predicted parameters were performed between the three initial structures i. e. BD, UD and UDC as well as between the UDC case and the three sire reference schemes.Comparisons were carried out by application of the Ryan-Einot-Gabriel-Welsch Multiple Range multiple range test in SAS (2002).

Connectedness and genetic-environmental association
Measures of herd connectedness and genetic-environmental association under the various population structures are in Table 1.Effective number of progeny per sire ( ).This parameter gradually improved as the number of RSs per herd increased.The same trend was also observed for the rest two connectedness measures, i. e. number of herds in which a sire has progeny (M) and direct connections (DC).Genetic-environmental dependence for the UDC design was confirmed by high values for Kendall's τ, Hoeffding' D and Spearman correlation.However, increasing the number of RSs per herd resulted in diminishing levels of association (Table 1).Note that non integer values for the three connectedness measures in the case of using one reference sire per herd implies an unbalanced design: apart from one sire used in all herds, two more sires are necessarily used in two herds each having 16.4 and 12.8 progeny, respectively.

Heritability estimation
Simulated as well as estimated heritability estimates under the various population structures are in Table 2.The average simulated heritability was 0.297, very close to the desired value (h 2 =0.30).Mean estimated heritability was comparable across cases BD and UD but it was remarkably lower in the UDC design (h 2 ~0.06).This result was due to a significant underestimation of the additive genetic variance (results not shown).Using one RS under the most unfavorable population structure, improved heritability estimation (h 2 =0.15).Using 5 or 10 RSs resulted in a heritability estimate close to the simulated value.A total number of 2, 37, 18 and 2 (out of 100) of heritability estimates were found equal (or restricted because of REML) to zero in the UD, UDC, 1RS and 5RS cases, respectively (Table 2).Means with different letters or numbers are statistically significant different (p<0.05).Letters are pertaining to comparisons among the S, BD, UD and UDC cases, while numbers are pertaining to comparisons among UDC and the sire reference schemes.CI: confidence intervals

Estimation of fixed effects and prediction of BVs
Table 3 shows MSE for the fixed effects and BVs, Pearson's correlations for the BVs as well as means of the estimated BVs of the 25 % best sires as a measure of genetic gain (GG) across the various scenarios.No correlation coefficients are presented for fixed effects because of limited number of observations (classes).As reasonably expected, MSE for fixed effects and BVs were lowest in the balanced design (BD), were statistically significant higher in the unbalanced case (UD) and highest in UDC.Prediction accuracy of BVs was highest in the balanced design (r=0.76)close to the theoretically expected value estimated by the classical formula (FALCONER, 1989): . For n=20 and h 2 =0.285 this formula gives r=0.778.Prediction accuracy of BVs decreased in the unbalanced design (r=0.68) and it was remarkably low (r=0.12) in the extremely unfavourable structure (UDC).MSE for fixed effects and BVs did not discriminate between 1 and 5 RS schemes and it was significantly lower when using 10 RSs.Using more than one RS gradually improved accuracy of prediction of the BVs.Genetic gain was similar across cases BD, UD and it was significantly lowest in the UDC case.Using 5 or 10 RSs per herd resulted in genetic gains similar to those estimated in the fully balanced design (Table 3).
Table 3 Mean Squared Error (MSE) for fixed effects (f) and breeding values (BV), accuracy of BVs estimation (r) as well as genetic gain (GG) under the various population structures: balanced design (BD), unbalanced design (UD), unbalanced design with genetic-environmental dependence (UDC) and sire reference schemes

Discussion
This study has confirmed supremacy of the REML method even when herds are disconnected.Herd connectedness did not affect the accuracy of heritability estimation (compare cases BD and UD, Table 2).However, this is not the case for BLUP, where disconnectedness significantly affected estimation of BVs and particularly of the systematic effects.In the latter case, MSE in the unbalanced design was twofold when compared to the balanced design while error and accuracy for BVs were 0.20 higher and 0.10 lower, respectively.It was, however, the selective use of sires in combination with disconnectedness that seriously affected the estimation of all parameters examined.Under this extreme scenario, a remarkably low heritability, highest estimation error for fixed effects and BVs, lowest accuracy for the BVs and lowest genetic gain were observed.Apparently, in absence of connectedness there is no way to estimate genetic and/ or environmental effects when these effects are confounded.CLEMENT et al. (2001) have also reported underestimated heritabilities for additive and maternal genetic effects under disconnectedness when herds differ in genetic merit.Only the scenario of assortative assignment of sires across environments was simulated in the present study, which may reflect the results of selection and/ or culling bias as well.Selective use of sires should be thus avoided and random distribution of sires across environments should be pursued in any breeding program.If not so, the use of reference sires serves as a good solution.Apart from creating genetic ties between herds, the usage of reference sires removes any association between the genetic and the environmental effects ultimately allowing for better parameter estimations.Mating 0.25 of the herd ewes with reference sires resulted in a fairly good approximate of the true heritability value.However, when the estimation of fixed effects and of BVs is coming in question, the proportion of herd ewes mated with RSs should increase up to 0.5.Even in the latter case, the level of accuracy of BVs remained significantly lower than that of the fully balanced design.By simulating other scenarios and population structures, KUHN (2005) also reported reduced bias of comparisons between homebred sires when more progeny were allocated to reference sires.Reduction in bias was markedly lower with 20 % linking sire progeny, but differences in bias reductions when 50 vs.33 % of progeny were from linking sires were minor.HANOCQ et al. (1996) also reported a non linear increase for gains in individual accuracies of BVs estimation under a small level of connection in contrast to higher levels of connections.When compared to the balanced design, genetic gain in the unbalanced design was lower, however, not statistically significant.Interestingly, mating 0.25 of herd ewes with reference sires under unfavourable population structure resulted in similar genetic gain to that of the balanced design.Other simulation studies by assuming AI sire referencing have shown improved genetic gain by 30 to 35 % compared to within-unit selection as well as improved accuracy and lower inbreeding coefficients (HANOCQ et al., 1996;LEWIS and SIMM, 2000;RODEN, 1996).Although referencing is usually through AI, applications of such schemes on a natural mating basis have shown comparable to AI-SR genetic gains while being superior in terms of inbreeding (KUHN, 2005).AI may create extensive connections, but AI is currently not widely used in sheep, goat and beef cattle populations because it is either costly, or difficult to apply.

Methological approach
A problem often encountered when analysing data with unfavourable structure is how to treat contemporary groups (CG).HENDERSON (1973) showed that by treating CGs as fixed effects renders genetic evaluations invariant to CG effects.Other studies (UGARTE et al., 1992;VISSCHER and GODDARD, 1993) suggested treating CGs as random effects under non-random distribution of sires across CGs.Accuracy of BV estimation was improved in the random model in all cases examined by UGARTE et al. (1992).VISSCHER and GODDARD (1993) also found higher accuracies for the BVs when the best sires were used in the best CGs using a random model.However, when the best sires were represented only in the worst CGs, the correlation between true and predicted breeding values for the random model became negative.The authors finally concluded that when "a non-random association exists between sires and CGs, the groups should be treated as fixed effects for practical genetic evaluations".In dairy cattle, Interbull (2001) recommends considering an effect as fixed if the association between the effect and the main random effect (animal or sire) is non-random.Practically, an effect with limited number of levels cannot be fitted as random and this is the main reason why herd effect was fitted here as fixed effect.In the present study the effect of a single use of reference sires within of one generation was evaluated.Connections by time are reasonably expected to result in more accurate genetic evaluations and higher genetic responses when herds have different genetic levels.Connections allow for gene flow from herds having high genetic level to herds with low genetic level.As a result, differences in the average genetic level gradually decrease across herds (HANOCQ et al., 1996).Further studies are needed to examine the effect on parameter estimation and genetic evaluation of (dis)continual connectedness by time.

Table 1
Measures of herd connectedness and of genetic-environmental association across the various population structures: balanced design (BD), unbalanced design (UD), unbalanced design with genetic-environmental dependence (UDC) and sire reference schemes (Maßzahl der Herdenverbundenheit und der genetisch-umweltbedingten Assoziation bei verschiedenen Populationsstrukturen: balanciertes Design [BD], unbalanciertes Design [UD], unbalanciertes Design mit genetisch-umweltbedingter Abhängigkeit [UDC] und den Väter-Referenzschemata) Means with different letters or numbers are statistically significant different (p<0.05).Letters are pertaining to comparisons among the BD, UD and UDC cases, while numbers are pertaining to comparisons among UDC and the sire reference schemes.CI: confidence intervals