Simplifications of marker-assisted genetic evaluation and accounting for non-additive interaction effects

Abstract A computing simplification was applied to marker-assisted genetic evaluation of quantitative traits including additive and non-additive effects of QTL as well as residual polygenic effects. Different situations including QTL and the residual polygenic effect estimated as a sum or separately, and with or without non-additive effects integrated in models were evaluated. The computing simplification was used in combinations with different models and parameterizations. An example data was adopted to illustrate the simplified computing strategy and was compared with the computing method of direct inversion. Identical results were obtained from both computing strategies. The main advantage of the simplification is that it does not require inversion of nonadditive relationship matrices and relationship matrices of QTL, and the number of random effects in mixed model equations is the same as any animal model with only additive effects.


Introduction
In linear model genetic evaluations, the independent variables are usually results in a probability distribution of QTL genotype rather than a specific genotype.Estimation of QTL effects in a linear model requires the use of uncertain QTL genotypes as independent variables.Three types of linear models are often used for the estimation: (1) Mixture model: This type of model is based on a mixture of QTL genotypic distributions.It was proposed by LANDER and BOTSTEIN (1989) for QTL interval mapping and was a direct solution for this kind of evaluation.(2) Regression model: The model was proposed by HALEY and KNOTT (1992).It takes the expectation of QTL genotype as covariates of the linear model, and therefore it is an approximation to avoid the complicity of mixture model analysis.(3) Gametic model: This model takes QTL covariates as random effects and uses marker information to quantify the similarities between the random allelic QTL effects of different individuals in the population.It was proposed by FERNANDO and GROSSMAN (1989) and has been used very widely.The uncertainty of QTL covariates is eliminated by assuming that the number of QTL alleles in the population is infinite or that each animal carries distinct and unique QTL alleles.The gametic model has an advantage for animal data that it can easily integrate pedigree records and marker information in the data analysis and make good use of relatedness among relatives due to sharing identical QTL alleles by descent.The gametic model has been applied to both the marker-assisted genetic evaluation for commercial breeding programs (DEKKERS, 2004) and QTL mapping practice in livestock populations (e.g.GRIGNOLA et al., 1996;ZHANG et al., 1998;FREYER et al., 2002FREYER et al., , 2003)).Since the gametic model method was proposed, a series of studies has been focused on the method of reducing the number of mixed model equations in marker-assisted genetic evaluations.When only one QTL is considered, mixed model equations for estimating QTL effects and residual polygenic effects need three equations for each animal in addition to fixed effects.In case of q QTL, 1 2 + q equations have to be evaluated for each animal.CANTET and SMITH (1991) proposed a reduced animal model to reduce the number of equations by absorbing QTL effects of non-parents into QTL effects of parents.HOSCHELE (1993) showed that QTL equations are not needed for animals that are not marker genotyped and do not provide relationship ties among genotyped descendants.Therefore, the number of equations for reduced animal model allows being further reduced by eliminating QTL effect equations of these animals.VAN ARENDONK et al. (1994) developed a method to estimate directly the sum of the QTL allelic effects and residual polygenic effects for the purpose of genetic evaluation and reduced the number of mixed model equations to one for each animal.SAITO andIWAISAKI (1996a, 1997) showed how the approach by VAN ARENDONK et al. (1994) allows further simplifying the computation if it is combined with the computing simplification of CANTET et al.'s reduced animal model (1991) and the idea of HOSCHELE (1993).The method of VAN ARENDONK et. al. (1994) is a very efficient way for simplifying the computations, providet estimating total additive genetic value is the purpose of the analysis.However, the method does not give estimates of QTL effects even they are required (SAITO and IWAISAKI, 1996b).In marker-assisted estimation of QTL effects, diverse statistical models can be needed for different purposes of the data analyses.Modeling QTL effects separately from polygenic effects are often necessary in some cases.For example, the estimates of QTL effects are useful for making livestock mating plan, in which characteristics of each animal may be of reference value.A series of experiments in poultry (MATHUR and HORST, 1992;MATHUR and HORST, 1994;HORST et al., 1996) have demonstrated substantial non-additive effects of major genes.Similar non-additive effects can be expected from QTL and candidate genes as well.For some other species such as swine and beef cattle, QTL position estimation is generally based on the crossbred populations originated from breeds that are substantially different.Including non-additive effects in the model will allow more accurate estimation of QTL effects and positions.The genetic evaluation of livestock is usually based on the animal model of additive effects, in which the nonadditive effects could contribute partially to breeding values.Fitting non-additive effects of QTL and residual polygenes in the model allows removing the unstably inheritable effects from estimates of breeding values and improving the accuracy of the estimates.Estimation of non-additive effects can be especially important for meat animals when the main objective of selection in purebreds is to improve the performance in commercial crossbred populations.In this study, the strategy of computing simplification (SCHAEFFER, 2003) was applied to marker-based QTL analysis of different models with and without nonadditive effects of QTL and polygenes, and including QTL effects and the residual polygenic effect estimated as a sum or separately.Example data were adopted to illustrate marker-assisted QTL effect estimations and polygenic effect evaluations combined with different models.

Theories Notations for genetic effects
Considering a single QTL, the maternal and paternal allelic effects of an animal are denoted as and .The additive and dominance effects at the QTL are denoted as and .The relationship matrices for QTL allelic effects, and additive and dominance effects are denoted as G , and .The additive and dominance effects of residual polygenes (excluding the QTL effects considered) are expressed as and and their relationship matrices as and , respectively.The total additive and dominance effects of an animal, the sum of QTL effect with polygenic effect are denoted as and with relationship matrices as and .The additive and dominance effects of animals are expressed as a and as defined by FALCONER and MACKAY (1996) and commonly used in conventional animal models without marker information.Their relationship matrices are and , the same as those for and .

Gametic model of QTL effects
The single-QTL-locus gametic model can be described as where is the phenotypic observation of individual i ; are gametic effects at the QTL and is the residual polygenic effect of individual i and β is fixed effect vector. is the model residual.In matrix notation, the model becomes .
. Here, is an identity matrix of dimension equal to the number of individuals to be evaluated ( ) and where is numerator relationship matrix.G is the gametic relationship matrix at QTL and can be calculated conditional on linked marker and pedigree information (WANG et al., 1995;LIU et al., 2002).G for a QTL locus has dimension of . For the gametic model, the mixed model equations (MME) can be very large in comparison with the conventional animal model.The number of equations for gametic effects is twice as the number of animals to be estimated for each QTL.

Additive model of QTL effects
To simplify the computation, the gametic effects at a QTL can be merged into the QTL additive effect, that is . The here can be converted from gametic relationship matrix G using the formula (1) according to LIU et al. (2002), where is an element of at row i column , and is the element of G at row , the identicalby-descent probability of the individual i 's first allele with individual 's second allele.For the purpose of genetic evaluation, Additive Model 1 can be further simplified by merging residual polygenic effects and QTL additive effects into the total additive genetic effects, i.e.
. The model becomes with the variance-covariance structure as This is the model proposed by VAN ARENDONK et al. (1994) for marker-assisted genetic evaluation.Here, In this way, the number of MME for marker-assisted genetic evaluation reduces to be equal to the number of animals to be evaluated.Comparing with mixed model equations for the conventional animal model, the difference is at matrix , which is an average of and weighted by the sizes of variance components and .

Dominance Model of QTL effects
The model with dominance QTL effects can be derived from HENDERSON (1985) as where and stand for additive and dominance effects of residual polygenes while and for additive and dominance effects at QTL q .The variance-covariance structure is assumed to be Here, is a dominance relationship matrix for residual polygenic effects, calculated based on pedigree information only.Matrix D can be converted from average gametic relationship matrix (SMITH and MAKI-TANILA, 1990).Similarly, dominance relationship matrix at QTL can be calculated from gametic relationship matrix of the same QTL, which is calculated conditional on marker and pedigree information (WANG et al., 1995;LIU et al., 2002).The element of at row and column can be calculated based on G using formula . (3) For genetic evaluation, residual polygenic effects and QTL effects can be merged into total genetic effects, i.e. the total additive effect and the total dominance effect: with variance-covariance structure as Therefore, the relationship matrices are for the total additive genetic effect and the total dominance genetic effect, respectively.

Computing simplifications
Incorporating the QTL effects in the linear model leads to considerable increases in the size of mixed model equations requiring substantial computer resporces and computing time.Therefore, Some method for computing simplification becomes necessary.The transformation of gametic effects at QTL into additive effects makes it possible to use computing simplification developed by SCHAEFFER ( 2003) for solving mixed model equations when there are more than one random effects in the data analysis.The method can be extended to the case of QTL effect estimation.For Additive Model 1, the corresponding mixed model equations are .In equations, subtraction of the third equation from the second gives The QTL additive effects can be expressed as Therefore, an iterative procedure can be applied, instead of solving equations (5) directly.The iteration begins with setting the starting values for , and such as to null vectors, then follows the steps as follows: Starting back at step 1 if the estimates do not converge.The converged values are the estimates of β , and .This computing strategy can avoid inverting . The set of equations to be solved is small and contains only residual polygenic effects.It is especially advantageous when multiple QTL and nonadditive effects are considered simultaneously.0 a q a q A For Dominance Model 1, solving the following mixed model equations can be replaced with the iteration of solving β and from the equations ( 7) and calculating based on the solution from equations ( 7), where the corrected observation is λ are as defined in equations ( 5).
Accordingly, the iterative procedure for Dominance Model 2 includes solving from equations where , and estimating from formula Here, t a λ in equation ( 9) is 2 2 t a e σ σ and depends on heritability.

Numerical example
The example data including pedigree, marker genotypes and phenotypic observations as given in Table 1 were adopted to illustrate the methods described in the section of Theories.A QTL was assumed to be known and linked to a molecular marker with a recombination rate of 0.1.The heritability of the trait was assumed to be 0.35.The variance component, expressed in proportion of phenotypic variation, was assumed to be 0.25 for the residual polygenic additive effect ( ), 0.1 for the additive effect at the QTL ( ), 0.075 for the dominance effect of residual polygenes ( ) and 0.025 for the dominance effect at the QTL ( ).The variance ratio of gametic effects to phenotypic variance was 0.05 ( ), equal to a half of .The parents of individuals 1 and 2 are unknown.These two individuals were assumed to be unrelated and noninbred.The gametic relationship matrix for the QTL was calculated based on the method of WANG et al. (1995) and listed in upper diagonal area of Table 2.The average gametic relationship matrix for polygenes was based on SMITH and MAKI-TANILA (1990) and shown in the lower diagonal of Table 2.The additive relationship matrices were calculated based on gametic relationship matrices using the formula (1) and listed in Table 3, in which the additive relationship matrix at the QTL was in upper diagonal and the additive relationship matrix of polygenic effects was in lower diagonal area.The dominance relationship matrices were estimated from gametic relationship matrices based on formula (3).The dominance relationship matrix at the QTL and the dominance relationship matrix of polygenic effects were shown in upper and lower diagonals of Table 4, respectively.The diagonal elements were listed separately in the Tables.as conventionally defined in the literature of animal genetic.Five models with marker information integrated for inferring QTL effects are as described in the section of Theories, including Gametic Model, Additive Model 1, Additive Model 2, Dominance Model 1 and Dominance Model 2. The symbols for genetic effects are as defined in the section of Notations for genetic effects.The model residual was assumed to be independently, identically and normally distributed, for all models considered here.All the models were analyzed using direct inverse procedure in solving mixed model equations.The computing procedure based on Schaeffer's simplification was applied to Additive Model 1, Dominance Model 1 and 2, and Animal Model 2 since these models included two or more random factors.The results from evaluations of the models without dominance effects were listed in Table 5, and those from evaluation of the models with dominance effects were shown in Table 6.The result shows that the computing simplification provides the same results as those from direct inverse solutions.The number of iterations to converge was 6, 6, 9 and 6 for Additive Model 1, Animal Model 2, and Dominance Model 1 and 2, respectively.) than Animal Model 1 (Table 5).Dominance Model 1 and 2 also give smaller mean square errors than Animal Model 2 (Table 6).The breeding values estimated from Additive Model 1 and 2 are exactly identical to those from Gametic Model.They are equivalent models.However, the gametic model required more computing time.The Additive Model 2 needs the shortest computing time.Therefore, Additive Model 2 is a good choice when QTL effects do not need to be estimated while Additive Model 1 is the choice when both QTL effects and total genetic merit are of interest.Similarly, Dominance Model 2 gave the same results as Dominance Model 1, but demanded less computing time.The Table 6 shows that fitting dominance effects in addition to additive effects made the models have a better fit to the data.Including dominance effects in the conventional animal model reduced MSE from 100.33 to 82.53 while MSE was decreased from 90.97 to 74.88 when fitting dominance effects in the models with QTL effects.Dominance Model 1 could be used for the analysis of QTL effect estimations, and Dominance Model 2 may be applied to marker-assisted genetic evaluation and allows a cleaner estimation of breeding values.

Discussion
After FERNANDO and GROSSMAN (1989) proposed the gametic model BLUP method for marker-assisted genetic evaluation, several methods have been developed for simplifying the computations of the marker-assisted genetic evaluation and reducing the number of mixed model equations for the evaluation.The simplifications of CANTET andSMITH (1991) andHOSCHELE (1993) are useful for establishing mixed model equations by either expressing the effects of non-parents with their parents or for eliminating the QTL effect equations of those animals that are either not genotyped or do not provide relationship ties.The simplification of VAN ARENDONK et al. (1994) was applied at the stage of parameterization to reduce the number of effects to be estimated and therefore the number of mixed model equations.In this study, the simplified approach of estimating sum of QTL and polygenic effects ( VAN ARENDONK et al., 1994) was extended to different effects.The analysis shows that the modeling for marker-based analysis can be very flexible and can be of different forms according to the purpose of the data analysis.We also adopted SCHAEFFER's computing simplification (2003) into marker-assisted QTL analysis at the stage of solving mixed model equations, in addition to those by CANTET andSMITH (1991) andHOSCHELE (1993).It is easy to see that all four simplifications above can be applied in different combinations.However, SCHAEFFER's simplification ( 2003) is especially useful when multiple random factors exist in the analysis and when various QTL effects, additive and non-additive, need to be estimated aside from those for residual polygenes.To keep simple the presentation, only additive effects and dominance effects of a single QTL were included in the section of Theories and in the analysis of the example data.However, the principle described in this study can be extended straightforward to include epistatic effects of multiple QTL.Table 7 gives a more general summary for the variances and relationship matrices of different genetic effects including additive, dominance, and epistatic effects of multiple QTL expressed as sums of QTL effects and the corresponding residual polygenic effects.Different QTL effects, additive or non-additive, can also be estimated based on the results in Table 7 separately from residual polygenic effects, depending upon the purpose of the data analysis.Traditionally, the epistatic effect is classified into additive by additive, additive by dominance and dominance by dominance effects (COCKERHAM, 1954) since it is not possible to distinguish the additive by dominance effect from the dominance by additive effect for polygenes without observation of individual loci.However, these are distinguishable for epistatic interactions between QTL loci.The additive by dominance is generally different from dominance by additive effects for QTL, though they can be put together to simplify the computation as it is the case in Table 7.Consider a pair of Table 7 Variance matrices and relationship matrices for the total genetic effects as a sum of QTL effects and residual polygenic effects (Varianz-und Verwandtschaftsmatrix für die genetischen Effekte als Summe der QTL-Effekte und polygenen Effekte)) Total Effect Variance Relationship matrix QTL, and q p as an example.Apparently, the interaction between additive at q and dominance at p is different from that between dominance at q and additive at p .It is important to realize this difference even in case the sum of different epistatic effects is estimated as shown in Table 7, because is generally not equal to and is not equal to .
Many studies showed that use of marker information allows increasing the accuracy of genetic evaluation resulting into more the genetic progress (ZHANG and SMITH, 1992;GIMELFARB and LANDE, 1994;RUANE and COLLEAU, 1995).The analysis of the example data in this study also indicates that integrating marker information helps in fitting and appropriate model to the data.The advantage of using marker information in genetic evaluation can be explained by the nature of relationship matrices.The conventional numerator relationship matrix and dominance relationship are calculated based on pedigree information only.The gene transmission probability is generally taken as 0.5, which is corresponding to maximum uncertainty about gene transmission under known pedigree.In reality, the transmission of an allele from a parent to an offspring follows an all-or-none pattern.Relationship matrices at QTL, and are calculated based on both pedigree and marker information.With information from molecular markers, it becomes possible to track QTL allelic transmission more accurately than with pedigree information alone.The transmission probability of QTL alleles given marker information approaches one or zero from 0.5, with increases of the informativeness of molecular markers.Therefore, and are more accurate estimates of correlations between random genetic effects.LIU et al. (2002) showed that is equal to when markers are assumed to be completely noninformative.Therefore, it can be seen from formulae (2) and (4) that when is equal to and therefore, .So, the bottom line for marker-assisted genetic evaluation using the discussed models is that, on average, marker-assisted genetic evaluation is at least as good as the conventional animal model evaluation without marker information, if there are no errors in the parameters used (e.g.QTL location estimates) and no mistakes in marker genotyping.In actual marker-assisted genetic evaluation, the situation can vary due to different factors such as inaccuracy of QTL position estimates, sampling errors etc..However, these are not the problems of marker-assisted genetic evaluation itself, and therefore should not be the reason to underestimate the value of marker-assisted selection.
Since the marker information is used through relationship matrices in marker-assisted genetic evaluation, the model expression for marker assisted genetic evaluation has the same form as a conventional animal model aside from that a weighted average of relationship matrices are used for the cases of marker assisted genetic evaluation (see Additive Model 2).The results of the study shows that estimating the sum of QTL effects and residual polygenes through using weighted average of relationship matrices will result in the same results as those from the gametic model but will considerably simplify the computing procedure and reduce computing time.Therefore, the marker-assisted genetic evaluation can be replaced with a conventional animal model BLUP procedure if the weighted average of numerical relationship matrix ( ) is used in the place of t A numerical relationship matrix ( ). could be replaced with , which depends only on heritability.That the conventional animal model BLUP procedure is able to be used for marker-assisted genetic evaluation in this way will greatly facilitate the transition process from conventional genetic evaluation systems to the systems of marker-assisted genetic evaluation in livestock industry.The current genetic evaluation systems can be used for marker-assisted genetic evaluation merely by replacing with .Various model formulations in this study may also provide potential opportunities for improving QTL mapping, in which procedures of variance component estimation based on gametic models have often been used so far (e.g.GRIGNOLA et al., 1996;ZHANG et al., 1998).This study indicates that the gametic model for QTL mapping can be simplified.Replacing QTL gametic effects with QTL additive effects can not only simplify the computation because of reducing the number of random QTL effects, but also provide a potential way to practise multiple interval mapping of QTL.The advantage of multiple interval mapping (MIM) procedure is that it can effectively prevent the Ghost QTL phenomenon from happening and increase the accuracy of the QTL detection (WEBER et al., 1999;KAO et al., 1999).MIM has been so far applied only to fixed model procedure for inbred populations, taking the already positively tested QTL as covariate controls when testing a putative QTL of interest (WEBER et al., 1999;KAO et al., 1999).This fixed model method is difficult to use in livestock due to specific characteristics of livestock population such as uncertain linkage phases and incomplete informativeness of markers.Using the results of Table 7, the sums of residual polygenic effects with the effects of the already positively tested QTL can be included in the model for QTL mapping as a random effect control when a putative QTL of interest is being tested.

⊗
stands for Kronecker product.The variance-covariance structure is assumed to be

j
In this way, the gametic model above becomes an additive model of QTL effects, the corrected observations based on the current estimates of QTL additive effects the following reduced mixed model equations for estimates of β and from the corrected observation vector ˆ0