An algorithm for genetic variance estimation of reproductive traits under a threshold model

Some binary traits are determined by a large number of loci. The standard approach, in this case, is to model an unobservable variable (liability) with a fixed threshold. We present a method for the estimation of the genetic additive variance component under a threshold animal model with fixed effects included. The method can be applied to data with repeated measurement per animal. The unknown parameters of the model have been estimated by Gibbs sampling. The numerical properties of the method are investigated on simulated data for a large real pedigree of breeding stock of laying hens. The algorithm shows good mixing properties, producing consistent estimates from many distinct runs. The application of the method is exemplified on fertility data recorded for the same pedigree.

Introduction Some reproductive traits (e.g.fertility) are determined by many loci but are binary scored and, in consequence, usually troublesome in genetic analysis.The standard approach is to assume an unobservable continuous scale with a fixed threshold.The threshold model has become a preferred method in genetic evaluations of reproductive binary traits.The methodology (for sire model) has been described by GIANOLA and FOULLEY (1983) and developed by others authors (e.g.MISZTAL et al., 1989;MATOS et al., 1997).SORENSEN et al. (1995) implemented Gibbs sampler for binary data where all conditional posterior distributions have a known form, so the sampler is relatively easy to apply.In poultry, reproductive eggs are binary scored (eg.hatched or not) and may be analysed in this form.Alternatively, the set of binary observations can be summarised as a proportion variable recorded for each dam (SEWALEM et al., 1998).Both approaches have been implemented and may lead to various estimates of genetic parameters (SZWACZKOWSKI et al., 2002).In general, the inclusion of binary data instead of summarised 'percentage' data increases the number of observations.In this paper we present Gibbs sampling algorithm for genetic evaluation of a binary trait under the threshold model.Simulation study on the efficiency of sampling scheme is reported.The method is exemplified with genetic evaluation of fertility in laying hens.

Method
Model.Let us assume that for each of the observed individuals we have a number of repeated measurements of a binomial random variable describing this individual.The binominal variable is an expression of an underlying, unobservable, continuous random variable.As in all threshold models we assume that there exists an unknown threshold for the continuous variable which partitions the real line into two intervals corresponding to categories of response (MATOS et al., 1997;SORENSEN et al., 1995).In our case without loss of generality the threshold can be taken as 0. The unobserved variable can be modelled in the following way where is a p-vector of fixed effects, a is a q-vector of random additive genetic effects, are vectors choosing the appropriate fixed and random effects and is a random error with 0 expectation and variance equal to 1.The random additive genetic effects follow a normal distribution , where A is the additive relationship matrix.The expression of the unobservable variable is the observation which takes value 1 or 0 (e.g.fertilized or unfertilized).δ is an indicator taking value 1 if the condition is satisfied and 0 otherwise; where the summation goes over all connected with the k-th level of β , being the number of observations; The conditional posterior density of additive genetic effect is normal with expected value E and variance V of the following form * i a -when i * is a dam: with being a total number of progeny.The parameters Mixing properties.Due to a correlation among variables a sampling process may show slow mixing, i.e. move slowly in the parameter space.This may limit the practical application of the method.There is no way in which convergence can be proved beyond any doubt.However, by monitoring many distinct variables we get much power to detect a lack of convergence, if any.The standard procedure is to compare estimates from different chains starting from distinct initial values or/and seeds for a random number generator.To evaluate the mixing properties we monitored the estimates of a -the vector of random additive effects and compared the two estimates from two independent runs.The random values for genetic additive variance were monitored to visualize the speed with which the process converges to the true values.The simulation study was performed on data generated for a real pedigree structure.The pedigree was nine-generation breeding stock of laying hens, consisting of 396 founders and 5980 non-founders.The pedigree included large full-sibs nested within the sire group.The average inbreeding coefficient was 0.09.Three data sets were generated under a pure genetic additive model assuming a heritability h 2 =0, 1/3 and 5/6, respectively (Table ).The estimates were taken as means from 10000 random values after discarding the first 1000 samples.The samples were collected every 20-th iteration.

Results
The algorithm shows good mixing properties.The random values of genetic additive variance converged rapidly to true values from distinct extreme starting values ( =0.001 and =5).The process was stable through out all iterations (Fig. 1).For each of the three data sets the two independent estimates of vector a were very consistent (Fig. 2).This strongly suggests a very good mixing performance of the method, at least for the pedigree structure used here.The heritability estimates were quite close to the true values used for data simulation (Table ).This was also the case for null genetic contribution in the first data set.Numerical example We used the method to analyse fertility data of laying hens.The pedigree was the same as used for simulation study.The fertility data were collected from 1989 through 1997.Eggs were examined by candling on 8 th day of incubation.The trait was measured on 30 eggs per dam, hence 30 binary observations were available for each of the 5936 hens.The mean percentage of fertilized eggs per dam was 79±30%.The following model was used for liability variable y ijkl = µ + g i + h j + a ijk + e ijkl , where g i is the fixed effect of i-th year (generation), h j is the fixed effect of j-th hatch period, a ijk is the random genetic additive effect of ijk-th dam, and e ijkl is random residual effect connected with ijkl-th egg.The estimated density for genetic variance is given in Figure 3.The mean and mode of distribution were 0.776±0.021and 0.785, respectively.This gives a relatively high heritability estimate of 0.437±0.021.The genetic trend for fertility was obtained by averaging the individuals' genetic effects within each generation (Fig. 4).

Practical implications
The threshold models were developed a few decades ago (see e.g.DEMPSTER and LERNER, 1950), but their application was limited, mainly because of the difficulty to integrate over random effects analytically.Over the past decade several approaches to analysing threshold characters have been developed.In this study we describe methods of estimating variance component as well as fixed and random genetic effects based of simple implementation of the Monte Carlo approach.The method is presented in application to poultry data, but it can be applied to any binary threshold data of other livestock and poultry species.Although the Monte Carlo procedures may provide excellent solutions to complex problems, their efficiency totally depends on particular applications.For genetic problems there are a number of samplers to study additive genetic effects.In general, simple samplers suffer mixing difficulties due to strong dependencies on pedigree structure.More advanced samplers use a blocking strategy and sample the entire vector of additive effects.This, however, makes the sampler complex and more expensive.Our sampler used limited blocking and proved to be efficient for the pedigree studied in this paper.Although we have encountered no problem for other pedigree structures, the efficiency of the algorithm needs to be verified for each particular pedigree under study.We estimated a relatively high heritability of fertility.Many authors (e.g MATOS et al., 1997) reported higher heritability for the threshold than the linear model.This corresponds with our earlier studies carried out by SZWACZKOWSKI et al. (2002).The difference can be partly explained by non-normality of the trait when analysed as percentage data.Although several transformation functions have been proposed, they rarely satisfy model assumptions.This suggests that the data should be rather analysed in their original binary scale, when possible.Higher heritability from the threshold models suggests that a selection index based on the threshold model may give higher accuracy of selection.The algorithm presented in this study allows for unitrait genetic analysis.In practice, however, multitrait evaluation is required.The algorithm to sampling of additive genetic additive values for an arbitrary number of continuous traits has been presented by SZWACZKOWSKI et al. (2001).The method presented here can be relatively easily extended to mixed threshold-linear problems.
Parameter estimation.The estimation of model parameters is realized by the Gibbs sampling procedure.The consecutive parameters are sampled from the following distributions: is, or is not a founder and as the individual is or is not observed.