Introduction

AAB

Archives Animal Breeding

AAB

Arch. Anim. Breed.

2363-9822

Copernicus GmbH

Göttingen, Germany

10.5194/aab-58-277-2015

Comparison of inference methods of genetic parameters with an application to body weight in broilers

Maniatis

gerasiman@gmail.com Demiris

Kranis

Banos

Kominakis

1Faculty of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 118 55 Athens, Greece 2Department of Statistics, Athens University of Economics and Business, 76 Patission Str., 10434 Athens, Greece 3The Roslin Institute and Royal School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, Scotland, UK

G. Maniatis (gerasiman@gmail.com)

27July2015

58 2 277286 6January2014 12May2015

This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

This article is available from https://aab.copernicus.org/articles/58/277/2015/aab-58-277-2015.html

The full text article is available as a PDF file from https://aab.copernicus.org/articles/58/277/2015/aab-58-277-2015.pdf

REML (restricted maximum likelihood) has become the standard method of variance component estimation in animal breeding. Inference in Bayesian animal models is typically based upon Markov chain Monte Carlo (MCMC) methods, which are generally flexible but time-consuming. Recently, a new Bayesian computational method, integrated nested Laplace approximation (INLA), has been introduced for making fast non-sampling-based Bayesian inference for hierarchical latent Gaussian models. This paper is concerned with the comparison of estimates provided by three representative programs (ASReml, WinBUGS and the R package AnimalINLA) of the corresponding methods (REML, MCMC and INLA), with a view to their applicability for the typical animal breeder. Gaussian and binary as well as simulated data were used to assess the relative efficiency of the methods. Analysis of 2319 records of body weight at 35 days of age from a broiler line suggested a purely additive animal model, in which the heritability estimates ranged from 0.31 to 0.34 for the Gaussian trait and from 0.19 to 0.36 for the binary trait, depending on the estimation method. Although in need of further development, AnimalINLA seems a fast program for Bayesian modeling, particularly suitable for the inference of Gaussian traits, while WinBUGS appeared to successfully accommodate a complicated structure between the random effects. However, ASReml remains the best practical choice for the serious animal breeder.

Introduction

The restricted maximum likelihood (REML) method (Patterson and Thompson, 1971) for unbalanced mixed models has been extensively used in animal breeding and has become the standard method for the estimation of variance components. The Bayesian Markov chain Monte Carlo (MCMC) methods were introduced in quantitative genetics in the early 1990s (Wang et al., 1993; Sorensen et al., 1994), facilitated by the development of the Gibbs sampling procedure (Geman and Geman, 1984; Gelfand and Smith, 1990). The Gibbs sampler successively samples from conditional distributions of all parameters in a model in order to generate a random sample of the marginal posterior distribution, which is the target for Bayesian inference. MCMC methods represent the standard inference procedure for Bayesian animal models (Sorensen and Gianola, 2002), and through the years they have become an attractive alternative to REML. Recently, a non-sampling-based alternative to MCMC, the integrated nested Laplace approximations (INLAs), has been introduced (Rue et al., 2009). Using INLA, marginal posteriors for all parameters and random effects can be calculated. Because INLA is based on direct numerical integration instead of simulations, it is much faster than MCMC (Rue et al., 2009). Furthermore, Holand et al. (2013) have developed an R package (AnimalINLA) making Bayesian animal models more accessible to animal breeders.

Several programs are available for MCMC methods, but very few provide a flexible environment. WinBUGS (Lunn et al., 2000) is the most well-developed and general-purpose Bayesian software available to date. It has an interactive environment that enables the user to specify models that need to be compiled before starting the Gibbs sampling. Convergence diagnostics, model comparisons, e.g., via DIC (deviance information criterion), and other useful plots and diagnostics are available. Several distributions can be used for modeling the observations as well as priors, while full conditional distributions are automatically constructed and the appropriate MCMC algorithm for sampling is chosen (Lunn et al., 2000). In WinBUGS and in the context of animal breeding, an important issue is the importation of the animals' genetic relationship matrix. Methods proposed so far (Damgaard, 2007; Waldmann, 2009) either require prior transformation of the data using complex code or do not provide a generic procedure independent of the data structure. A good solution here is the use of the inverse of the numerator relationship matrix A-1 directly through the diagonal values of W-1 matrix, where A-1=(T-1)′W-1T-1 (Henderson, 1976; Quaas, 1989), as suggested by Gorjanc (2010). Recently, Hallander et al. (2010) have developed a Bayesian method in WinBUGS based on the decomposition of the multivariate normal prior distribution into products of conditional univariate distributions, thus permitting the genetic evaluation of complex pedigree structures. In addition, more complicated covariance structures have been incorporated via Bayesian methods, allowing for the simultaneous estimation of both additive and dominance genetic effects (Waldmann et al., 2008; Mathew et al., 2012).

The primary goals of the present study were to apply and investigate the relative merits of three methods (REML, Gibbs sampling and INLA) in the context of animal breeding, using representative programs such as ASReml 3.0 (Gilmour et al., 2009), WinBUGS and AnimalINLA. For this purpose, both a Gaussian and a binary trait were explored and variance components and the genetic parameters along with breeding values across the three methods were estimated and compared.

Materials and methods Data description

Data on body weight (BW) at 35 days of age from a broiler line were made available by Aviagen Ltd. Given that, in the Windows version of AnimalINLA 1.1, limitations in the size of the data set exist, a small data set was randomly selected, consisting of 2319 records. This comprised 1171 males and 1148 females in 40 hatch weeks, while the pedigree included a total of 2456 animals. All sires (n=32) and dams (n=105) were assumed to be non-inbred and non-related. To make results directly comparable, all phenotypic values were standardized to the standard normal distribution via y=y0-y¯σy0, where y∼N(0, 1) is the standardized BW, y0 the original phenotypic values of BW, y¯ the mean BW in the population and σy0 the standard deviation of BW. A preliminary analysis of variance showed that the statistically significant (P<0.05) fixed effects included hatch week and sex. Hence, these fixed effects were included in all models. In this data set, each dam was mated with two sires producing from 2–57 offspring with records (average full-sib family size: 16), while sires were mated with two to seven dams and produced 2–97 offspring (average half-sib family size: 56). Such a structure enabled the inclusion of maternal environmental effects (c2) through proper modeling. The latter are modifications of the offspring phenotype caused by the environment provided by the mother and consider any influence of a dam on its progeny, excluding the effects of directly transmitted genes.

A binary response trait was also constructed, using the original BW values and a threshold at the highest 20 % phenotypic values. Thus, the new variable yB followed the Bernoulli distribution, with values 0 and 1 denoting low and high weight, respectively. In this data set, only the gender of the animals was statistically significant (P<0.05) and was thus included in analyses as the only fixed effect.

Statistical analysis Gaussian trait

Three animal models were considered for BW. Model M1 was a purely additive animal model, while model M2 allowed for the inclusion of maternal environmental effects and model M3 was as model M2 but with a covariance σuc between additive genetic and maternal environmental effects. In summary, the models in matrix notation were as follows: y=Xb+Zu+e(M1) y=Xb+Zu+Zcc+e(M2) y=Xb+Zu+Zcc+e,withcov(u,c)=σucI(M3), where y=n×1 is the vector of observations (n: number of records, 2319), b=p×1 is the vector of fixed effects (p: number of fixed effects classes, 42), u=q×1 is the vector of direct additive genetic effects (q: number of additive effects, 2456), c=k×1 is the vector of maternal environmental effects (k: number of dams with offspring, 105) and e=n×1 is the vector of residuals; X, Z and Zc denote the incidence matrices relating the observations to the corresponding fixed and random effects. The vector of direct genetic effects was assumed to follow the normal distribution: u∼N0n,σu2A, where 0n denotes a n×1 vector of 0s, σu2 denotes the direct genetic variance and A denotes the additive genetic relationship matrix. The maternal environmental effects were assumed to follow a normal distribution given by c∼N0k,σc2Ik, where Ik is an identity matrix of order k and σc2 is the maternal environmental variance. Finally, residuals for the two traits were assumed to be normal as follows: e∼N0n,σe2In, where σe2 is the residual variance.

From a Bayesian perspective, the data y are assumed to be y|b,u,σe2∼N(Xb+Zu,σe2In) and y|b,u,c,σe2∼N(Xb+Zu+Zcc,σe2In) for models M1 and M2, respectively. The vector of the data y for model M3 was assumed to be y|b,u,c,r,σe2∼N(Xb+Zu+Zcc,σe2In), where the correlation was r=cov(u,c)σuσc. The vector of b (p×1) for all three models was partitioned into two sub-vectors, denoting hatch (h) and sex (s). It was assumed that both sub-vectors followed univariate normal, according to h|σh2∼N(0,σh2)I and s|σs2∼N(0,σs2)I.

Gelman (2006) investigated the statistical properties of different priors on variance components and found that a uniform prior on the standard deviation is a reasonable choice in a number of situations. Therefore, vague uniform priors were utilized for the standard deviation of the additive genetic effects σu∼U(0, 100) as well as for the c2 effects σc∼U(0, 100). The inverse gamma distribution (0.001, 0.001) for the residual variance σe2 or the uniform distribution σe∼U(0, 100) for the residual standard deviation were utilized in order to account for the effect of the priors on the estimations. Both approaches gave indifferent results. The same priors were used in AnimalINLA and in WinBUGS to attain comparability. Inferences were made by REML and by estimating the marginal posterior distribution using either Gibbs sampling or INLA. Estimates of heritability (h2) as well as c2 were calculated as ratios of the estimates of direct additive genetic (σu2) and maternal environmental (σc2) variances, respectively, to the phenotypic variance (σp2). The phenotypic variance accounts for the sum of all variance components, according to the model.

For measuring the mixing and efficiency of the MCMC samples, the effective sample size (ESS) was used. The ESS of the posterior samples of each parameter corresponds to the number of independent samples having the same estimation accuracy as the dependent MCMC samples and is given by Waagepetersen et al. (2008): ESS = K1+2∑k=1∞ρk, where K the total number of correlated MCMC samples and ρk is the Markov chain lag-k autocorrelation.

Binary trait

Initially, a simple animal model was fitted via REML, considering yB as a normally distributed trait. Subsequently, a generalized linear model (McCullagh and Nelder, 1994) was used for the analysis of the binary variable. In this analysis, the observed binary variable yB is related to an underlying unobservable continuous variable λ, such that the observed binary response (yB) is the result of the following relationship: yBi=0ifλi≤τ1ifλi>τ, where τ is fixed and yBi corresponds to observation i. Several link functions (logit, probit, cloglog) can be applied to link the binary variable to the underlying scale (Gilmour et al., 2009). In our study, the logit function was used: λ=log⁡(μ1-μ), where μ is the probability of success and λ the vector of linear predictors of the unobserved variable on the underlying scale. An animal model was assumed for λ such that λ=Xb+Zu+e. A uniform prior was assumed here for the standard deviation of the additive genetic effects on the underlying scale σu∼U(0, 100). On the logit scale σe2=π23≈3.29, and heritability is thus estimated as h2=σu2σu2+π23 (Gilmour et al., 2009).

In order to investigate the relative merits of the three approaches, data for both the Gaussian and the binomial case were simulated and models were applied accordingly.

Simulation study

The initial analysis of data revealed a marginal importance of the c2 effects and a possible covariance between u and c. To further test the behavior of the three programs under a scenario of two correlated random effects with a marginal contribution by one of them, a simulation study was conducted, emulating the pedigree structure and the variance components of the real data. In total, 20 sires and 70 dams were used in the pedigree, and 2240 progeny with records were simulated. Each sire was assumed to mate to seven dams, while each dam produced offspring with two different sires. All sires and dams were assumed to be non-inbred and non-related. Each full-sib family consisted of 16 offspring. The direct genetic effect for founder i (1, ..., 90) was drawn as ui∼N0,σu2, while the maternal environmental effect of dam j (1, ..., 70) was cj∼N0,σc2, with σu2=7 and σc2=3. Two scenarios were explored regarding the correlation between the direct genetic and the c2 effects (ruc): (a) ruc=-0.2 (low) and (b) ruc=-0.8 (high). The direct genetic effects of offspring i (1,... ,2240) were calculated by ui=12uj+uk+ms, where uj and uk denote direct genetic effects of dam and sire, respectively, while msi represented the Mendelian sampling deviation drawn conditional upon the c2 effects: msi|ci∼N(0.5σu2σcrci,(1-r2) 0.5σu2). The total phenotypic variance was estimated according to σp2=σu2+σc2+σe2. The residuals were sampled as ei∼N0,σe2, where σe2=32, thus resulting in σp2=42, h2=0.17 and c2=0.07.

In total, 30 samples from each scenario were generated. These samples were then analyzed via models M1–M2 (ASReml and AnimalINLA) and M2–M3 (WinBUGS). The mean squared error (MSE) was employed to quantify the performance of the predictors throughout, along with the coverage of interval estimates. The MSE was computed as follows: MSE = ∑i=1N((θ^i-θ)2+var(θ^i))N, where θ stands for the true and θ^i for the estimated parameter, θ^i-θ corresponds to bias, and N=30 is the number of samples.

Model evaluation criteria

According to the method applied, the model comparison was based on four evaluation criteria: the Akaike information criterion (AIC; Akaike, 1973), the Bayesian information criterion (BIC; Schwarz 1978), the conditional Akaike information criterion (cAIC; Vaida and Blanchard, 2005) and the DIC (Spiegelhalter et al., 2002). All criteria are based upon the computation of the deviance (D): D=-2log⁡(p(y|θ^))=-2log⁡L, where θ denotes the p×1 vector of the model parameters and p(y|θ^) denotes the likelihood of the data y evaluated at the maximum likelihood estimate θ^. While likelihood ratio tests (LRTs) suggest the direct comparison of logLs between the various nested models, AIC, BIC and cAIC suggest penalizing the deviance by appropriate complexity terms. However, the determination of the number of the model parameters is nontrivial when random effects are of interest and are being estimated using methods such as BLUP. For such cases the AIC is shown to be asymptotically biased (Crainiceanu and Ruppert, 2004). An asymptotically unbiased criterion is the cAIC, defined by Vaida and Blanchard (2005) as cAIC = -2log⁡Li+2ρ, where ρ are the effective degrees of freedom (Hodges and Sargent, 2001), given by the trace of the hat matrix H that maps the vector of observed values to the vector of the fitted values. In all criteria, models with smallest values are to be preferred, denoting a better balance between complexity and fit.

Results Gaussian trait

Table 1 summarizes the estimated variance components and genetic parameters of BW, along with likelihoods, ρ and the model evaluation criteria. With regard to the Bayesian methods, posterior means and posterior medians were very close for all parameters of interest. The closeness of mean, median and mode was also suggested by visual inspection of the posterior densities, which displayed unimodality. Therefore, only the posterior means are presented. For our data to achieve convergence via WinBUGS, a burn-in of 10 000 iterations, a total number of 1 000 000 samples and a thinning interval of 20 were necessary. The latter was concluded on graphical inspection of the trace and autocorrelation plots, yielding a sample of 50 000 iterations. Such runs took approximately 14 to 16 h, depending on modeling assumptions. Heritability for BW ranged from 0.15 to 0.34, while c2 accounted for 0–0.08 of the total phenotypic variance, depending on the model and the method applied. All evaluation criteria, regardless of the method considered, concur in the choice of a purely additive animal model without the inclusion of the c2 effects. With M1, heritability estimates ranged slightly among the methods, from 0.31 (ASReml) to 0.34 (AnimalINLA), while 95 % confidence and credible intervals between ASReml and the Bayesian programs always coincided. The ESS of all parameters estimated via model M1 and WinBUGS exhibited the highest values (higher than 7000) among models, indicating best MCMC mixing properties.

Under model M2, REML-based estimates were significantly different than those obtained from the two Bayesian approaches. In this case, REML heritability was seriously underestimated (0.15) when contrasted with MCMC and INLA methods (0.31 and 0.32, respectively). Furthermore, while c2 was 0.07(±0.03) in REML, no detectable variance due to c2 was identified with the Bayesian methods. As a result, the sum of the additive and the c2 effects given as a proportion of the phenotypic variance was significantly lower in REML (0.22) when compared to Bayesian methods (0.31–0.32). Such a paradox may arise from covariances between the various random effects. To test for such a hypothesis, we fitted model M3 that accounted for a covariance between the additive genetic and the maternal environmental effects.

This could be effectively modeled only via the WinBUGS software. Under model M3, h2 and c2 estimates were comparable (0.17 and 0.08, respectively) to ASReml estimates (for model M2), while the covariance in question was not statistically significant (Table 1). A negative additive genetic maternal environmental correlation was detected (-0.20), although with large standard error (0.30) that did not allow for firm conclusions.

To further quantify the implications of model and method evaluation on selection decisions, Pearson as well as rank correlations of animals' EBVs and the percentage of common animals selected were calculated across the models and methods applied (results not shown). The correlations in question were extremely high (0.97–0.99) when the focus was on the whole population and/or a proportion of the best 20 % of animals. During this phase, an additional advantage of the WinBUGS software was its ability to estimate (via the rank tool) the uncertainty associated with the ranking of the individuals from the posterior distributions of the EBVs. Figure 1 presents 12 selected examples from the posterior distribution of the EBV ranks, with four animals each from the top, middle and low end of the spectrum. These ranks were based upon the whole posterior density and properly accounted for characteristics such as the variance and skewness of the posterior. Both, a 95 % rank interval as well as the median rank are provided, thus presenting an easy and flexible way of animal selection. The large uncertainty associated with selecting among similar animals is also illustrated. Here, rank correlations were remarkably high, ranging from 0.96 to 0.99 among all methods and models considered. Furthermore, standard errors of the EBVs and solutions for the fixed effects were comparable among the methods, with no statistically significant differences. All models and methods suggested the same animals, resulting in correlations between the estimated breeding values that ranged from 0.96 to 0.99.

Estimates of variance components and genetic parameters for body weight (BW) at 35 days of age.

Software Model

σu2

σc2

σuc

σe2

σp2

σucσp2

ruc

logL AIC BIC cAIC/DIC

ρ/pD

ASReml M1 Mean {CI} 0.133 {0.09, 0.18} – – 0.302 {0.27, 0.34} 0.434 {0.40, 0.47} 0.31 {0.21, 0.41} – – – -1653 3307 3313 4044 369 M2 Mean {CI} 0.065 {0.01, 0.13} 0.029 {0.01, 0.05} – 0.335 {0.30, 0.38} 0.429 {0.40, 0.46} 0.15 {0.01, 0.29} 0.07 {0.01, 0.13} – – -1651 3306 3317 4182 440 WinBUGS M1 Mean {CI} ESS 0.139 {0.09, 0.20} 8332 – – 0.298 {0.26, 0.33} 9957 0.437 {0.40, 0.48} 8532 0.32 {0.22, 0.43} 7982 – – – -1622 – – 4302 529 M2 Mean {CI} ESS 0.134 {0.07, 0.20} 1808 0.001 {0, 0.03} 1072 – 0.300 {0.26, 0.34} 2671 0.435 {0.38, 0.47} 2097 0.31 {0.18, 0.43} 1868 0 {0, 0.06} 1041 – – -1632 – – 4304 520 M3 Mean {CI} ESS 0.069 {0.01, 0.15} 1768 0.032 {0, 0.12} 1299 -0.014 {-0.08, 0.01} 1314 0.321 {0.20, 0.37} 1323 0.410 {0.31, 0.47} 1391 0.17 {0.04, 0.35} 1669 0.08 {0, 0.36} 1197 0.04 {0, 0.19} 1229 -0.20 {-0.88, 0.44} 1329 -1795 – – 4305 358 AnimalINLA M1 Mean {CI} 0.152 {0.11, 0.21} – – 0.297 {0.26, 0.33} 0.449 {0.37, 0.54} 0.34 {0.23, 0.45} – – – – – – 4289 – M2 Mean {CI} 0.143 {0.10, 0.21} 0.004 {0, 0.03} – 0.302 {0.26, 0.34} 0.449 {0.37, 0.57} 0.32 {0.23, 0.44} 0 {0, 0.02} – – – – – 4290 –

σu2: additive genetic variance; σc2: maternal environmental variance; σuc: additive genetic maternal environmental covariance; σe2: residual variance; σp2: phenotypic variance in g2; h2: heritability; c2: ratio of the maternal environmentalvariance to the phenotypic variance; ruc: additive genetic maternal environmental correlation; logL: natural log likelihood; AIC: Akaike information criterion; BIC: Bayesian information criterion; cAIC/DIC: conditional Akaike informationcriterion/deviance information criterion; ρ/pD: effective degrees of freedom/effective number of parameters; “Mean” in Bayesian analysis denotes posterior mean; ESS: effective sample size; CI: 95 % confidence or credible intervals.

Estimates of variance components and genetic parameters for the binary transformed BW.

Software

σu2

σp2

ASReml (obs) Mean (SE) CI (95 %) 0.011 (0.003) {0.006, 0.018} 0.109 (0.003){0.10, 0.12} 0.10 (0.02){0.04, 0.16} ASReml Mean (SE) CI (95 %) 0.769 (0.226){0.34, 1.21} 4.059 (0.226){3.63, 4.49} 0.19 (0.05) {0.09, 0.29} WinBUGS Mean (SE) CI (95 %) ESS 1.972 (0.859){0.87, 4.12} 1436 5.275 (0.795){4.14, 7.27} 1421 0.36 (0.09) {0.21, 0.56} 1293 AnimalINLA Mean (SE) CI (95 %) 0.866 (0.241){0.48, 1.41} 4.156 (0.353){3.77, 4.70} 0.21 (0.07) {0.13, 0.30}

σu2: additive genetic variance; σp2: phenotypic variance; h2: heritability; obs: observed scale; “Mean” in Bayesian analysis denotes posterior mean; ESS: effective sample size; CI: 95 % confidence or credible intervals.

Binary trait

The estimated variance components and genetic parameters of yB for a purely additive animal model across the three methods are presented in Table 2. A model incorporating c2 effects was also fitted; however, convergence was not achieved under any method applied. In ASReml, heritability on the observed scale (ho2) was estimated to be as high as 0.10, while the respective estimate on the underlying scale was significantly higher (hU2=0.19). Using the classical formula (Dempster and Lerner, 1950), the ratio between the two estimates would be ho2hU2=zxp2p1-p, where p is the level of incidence and zxp is the ordinate of a standard normal curve cutting off an area equal to p. For p=0.2 (as in here) the ratio is (ho2hU2≈0.5) in full agreement with our results. Estimates from AnimalINLA were comparable to those of ASReml (hU2=0.21). Interestingly, the WinBUGS heritability estimate was significantly higher (up to 0.36), exceeding the original h2. Differences were also detected on the 95 % confidence or credible intervals of the point estimates of the additive variance as well as the heritability on the underlying scale. More specifically, the 95 % credible interval of hU2 given by WinBUGS was in the region of (0.21, 0.56), that of AnimalINLA was in (0.13, 0.30) and finally that of ASReml was in (0.09, 0.29). The ESS of all parameters estimated via WinBUGS were 1293 and 1436 for h2 and additive genetic variance, respectively.

Distribution of ranking for 12 representative animals, based on the EBVs estimated by WinBUGS. Four animals each were taken from the top, middle and low end of the spectrum. u[i] refers to the EBV of i animal; rank 1, ..., 2456.

As in the case of the Gaussian trait, rank correlations across the three methods remained high, ranging from 0.92 to 0.99 (results not shown). In addition, the proportion of common animals selected among the three methods exceeded 93 %, suggesting minor implications of method usage on selection decisions.

Simulation study

Descriptive statistics of the simulated data and the estimators across models and methods are given in Table 3. Average values of the simulated data were equal to the true ones (h2=0.17 and c2=0.07). Note that during simulations, c2 was statistically significant. Using model M1 under either ASReml or AnimalINLA always resulted in inflated predictions for the true parameters. More specifically, the estimated heritability ranged from 0.35 to 0.51, with a tendency for inflated estimates particularly in AnimalINLA and under the strongly negative-ruc scenario for both software packages (ASReml and AnimalINLA). Overestimation of the heritability was due to both higher estimates of the additive genetic variance and lower estimates of the residual variances.

True values and descriptive statistics of the estimators under two levels of additive genetic maternal environmental correlation.

Model M1 M2 M3 Software True values ASReml AnimalINLA ASReml AnimalINLA WinBUGS Scenario low high low high low high low high high

σu2

7 (0.6) 15 (3) [11, 23] 19 (4) [13, 30] 18 (17) [13, 55] 32 (17) [13, 65] 6 (2) [4, 11] 10 (4) [4, 21] 14 (10) [11, 47] 26 (14) [14, 50] 19 (5) [7, 30] 10 (5) [5, 21]

σc2

3 (0.5) – – – – 3 (1) [2, 6] 3 (1) [1, 7] 0 0 0.9 (0.8) [0, 3] 6 (3) [2, 11]

σe2

32 (0.9) 28 (2) [24, 30] 24 (2) [19, 29] 28 (2) [26, 31] 25 (2) [19, 29] 32 (2) [28, 35] 29 (2) [23, 34] 29 (2) [26, 31] 25 (2) [19, 29] 24 (3) [13, 30] 28 (4) [18, 35]

σp2

42 (1.4) 43 (2) [40, 48] 43 (2) [39, 48] 47 (17) [42, 84] 57 (17) [40, 94] 42 (2) [39, 46] 42 (2) [39, 46] 44 (10) [39, 78] 51 (13) [40, 78] 44 (3) [39, 51] 40 (3) [39, 49]

0.17 (0.02) 0.35 (0.05) [0.27, 0.47] 0.44 (0.07) [0.31, 0.61] 0.44 (0.13) [0.30, 0.65] 0.51 (0.14) [0.33, 0.69] 0.15 (0.05) [0.08, 0.26] 0.21 (0.09) [0.09, 0.47] 0.34 (0.09) [0.26, 0.60] 0.47 (0.13) [0.21, 0.64] 0.43(0.11) [0.17, 0.68] 0.24 (0.09)[0.10, 0.44]

0.07 (0.01) – – – – 0.07 (0.02) [0.04, 0.13] 0.08 (0.03) [0.02, 0.16] 0 0 0.02(0.02) [0, 0.09] 0.12 (0.03)[0.05,0.23]

σuc

-3.16 (0.47) – – – – – – – – – -4.54 (4.62) [-9.74, -1.61]

σuc/σp2

-0.08 (0.01) – – – – – – – – – -0.13 (0.09) [-0.28, -0.04]

ruc

-0.8 – – – – – – – – – -0.60 (0.2) [-0.94, -0.2]

σu2: additive genetic variance; σc2: maternal environmental variance; σe2: residual variance; σp2: phenotypic variance; h2: heritability; c2: ratio of the maternal environmentalvariance to the phenotypic variance; σuc: additive genetic maternal environmental covariance; ruc: additive genetic maternal environmental correlation; in parentheses: standard deviations; in square brackets: range [min, max].

Estimates under model M2 were in close proximity to the true values only in the case of ASReml and the low-ruc scenario (h2=0.15, c2=0.07). Slightly higher estimates for h2 and c2 were observed in ASReml in the high-ruc scenario (h2=0.21, c2=0.08). Under AnimalINLA, the respective h2 estimator was seriously inflated (h2=0.34) due to overestimation of the additive genetic effects and failure to account for the c2 effects. This trend was more evident in the strong- vs. the low-ruc scenario. The WinBUGS estimates for Model M2 under the high-ruc scenario were slightly better than those obtained by AnimalINLA. Finally, model M3 was fitted via WinBUGS for the high (ruc=-0.8) scenario. In this case, a statistically significant ruc was detected (as high as -0.60), but h2 and c2 were systematically overestimated. Only minor differences were observed in the mean estimates using WinBUGS and two prior distributions for the residual variance. In Table 3, results are derived from the uniform distribution case.

The MSEs across models and methods are presented in Table 4. Irrespectively of the method and/or model, MSEs were lower in the low- vs. the high-correlation scenario. Furthermore, better estimates (in terms of MSEs) were attained in ASReml using M2 model under the low correlation. Lowest MSEs were observed under model M2 in ASReml and highest under model M1 in AnimalINLA. Interestingly, lowest MSEs were attained even under the strongly negative-ruc scenario using model M2 in ASReml. The WinBUGS software, although able to account for the specific correlation, exhibited the highest MSE of σe2 when the prior distribution chosen was inverse gamma (0.001, 0.001), with an analogous effect on the estimators of h2 and c2. In contrast to the real data, WinBUGS estimates of the simulated data exhibited better performance when the prior utilized for σewas the uniform distribution (MSE 44.75 vs. 215.21 for the inverse gamma prior for σe2). All other parameters (σu2 and σc2) estimated via model M3 in WinBUGS had relatively low MSE.

Mean squared errors of the variance components and the genetic parameters under two levels of additive genetic maternal environmental correlation.

Model M1 M2 M3 Software ASReml AnimalINLA ASReml AnimalINLA WinBUGS Scenario low high low high low high low high high

σu2

85.00 184.43 171.80 343.28 12.83 40.47 85.68 323.46 168.36 41.76

σc2

– – – – 2.60 4.67 NE NE 6.05 18.12

σe2

22.43 65.70 17.23 65.53 5.67 19.70 15.33 65.33 72.79 44.75

σp2

6.99 12.36 177.28 199.08 5.78 7.30 45.24 182.53 11.73 10.28

0.04 0.08 0.09 0.18 0.01 0.02 0.04 0.14 0.12 0.04

– – – – 0.01 0.01 NE NE 0.03 0.06

ruc

– – – – – – – – – 1.48

σu2: additive genetic variance; σc2: maternal environmental variance; σe2: residual variance; σp2: phenotypic variance; h2: heritability; c2: ratio of the maternal environmental variance to the phenotypic variance; ruc: additive genetic maternal environmental correlation; NE: non estimability.

Actual coverage of nominal 95 % intervals of estimated variance components and genetic parameters.

Low High ASReml AnimalINLA ASReml AnimalINLA WinBUGS M1 M2 M1 M2 M1 M2 M1 M2 M2 M3

σu2

36.67 83.33 33.33 76.67 16.67 50.00 20.00 40.00 40.00 76.67

σc2

– 86.67 – – – 56.67 – – 63.33 93.33

σe2

73.33 93.33 53.33 80.00 26.67 76.67 46.67 67.67 66.67 76.67

σp2

80.00 96.67 73.33 90.00 70.00 86.67 66.67 80.00 86.67 86.67

33.33 76.67 33.33 73.33 13.33 53.33 20.00 33.33 36.67 66.67

– 90.00 – – – 56.67 – – 60.00 90.00

ruc

– – – – – – – – – 90.00

σu2: additive genetic variance; σc2: maternal environmental variance; σe2: residual variance; σp2: phenotypic variance; h2: direct heritability; c2: ratio of the maternal environmental variance to the phenotypic variance; ruc: additive genetic maternal environmental correlation.

The coverage of interval estimates for the three models and the respective methods of analysis are shown in Table 5. To construct Bayesian 95 % credible intervals, the quantiles of the relevant posterior distributions (as estimated by MCMC and INLA) were used. ASReml's intervals were constructed based on asymptotic normality of the maximum likelihood using θ^i±1.96⋅se(θ^), where se denotes the estimated standard error of the parameter. In the case of low ruc, the best coverages were given by ASReml and model M2, with narrower intervals than the Bayesian methods. In contrast, WinBUGS exhibited the best coverage performance in the case of the high ruc, at the expense of wider intervals. AnimalINLA experienced difficulty in attaining nominal coverage of interval estimates when model M1 was assumed as well as under the strongly negative-ruc scenario. In addition, DIC via WinBUGS favored the true model that incorporated the ruc in 76.67 % of the samples.

Discussion

The theoretical aspects and advantages of REML and MCMC methods for fitting hierarchical multilevel models, such as the animal model, have been extensively explored elsewhere, either with a statistical focus (Browne and Draper, 2006) or from an animal breeder's perspective (van Tassel et al., 1995; van Tassel and van Vleck, 1996). However, this is the first study applying REML and MCMC methods along with another Bayesian approach, i.e., INLA, within the context of poultry breeding. Our main concerns were the practical aspects of the applicability of three available typical software programs for the standard animal breeder. Given that both the size and the structure of data sets may have an impact on the performance of the analytical approach (Blasco, 2001), no general inference can be made based on the present results.

In the present study, an attempt to compare coverage intervals derived from Bayesian and REML approaches was pursued. However, there are two main differences between credible and confidence intervals. While a credible interval incorporates information from the prior distribution into the estimate, confidence intervals are based solely on the data, treating the parameter as fixed and the interval itself as random. Credible intervals are different from confidence intervals essentially because credible intervals are probability intervals; i.e., they say that the true value should be within the interval with a determined probability. Confidence intervals do not say that the true value is within the limits with a determined probability. In conceptual repetitions of an experiment, different confidence intervals can be obtained; 95 % of these intervals contain the true value. Thus, we treat the interval as containing it, knowing that, in the long run, we will be wrong 5 % of times. Although different in philosophy, the comparison between these types of intervals may be useful within the context of a study such as ours.

From a frequentist's point of view, the standard method entails the use of the REML and BLUP methods. In the present study, ASReml (Gilmour et al., 2009) software was employed. The software is stable and fast and can handle many different models, data structures and thousands of data records. In addition, the necessary files are not especially complicated to construct, while a valuable manual, containing a lot of information and numerous examples, is available for the animal breeder. For binary trait modeling, a variety of link functions (logit, probit, cloglog) can be chosen.

An obvious obstacle when using commercial programs is their limited flexibility, i.e., the inability to model complex structures between (random) effects. A good example here was the presence of negative correlation between u and c effects which could not be appropriately accommodated within the context of a typical REML package. This covariance is typically ignored (assumed to be 0), but this need not be always the case. Although in need of a more concise biological explanation, scenarios relate the negative correlation between the u and the c effects to maternally transmitted immunoglobulins, antioxidants (particularly carotenoids and vitamin E) and yolk androgens. While yolk androgens correlate positively with offspring growth (Schwabl, 1996; Groothuis et al., 2005; Müller et al., 2007), they suppress the immune system (Ketterson and Nolan, 1999; Groothuis et al., 2005) and may promote oxidative stress (von Schantz et al., 1999) in the offspring. On the other hand, maternally transmitted immunoglobulins (Buechler et al., 2002; Boulinier and Staszewski, 2008) and carotenoids (Surai and Speake, 1998) may enhance immune function, but at the expense of offspring growth.

Modeling the covariance in question was made possible only via WinBUGS. This is a very valuable feature when testing assumptions of the standard animal model with regard to possible correlation structures between the various random effects. This program allows for the application of a large group of competing models and Bayesian model evaluation criteria (Sorensen and Gianola, 2002). A further important attribute of WinBUGS is the rank tool, which can simultaneously incorporate the uncertainty associated with ranking the individuals, thus assisting in animal selection. In theory, REML and INLA would probably struggle if the likelihood was very flat, whereas MCMC methods should be able to cope (Blasco, 2001). Such scenarios could be important for practical breeding purposes and might be properly encountered by MCMC methods. Bayesian methods, such as MCMC implemented in WinBUGS, can be especially useful in complex situations at the cost of being computationally expensive and time-consuming. For our data, approximately 14 to 16 h were needed to achieve convergence, depending on modeling assumptions.

The AnimalINLA has proved to be a remarkably time-efficient experience. It took less than 10 s to produce the required posterior distributions, while providing comparable estimates with the other packages. Although computationally efficient, the current version of this R package (AnimalINLA 1.1) could not accommodate more than 4000 records in the animal model, probably due to compatibility problems with Windows. Although time-efficient, AnimalINLA has displayed certain problems in terms of bias and accuracy, particularly for a binary trait. The latter has also been confirmed by Holand et al. (2013) and is supported by a more detailed investigation of simulated data. Finally, it is not as flexible in modeling as the WinBUGS and the documentation is still under development.

In conclusion, WinBUGS can be of great assistance to the animal breeder because of its flexibility in modeling complex models while unraveling existent data structures that the usual REML-based packages neglect. Within the animal breeding context, its applicability remains rather limited since only small to moderate data sets or populations can be handled in a time-efficient manner. Furthermore, the choice of the priors should be made with caution, particularly when the posteriors may vary with priors. The AnimalINLA software appears to be a promising future perspective for the animal breeder dedicated to the Bayesian paradigm since it is remarkably fast. It seems, however, to be a package still under development. Our own experience on large data sets has shown that ASReml can effectively handle analyses for up to 200 000 records and related pedigree structures fast (< 1 h) and mostly independent of initial values (Maniatis et al., 2013). Furthermore, as the simulation results have shown, even when a large covariance between random effects is neglected, it may provide estimates of the parameters in question with relatively small bias and error. Given all the above, ASReml remains the best practical choice for the serious animal breeder among the software packages examined.

References 1

Akaike, H.: Information theory and an extension of the maximum likelihood principle, edited by: Petrov, B. N. and Csaki, F., Proceedings of the 2nd International Symposium on Information Theory Akademiai Kiado Budapest Hungary, 267–281, 1973.

Blasco, A.: The Bayesian controversy in animal breeding, J. Anim. Sci., 79, 2023–2046, 2001.

Boulinier, T. and Staszewski, V.: Maternal transfer of antibodies: raising immuno-ecology issues, Trends Ecol. Evol., 23, 282–288, 2008.

Browne, W. J. and Draper, D.: A comparison of Bayesian and likelihood-based methods for fitting multilevel models, Bayesian Anal., 1, 473–513, 2006.

Buechler, K., Fitze, P. S., Gottstein, B., Jacot, A., and Richner, H.: Parasite-induced maternal response in a natural bird population, J. Anim. Ecol., 71, 247–252, 2002.

Crainiceanu, C. M. and Ruppert, D.: Likelihood ratio tests in linear mixed models with one variance component, J. Roy. Stat. Soc. B, 66, 165–185, 2004.

Damgaard, L. H.: Technical note: How to use Winbugs to draw inferences in animal models, J. Anim. Sci., 85, 1363–1368, 2007.

Dempster, E. R. and Lerner, I. M.: Heritability of Threshold Characters, Genetics, 35, 212–236, 1950.

Gelfand, A. E. and Smith, A. F. M.: Sampling-Based Approaches to Calculating Marginal Densities, J. Am. Stat. Assoc., 85, 398–409, 1990.

Gelman, A.: Prior distributions for variance parameters in hierarchical models(Comment on an Article by Browne and Draper), Bayesian Anal., 1, 515–533, 2006.

Geman, S. and Geman, D.: Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images, IEEE T. Pattern Anal., 6, 721–741, 1984.

Gilmour, A. R., Gogel, B. J., Cullis, B. R., and Thompson, R.: Asreml User Guide Release 3.0. VSN International Ltd, Hemel Hempstead, UK, 2009.

Gorjanc, G.: Flexible Bayesian Inference of Animal Model Parameters Using BUGS Program, Contribution for 9th WCGALP, 2010.

Groothuis, T. G. G., Eising, C. M., Dijkstra, C., and Muller, W.: Balancing between costs and benefits of maternal hormone deposition in avian eggs, Biol. Lett., 1, 78–81, 2005.

Hallander, J., Waldmann, P., Wang, C. K., and Sillanpaa, M. J.: Bayesian Inference of Genetic Parameters Based on Conditional Decompositions of Multivariate Normal Distributions, Genetics, 185, 645–654, 2010.

Henderson, C. R.: A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values, Biometrics, 32, 69–83, 1976.

Hodges, J. S. and Sargent, D. J.: Counting degrees of freedom in hierarchical and other richly-parameterised models, Biometrika, 88, 367–379, 2001.

Holand, A. M., Steinsland, I., Martino, S., and Jensen, H.: Animal Models and Integrated Nested Laplace Approximations, G3-Genes Genom Genet, 3, 1241–1251, 2013.

Ketterson, E. D. and Nolan, V.: Adaptation, exaptation, and constraint: A hormonal perspective, Am. Nat., 154, S4–S25, 1999.

Lunn, D. J., Thomas, A., Best, N., and Spiegelhalter, D.: WinBUGS – A Bayesian modelling framework: Concepts, structure, and extensibility, Stat. Comput., 10, 325–337, 2000.

Maniatis, G., Demiris, N., Kranis, A., Banos, G., and Kominakis, A.: Genetic analysis of sexual dimorphism of body weight in broilers, J. Appl. Genet., 54, 61–70, 2013.

Mathew, B., Bauer, A. M., Koistinen, P., Reetz, T. C., Leon, J., and Sillanpaa, M. J.: Bayesian adaptive Markov chain Monte Carlo estimation of genetic parameters, Heredity, 109, 235–245, 2012.

McCullagh, P. and Nelder, J. A.: Generalized Linear Models, Chapman and Hall, London, 1994.

Muller, W., Deptuch, K., Lopez-Rull, I., and Gil, D.: Elevated yolk androgen levels benefit offspring development in a between-clutch context, Behav. Ecol., 18, 929–936, 2007.

Patterson, H. D. and Thompson, R.: Recovery of Inter-Block Information When Block Sizes Are Unequal, Biometrika, 58, 545–554, 1971.

Quaas, R. L.: Transformed Mixed Model Equations: A Recursive Algorithm to Eliminate A-1, J. Dairy Sci., 72, 1937–1941, 1989.

Rue, H., Martino, S., and Chopin, N.: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, J. Roy. Stat. Soc. B, 71, 319–392, 2009.

Schwabl, H.: Maternal testosterone in the avian egg enhances postnatal growth, Comp. Biochem. Phys. A, 114, 271–276, 1996.

Schwarz, G.: Estimating the dimension of a model, Ann. Statist., 6, 461–464, 1978.

Sorensen, D. and Gianola D.: Likelihood, Bayesian and MCMC methods in quantitative genetics, Springer-Verlag, New York, 2002.

Sorensen, D. A., Wang, C. S., Jensen, J., and Gianola, D.: Bayesian-Analysis of Genetic Change Due to Selection Using Gibbs Sampling, Genet. Sel. Evol., 26, 333–360, 1994.

Spiegelhalter, D. J., Best, N. G., Carlin, B. R., and van der Linde, A.: Bayesian measures of model complexity and fit, J. Roy. Stat. Soc. B, 64, 583–639, 2002.

Surai, P. F. and Speake, B. K.: Distribution of carotenoids from the yolk to the tissues of the chick embryo, J. Nutr. Biochem., 9, 645–651, 1998.

Vaida, F. and Blanchard, S.: Conditional Akaike information for mixed-effects models, Biometrika, 92, 351–370, 2005.

Van Tassell, C. P. and Van Vleck, L. D.: Multiple-trait Gibbs sampler for animal models: Flexible programs for Bayesian and likelihood-based (co)variance component inference, J. Anim. Sci., 74, 2586–2597, 1996.

Van Tassell, C. P., Casella, G., and Pollak, E. J.: Effects of Selection on Estimates of Variance-Components Using Gibbs Sampling and Restricted Maximum-Likelihood, J. Dairy Sci., 78, 678–692, 1995.

von Schantz, T., Bensch, S., Grahn, M., Hasselquist, D., and Wittzell, H.: Good genes, oxidative stress and condition-dependent sexual signals, P. Roy. Soc. B-Biol. Sci., 266, 1–12, 1999.

Waagepetersen, R., Ibanez-Escriche, N., and Sorensen, D.: A comparison of strategies for Markov chain Monte Carlo computation in quantitative genetics, Genet. Sel. Evol., 40, 161–176, 2008.

Waldmann, P.: Easy and Flexible Bayesian Inference of Quantitative Genetic Parameters, Evolution, 63, 1640–1643, 2009.

Waldmann, P., Hallander, J., Hoti, F., and Sillanpaa, M. J.: Efficient Markov chain Monte Carlo implementation of Bayesian analysis of additive and dominance genetic variances in noninbred pedigrees, Genetics, 179, 1101–1112, 2008.

Wang, C. S., Rutledge, J. J., and Gianola, D.: Marginal Inferences About Variance-Components in a Mixed Linear-Model Using Gibbs Sampling, Genet. Sel. Evol., 25, 41–62, 1993.

</app></app-group></back> </article>