Genetic Analysis of Several Disease Categories Using Test Day Threshold Models in German Holstein Cows

In the present study trajectories for the disease categories udder diseases, metabolic diseases, all diseases, fertility diseases, and ovarian problems, are described. Variance components were estimated where the considered period of lactation varies between 50 and 300 days. Furthermore, the impact of the number of daughters per sire was analysed with disease information from the first 50 days of lactation. In total 18 data sets were analysed with a test day threshold model. The average disease frequencies were between 6.5% and 2.7% for udder diseases and they were between 1.7% and 0.4%, and 15.3% and 6.0% for metabolic diseases and ‘all diseases’, respectively. For udder diseases the estimated heritabilities vary between 0.12 and 0.25 depending on lactation length and the number of daughters per sire. For metabolic disease heritabilities were estimated within the interval of 0.12 to 0.24. Depending on the lactation length the heritabilities of all diseases were fairly constant (0.03 to 0.04). The heritabilities of all diseases increased if fertility diseases were excluded. Those estimates vary between 0.15 to 0.19.


Introduction
Milk production has changed during the past two decades and has been characterised by increasing herd size, and increasing milk yield, as well as increasing veterinary costs (e.g.JAKOB and DISTL, 1998).Today, diseases (e.g.fertility and udder health problems) are one of the most significant problems affecting commercial milk production.Diseases reduce animal welfare and result in economic losses for the farmer due to extra veterinary treatments, extra labour, decreasing milk production, discarded milk and involuntary early culling (NIELSEN et al., 1999).The frequencies of diseases or disease categories depend on case definition and show considerable variation in literature reports (KELTON et al., 1998;COLLARD et al., 2000).Most analyses of disease data were carried out using linear models and disease information was considered as 'all or none' trait.These models apply a normal distribution of the data.Threshold models are an alternative to the traditional linear models as they take the binary nature of disease data into account (GIANOLA and FOULLEY, 1983).The estimates for heritabilities are higher when lactation threshold models are used (SIMIANER et al., 1991;HERINGSTAD et al., 1997;HINRICHS et al., 2003;HINRICHS et al., 2004).However, it should be noted that the definition of disease information as 'all or none' trait does not utilise all disease information provided by the data, since some cows have more than one disease case per lactation.REKAYA et al. (1998) therefore recommended the development of test day models for the analysis of longitudinal binary response such as disease field data.The use of longitudinal threshold models would improve the estimation of genetic parameters, because repeated disease cases are distinct observations in those models.Heritabilities for some metabolic diseases has been estimated by SIMIANER et al. (1991), LYONS et al. (1991), URIBE et al. (1995), andNIELSEN et al. (1999) and the estimated heritabilities vary between 0.01 and 0.47, with most estimates in the interval of 0.08 to 0.15.Most literature reports of heritabilities for other diseases are based on dividing all available disease information into two disease categories, mastitis and other diseases than mastitis, respectively.Estimates for diseases other than mastitis (0.11) are given by LUND et al. (1994) andHANSEN et al. (2002), who provided a heritability of 0.02.If heritabilities for udder diseases (e.g.mastitis) are estimated with linear models most values are in the interval of 0.02 to 0.03 (HERINGSTAD et al., 2000).Some literature reports could be found where mastitis data were analysed with test day threshold models (REKAYA et al., 1998;KADARMIDEEN et al., 2000;HERINGSTAD et al., 2003b).The resulting estimates for the heritabilities ranged between 0.03 and 0.41.In the present study the disease categories all diseases, udder diseases, and metabolic diseases were analysed with test day threshold models.Firstly, the trajectories of the different disease categories are described.Afterwards, variance components were estimated, where data recording comprises the first 50, 100, and 300 days of lactation, respectively.Furthermore, the effect of an increasing number of daughters per sire (improved genetic structure of the data) was analysed.

Materials and methods
Data recording took place from February 1998 to December 2002 on three commercial milk farms with an overall total of 3200 German Holstein cows.A special data recording scheme for functional traits (e.g.health traits) was established on these farms.In this data recording scheme the information, which was stored on farms in computer based management programs is used for a further improvement of the performance test in dairy cattle.Therefore farms were visited monthly and security copies (Backups) of the computer based management programs were taken.New information from the previous month was checked for plausibility.Questionable new information was discussed with the farm management and the veterinarian and discarded if they could not be clearly allocated.Furthermore, the same bulls were used for artificial insemination on the three farms.This led to a genetic connection between the farms.Disease information was recorded by veterinarian or farm staff.During the first 300 days of lactation 75,595 disease treatments were recorded in total and thereof 39,934 come under udder disease treatments (Table 1).Table 1 summarises the recorded treatments of the analysed disease categories during the first 50, 100, and 300 days of lactation.The disease category 'all diseases except fertility diseases' was created for analysing the impact of fertility diseases on the results of the variance component estimation for all diseases.In general, the data sets could be divided according to lactation length and the number of daughters per sire (Table 2 and Table 3).Table 2 gives a description of the data sets 1 to 9, whereas Table 3 describes the data sets 10 to 18.The effect of an increasing number of daughters per sire was analysed in the data sets 10 to 18.This was equal to an improved genetic structure of the data.This improved genetic structure of the data was caused by the special data recording scheme.
In all data sets disease information was treated as an 'all or none' trait.All analysed data sets contained one observation per cow and day.Each observation (day) received a disease code, '1' if a cow showed a disease and '0' if not.If there was a recorded treatment, the day of treatment and the following five days are coded with '1'.The five day period was selected, because it is the average time in which milk has to be discarded.All other days were coded with '0'.In a first step all data sets were checked for extreme categories.These categories were excluded from the analysis, but it should be noted that this was not a problem in our data, and therefore nearly all data were included in the following analyses.The significance of fixed effects was analysed by using the GENMOD procedure of the SAS package (SAS, 1999).Statistical testing for each factor was done by removing this particular factor from the model and comparing the likelihood of this reduced model with the likelihood of the full model.The age of calving was in eight classes: (20 -24), (25 -29), (30 -49) month in first parity, (30 -39), (40 -45), and (46 -56) month in second parity, third parity (45 -75) month, and higher than 3 (≥ 75 month).Furthermore, the model includes the herd calving season as a multi-code.Each year was divided into three calving seasons.Calving between January and March was summarised to calving season one, and April to August, and September to December are the other two calving seasons, respectively.A further systematic environmental effect was the herd*week (of observation) of the test day, where one herd*week (of observation) class was equal to one herd week.The influence of all systematic environmental effects were significant (p< 0.001).A threshold liability model (WRIGHT, 1934;DEMPSTER and LERNER, 1950;GIANOLA and FOULLEY, 1983) was used for the estimation of random permanent environmental variance and additive genetic variance.In this study an animal model was applied to all data sets.
The following test day repeatability threshold model was applied to the data sets 1 to 18: = expected probability for occurence of any disease Φ = cumulative probability function of the standard normal distribution B i = fixed effect of the i -th herd*week (of observation) (i = 1,...,771) * K j = fixed effect of the j -th herd*year*season (of calving) (j = 1,...,45) L k = fixed effect of the k -th age of calving (k = 1,...,8) f k(days in milk) = lactation curve nested within age of calving k m l = random permanent environmental effect (l=1,...,10,071) ** t m = random effect of the m -th animal (m=1,...,15,802) * The number of herd*year*season effects (of observation) decreased for metabolic diseases from 771 to 421.This was caused by extreme categories, which were not included into the analyses.** The number of random permanent environmental effects was 10,071 for data set 1.For the remaining data sets the number of random permanent environmental effects are given in Table 2 and Table 3.A pedigree file was applied to all data sets, which contains as much information as possible from two ancestor generations.
In the test day models for udder diseases, all diseases, and 'all diseases except fertility diseases' f k(days in milk) represents a fixed effect with seven classes for the first seven days of lactation.After the seventh day of lactation f k(days in milk) is identical to the function used by ALI and SCHAEFFER (1987).For metabolic diseases f k(days in milk) represents a fixed effect with 18 classes for the first 18 days of lactation.After the 18th day of lactation f k(days in milk) is identical to the function used by ALI and SCHAEFFER (1987).The posterior distributions of the permanent environmental variance and the additive genetic variance for the liability to udder diseases or all diseases were determined with the Gibbs sampling algorithm implemented in the LMMG_TH program, a threshold derivative of LMMG (REINSCH, 1996).The LMMG_TH program based on the results presented by SÖRENSEN et al. (1995).For random effects multivariate normal distribution with zero means and appropriate variance -covariance matrices were used and improper flat priors for fixed effects.For all models 100,000 cycles were generated and the result from each cycle was retained.The results of the first 10,000 cycles were discarded (burn in plus a safety margin) and the results of the remaining 90,000 cycles were used to calculate the genetic parameters, using the MEAN procedure of the SAS package (SAS, 1999).The convergence was determined by visual inspection of plots of realised parameter values against iteration number.Similar convergence detection was used by REINSCH et al. (1999).

Trajectories of different disease categories
The trajectory of udder diseases showed only one peak during the first 300 days of lactation (Figure 1).Thereafter the frequency of udder diseases decreased up to a constant level of 2% to 3%.
Day 5, 18.0%The disease incidence of metabolic diseases showed a further peak at lactation day 14 (Figure 2).Thereafter the metabolic disease incidence decreased and was below 1% after the 24th day of lactation.Figure 1 and Figure 3 suggest that the first peak of the trajectory of all diseases was caused by udder diseases, whereas the distinct peaks in the latter part of lactation must be caused by other health problems.The high number of fertility disease treatments (Table 1) pointed out, that fertility diseases could be the reason for these peaks.
The trajectories of fertility diseases and ovarian problems are shown in Figure 4.The trajectory of fertility diseases showed three distinct peaks, where the first peak was caused by non-ovarian problems, and the other two were caused by ovarian problems.All in all, the Figures 1 to 4 described the trajectories of the most important disease categories.The first peak of the trajectory of all diseases (Figure 3) summarises the non-ovarian problem peak (Figure 4) and the peak of the udder health problems and metabolic problems (Figure 1 and Figure 2).The remaining peaks in Figure 3 were caused by an increased risk of ovarian problems (Figure 4).In addition to the peaks the trajectory of all diseases was not so constant as the trajectory of udder diseases.Fertility diseases were excluded for the estimation of the influence of an increasing number of daughters per sire.This was done because the distinct peaks of the trajectory could create problems if variance components were to be estimated.

Disease frequencies
Table 4 summarises the average incidence of the disease categories in the data sets 1 to 9. As expected the disease category all diseases showed the highest frequencies, followed by udder diseases and metabolic diseases.For all analysed disease categories the disease frequency decreased with increasing lactation length.This suggests that disease problems are concentrated at the beginning of the lactation.Therefore the analysis of the effect of an increasing number of daughters per sire was restricted to the first 50 days of lactation.Furthermore, fertility diseases were excluded from all diseases.
The disease frequencies of the different disease categories were fairly constant in data sets 10 to 18 and they where between 6.1% and 6.5%, 1.6% and 1.7%,and 8.1% and 8.2% for udder diseases, metabolic diseases, and all diseases except fertility diseases, respectively.
In Table 5 the disease frequencies of the different disease categories are given for the different lactations.Furthermore, this table includes the number of lactations and the average lactation length and it's standard deviation.

Estimation of variance components
The results of the variance component estimation for the first 9 data sets are summarised in Table 6 and all estimates are posterior means.The estimated additive genetic variances vary between 0.04 (data set 9) and 0.91 (data set 4) and were between 0.28 and 3.16 for the permanent environmental variance, respectively.Furthermore, for all disease categories a decreasing additive genetic variance and permanent environmental variance could be observed with increasing lactation length (Table 6).The highest additive genetic variance (0.43 to 0.91), as well as permanent environmental variance (2.12 to 3.16) were estimated for the liability to metabolic diseases followed by the liability to udder diseases.The additive genetic variance of the liability to udder diseases were estimated within the interval of 0.24 to 0.58 and were between 0.83 and 1.80 for the permanent environmental variance.
The estimated variance components for the liability to all diseases were significantly lower, as compared to the liability to metabolic diseases or udder diseases.This led to lower heritabilities of liability (0.04 to 0.07) for all diseases, whereas similar heritabilities of liability were estimated for metabolic diseases (0.12 to 0.18) and udder diseases (0.12 to 0.17).
The effect of an increasing number of daughters per sire (improved genetic structure) on the variance component estimation is shown in Table 7.All estimates are posterior means.
The estimated additive genetic variances vary between 0.38 and 1.47, depending on the disease category and they were between 1.18 and 3.72 for the permanent environmental variance, respectively.Within all disease categories additive genetic variance increased with an improved genetic structure of the data.The permanent environmental variance increases for the liability to metabolic diseases and 'all diseases except fertility diseases' with an increasing number of daughters per sire (improved genetic structure), whereas the permanent environmental variance decreased for the liability to udder diseases, respectively (Table 7).
Table 7 Posterior means and standard deviations (SD) of additive genetic variance (σ 2 a ), permanent environmental variance (σ 2 pe ), heritabilities (h 2 ), and repeatabilities (r), for liability to udder diseases, metabolic diseases, and all diseases except fertility diseases, in data sets with an improved genetic structure (different restrictions for minimum number of daughters per sire, at least 10, 30, or 50 daughters) (Posteriori Mittelwerte der additiv genetischen Varianz (σ 2 a ), der permanenten Umweltvarianz (σ The estimated heritabilities of liability were within the interval of 0.15 to 0.25.With an increasing number of daughters per sire (improved genetic structure) an increasing heritability of liability could be observed for all analysed disease categories.The increasing heritability of liability was more distinct for the disease categories udder diseases (0.18 to 0.25) and metabolic diseases (0.16 to 0.24) compared to the disease category 'all diseases except fertility diseases' (0.15 to 0.19).The increasing heritability of liability showed that differences between sires become more apparent with increasing number of daughters per sire (improved genetic structure).Furthermore, the genetic connection between the farms should have a positive effect on the heritabilities.For all analysed data sets the estimated repeatabilities were high with values within the interval of 0.61 to 0.84 (Table 7).
Posterior distributions are shown exemplary for the liability to udder diseases in Figure 5, where the considered period of lactation ends after 50 days in milk.Posterior distributions of the heritability were similar for the data sets, which contain all available udder disease information, as well as for the data set when only records from sires with at least 10 daughters were considered.Furthermore, posterior distributions of the heritability of the two remaining data sets were similar, but they are not so peaked (Figure 5).The posterior distributions for the liability to metabolic diseases (not shown) were more flat, compared to those of the liability to udder disease.In addition the posterior distributions of the liability to all diseases except fertility diseases (not shown) were similar to those shown in Figure 5.

Discussion
The aggregation of different diagnoses to disease categories was very useful for this analysis.On the one hand inconsistent diagnosis given by the veterinarian could be avoided.On the other hand the definition of disease categories reduced the number of discarded information.For example, disease categories reduced the number of extreme categories, because similar diagnoses were summarised.If test day models are used the trajectory of the analysed trait has an important effect.Therefore, models must be fitted to the trajectory of the analysed disease category.For udder diseases, metabolic diseases, and 'all diseases except fertility diseases' this could be done by using the Ali Schaeffer function, but it should be noted that the possibilities of these function are also limited.Therefore, the first days of lactation must be treated as fixed effects.This was also done by SCHOMAKER et al. (2002).

Disease frequencies and estimation of variance components
The range of frequencies of different disease categories or diseases vary widely in the literature, depending on data recording methods and case definition.Furthermore, the frequencies of diseases categories are affected by the number of diseases, which were summarised to one disease category.NIELSEN et al. (1999) decided between udder diseases, reproductive diseases, digestive diseases, and feet and leg diseases.Each of these disease categories summarises nearly ten different diagnoses.Other studies (LUND et al., 1994 andHANSEN et al., 2002) divided disease information into two disease categories, mastitis and diseases other than mastitis.There is no standardisation for the creation of disease categories.Standardisation for the evaluation of disease categories would improve the comparability of different analyses.Overall, in the analysed disease categories the average disease frequencies decreased with increasing lactation length (Table 4).This suggests that all disease problems are concentrated in a relatively short phase at the beginning of lactation, when the physiological demands of the cows are high.For udder diseases the concentration of diseases at the beginning of lactation is in agreement with the results found by HERINGSTAD et al. (2003b), KADARMIDEEN et al. (2000), and REKAYA et al. (1998).An increasing number of daughters per sire (improved genetic structure) showed no effect on the average disease frequency.It should be noted that in this study all lactations were included, whereas most other studies based on first lactation information.For udder diseases the additive genetic variance and the permanent environmental variance decreased with the increasing period of data collection (Table 6).This resulted in decreasing heritabilities (0.17 to 0.12) and repeatabilities (0.70 to 0.52).These estimates are higher than those found by REKAYA et al. (1998), KADARMIDEEN et al. (2000), and HERINGSTAD et al. (2003b) for mastitis if test day threshold models were used.These authors divided the lactation in several blocks.This could be a reason for the different results.Another possibility for the different results could be that the authors mentioned above used sire threshold models.In a previous analysis (results not shown) results were checked by a second analysis with a sire model.In our case the results of both models agreed well.Additionally, the estimated heritabilities for udder diseases in this study are significantly higher than those given by several authors (SIMIANER et al., 1991;HERINGSTAD et al., 1997;HERINGSTAD et al., 2003a) who carried out their analyses with lactation threshold models.The additive genetic variance and the permanent environmental variance of metabolic diseases also decreased with increasing lactation length.The additive genetic variance and the permanent environmental variance of metabolic diseases were higher than those estimated for udder diseases and all diseases.However, the estimated heritabilities (0.18 to 0.12) are similar to the heritabilities of udder diseases and they are in the upper range of literature reports (SIMIANER et al., 1991;LYONS et al., 1991;URIBE et al., 1995).It should be noted that in most cases cows are only affected by one metabolic disease during lactation.Therefore, the improvement in genetic parameter estimation, caused by the use of test day threshold models is limited.This is not correct for udder diseases where repeated cases occurred in a higher frequency, compared to metabolic diseases.This resulted in an improvement in genetic parameter estimation if test day threshold models are used for genetic analysis of udder disease data because distinct cases of udder diseases could be treated as distinct observations in those models (HINRICHS et al., 2003).If all available disease information was analysed simultaneously the additive genetic variance (0.04 to 0.07), as well as the permanent environmental variance (0.28 to 0.40) are low, compared to the disease categories udder diseases and metabolic diseases.The use of lactation threshold models resulted in low estimates for the additive genetic variance and the permanent environmental variance.In addition, the estimated heritabilities and repeatabilities are low (HINRICHS et al., 2004).If test day threshold models were used the additive genetic variance was fairly similar compared to the additive genetic variance of the lactation threshold models.Therefore, heritabilities did not increase with the use of test day threshold models in contrast to the repeatabilities (Table 6).The estimated heritabilities are similar to those estimated by SIMIANER et al. (1991).The fairly constant additive genetic variance could be caused by the trajectory of all diseases (Figure 3).These trajectories showed distinct peaks and some of them were caused by ovarian problems (Figure 4).Therefore, ovarian problems cannot be analysed together with other diseases and should be assessed separately.An alternative function must be found for modelling the trajectory of ovarian problems, or the Ali Schaeffer function should be replaced by a fixed effect for every day.Such data sets have to be very large, because several observations will be needed for an exact estimation of the fixed effect of the lactation day.The effect of an increasing number of daughters per sire (improved genetic structure) on the results of the variance component estimation was analysed with the disease categories udder diseases and metabolic diseases.In addition, fertility diseases were excluded from the disease category all diseases before the variance components were estimated.The lactation length was restricted to the first 50 days of lactation, because most diseases occurred during this period of lactation (Figures 1 to 4).An increasing number of daughters per sire (improved genetic structure) showed a positive overall effect in the analysed disease categories (Table 7).The results of the variance component estimation for the diseases category 'all diseases except fertility diseases' suggests that fertility diseases should analysed separately (Table 6 and Table 7).All in all the results of our study showed that an increasing number of daughters per sire (improved genetic structure) had a positive effect on the variance component estimation, if test day threshold models were used.This positive effect cannot observed if the analyses was carried out with lactation threshold models (HINRICHS et al., 2004).Therefore, the development of test day threshold models is an important area of research.Alternative models should be tested for their possibilities to describe the trajectories of the disease categories all diseases and fertility diseases.Another possibility could be the use of random regression models or multiple trait models.For such models are very large data sets will be needed which are not available in most countries today.An advantage of random regression models would be that they treat diseases at different lactation days as different traits.This could be useful because HERINGSTAD et al. (2003b) found that the genetic correlation between mastitis in different stages of lactation was not equal to one.Therefore mastitis in different stages of lactation is not the same trait and should be treated as different traits, if it is possible.

Special data recording scheme
The results of our study showed that the establishment of a special data recording scheme for functional traits on commercial dairy farms is possible.Data recording on commercial dairy farms would lead to large field data sets, which are not available in most countries today.These data sets could be used for the estimation of genetic parameters and breeding values.The establishment of such data recording schemes should be done stepwise.In a first step potential data recording farms have to be selected.These farms have to fulfil several requirements.They should be as large as possible and farm management, farm staff, and veterinarians must be interested in the data recording scheme.Furthermore, all farms must have computer based herd management programs.All information should be stored on the farm as long as possible.The second step includes the construction of a data base for the data recording scheme and first data recording.Data recording has to become a part of the daily farm work and plausibility checks for the recorded information must be established.Thereafter the same bulls should be used for artificial insemination at the same time on the farms.A genetic connection between the data recording farms would therefore be built up.If the data recording is well established on the farms another important point is, that the motivation of farm staff, farm management, and veterinarians for the data recording has to be maintained.This could be done in the form of disease reports or warnings about certain disease outbreaks.

Conclusions
The presented results suggest that disease traits showed sufficient additive genetic variance.Therefore, disease information should be used in commercial breeding schemes.Furthermore, disease information should be divided into disease categories.The development of test day threshold models is an important area of research.The trajectory of all diseases showed more peaks and could not be described by the Ali Schaeffer function.Therefore alternative functions should be tested for this disease category.In particular fertility diseases should be analysed separately.In addition, fertility diseases should be divided in ovarian and non ovarian problems.All models must be fitted to the trajectory of the disease category.For the genetic improvement of disease resistance data recording, quantitative genetic information and molecular genetic information should be combined and integrated stepwise into commercial breeding schemes.

Fig. 3 :
Fig. 2: Frequency of metabolic diseases during the first 300 days of lactation (Frequenz der Stoffwechselerkrankungen während der ersten 300 Laktationstage) Figure 3 describes the trajectory of the disease category all diseases.

Table 5
Number of lactations (n), mean lactation length and standard deviation (SD) and frequencies (%) of the different disease categories for the first 300 days of lactation (all available information used) (Anzahl der Laktationen (n), mittlere Laktationslänge und Standardabweichung und Erkrankungsfrequenzen in den verschiedenen LL = lactation length, UD = udder diseases, MD = metabolic diseases, ADEFD = all diseases except fertility diseases, AD = all diseases