Prediction of 305-day first lactation milk yield in cows with selected regression models

In the present study, the prognostic values of multiple and spline regression models were tested for 305-day lactation milk yield of cows. The predictors were: HF genes proportion in cow’s genotype; average milk yield for 305-day lactation from 4 first milkings of all cows in a barn, in which a cow was used in a given year; month of calving; and average daily milk yield from first four test-day milkings. Models were developed basing on 628 first lactations of BW cows with average 71% HF genes proportion. Subsequently, the predictive values of the models examined were verified on the grounds of next 105 first lactations. Prognostic differences of the models examined were determined, finding the prognosis obtained with the spline regression more accurate (smaller prediction error, higher coefficient of correlation for prognosis and real values in the model containing information from first three test-day milkings). These models are easy to construct and may be useful in practical estimation of cow lactation yields. They may be used for predicting the actual lactation yield in order to minimise production costs and achieve better production results.


Introduction
In many countries the analysis of milk yield for 305-day lactation is a basis for evaluating the breeding value of milk cattle.Every month, or every second month, test-day milkings are being carried out, which are the base for calculating 305-day lactation milk yield.Commercial value of a cow is compared with the yield of cows being in the same lactation, calved in the same year and season and coming from the same herd.Many factors are included into the model, as a herd-year-season effect (HYS).The estimated and the actual yield of a cow may differ considerably, however, in particular when the interval between test-day milkings are longer or also when the last test-day milking is collected before the end of 305-day lactation.In recent years studies have been performed on the use of lactation curves for calculating the milk yield.The application of specific lactation equation may enable more accurate estimation of the performance of cows for 305 days.Most frequently, the regression models of WOOD (1967), WILMINK (1987) or ALI and SCHAEFFER (1987) are used, though the proposed lactation curves do not always allow to obtain satisfactory results in relation to indigenous cattle population (STRABEL, 2002).When estimating the breeding value, the yield on the test day is taken into account (PTAK and SCHAEFFER, 1993) as well as random regression models (JAMROZIK and SCHAEFFER, 1997).Some possibilities for milk performance evaluation may arise from application of automatic milking systems, where data are based on real milk yields, however this approach still needs verification (BOHLSEN et al., 2003).Knowledge of the lactation curve may be used for supplementing incomplete cow lactations, determining food rations, estimating more accurately production costs and possible profits (HOLMAN et al., 1984;GERNAND et al., 1999 ) as well as for diagnosing mastitis (LESCOURRET et al., 1995), ketosis (DETILLEUX et al., 1994) or relationship between mastitis and milk composition throughout lactation (AMIN, 2001).Milk yield prediction in for same cows, and also in the whole herd, may facilitate the producers in taking adequate breeding decisions, being instrumental this way in reducing the costs and increasing the profits.Taking into account the foregoing, an attempt was undertaken in the present study to test the values of selected multiple and spline regression models based on the yields of first four test-day milkings.The received milk yield predictions for 305-day lactation were compared with the real yields obtained by a cow in that period.

Material and methods
In the study were used the data on 733 Black-and-White (BW) cows with various proportion of Holstein-Friesian (HF) genes referring to their first lactations lasting at least 305 days.The lactations were split into 2 groups: • 1 group -628 lactations, for constructing regression models.The lowest proportion of HF genes for cows of that group was 25%, the largest almost 98.6% (67% on the average).These cows were used in 7 barns in the northern part of Poland in 1994-1999.Their milk performance was calculated basing on monthly test-day milkings (STOLZMAN, 1982).• 2 group -105 lactations, picked out from the same 7 barns.These lactations were terminated in 2000.The average proportion of HF genes in that group of cows was 71%.The 305-day lactation milk yield in examined cows in particular barns was at the level of 4528 kg of milk, with the average daily yield being 19.8 kg.Butterfat content ranged 3.54-4.54%,and protein -2.95-3.35%.
The highest average milk yield was recorded in cows usually during the first month of lactation.Later on, the milk yield decreased gradually (by 1 kg per month on the average).Each cow was characterised by a group of 8 variables -the predictors (X): X 1 -percentage of HF genes in cows genotype; X 2 -number of days in cows lactation; X 3 -average 305-day lactation milk yield from 4 first milkings of all cows of a barn, in which a cow was used this year; X 4 -month of calving; X 5 -average daily milk yield from test-day milking falling on the average on the 18th day of lactation (between the 5th and the 28th day); X 6 -average daily milk yield from test-day milking falling on the average on the 48th day of lactation (between the 36 and the 59th day); X 7 -average daily milk yield from test-day milking falling on the average on the 76th day of lactation (between the 66th and the 89th day); X 8 -average daily milk yield from test-day milking falling on the average on the 104th day of lactation (between the 95th and the 118th day).The explanatory variable (y) was total milk yield of a cow for the lactation determined as the real yield estimated according to obligatory methods and recorded in breeding documentation, as opposed to the yields predicted with the regression models ( ).
y În the analyses were used four multiple regression (MLR) and four spline regression models with break-points in arithmetic mean (PRA).MLR may be written down as the following equation: where: y ˆ -vector of values calculated on the basis of estimated model; β -(k+1)x1 vector of coefficients, β = (X T X) -1 X T y; y -nx1 vector of variable Y observations; Xnx(k+1) matrix of observations on explanatory variables, e -vector of random errors.In PRA a situation is considered, when dependent variable (y) changes together with the values of independent variables, and the linear correlation may remain within the wide range of different levels of production; however, above a certain value a discontinuity may appear in the correlation between variables.The moment of this discontinuity, called a break-point, divides the regression equation in two estimations, above and below a break-point.Thus, the model of spline regression consists of two independent equations, applied according to the value of dependent variable (PEDHAZUR, 1973).The break-point in the median allows to predict more accurately in the situation when distribution of traits is characterised by larger or smaller asymmetry: X = y ˆ1 β 1 (for p ≤ y ) + X 2 β 2 (for p> y ); where: X 1 -matrix of observations for explanatory variables for p (break-point) being less than or equal to the arithmetic mean of observations; X 2 -matrix of observations for explanatory variables for p (break-point) being more than the arithmetic mean; β 1 , β 2vectors of coefficients; yarithmetic mean.All analysed models had identical first four variables: X1, X2, X3, X4.Differences in the models resulted from using additional predictors: average milk yields from test-day milkings in successive months of the test.The analysed models of multiple regression are: MLR1 -X5 MLR2 -X5, X6 MLR3 -X5, X6, X7 MLR4 -X5, X6, X7, X8 and the models of spline regression are: PRA1 -X5 PRA2 -X5, X6 PRA3 -X5, X6, X7

PRA4 -X5, X6, X7, X8
The analysed models were augmented originally with three additional variables: age of cows, inter-calving period and study year.The input of these variables in explaining the variability in milk yield with lactation was non-significant, just as analogous regression coefficients.Considering the excess of information, the setting-up of an adequate model would be difficult, therefore these variables were not taken into account in analysed models.
For estimating regression functions, Statistica 5.1 v.PL software package was used (Statistica PL, 1997).PRA models were approximated with the quasi-Newton's estimation algorithm, which estimates the function at each successive estimation step in different points, in order to determine derivatives of the Ist and the IInd order.The derivatives indicate the direction and the gradient of changes in the function slope.The algorithm follows a certain path to the minimum of loss function, which estimates the volume of deviations between the predicted and the observed values (GILL and MURRAY, 1972).Each deviation represents a certain loss in the accuracy of prediction.Minimizing the loss function is a procedure in estimating the coefficients of regression equation.The loss function (F L ) was: = min; where: y ˆ The quality of the obtained models was determined with the coefficient of determination, which indicates the extent, in per cent, in which the variability of the explained trait (Y) has been explained by explanatory variables (X 1 , X 2 …) occuring in a given model, with the coefficient of determination being given at the same time corrected according to the formula below, which will enable the comparison of coefficients from models differing in the number of explanatory variables.
; where: n -number of observations; k -number of predictors; R 2 -coefficient of determination; R 2 A -adjusted R 2 .Moreover, using the Shapiro-Wilk's test the normality of distribution of the differences between the values predicted by the model and the real values was verified.Also an assumption was analysed on the homoscedasticity, i.e. on the constancy of variance of the residual component basing on the following statistics: ; where: n 1 and n 2 -observations split into subsets according to the arithmetic mean; σ 2 1 and σ 2 2 -variances in both subsets; k -number of explanatory variables.
In the analysed models the possibility of explanatory variables being correlated or not was verified as well.
The obtained models were used for generating the prediction for group 2 of previously selected 105 cows.The yield predictions derived by respective models were compared with the real 305-day lactation yields, which were attained by these cows (according to documentation).The test for the difference between means (paired two group t-test or dependent sample t-test, or repeated measures t-test) was employed for related variables to verify the yields predicted with regression and the real yields.Also the coefficient of correlation (r) was determined between the real values and the values predicted by respective models.The mean relative error of prediction (PE) was calculated as well (according to DITTMANN, 1999, under assumption that time momentum T = 0): For each model, numerical ranges and absolute percentage differences for the predictions and the real values were composed in 200-kg intervals in order to see the number of predictions that fell within particular limits of the arithmetic mean for the examined cows of group 2.

Results
Tables 1-2 present the regression coefficients for the eight models analysed.All the coefficients were highly significant (P<0.01).The tables also show the coefficients of determination (R 2 ), which ranged 0.74 to 0.91, thus being very high (LUSZNIEWICZ and SLABY, 2001).Table 3 presents prediction effects of the examined models for 105 cows selected.Differences between the average real milk yield for 305-day lactation and the predicted yields were negative and ranged -278.8 to -33.6 kg (MLR1 and PRA1).This difference was a result of the comparisons of particular observations, deviating either in plus or in minus from the real values.With the use of the dependent sample t-test, the predictions from respective models were compared with the real milk yields of the analysed set of 105 cows.In most cases, these differences were non-significant (except for MLR1 and MLR2).The lowest prediction error for the multiple regression amounted to 0.08% (MLR3 and MLR4), and to 0.06% (PRA4) for the spline one.Only PRA1 had a larger prediction error than the best models of multiple regression with 3 and 4 first test-day milkings.The coefficients of correlation between the predictions and the real values ranged from 0.70 for MLR1 to 0.92 for PRA4.The values of those coefficients may be referred to linear models, which are based on the day of lactation.The value of the average absolute differences for the real and the predicted values in shown in Table 4.The average absolute differences ranged from 632.66 to 310.91 kg milk, which made for 12.04 to 5.91% of mean real milk yield of the analysed population.The variability of absolute differences was high, and the maximum discrepancy between the predicted and the real values ranged in some cases from 1305 kg milk for PRA4 to 2045 kg milk for PRA1.

Discussion
The daily milk yield of cows with a similar proportion of HF genes was very similar according to test-day milkings in successive lactation months, i.e. during the first 30 days after the calving the cows attained their maximum daily milk yields, while in next months a smaller or a greater decrease in milk yield was observed.Such a course of yield during lactation is consistent with the reports of other authors (OLORI et al., 1997;SWALVE and GUO, 1999) and might have been considered to be normal if the high milk yield had been sustained for a longer time and the decrease had not been over-intensive.GARCIA and HOLMES (2001) stress that development of milk yield in cows during lactation is influenced also by the level of milk yield within the herd and the calving season in cows.CHMIELNIK et al. (1998) confirmed the increase of milk yield in cows with a higher proportion of HF genes; however, the course of lactation, in particular in cows with more than 85% content of HF genes, was characterised by two distinct peaks.Great part of milk yield variability has been explained with the proposed set of predictive variables.In that respect, the best model was PRA4 (R 2 = 0.91).In the studies of other authors (SHIVE-KUMAR, 1995), the proposed model for predicting the milk yield in cows in the 1st lactation with regard to body weight and some zoometric parameters, was characterised by low coefficient R 2 = 0.36.On the other hand, PAN et al. (1997) applied the model of multiple regression (Y = 200.548+ 59.122•X 1 + 1.447•X 2 ), in which the explanatory variables were: peak milk yield during lactation (X 1 ) and initial milk yield (X 2 ).The coefficient of determination for that model amounted to 0.64 and was lower than that in the presented study.OLORI et al. (1999), when analysing the standard models of the lactation curves (incomplete gamma, inverse quadratic polynominal, polynominal regression, exponential, mixed log) for predicting some milk yield parameters in cows from the 1st lactation in a small herd, obtained very high R 2 (0.944 -0.996) for peak milk yield.For the whole lactation yield, however, these coefficients were much lower and were at the level of 0.66.These authors stressed that coefficient R 2 > 0.70 points to well-matched model, whereas the model is disqualified when R 2 < 0.40.Some authors (PERZ and SOBEK, 1999) suggest however that it is better to take into account the lowest variance of estimation error as the criterion of model quality instead of the value of R 2 ; though one should stress that in the present paper are quoted the corrected coefficients R 2 , which makes a comparative criterion for various models.One should remember, however, that every additional variable in a model increases the value of that coefficient (RENCHER and CEAYONG, 1980).The analysed residuals were of normal distribution, except for MLR1 and MLR2.The homoscedasticity test showed that the equality of variances occurred for MLR2, MLR4, PRA1, PRA3 and PRA4, the obtained estimators being biased in the remaining models; moreover some variables in those models were inter-correlated, which hinders or even rules out the use of those models.Logarithmic and polynominal equations presented by BADNER and ANDERSON (1985) were characterised by the coefficients of correlation from 0.77 to 0.96.The model of JENKINS et al. (1984) was charcterised by the coefficient of correlation equal to 0.72.GUO and SWALVE (1995) received the coefficients of correlation above 0.97 for the analysed polynominal curves for the real and the predicted milk yields.The models analysed by OLORI et al. (1999) were characterised in the majority by randomnessless, with a differentiated value of the coefficient of correlation for the predicted and the real milk yields for particular cows (0.0 to 0.99).
Improvement of model predictability resulted in general from taking into account in models the average milk yield of test-day milking of the successive month, which in general increased the percentage of predictions in particular intervals and reduced the number of observations deviating by almost 1000 kg from the real milk yields.This is also evident for the parameters of particular models (Table 3 and 4), at the same time one may compare respective models for the full set of multiple regression data and incomplete spline regression models, which generally were more precise.PRA models were of similar values.The differences between predicted and observed values for spline regression (Table 3) to some degree resemble those obtained by SWALVE and GUO (1999) for herds of similar yield; using their own model, the authors recorded positive differences (7 to 70 kg of milk) for the first lactation cows, depending on the herd.PERZ and SOBEK (1999) analysed the usefulness of 22 various models for predicting the milk yield basing on different test-day data, using many mathematical models: from the simplest -a simple regression, to more complex models with trigonometric functions and their reciprocals, concluding that a more complex model is not always better in predicting than the more simple one.Models PRA 3 and PRA 4 predicted the milk yield for 305-day lactation with a difference below 350 kg milk, surpassing all MLR models.Unfortunatelly, the predictive values of certain models are ruled out by heteroscedasticity (MLR1, MLR3, PRA2), and despite their rather good other quality parameters, their predictions cannot be reliable.Model PRA3 with a lower number of prognostic variables matched in the quality parameters MLR4 model with the full set of explanatory variables.When using that model, a prediction is possible just after 3 first test-day milkings (approximately 90 days of lactation), with the accuracy matching the predictions based on 4 test-day milkings for multiple regression.PERZ and SOBEK (1999) stress that more simple models are less sensitive to fluctuations appearing in the data, though they do not render truly the real course of lactation curve.Recently, test-day milkings are being used more and more frequently, which may be directly included into evaluation of breeding value (Test-Day Model).Thanks to that, it is possible to see, for instance, on effect of health state, weather conditions, feeding method, maintenance conditions in a barn and other factors that eliminate the influence of environment.The accuracy of evaluation of the breeding value of cows increases when compared with 305-day milk yield by approximately 4-8% (JENSEN, 2001;STRABEL and SZWACZKOWSKI, 2000;SCHAEFFER et al., 2000).The lactation curve equation allows taking into account the changes in the quantity of milk obtained from a cow depending on the lactation stage.Also the models with random regression are applied (Random Regression Model), where the effect of individual is replaced by curvilinear regression, which takes into consideration the shape of lactation curve in estimating the genetic value of cow (SCHAEFFER and DEKKERS, 1994;JAMROZIK et al., 1997).The genetic value refers in that case to daily milk yield of cows.

Recapitulation
In the models presented here a genetic agent has not been taken into account, though the estimated coefficients of regression may be used in the model of random regression.According to the presented models, in particular those, the predictive values of which were the best (coefficient of correlation between the real value and the prediction -r > 0.90), it is reasonable to predict the milk yield of cows basing on first four test-day milkings (approximately 100 days of lactation).An important advantage of the presented models consists in the ease of design and in the clarity of interpreting the model parameters.Their comparison, as is showed by the present study, does not require such large data sets as in case of models with random regression.This speaks well for using these models in practical application, in particular for producers for actual predictions with the aim to minimize production costs and to achieve the best production results possible.One should remember, however, that environmental conditions and animal productive abilities change with time, hence it is necessary to modify regression coefficients of the model.

Table 1
Multiple regression models with a variable number of predictors (Multiple Regressionsmodelle mit variabler (R 2 -coefficient of determination of the model; coefficient from F test for homoscedasticity of the model -ns -non-significant, * -significant at P<0.05, ** -significant at P<0.01)

Table 2
Spline regression models at the break-point equal to the arithmetic mean with a variable number of predictors (Modelle der gestückelten Regression im Knoten, der dem arithmetischen Mittelwert mit variabler

Table 3
* -means for observed milk yields and for a given model differ significantly at P<0.05

Table 5
Numerical and proportional prediction data in difference intervals between real and predicted milk yields for respective models (Zahlen-und Prozentzusammenstellung von Prognosen in den Differenzbereichen zwischen den tatsächlichen und prognostizierten Milchleistungen für einzelne Modelle)