Genetic analysis of distance-dependent racing performances in German Thoroughbreds

The objective of this study was to develop a new multivariate statistical model for genetic estimation of distance-dependent racing performances in German Thoroughbreds. Analysed performance traits were »square root of distance to first placed horse in races over sprint distances (until 1 400 m)«, »square root of distance to first placed horse in races over mile distances (from 1 401 m to 1 900 m)« and »square root of distance to first placed horse in races over long distances (over 1 900 m)«. These traits were found to be influenced by the carried weight, which was determined by the horses’ earlier performance. Therefore, new traits were developed based on random regression models, which were independent from the carried weights. Genetic parameters were first evaluated for these new created traits »new distance to first placed horse in races over sprint distances« (h2=0.088), »new distance to first placed horse in races over mile distances« (h2=0.081) and »new distance to first placed horse in races over long distances« (h2=0.137) using a multivariate animal model. Genetic correlations between these traits were high, but differed from rg=1. A further heritability was estimated for the distance-independent trait »new distance to first placed horse in races over all distances« (h2=0.101) applying a univariate animal model with a fixed distance effect. These two different models were compared by two criteria. The ranking of breeding values for the distance-independent trait (estimated with the univariate model) was first correlated with each of the rankings of breeding values for the three distance-dependent traits (estimated with the multivariate model). Correlations varied from r=0.668 to r=0.813. The second criterion for comparison was the percentage of incorrectly selected raced stallions by breeding values estimated with the univariate model. Between 47.4 % and 69.7 % of stallions were incorrectly selected. The use of a total selection index including breeding values of the three distance-dependent traits with suitable weightings was recommended as a possible future selection criterion.


Introduction
In the past different genetic evaluation systems were developed for the German Thoroughbred population, although selection decisions still base on the subjective end-of-year general handicap weight.This actual selection criterion in Thoroughbreds is entirely based on the horse's own performance and expresses its racing merit as a weight in kg that is allocated by professional compilers.SCHULZE-SCHLEPPINGHOFF et al. (1987) estimated initially breeding values for the trait general handicap weight with a BLUP sire model.A BLUP animal model was first applied by PREISINGER et al. (1993) for the traits rank at finish and earnings.In several later genetic studies, the racing performance was mainly described by the trait rank at finish (JAITNER et al. 1994, UPHAUS and SCHMUTZ 1998, EKIZ 2005, HAHN 2008).In these used genetic models the carried weights of horses which bias their real performance ability, was included as a fixed linear regression.
In many races of Thoroughbreds all over the world, the racing time is only measured for the winner of a race.MOTA et al. (2005) calculated the racing time for non-winning Brazilian Thoroughbreds in races by multiplying the number of body lengths behind the winner per 2/10 of second and added this to the finishing time of the winner.For this new defined trait, distinct genetic predispositions were found for different distances (1 000 m, 1 100 m, 1 200 m, 1 300 m, 1 400 m, 1 500 m and 1 600 m).Also HAHN (2008) estimated genetic parameters for the racing performance (expressed as rank at finish) of 2-, 3-and 4-year old Thoroughbreds over the different distances classes Sprint, Mile, Intermediate and Long by using a multivariate animal model.The genetic correlations between the traits Sprint, Mile, Intermediate and Long varied from r g =0.64 to 0.99 within age-class two and three and decrease with increasing distance difference.BUGISLAUS et al. (2004) created the racing performance trait »new distance to first placed horse in a race« which was independent from carried weights of horses in races.This trait showed a good suitability for genetic evaluation.So far, genetic parameters for this trait were only estimated over all distances, although the genetic predispositions of horses for racing performances over distinct distances might differ.
The objectives of this study were (1) to create new distance-dependent racing performance traits independent from carried weights, (2) to analyse genetically these new traits and (3) to develop a new breeding value evaluation system.

Development of distance-dependent racing performance traits independent from carried weights
The total performance data set for the creation of distance-dependent racing performance traits consisted of 45 179 performance observations from 2 962 Thoroughbreds starting in flat races in the years 2001, 2002 and 2003.The data set included in total 6 469 flat races.These races were divided into three distance classes: The first class includes 1 405 races with sprint distances (until 1 400 m), the second class contains 2 689 races with mile distances (from 1 401 m until 1 900 m) and the third class includes 2 375 races with Intermediate-and long distances (over 1 901 m).The last named distance class is in the following only called long distances class.Only Thoroughbreds that performed in individual distance classes in more than 4 flat races were included in the data set.It was assumed that raced horses were unrelated.
The phenotypical trait distance to first placed horse was obtained by summation of sequential stewards' decisions within a race.It was expressed in horse lengths.The variable stewards' decision in the data described the distance between two sequentially placed Thoroughbreds in a race when they passed the finish.A square root transformation of the trait distance to the first placed horse was necessary, to receive a reasonable approximation to the normal distribution.This transformed trait was subtracted of a constant with the value 20.The value 20 was utilized because transformed trait distance to the first placed horse in a race should not receive a lower value than 0. The three analysed traits in this study were distance to the first placed horse in races over sprint-, mile-and long distances, respectively.
Performance observations in the data showed not the real performance potential of horses because competing Thoroughbreds had to carry different weights in individual races.These carried weights expressed the racing merit of horses in comparison with all starting Thoroughbreds and were allocated by professional compilers.For genetic evaluation unbiased performance observations are desirable.For this reason, coefficients of phenotypical performance traits on carried weights were estimated within each individual Thoroughbred in different distance classes by using random regression model.Three different univariate statistical models for the traits distance to first placed horse in races over sprint-, mile-or long distances were used for estimations of coefficients on carried weights.A fourth univariate statistical model for the trait distance to the first placed horse in races over all distances was additionally applied for estimation of coefficients on carried weights.These models included only a random animal effect and a residual effect.The space variables were the carried weights.A first order polynomial on carried weights for the animal effect was used in all four estimations.The program VCE5 (KOVAC et al. 2002) was utilised for evaluations of coefficients of the traits transformed distance to first placed horse in races over Sprint-, Mile-, Long-and over all distances on carried weights.The most frequent coefficients from the four evaluations over all animals in the individual distance classes were used for creation of the new performance traits independent from carried weights.
New distance to first placed horse in races over sprint-, mile-, long-or all distances where distances is the abbreviation for the phenotypical traits distance to first placed horse in races over sprint-, mile-, long-or all distances; coefficients are the most frequent evaluated coefficients of square root of distance to first rank in races over sprint-, mile-, long-or all distances on carried weights, respectively; cw is the abbreviation for carried weights ranging from 47 to 74.5 kg.New distance to the first placed horse represented in races with sprint distances a minimum of 9.49 and a maximum of 21.76, in races with mile distances a minimum of 6.95 and a maximum of 21.77 and in races with long distances a minimum of 6.31 and a maximum of 21.77.

Estimation of genetic parameters
The performance data set described in the previous chapter was applied for evaluation of variance components.In addition pedigree back to the third generation was used (11 014 animals).Only Thoroughbreds with more than 4 observations in individual distance classes (sprint-, mile-and long distances), races with more than 3 starters, trainers and jockeys with more than 3 starts, respectively, were included in the performance data set.The analysed traits were »new distance to first placed horse in races over sprint distances«, »new distance to first placed horse in races over mile distances«, »new distance to first placed horse in races over long distances« and »new distance to first placed horse in races over all distances«.The variance components were evaluated using REML procedure as implemented in the program VCE5 (KOVAC et al. 2002).The following multivariate genetic-statistical model was utilised for the three distance-dependent traits.
where y 1 is the vector of observations containing the traits of each Thoroughbred recorded at each individual race as new distance to first placed horse in races over sprint distances, new distance to first placed horse in races over mile distances and new distance to first placed horse in races over long distances.Vector b represents the fixed effects including the effects of sex (stallion, mare and gelding), age of Thoroughbred (age classes of 2, …, 10 and >10 year old horses), year-season of race (three months were combined to one season), trainer (1, …, 456), jockey (1, …, 399) and each individual race (6 469 races).The vector a represents the random additive genetic effects and vector pe denotes the permanent environment.The known incidence matrices X 1 , Z 1 , Z 2 relate the observations to the corresponding fixed and random effects.
In further genetic analyses, a univariate genetic-statistical model was applied for the performance trait new distance to first placed horse in races over all distances.This geneticstatistical model considered the different distance classes in races as fixed effect.
where y 2 is the vector of observations including the trait of each Thoroughbred recorded at each individual race as new distance to first placed horse in races over all distances.
Vector b shows the fixed effects including the effects of sex (stallion, mare, gelding), age of Thoroughbred (age classes of 2, …, 10 and >10 year old horses), year-season of race (three months were combined to one season), trainer (1, …, 456), jockey (1, …, 399), each individual race (6 469 races) and distance (sprint-, mile-and long distances).Vector a represents the random additive genetic effects and vector pe is the permanent environment effect.Vector e considers the residual effects.X 2 , Z 1 , Z 2 are the known incidence matrices.

Comparison of breeding values estimated by using two different genetic-statistical models
For the comparison of the two different genetic-statistical models (2) and ( 3), breeding values were estimated for the traits »new distance to first placed horse in races over sprint distances«, »new distance to first placed horse in races over mile distances«, »new distance to first placed horse in races over long distances« and »new distance to first placed horse in races over all distances«.The program PEST (GROENEVELD et al. 1990) was applied for estimation of breeding values.The performance and pedigree data sets are the same as in the previous section.For this analysis, 757 raced stallions in the data set were ranked on the base of their breeding values for the traits »new distance to first placed horse in races over sprint distances«, »new distance to first placed horse in races over mile distances«, »new distance to first placed horse in races over long distances« and »new distance to first placed horse in races over all distances«, respectively.
The first criterion for comparison of the two different genetic-statistical models were distinct correlations between the different rankings of breeding values for the distancedependent racing performance traits in model ( 2) and the ranking of breeding values for the distance-independent performance trait in model (3), respectively.The second criterion was the percentage of incorrectly selected stallions when either distance-dependent performance traits (model 2) or the single trait (model 3) was considered.The selection rate was 10 %.
The breeding values of the three distance-dependent traits were also combined in a selection index using equal weightings for each of the three breeding values.The ranking of horses by this selection index was compared with the ranking of horses by breeding values for the trait »new distance to first placed horse in races over all distances« (estimated with model 3).For this comparison the two above described criteria were used.

Development of distance-dependent racing performance traits independent from carried weights
From the analysed Thoroughbreds started 74.1 % only in races within one distance class (12.6 % of horses started only in races over sprint distances, 29.9 % raced only over mile distances and 31.6 % started only over long distances).About 25.5 % of horses raced in two bordering distance classes and only 12 horses (0.4 %) participated in all three distance classes.HAHN (2008) noticed in the period from 1991 to 2005 a significant trend to shorter racing distances in Europe.Such a trend means for the future German Thoroughbred breed, a more effective mating regarding specific distance classes.Therefore, a development of new distance-dependent racing performance traits is necessary.
For the creation of performance traits that are independent from carried weights in races, regression coefficients of carried weights on transformed phenotypical traits for the three distance classes were estimated within each individual performing animal by using random regression models.In all four estimations of coefficients within random regression model, a first order polynomial on carried weights for the animal effect was used.Higher orders of polynomials were not significant.Figure 1 represents the distribution of regression coefficients using random regression models for square root of distance to first placed horse in races over sprint-, mile-and long distances, respectively.The regression coefficients were nearly normal distributed in all three different distance classes.The most frequent coefficients had in each distance class a value of 0.025, although the range of coefficients varied in the different distance classes.These most frequent regression coefficients were used for creation of the three traits »New distance to first placed horse in races over sprint-, mile-and long distances«, respectively.The distribution of regression coefficients estimated within each animal using random regression model for carried weights on distance to first placed horse in races over all distances was also nearly normal distributed and showed the most frequent value at 0.025.This coefficient was used for creation of the trait »New distance to first placed horse in races over all distances«.These four new created traits were independent from the carried weights and reflected more the real performance potential of horses in races with different distance classes.The estimated coefficients were independent from the genetic potential of horses due to estimation within horses.BUGISLAUS et al. (2004) compared two different considerations of raced horses' carried weights in genetic estimations and found that the inclusion of a fixed linear regression in the genetic-statistical model lead to overestimated breeding values for low performing Thoroughbreds and underestimated breeding values for high performing Thoroughbreds.
The studies of JAITNER et al. (1994), PREISINGER et al. (1993), UPHAUS and SCHMUTZ (1998) and HAHN (2008) included the carried weights as fixed linear regression in the geneticstatistical model.BUGISLAUS et al. (2004) determined that the use of equation [1] is more suitable for the consideration of carried weights.

Estimation of genetic parameters
Genetic parameters were first estimated for the three traits »New distance to first placed horse in races over sprint-, mile-and long distances« by using the multivariate geneticstatistical model (2).All fixed effects in this genetic model showed high significant influences on the three performance traits.Resulting heritabilities and genetic correlations for and between the distance-dependent traits are represented in Table 1.The highest heritability was found for the trait »New distance to first placed horse in races over long distances«.The heritabilities of the traits »New distance to first placed horse in races over sprint distances« and »New distance to first placed horse in races over mile distances« were about 36 % and 41 % lower, respectively, than for the trait »New distance to first placed horse over long distances«.Genetic correlations between these three traits were very high.Specially, the genetic correlation between the traits »New distance to first placed horses in races over mile distances« and »New distance to first placed horses in races over long distances« was positively of high magnitude (r g =0.99).This indicates that high performing Thoroughbreds in races over mile distances showed also an extraordinary genetic predisposition in racing performance for long distances.The genetic correlations between the sprint-and mile distances and between the sprint-and long distances were high, but differed clearly from 1.This represents that genetic mechanisms between these distinct traits differ slightly.The trait »New distance to first placed horse over all distances« estimated with the univariate genetic-statistical model (3) had a value of h²=0.101.BUGISLAUS et al. (2004) found a genetic correlation between »New distance to first placed horse« and »New rank at finish« of r g =1.So, there seems to be no difference between these traits that were both independent of carried weights.HAHN (2008) estimated similar genetic parameters for the trait rank at finish in different distance and also age classes (2-, 3-and 4-year old horses).The heritability values in the study of HAHN (2008) decreased with increasing age.HAHN (2008) found also a decrease of genetic correlations with increasing distance.MOTA et al. (2005) utilized for evaluation of genetic parameters racing times from distances over 1 000 m, 1 100 m, 1 200 m, 1 300 m, 1 400 m, 1 500 m or 1 600 m as performance traits and found decreasing heritabilities with increasing distance length (from h²=0.29 over 1 000 m and h²=0.05 over 1 600 m).Also SOBCZYNSKA (2006) estimated for Thoroughbreds heritabilities for rank at finish that were decreasing from h²=0.16 in races over 1 000 m to h²=0.06 in races over more than 1 800 m.The genetic parameters for the distance-dependent traits estimated by MOTA et al. (2005)

Comparison of breeding values estimated by using two different genetic-statistical models
The genetic-statistical models (2) and (3) were first compared by correlations between rankings of breeding values for the traits »New distance to first placed horse in races over sprint-, mile-and long distances« (estimated with model ( 2)) and the ranking of breeding values for the trait »New distances to first placed horse in races over all distances« (estimated with model ( 3)), respectively.The first and the second named correlations show in Figure 2 high values with r=0.668 and r=0.674, but these values deviate anyway greatly from 1.That means that the rankings of breeding values for these distinct traits differed significantly.The correlation between the traits »New distance to first placed horse in races over long distances« (estimated with model ( 2)) and »New distance to first placed horse in races over all distances« (estimated with model ( 3)) was with r=0.813 higher, but represented also that the rankings of breeding values differed.The second criterion for model comparison was the percentage of incorrectly selected raced stallions by breeding values estimated with model (3).This criterion is also shown in Figure 2. When selecting 10 % of stallions using breeding values from the univariate model (3), about 69.7 %, 65.8 % and 47.4 % of stallions were incorrectly selected in comparison to the selections by breeding values for the traits »New distance to first placed horse in races over sprint distances«, »New distance to first placed horse in races over mile distances« and »New distance to first placed horse in races over long distances«, respectively (multivariate genetic model ( 2)).These values are consistent with the above correlations between these traits.The ranking of stallions by the new created selection index was correlated with the ranking of stallions by breeding values for the trait »New distance to first placed horse over all distances«.The correlation represented with r=0.78 a high value.
When selecting 10 % of stallions using breeding values from the univariate model [3], about 56.6 % of stallions were incorrectly selected in comparison to the selection by selection index.

Figure 2
Figure 2Correlations between rankings of breeding values of distance-dependent and of distance-independent racing performance traits as well as the percentage (%) of incorrectly selected stallions Korrelationen zwischen den Rangfolgen der Zuchtwerte der distanzabhängigen und distanzunabhängigen Rennleistungsmerkmale sowie der Prozentsatz (%) falsch selektierter Hengste

Fig. 2 :
Fig.2: Correlations between rankings of breeding values of distance-dependent and of distance-independent racing performance traits as well as the percentage (%) of incorrectly selected raced stallions (Korrelationen zwischen den Rangfolgen der Zuchtwerte der distanzabhängigen und distanzunabhängigen Rennleistungsmerkmale sowie der Prozentsatz (%) falsch selektierter Hengste) and SOBCZYNSKA (2006) are hardly comparable with the results in the Table because different distance areas were analysed.Genetic parameters with standard errors (in parentheses) estimated for the three distance-dependent performance traits new distance to first placed horse (NDFPH) in races over sprint-, mile-and long distances Genetische Parameter mit Standardfehlern (in Klammern) für die drei distanzabhängigen Leistungsmerkmale neuer Abstand zum erstplatzierten Pferd (NDFPH) in Rennen über Sprint-, Meilen-und lange Distanzen