Estimating lethal allele frequencies in complex pedigrees via gene dropping approach using the example of Brown Swiss cattle

A new approach to estimating the allele frequencies of lethal autosomal-recessive genetic disorders was developed based on the gene dropping method. The method was tested in the complex pedigrees of 1 830 125 animals of the Austrian Brown Swiss population, where carriers for 4 genetic disorders were recorded. Trends of allele frequencies of Spinal Dysmyelination and Spinal Muscular Atrophy increased while Weaver decreased, but allele frequencies of Arachnomelia fluctuated between 2 and 3 %. The results were compared to the results from other methods. The results obtained from probability of gene origin were higher than the results from gene dropping in general, while the results from gene counting were lowest due to the fact that just a part of the pedigree information could be considered by the used program. The gene dropping and gene counting methods used here take lethal selection into account, while the program based on probability of gene origin does not. Therefore, gene dropping and gene counting seem to be more appropriate for estimating the lethal allele frequency of lethal autosomal-recessive genetic disorders. Applying the gene dropping approach, one can obtain the distribution of allele frequencies and confidence intervals for the allele frequency, which might be valuable for observing trends in active breeding populations.


Introduction
Up to now, four lethal autosomal genetic disorders have been identified in Brown Swiss cattle (BS) caused by recessive genes known as Arachnomelia (A), Spinal Dysmyelination (SDM), Spinal Muscular Atrophy (SMA) and Weaver (W).Symptoms of calves affected with A involve lengthening and decrease in the diameter of limb bones, increased bone fragility, arthrogryposis, distortion of the vertebral column, and brachygnathia inferior (BREM et al. 1984).The spread of Arachnomelia began with the US-bull named ›Norvic Lilasons Beautician‹ (GENTILE and TESTONI 2006).A second genetic disorder, SDM, is a congenital neurological disorder occurring in cattle breeds upgraded with American BS, and its gene is located on chromosome 11 (NISSEN et al. 2001).SDM carriers might be traced back to an American BS bull named ›White Cloud Jasons Elegant‹ born in 1966 (GENTILE and TESTONI 2006).SMA is characterized by profound muscular atrophy affecting appendicular muscles, particularly of the rear limb (TROYER et al. 1993), and its gene is located on Chromosome 24 (MEDUGORAC et al. 2003, KREBS et al. 2006).SMA cases are reported mainly in advanced backcrosses between American BS and European Brown cattle breeds, and most of them can be traced back to an American BS bull named ›Meadow View Destiny‹ (GENTILE and TESTONI 2006).A further disease, Weaver, is also an inherited disorder of purebred Brown cattle characterized by progressive bilateral hind leg weakness and ataxia, resulting in a weaving gait.The U.S. sire ›Nakota Destiny Dapper‹ and his sons ›Target‹ and ›Matthew‹ were responsible for the diffusion of W. The W gene has been mapped to Chromosome 4. W is also associated with increased milk production (HOESCHELE and MEINERT 1990, GEORGES et al. 1993, GENTILE and TESTONI 2006).
In the Austrian BS population, the 4 diseases described above can be traced back to the following carriers: Norvic (Norvic Lilasons Beautician) born 1960, Elegant (White Cloud Jasons Elegant) born 1966 and Destiny (Meadow View Destiny) born 1953 as the source of A, SDM and SMA, respectively.Five bulls, namely Dapper (Nakota Destiny Dapper), Bruce, Modern, Zelad and Barbaray, were the ancestral carriers for W. Most of these bulls are related.Their genes were inherited from the two bulls ›Jame Royal‹ and ›Destiny‹.Furthermore, ›Modern‹, an ancestral carrier for W, has the same sire as ›Stretch Improver‹, an ancestral carrier for SMA.
Due to importation of BS semen from the US, and due to application of artificial insemination, alleles for genetic disorders might be widely spread in the Austrian BS population.At present, we can identify carrier genotypes of SDM, SMA and W by the use of molecular genetic tests (DISTL 2005, GENTILE andTESTONI 2006).Therefore, more information for estimating allele frequencies for this kind of analysis is available as compared to times when carries could only be identified by their affected offspring.The objective of this study was to estimate the autosomal recessive lethal allele frequencies of A, SDM, SMA and W using three methods based on gene dropping, gene counting and probability of gene origin, to compare these methods and to demonstrate the spreading of lethal alleles over time.

Material and methods
Brown Swiss (BS) data were provided by the Federation of Austrian cattle breeders consisting of pedigree data from 1 830 125 animals.The percentage of inbred animals (i.e.all animals with inbreeding coefficients higher than zero) and the mean level of inbreeding from animals with both parents known increased from 43 % and 0.01 in 1980 to 95 % and 0.04 in 2005.The number of ancestors explaining 50 % of the total genetic variability in different reference populations (defined as animals born within a certain year) decreased from 74 to 10 ancestors for the birth cohorts 1980 and 2005, respectively.However, the quality of pedigree information, which is a main issue in any kind of genetic evaluation (e.g.HARDER et al. 2005, NILFOROOSHAN et al. 2008), increased over time (Table 1).In the Austrian population 11, 16, 77 and 78 male carriers for A, SDM, W and SMA, respectively, were identified either by affected progeny or DNA test.The cumulative numbers of these carriers are shown in Figure 1.

Gene dropping approach (GD)
We propose a modified version of the GD developed by MACCLUER et al. (1986).This method is based on the idea of gene flow through a pedigree introduced by EDWARDS (1968).Two unique alleles are assigned to each founder (founder is defined as an ancestor with unknown parents), and the genotypes of all descendants along the actual pedigree are generated following Mendelian segregation rules, no segregation distortion (50:50 transmission probabilities during meiosis) and no mutation was assumed.MACCLUER et al. (1986) used the gene dropping method to describe the genetic variability within populations.Their analyses were directed at two questions: (1) what is the probability of a particular founder's genes being lost or fixed, and (2) what is the distribution of allele frequencies for each founder's gene (in particular, what proportion are at high risk of loss)?To answer this questions all alleles were considered to be neutral to selection.To adapt the approach to the calculation of lethal allele frequencies, each carrier was considered as a founder animal and one of its alleles was flagged as a lethal allele (i.e.being not neutral to selection anymore), because a carrier's genotype consists of one normal allele and one lethal allele.The flagged lethal alleles as well as the ›normal‹ alleles of carriers and founders were dropped through the pedigree.Then the frequency of lethal alleles was derived by counting the flagged alleles within the alleles of the reference population.However, if an ancestor of the reference population inherited 2 lethal alleles, the gene dropping procedure was repeated to avoid overestimation of the lethal allele frequencies in the reference population, because individuals homozygous for a recessive lethal allele must not spread their genes to the next generation.By 1 000 simulation runs of this process, the distribution curves of frequencies of recessive alleles from animals born in 2000 and 2005 were drawn.

Gene counting method (GC)
The GC method was proposed by ALLAIRE et al. (1982).This approach calculates the expected lethal allele frequency from known individual genotypes (carriers) among a set of relevant ancestors.Here we used a Fortran program written by LIDAUER and ESSL (1994) based on a concept developed by ALLAIRE et al. (1982).With this program, only six generations of each reference animal are traced back.The expected lethal allele frequency is then the summation of gene frequencies from all types of relationships (due to the relationship of carriers to the set of individual in the reference population) weighted by the probability that these genes are transmitted to the individuals, taking the distance in generations (n) to the original animals (0.5n) into account.If the relationship of a carrier to the set of individuals is paternal grandsire (heterozygote), sire (heterozygote) and sire (homozygote recessive lethal allele), the lethal gene frequencies in these ancestors are 0.5, 0.5 and 1.0 and the probability weights are 0.25, 0.5 and 0.5, respectively.The expected lethal allele frequencies of this carrier to the individuals in the reference population are the summation of expected lethal allele frequencies of these paths of relationships.Otherwise even an animal already identified as a carrier is only considered by its relationship to the most important ancestors (0.5n).However, for the calculation of the allele frequencies, we corrected for lethal selection as described by LIDAUER and ESSL (1994).

Probability of gene origin (PO)
Recent work done by MAN et al. (2007) 2007).The expected frequency of lethal alleles in a reference population is half of the total gene contribution of one entire carrier and half of the sum over marginal gene contributions of all carriers, assuming there is no selection against heterozygotes.MAN et al. (2007) applied this method to estimate the allele frequency of complex vertebral malformation in Holstein-Friesians based on a single carrier bull, namely ›Carlin-M Ivanhoe Bell‹.Here, we applied this method to estimate allele frequencies for diseases originating from more than one carrier.

Trend over time
We defined 7 reference populations of animals born in 1975 and every 5 years after that until the year 2005.The intervals represented roughly one generation change.Lethal allele frequencies were estimated in all reference populations based on all known carriers and only animals with both parents known were considered (Table 1).

Lack of carrier information
The methods were also compared with regard to their sensitivity for underestimating allele frequencies in case of missing information.We applied the three different methods for estimating allele frequencies from only important carriers in the reference population 2005 instead of taking all identified carriers.In a first step, we applied the programme prob_orig by BOICHARD et al. (1997) to identify the first 500 important ancestors contributing with their genes to the population of animals born in 2005.In this analysis, ancestors were ranked according to their marginal gene contribution to the reference population.Later, carrier animals for any of the mentioned disorders were identified among the first 500 important ancestors.We referred to these carriers as the important carriers.
In addition, we expected that considering only carriers in the first 500 important ancestors would be sufficient to estimate lethal allele frequency in the entire population, because the gene contribution of these 500 important ancestors can explain more than 93 % of the genetic variation of the reference population born in 2005 (Table 2).We also omitted carriers out of the first 500 important ancestors whose marginal gene contribution was lower than 0.01 %.

Effect from number of generations
To show the effect of underestimation of allele frequencies with the Fortran programme written by LIDAUER and ESSL (1994) based on gene counting, a sub sample of animals was created considering only animals born in 1960 or later with at least one occurrence of the bull Norvic in their pedigrees.Norvic, born in 1960, was an important ancestor for Arachnomelia disease, and his genes were widely spread in the Austrian BS population through AI.His gene contribution stayed at rank 6 in both reference populations of animals born in 2000 (3.71 %) and 2005 (4.23 %) in all of Austria (Table 3).The reduced pedigree included a total of 102 908 animals.The maximum number of generations  1990, 1995, 2000 and 2005, the maximum number of generations was greater than 6 and varied from 2 to 7, 2 to 9, 2 to 10 and 3 to 12, respectively (Figure 2).Based on the reduced pedigree, the allele frequencies of Arachnomelia were estimated for only one carrier, ›Norvic‹, by gene counting (with and without lethal selection), probability of gene origin and gene dropping.1975 1980 1985 1990 1995 2000 2005 Gene frequencies

Number of simulation runs in gene dropping
We also compared the results obtained from 1 000 and 10 000 simulation runs with gene dropping to evaluate the appropriate number of repetitions for this method.Median, arithmetic mean, standard deviation (SD) and 95 % confidence interval (95 % CI) of allele frequencies were compared for the 2 different numbers of simulation runs for the reference population of animals born in 2005, using the whole pedigree data and taking all carriers of each disease into account.

Results
Results for lethal allele frequencies are shown in Figure 3. Lethal allele frequencies estimated by PO showed not only the same trends of frequency, but also generally higher results as compared to the results from GD in all diseases.Conversely, the results from GC were lowest, and its trend differed from other methods in cases of A and SMA.With GC, the allele frequencies of A and SMA showed a decreasing trend after the year 2000, while when estimated by PO and GD, a slight increasing trend was observed.The results for SDM frequencies revealed a small increasing trend over time, while W frequencies showed a slightly decreasing trend after 1990.The results of sensitivity analyses among the 3 methods considering important carriers revealed only a general underestimation of allele frequencies.The amount of underestimation was similar with all 3 methods.Underestimation was highest for A (54 to 61 %), while SMA (92 to 97 %) showed the lowest bias (Table 4).
The results for Arachnomelia allele frequencies, based on Norvic as the only carrier, and estimated from the reduced pedigree, are shown in Table 5.If the maximum number of generations between animals in the reference populations and Norvic were within 6 generations the results for allele frequencies from GC without lethal selection were similar to those from PO and the results from GD were similar to the results from GC when lethal selection was applied.Additionally, when the maximum number of generations between particular animals in the reference populations and Norvic was greater than 6, Class of allele frequencies Density the results for allele frequencies derived from GC became lower as compared to the results from both other methods.Differences between methods became larger for ›younger‹ reference populations (i.e. more generations between reference animals and carriers).
The comparison of the results from 1 000 and 10 000 simulation runs of GD are shown in Table 6.

Discussion
Trends of lethal allele frequencies of SDM (16 carriers) and SMA (78 carriers) increased over time.This might be explained by the intensive use of disease carriers through AI.Conversely, W (77 carriers) carriers were used less intensively according to the trend of lethal allele frequencies: allele frequency has continued to drop since 1990, while the number of W carriers is still increasing.Lethal allele frequencies of A (PO and GD) fluctuated between 2 and 3 %, but the number of carriers also remained roughly constant (about 11).This is probably due to less intensive use of carriers through AI and the small number of carriers in the population.Moreover, it is not surprising that the number of carriers and also allele frequencies of W increased steadily from 1975 to around 1990 (Figure 5), because animal breeders were attempting to improve milk production, and at that time, information about the association of milk production with W disease was not available.HOESCHELE and MEINERT (1990) did not publish their results on an association between W disease and higher milk production until 1990.A few years later, the number of W carriers remained constant and the intensity of use of carrier sires decreased from 5.34 % in 1990 to 1.25 % in 1992.Statistics for the most recent year (2005) show that the intensity of use of carrier sires has dropped to 0.01 %.Consequently, W allele frequencies have decreased steadily in Austrian BS populations.In general, the lethal allele frequencies depended not only on the number of carriers but also on the genetic contribution of these carriers to animals in the reference populations.For example, the number of carriers for SMA is similar to W (78 and 77, respectively) but the estimated lethal allele frequencies in the reference population 2005 differed substantially (9.00 % for SMA and 4.01 % for W).In contrast, only 16 carriers are identified for SDM, but a higher lethal allele frequency (7.63 %) was found than for W.
The sensitivity of the 3 methods to lack of information on carriers was quite similar, but GC seems to be slightly more sensitive.GD seems to be less sensitive than PO.Therefore, for all methods, it is recommended to take all known carriers into account.
Lethal allele frequencies estimated by PO were slightly higher than GC and GD.The main reason is that the MAN et al. (2007) approach does not account for lethal selection, which we consider to be a main disadvantage because biased results must be expected.Hence, the results of lethal allele frequencies estimated by PO were similar to the results from GC when no lethal selection was applied to GC. GC as deterministic approach is recommended because both lethal and neutral selection can be considered.However, the software used here must be adapted to take all (not just a restricted number) of generations between the reference population and the carriers into account.
However, the main advantage of GD is that we get distribution curves of classes of allele frequencies and confidence intervals.If distribution curves show no overlapping of confidence intervals for 2 reference populations, it can be concluded that a significant shift of allele frequencies occurred.Overlapping of confidence intervals between two reference populations reveals only trends for increase or decrease of allele frequencies.Therefore, GD is the only method allowing estimation of allele frequencies and hypothesis testing simultaneously.
In general, to avoid overestimation of allele frequencies and to derive correct trends, it is recommended to subtract the genetic contribution of descendants from carriers confirmed as non-carriers (MAN et al. 2007).This could not be done here, because no information on confirmed non-carriers was available.Therefore, all trends presented here must be interpreted with caution.
The results from PO or GD seem to be more reliable in deep pedigrees.However, with our GD, lethal selection is taken correctly into account.A minimum number of gene drops (1 000 simulation runs) can be recommend.The results of mean, medians, SD and 95 % CI were almost similar to the results from larger number of gene drops (10 000).The distribution curves of classes of allele frequencies and 95 % CI help identify significant shifts of allele frequencies.Observing trends of allele frequencies with the gene dropping procedure is recommended as a tool for monitoring breeding programs.
animals in the reference populations1975, 1980 and 1985  to Norvic were within 6 generations, while for the younger animals born in

Figure 2
Figure 2 Distribution of the maximum number of generations from reduced pedigrees of reference populations of animals born from 1975 to 2005 Verteilung der maximalen Anzahl an Generationen für reduzierte Pedigrees der Referenztiere, die zwischen 1975 und 2005 geboren wurden

Figure 5
Figure 5 Intensity of use of sires carrying the Weaver allele from 1975 to 2005 Intensität des Einsatzes von Trägerstieren für Weaver von 1975 bis 2005

Table 1
Number of animals with known parent and total, % inbred animals, mean inbreeding of inbred animals (F ‾), number of animals accounting for 50 % of the genetic variation (NG) and pedigree completeness (1997)approach and uses estimates for the total and marginal gene contribution of each carrier to the gene pool of a defined reference population with pedigree data.The marginal gene contribution is defined as the gene contribution not yet explained by other ancestors and is calculated as an ancestor's total gene contribution minus the gene contribution from all relatives that have a larger gene contribution to the reference population(MAN et al.
describes a method based on PO.A free Fortran program was made available for download under http://www.vetsci.usyd.edu.au/reprogen/research/postgraduate.shtml.It applies BOICHARD et al.

Table 2
Number of ancestors and their cumulative marginal gene contribution to animals born in the year 2005 Anzahl Ahnen und ihr kumulativer marginaler genetischer Beitrag zu Tieren aus dem Geburtsjahrgang 2005

Table 4
Lethal allele frequencies for animals born in 2005 based on important carriers/all carriers, and in brackets, the relative value of allele frequencies in % when taking only important carriers into account Frequenzen für letale Allele für Tiere aus dem Geburtsjahrgang 2005 basierend auf den wichtigsten Trägertieren/allen Trägertieren und in Klammern der relative Wert der Allelfrequenz in % bezogen auf Berechnungen ausschließlich mit den wichtigsten Trägertieren

Table 5
Lethal allele frequencies from a reduced pedigree based on gene counting (with or without lethal selection), probability of gene origin and gene dropping from 1975 to 2005 Frequenzen für letale Allele aus einem reduzierten Pedigree basierend auf der Gene-counting Methode (mit und ohne Berücksichtigung von Letalselektion), basierend auf der Wahrscheinlichkeit des Genursprungs und der Gene-dropping Methode für Referenzpopulationen der Geburtsjahrgänge 1975 bis 2005