Eight Y chromosome genes show copy number variations in horses

Copy number variations (CNVs), which represent a significant source of genetic diversity on the Y chromosome in mammals, have been shown to be associated with the development of many complex phenotypes, such as reproduction and male fertility. The occurrence of CNVs has been confirmed on the Y chromosome in horses. However, the copy numbers (CNs) of Equus caballus Y chromosome (ECAY) genes are largely unknown. To demonstrate the copy number variations of Y chromosome genes in horses, the quantitative real-time polymerase chain reaction (qPCR) method was applied to measure the CNVs of the eukaryotic translation initiation factor 1A Y (EIF1AY), equine testis-specific transcript on Y 1 (ETSTY1), equine testis-specific transcript on Y 4 (ETSTY4), equine testis-specific transcript on Y 5 (ETSTY5), equine transcript Y4 (ETY4), ubiquitin activating enzyme Y (UBE1Y), sex determining region Y (SRY), and inverted repeat 2 Y (YIR2) across 14 Chinese domestic horse breeds in this study. Our results revealed that these eight genes were multi-copy; furthermore, some of the well acknowledged single-copy genes such as SRY and EIF1AY were found to be multi-copy in this research. The median copy numbers (MCNs) varied among different breeds for the same gene. The CNVs of Y chromosome genes showed different distribution patterns among Chinese horse breeds, indicating the impact of natural selection on copy numbers. Our results will provide fundamental information for future functional studies.


Introduction
The mammalian Y chromosome stands out from the rest of the genome because it is male specific, constitutively haploid, and exhibits unique structural and functional features (Skaletsky et al., 2003).This has led to a correspondingly unusual genomic landscape, rich in segmental duplications, which provide ample substrate for the generation of copy number variations (CNVs).CNVs, a major source of genetic variation between individuals, include deletions, duplications, and complex rearrangements typically larger than 50 base pairs to over several megabase pairs (Mb) in size.The male specific region of the Y chromosome (MSY) contains clusters of genes essential for male reproduction (Tüttelmann et al., 2011;Chang et al., 2013;Yue et al., 2013).In humans, CNVs of the testis-specific protein, Y-encoded (TSPY) have been found to be associated with semen quality and repro-duction via the regulation of cell division in the process of spermatogenesis (Vodicka et al., 2007).In cattle, the CNVs of Y-linked genes also affect male fertility and play an important role in spermatogenesis (Hamilton et al., 2012;Chang et al., 2013;Yue et al., 2013).However, information about the annotation and transcriptome of horse MSY is still lacking.
Horses have played an instrumental role in transportation, agriculture, and warfare and have been faithful companions of humans since their domestication.Since the 1900's, due to the continuous development of combustion engine, the use of horses has gradually ceased.However, horses have not faded from human life.In many countries, horses have become domestic animals of both social and economic value (Yang et al., 2010).Although, the genetic variants that underlie the phenotypic diversification of horse breeds are poorly understood.Systematic discovery of the Y chromosome of Equus caballus (ECAY) genes started in 2004 (Raudsepp et al., 2004).A detailed MSY gene catalogue was developed for the horse, with 37 horse MSY genes/transcripts being identified.The horse MSY harbors 20 X-degenerate genes and 17 acquired or novel genes (Paria et al., 2011); however, the specific CNVs of these genes have not been investigated.
In order to estimate the copy numbers of Y-linked genes in horses and compare the CNVs between different Chinese horse breeds, three X-degenerate genes, eukaryotic translation initiation factor 1A Y (EIF1AY), sex determining region Y (SRY), and ubiquitin activating enzyme Y (UBE1Y), were chosen to have their copy numbers determined in this study.SRY is a well-known sex determination gene and the confirmation of its copy numbers will be helpful to form a better understanding of its structural and functional characters.EIF1AY and SRY genes were identified as single copy (Paria et al., 2011).Copy numbers of UBE1Y orthologs were identified in other species (cats, pigs, and mice) (Mitchell et al., 1991;Quilter et al., 2002;Pearks Wilkerson et al., 2008), and the UBE1Y gene was considered to be multi-copy in horses (Paria et al., 2011).However, the copy number range of UBE1Y has not previously been provided.Five Y-ampliconic genes, equine testis-specific transcript on Y 1 (ETSTY1), equine testis-specific transcript on Y 4 (ETSTY4), equine testis-specific transcript on Y 5 (ET-STY5), equine transcript Y4 (ETY4), and inverted repeat 2 Y (YIR2), were all identified as multi-copy genes (Paria et al., 2011), but the ranges of copy numbers variations were inconclusive.Therefore, we investigated the CNVs of these eight Y chromosome specific genes in Chinese horses using the quantitative real-time polymerase chain reaction (qPCR) method.Our results will provide fundamental information regarding the copy numbers for horse Y chromosome genes, which will benefit future functional studies.

Sample collection
Blood samples of 302 male horses were collected from 14 Chinese domestic breeds distributed in northwestern and southwestern China (Table 1).In addition, samples from two female horses were also collected to be used as female controls and water was used as a negative control; this was undertaken to verify the male specificity of the primers.The genomic DNA was extracted using a standard phenolchloroform method (Sambrook and Russell, 2002).The DNA concentrations were diluted to 20 ng µL −1 with ultrapure water and stored at −20 • C.

Primer design
CNVs of eight Y chromosome specific genes, EIF1AY, ET-STY1, ETSTY4, ETSTY5, ETY4, UBE1Y, SRY, and YIR2, were investigated in this study.Because the sequence of horse Y chromosome is still lacking, copy numbers remain uncertain for any Y-linked gene.Therefore, the two-copy gene, beta-actin (GenBank acc.no.NC_009156) on an autosome was used as a reference.The PCR primers were designed for the beta-actin gene and the conserved region of the SRY gene, using the Primer Premier 5.0 program (http://www.premierbiosoft.com/).The other seven pairs of primers were obtained from Paria et al. (2011).The detailed information regarding the PCR primers and the predicted sizes of each amplicon is listed in Table 2. To confirm the Y chromosome-specificity of the designed primers, a routine PCR was performed using male and female horse genomic DNA as templates and water as a negative control.The PCR protocol was as follows: each 12.5 µL reaction contained 20 ng of genomic DNA, 5 pg of each primer (10 pmol µL −1 ), 6.25 µL of 2 × PCR Mix buffer (including 0.375 U Taq DNA polymerase, 2 × PCR buffer, 18.75 µM MgCl 2 , and 2.5 µM dNTPs), and 4.25 µL of distilled water.Thermocycling consisted of an initial denaturation at 95 • C for 5 min, followed by 35 cycles at 94 • C for 30 s, 40 s at annealing temperature (Table S1 in Supplement), 72 • C for 30 s, a final extension at 72 • C for 10 min, and then sample storage at 4 • C. The PCR production of the male samples and female and negative controls were visualized on 1 % native agarose gel.The images were acquired by a ChampGel ™ 6000 Gel documentation and image analysis system and Lane 1D Gel imaging analysis software (Sagecreation, Beijing, China).

Quantitative real-time polymerase chain reaction
The quantitative real-time polymerase chain reaction (qPCR) method was used to measure the CNs of EIF1AY, ETSTY1, ETSTY4, ETSTY5, ETY4, UBE1Y, SRY, and YIR2 in the samples using a Roche Lightcycler 480 system and SYBR PCR Master Mix (TAKARA, Dalian, China).Plates with 96 wells were set up to run the qPCR.On each plate, wells were laid out for a calibrator, and a negative control (distilled water).Standard curves were generated from horse DNA diluted to 60, 40, 20, 10, 5, 2.5, and 1.25 ng µL −1 for eight pairs of primers.For the test samples, DNA was concentrated to 5 ng µL −1 .qPCR reactions with standard curve samples and test samples (including the calibrator and negative control) were run in triplicate.In this study, we ran a total of 302 horses on 176 plates (each plate was set up for 1 calibrator, 1 distilled water, and 14 testing samples) for the eight Y chromosome genes.Each reaction contained 10 µL of SYBR Green PCR Master Mix, 0.8 µL of primers (10 pmol µL −1 ), 6.8 µL −1 of distilled water, and 1.6 µL −1 of DNA template (5 ng µL −1 ).The qPCR was run with a program made up of the following steps: predenaturation at 95 • C for 10 min, followed by 40 cycles of denaturation at 95 • C for 5 s, and annealing at an appropriate temperature (Table S1) for 30 s.A melting curve was then generated by taking fluorescent measurements every 0.11 • C from 60 until 95 • C. Primer efficiencies were measured according to the equation E = 10 (−1/slope) , and the slope was generated by a standard curve.

Copy number estimation
The CNs of EIF1AY, ETSTY1, ETSTY4, ETSTY5, ETY4, UBE1Y, SRY, and YIR2 were estimated for test samples using the following three equations described in Hamilton et al. (2009): ; (1) Copy number test sample = Copy number calibrator × (ratio) × 2. (3) In the above equations, the DNA sample of the horse Guizhou 59 was used as the calibrator.The cycle threshold (C T ) value of the calibrator for each gene was determined by the average of 66 C T values obtained from 22 different plates for this particular sample.In equations 1 -2, E = the PCR efficiency for the reference gene (beta-actin) or each target gene (EIF1AY, ETSTY1, ETSTY4, ETSTY5, ETY4, UBE1Y, SRY, and YIR2), and C T = C T of the calibrator − C T of the test sample.

Quantitative real-time polymerase chain reaction data validation by TA cloning
To verify the accuracy of qPCR results, TA cloning was performed for the EIF1AY gene.PCR products from eight samples (three Baise (BS) horses, two Yanji (YJ) horses, one Debao (DB) horse, one Hequ (HN) horse, and one American Quarter (Q) horse) were purified using a Universal DNA Purification Kit (TIANGEN, Being, China), then ligated into the pGEM-T Easy cloning vector and transformed into Escherichia coli DH-5α (CWBio, China).In total 108 clones (11-15 clones per sample) were picked and amplified using the PCR method.PCR products were sequenced on an ABI PRISM 377 DNA sequencer (Perkin-Elmer) (Shanghai Sangon Biotech Company, Shanghai, China).

Statistical analysis
In order to minimize technical error and to obtain an accurate CN estimation, raw qPCR data that showed a coefficient of variation (CV) > 1 % between the duplicates were excluded from further analysis.The normality of the CN data was assessed with the Kolmogorov-Smirnov and Shapiro-Wilk normality tests (Shapiro and Wilk, 1965;Justel et al., 1997).
Box plot analyses of the CN data were conducted to detect outliers in all the breeds as a whole.Multiple pair-wise comparisons of the median copy numbers (MCNs) between breeds were analyzed using a nonparametric Mann-Whitney U test (Mann and Whitney, 1947) with a Bonferroni correction (Dunn, 1961).MEGA 5.1 was used to align the cloning sequences (Tamura et al., 2011).

Primer male-specificity
In order to validate the male specificity of the primers used in this study, a routine PCR was run using female DNA as the negative control.The results demonstrated that every primer pair for the target genes, EIF1AY, ETSTY1, ETSTY4, ETSTY5, ETY4, UBE1Y, SRY, and YIR2, amplified a malespecific band with the expected fragment size.This confirmed that the primers designed are male-specific and can be used for qPCR analysis in this study (Fig. 1).

Standard curve and primer efficiency
The standard curves for the reference gene (beta-actin) and the eight target genes (EIF1AY, ETSTY1, ETSTY4, ETSTY5, ETY4, UBE1Y, SRY, and YIR2) were generated from horse DNA diluted to different concentrations; the correlation coefficients of the standard curves generated were all higher than 0.99.The resulting reactions had primer efficiencies higher than 1.90, demonstrating high amplification efficiencies.The correlation coefficients of the standard curves and the primer efficiencies for each primer are displayed in Table 2.

The copy number variations of eight genes on the equine Y chromosome
The gene copy numbers of tested horses were calculated using the calibrator as an adjustment based on Eqs. ( 2) and (3) (see Sect. 2)(for results see Table 3).Copy numbers determined by relative real-time PCR were considered to be approximations only, and not absolute copy numbers.As described in Hamilton et al. (2009), the copy number of the calibrator was estimated using the C T method.The ratios relative to the calibrator were determined by the C T method, which involves normalizing the samples to a calibrator to minimize the variation.Therefore, the relative copy numbers can be used to compare the relative amount of Ylinked genes between horses with confidence.Previous studies, have shown that relative real-time PCR can still produce a useful estimate of copy numbers (Yue et al., 2013;Hamilton et al., 2012Hamilton et al., , 2009)).
The eight horse Y chromosome genes studied were divided into X-degenerate genes (EIF1AY, SRY, and UBE1Y) and ampliconic genes (ETSTY1, ETSTY4, ETSTY5, ETY4, and YIR2) (Paria et al., 2011).Almost two thirds of the 20 X-degenerate genes found in horses are expressed ubiquitously and have a Y-linked homologue in mammalian species (Quiltere et al., 2002;Rohozinski et al., 2002;Skaletsky et al., 2003;Pearks Wilkerson et al., 2008;Hughes et al., 2010).Most equine Y-borne amplified sequences are expressed exclusively or predominantly in the testis (Paria et al., 2011), and presumably have a role in testicular function, which may be valuable in selecting stallions for breeding.
The UBE1Y gene is a X-degenerate gene, specifically expressed in testis (Paria et al., 2011).UBE1Y is conserved in most eutherians and marsupials, except that this gene is found as a pseudogene in some primate lineages and is absent in humans (Skaletsky et al., 2003).Our results showed that the CN of UBE1Y ranged from 1 to 77 among individuals (Table 3), which supports previous research stating that the horse is the only species where UBE1Y is a multi-copy gene (Paria et al., 2011).Orthologs in other species (cats, pigs, and mice) are single copy (Mitchell et al., 1991;Quilter et al., 2002;Pearks Wilkerson et al., 2008).The high copy number and testis-specific transcription of UBE1Y in horses, supports the hypothesis that the gene could be the gene responsible for regulating germ cell proliferation and, thus, male fertility (Lévy et al., 2000).
EIF1AY was recognized as a single-copy gene with ubiquitous expression (Paria et al., 2011).Nine breeds have one MCN for the EIF1AY gene, but copy number variations (1-37) were detected (Table 3).Therefore, we assumed that EIF1AY was not a single-copy gene.This suggests that the population size and the method used in studies influences the result and final conclusion.Based on the 108 cloning sequences, 48 polymorphisms and 42 haplotypes were detected for the EIF1AY gene in this study (Table S2), which proved that the EIF1AY gene was multi-copy in horses.This was in accordance with our results, demonstrating that the CNVs results were credible and accurate using the qPCR method.
Compared with the X-degenerate genes, the ampliconic gene content is more diverse among lineages (Skaletsky et al., 2003).Five Y-ampliconic genes, (ETSTY1, ETSTY4, ET-STY5, ETY4, and YIR2), were all present in multiple copies in our study, which was consistent with the results from Paria et al. (2011).The MCN of the ETSTY1 gene varied from 1 to 5 among the 14 breeds.The MCN of the ETSTY4 gene was highest in Chakouyi breed (13) and lowest in Guizhou breed (1).For the ETSTY5 gene, the Balikun breed possessed the highest MCN of 35, whereas the Guizhou breed only had a MCN of 2. The ETY4 gene had the highest MCN tested with a range from 10 in Guizhou horses to 76 in Ningqiang horses.The MCN of the YIR2 gene ranged from 4 to 26 (Table 3).The multi-copy portion of mammalian MSYs may share very little direct sequence homology between species, but is surprisingly consistent in function (Skaletsky et al., 2003;Hughes et al., 2010).Therefore, these multicopy MSY genes, with testis-specific or predominantly testis related expression, are most frequently related to the testis and possibly spermatogenesis and male fertility related functions.For example, the RBMY1 functional copy dosage is positively correlated with sperm motility, and dosage insufficiency is an independent risk factor for asthenozoospermia; therefore, comprehending the roll of CNVs in this gene is fundamental for understanding the cause of infertility (Chang et al., 2013;Yan et al., 2017).Other than the genes mentioned in this study, CNVs in different male-specific genes were found in mammalians.Nine gene or gene families of human Y chromosome showed CNVs.These included the partial deletions of the TSPY cluster and the AZFc region which may influence spermatogenesis and a novel complex duplication of the AZFa region (Wei et al., 2015).Two Ylinked genes (HSFY and ZNF280BY) of swamp buffalo also showed abundant CNVs (Zhang et al., 2017).
The development of different modern horse breeds and various Y chromosome lineages is a reflection of human selection and environmental adaptation and occurs much later than the domestication of the species (Vila et al., 2001;Wallner et al., 2017).Variations on the Y chromosome are important tools to analyze both Y chromosome lineages and domestication.In bulls, a genetic study on CNVs of Y-linked gene families in two ancestral Y-lineages has evaluated the effect of the number of Y-lineages on male reproduction and other traits (Yue et al., 2015).In horses, five MSY haplotypes have been identified by two Y-single nucleotide polymorphisms (SNPs) and one Y-indel (Han et al., 2015), and 42 other MSY haplotypes have been determined by 158 variants within domestic horses (Felkel et al., 2018); this suggests much higher diversity in Asian horses than in European breeds.
The CNV revealed a diverse distribution pattern among Chinese breeds in this study.The Guizhou was the breed which displayed the lowest CNV for the eight ECAY genes (Table S3-S10).Recent studies indicate that distribution of CNV regions may be shaped by natural selection (Cooper et al., 2007).This seems fitting, as most Guizhou horses are distributed in remote mountainous areas of Guizhou province and, due to the difficulty involved with transportation and occlusive conditions, rarely hybridize with other horse breeds.Therefore, it is possible that the gene copy numbers of Guizhou horses were relatively lower than other breeds before domestication.The results of this study support the assumption that CNVs might have been conserved for a long time and then passed on during the domestication of the horse (Metzger et al., 2013).
We suggest that other methods, such as array comparative genomic hybridization (array-CGH) (Wei et al., 2015;Shi et al., 2018) and the AccuCopy ® assay method (Yan et al., 2017) could be applied to increase the accuracy of CNVs detection of ECAY genes.More functional analyses regarding the relationship between CNVs on ECAY and fertility should be investigated in the future.

Conclusion
In this study, we first investigated the CNVs of eight Y chromosome genes in Chinese horses.The EIF1AY, ETSTY1, ET-STY4, ETSTY5, ETY4, UBE1Y, SRY, and YIR2 were multicopy with MCNs of 1, 3, 8, 9, 26, 7, 1, and 12.The CNVs of Y chromosome genes showed different distribution patterns among Chinese horse breeds, indicating a natural selection effect on horse evolution and CNV formation.
Data availability.The measurement data involved in this study are available upon request to the authors.

Figure 1 .
Figure 1.Gel electrophoresis of the PCR products of the beta actin gene and the eight Y chromosome horse genes.M is the 2 kb DNA ladder; ♂ is male horse genomic DNA; ♀ is female horse genomic DNA; and N is the negative control (distilled water).

Figure 2 .
Figure 2. Box plot analysis of eight genes CNs in horse.The outliers were indicated by a solid circle or an asterisk (extremely high CN).

Table 1 .
Sample information for the 14 chosen Chinese horse breeds.

Table 2 .
Correlation coefficient of the standard curve and the primer efficiency for eight horse Y chromosome genes.

Table 3 .
Copy number variations of eight Y chromosome genes in horses.