Genetic diversity and phylogenetic relationship of nine sheep populations based on microsatellite markers

Abstract The objective of this study was to assess the genetic diversity and phylogenetic relationship of nine sheep populations, including two famous high prolific populations and seven popular mutton populations raised in China. Overall, these sheep populations in this study exhibited a rich genetic diversity. Both the expected heterozygosity and Nei's unbiased gene diversity ranged from 0.64 to 0.75, with the lowest value found in Dorset sheep (DST) and the highest in Hu sheep (HUS) and Ba Han sheep (BAS). The polymorphic information content (PIC) varied between 0.59 in DST and 0.71 in HUS and BAS. Specifically, for individual breeds, the small-tail Han sheep (STH) and the four introduced populations did not display the expected diversity; therefore more attention should be paid to the maintenance of diversity during management of these populations. The results of un-weighted pair-group method (UPGMA) phylogenetic tree and structure analysis indicated that the nine investigated populations can be divided into two groups. Suffolk (SUF) and DST were clustered in one group, and the other group can be further divided into three clusters: German Mutton Merino (GMM)–BAS–Bamei Mutton sheep (BAM), HUS–STH and Du Han (DOS)–Dorper (DOP). This clustering result is consistent with sheep breeding history. TreeMix analysis also hinted at the possible gene flow from GMM to SUF. Together, an in-depth view of genetic diversity and genetic relationship will have important implications for breed-specific management.


Background
Sheep is one of the earliest domesticated animal species (Zeder et al., 2006;Bernardo et al., 2009). It provides meat and milk with high-quality protein and useful accessory products (wool and skin) for human. Therefore, sheep became an important livestock species and promoted the spread of human farming civilization (Jing and Zhang, 2009). China has a long history of sheep domestication (Kawamura et al., 2005) and famous sheep breeds with high reproductive performance. For example, small-tail Han sheep (STH) and Hu sheep (HUS) had attracted much attention for their excellent characteristics of high litter size and year-round estrus (China National Commission of Animal Genetic Resources, 2011). In addition to reproductive traits, meat production traits are also the focus of attention for sheep farmers. Since the growth rate and performance of meat production in Chinese sheep populations are relatively low com-pared with famous mutton sheep breeds worldwide (e.g., Dorper (DOP), Suffolk (SUF), Dorset (DST) and German Mutton Merino (GMM)), these perfect mutton breeds were introduced to China in the past two decades and are now common in many farms of China. However, with the introduction of foreign breeds to improve production performance of indigenous purebred sheep, the number of indigenous purebred sheep with high fertility is gradually decreasing (especially for STH) in China. Therefore, the evaluation of current genetic diversity situation in these sheep populations is very essential for breed protection. In addition, this study also involved seven mutton sheep populations raised in China including four introduced mutton populations mentioned above and three mutton populations in Inner Mongolia of China, which is the main producing area of mutton in China. The genetic diversity assessment for the seven mut-ton populations will be helpful for the formulation of plans of further management and utilization for these populations.
Microsatellite markers belong to a class of neutral markers, which have rich polymorphism in population and are widely distributed in the genome; therefore they are suitable for genetic diversity analysis. Previous studies indicated that microsatellite markers had been well used to assess genetic diversity and population structure of sheep breeds (Di et al., 2012;Dotsev et al., 2018;Madilindi et al., 2019;Vargas et al., 2018;Laoun et al., 2020;Ullah et al., 2020) and goat breeds (Carvalho et al., 2015;Câmara et al., 2017;E et al., 2019;Liu et al., 2019). Therefore, in this study, a panel of 26 microsatellite markers was employed to evaluate the genetic diversity, population structure and phylogenetic relationship of the nine sheep populations. The results will provide important information for the implementation of breed-specific management, propagation or conservation programs.

Animal ethics
All experiments involving animals were authorized by the Animal Ethics Committee of the Institute of Animal Science, Chinese Academy of Agricultural Sciences (no. IAS2019-49).

PCR amplification and genotyping
All sheep were genotyped using 26 microsatellite markers distributed in different chromosomes. Information of 26 microsatellite markers is summarized in Table 2. The primers for the first 24 microsatellite markers were recommended by the joint ISAG-FAO Standards Committee for analysis of sheep genetic diversity (Arranz et al., 1998(Arranz et al., , 2001Martin-Burriel et al., 1999;Diez-Tascon et al., 2000;Stahlberger-Saitbekova et al., 2001;Pariset et al., 2003;Tapio et al., 2003). The primers for the other two loci are selected from the report of Davies et al. (1995) and Di et al. (2012). Forward primers were 5 -labeled with fluorescent dyes (FAM or HEX, Sangon Biotech (Shanghai) Co., Ltd.). The PCR pro-tocol refers to the report of Zhong et al. (2011). PCR amplifications were performed in 12 µL reaction volumes containing 50 ng of genomic DNA, 2.5 mM MgCl 2 , 250 µM of each dNTP, 0.025 µM of each primer, 1.25 units of Taq polymerase and 1 × magnesium-free PCR buffer (Takara, Japan). Amplifications were carried out using the GENEAMP PCR 9700 thermocycler (Applied Biosystems), with the following cycling parameters: 94 • C for 5 min followed by 30 cycles of 94 • C for 30 s, annealing at 55-60 • C for 30 s, 72 • C for 30 s and a final step at 72 • C for 10 min. PCR products were diluted to 1/2-1/4 concentrations, and 0.75 µL of each diluted product was then mixed with an internal standard (Gene Scan 500 LIZ™, Applied Bio systems, USA) according to the manufacturer's instructions. Genotyping was performed using a Genetic Analyzer 3130 xl (Applied Bio systems, USA). Fragment analysis was performed using GEN-EMAPPER V3.7 software (Applied Biosystems). The thirdorder least squares method was used for allele size determination (Mburu et al., 2003). Genotyping was repeated once if individual samples failed to amplify.

Statistical analysis
The genetic diversity parameters of nine sheep populations, including mean number of effective alleles (N A ), the expected heterozygosity (H E ), the observed heterozygosity (H O ), the polymorphic information content (PIC) and Nei's unbiased gene diversity (H S ), were calculated using Microsatellite Toolkit software 3.1.1 (Park, 2008). The genetic parameters for each microsatellite locus, including the effective number of alleles (N E ), the allelic richness over all samples per locus (Rt), the fixation index within populations (F IS ), the fixation indices of total population (F IT ) and the pairwise differences between the populations (F ST ), were obtained using FSTAT 2.9.3.2 (Goudet et al., 2002). Dispan was used to calculate the genetic distance of D A and D S between populations and build the phylogenetic tree (Nei et al., 1983). Population structure was analyzed by Structure 2.3.4 with Bayesian clustering model, and its results were visualized by Distruct 1.1 (Falush et al., 2003;Pritchard et al., 2000;Rosenberg, 2003). TreeMix 1.13 was used to construct a maximum likelihood tree and infer migration events between branches (Pickrell et al., 2012).

Genetic diversity in nine sheep populations
In this study, the genetic diversity analysis was firstly performed in nine sheep populations. The mean number of effective alleles (N A ) is the largest in STH (10.50) and the least in SUF (8.12). The expected heterozygosity (H E ) and observed heterozygosity (H O ) within the population ranged from 0.64 to 0.75 and 0.63 to 0.74, respectively, with the lowest values found in the DST and the highest in the BAS.  The highest PIC was observed in HUS and BAS (0.71), followed by GMM (0.70), and the lowest value was observed in DST (0.59). The Nei's unbiased gene diversity (H S ) varied between 0.64 in DST and 0.75 in HUS and BAS (Table 3).
Overall, these sheep populations in this study exhibited a rich genetic diversity. However, specifically for a single breed, the STH did not display the expected diversity. In addition, compared with other populations, the diversity of the four introduced populations raised in China was relatively low (especially in DST) in the nine sheep populations. Therefore, the above five populations (STH, DST, DOP, SUF, and GMM) exert more of our attention to the maintenance of diversity during population management. Meanwhile, it was observed that all the microsatellite loci were polymorphic across nine populations. A total of 435 alleles were identified in nine sheep populations, and the number of alleles per locus ranged from 8 (SRCRSP5) to 25 (MAF214) ( Table 2). The highest effective number of alleles (N E ) was observed at locus DYMS1 (9.2113) and the lowest at BM8125 (2.3668). Allelic richness over all samples per locus (Rt) was measured between 4.6968 (SRCRSP5) and 14.0685 (MAF70), with a mean of 9.0208. The PIC across all the populations ranged from 0.4754 (MCMA54) to 0.8325 (MAF70). In total, these loci were polymorphic and suitable for sheep genetic diversity analysis.

Genetic differentiation within and between sheep populations
In order to analyze the degree of differentiation within and between populations, F statistics were calculated in the nine sheep populations. The results are shown in Table 2.
The F IT value of the 26 microsatellite loci varied between 0.0281 (MCM140) and 0.4481 (SRCRSP5) in the nine populations. The F IS value ranged from −0.0500 (SRCRSP9) to 0.1551 (TGLA53). In the pairwise-population F ST analysis, the greatest divergence was observed between BAM and DOP (0.1212) and between BAM and DST (0.1212) (Table 4 and Fig. 1). According to the chi-square test of F ST , all populations showed significant divergence (P < 0.05) from each other ( Table 4). The overall F ST of each locus for nine sheep populations ranged from 0.0473 to 0.1414, with a mean of 0.079. The results indicated that the genetic variation between these populations reached 7.9 %.

Phylogenetic relationship and population structure of nine sheep populations
Firstly, the genetic distance among nine sheep populations is shown in Table 5. The genetic distance D A ranged from 0.0434 to 0.3418 and D S ranged from 0.0639 to 0.2081. According to the Nei's genetic distance (D A ), an un-weighted pair-group method (UPGMA) phylogenetic tree of nine sheep populations was constructed (Fig. 2). The results indicated that nine sheep populations can be divided into two groups. SUF and DST were clustered in one group. The other seven populations were classed into another group. Then the second group was further divided into three clusters: GMM-BAS-BAM, HUS-STH and DOS-DOP. The above clustering results were also confirmed by structural analysis. In Fig. 3, each population is represented by a rectangle broken into k colored segments. At k = 2, all the populations were separated into two different groups: the first group comprised SUF and DST, and the other consisted of the remaining seven populations. At k = 3, the remaining seven populations were divided into two clusters (GMM-BAS-BAM and HUS-STH-DOS-DOP). At k = 4, HUS and STH were clustered into one new subgroup, and DOS and DOP were clustered into another subgroup. In addition, from k = 5 to k = 7, no more new clusters generated. The clustering results were highly consistent with the UPGMA phylogenetic tree.
To infer migration events between populations, a maximum likelihood tree (Fig. 4a, b) was built using TreeMix. The residuals from the trees were inspected to estimate how well the model fit the data (Fig. 4c, d). Residuals above or below zero indicate that populations are more or less close to each other than they are presented in the tree. Based on this, it is implied that the actual genetic relationships between SUF and GMM and between DST and STH were closer than the evolutionary tree shows when no migration events were allowed (Fig. 4a, c). The model with one migration event (Fig. 4b) alluded an obvious gene flow from GMM to SUF. Combining the maximum likelihood tree (Fig. 4b) and residual plot (Fig. 4d), it was suggested that DST was more closely related to GMM than presented in the tree. However, when two or more migration events were assumed, the phylogenetic relationship shown in maximum likelihood tree was inconsistent with true breeding history of these sheep populations; therefore they are not shown in Fig. 4.

Discussion
The results of genetic variation analysis give us some hints. Firstly, overall, the nine sheep populations in this study have high genetic diversity and their mean H E and H O reached 0.71 and 0.68, respectively. The higher the genetic heterozygosity, the greater the variation of population (Bai et al., 2014;Dotsev et al., 2018;Jawasreh et al., 2018). Therefore, the large genetic variation existed within these investigated sheep populations. Secondly, as far as single breed is concerned, HUS has a higher genetic diversity, while STH did not display the expected diversity. This is consistent with the actual situation that HUS is being valued more in recent years, and therefore the number of being raised is very large, while the number of purebred STH is decreasing gradually. From the perspective of genetic diversity indicators (H E and H O ), the diversity of the current STH population is reduced compared to a few years ago (Ma et al., 2006;Liu et al., 2014). So, it is essential to formulate timely measures to protect genetic diversity of the famous prolific breed for meeting different needs in the future. In addition, the genetic diversity of four introduced mutton breeds (especially in DST) was relatively low in this study; therefore the result reminds us that the new individuals without kinship to current population should be added into these populations in order to keep rich genetic diversity and maintain excellent production performance as well as avoid inbreeding depression. Thirdly, for sheep in different regions of the world, Chinese sheep populations have an upper-middle level of genetic diversity, which is similar with previous reports (Niu et al., 2012;Liu et al., 2014;E et al., 2016. The genetic diversity in these Chinese sheep populations is higher than that of several breeds in Africa (e.g., West African Dwarf and Uda in Nigeria) (Agaviezor et al., 2012) and Europe (e.g., Cres island sheep and Lika pramenka sheep in Croatia; Denmark sheep in eastern Europe) (Tapio et al., 2010;Salamon et al., (Dossybayev et al., 2019;Dudu et al., 2020), implying that a moderate genetic differentiation (5 %-15 %) exists among sheep populations. The results of the UPGMA phylogenetic tree and structure analysis are consistent with breeding history of these sheep breeds. First of all, SUF and DST were clustered in one group. Since both SUF and DST horn were originally bred in the UK and both of their sire lines are Southdown rams (China National Commission of Animal Genetic Resources, 2011), genetic relationships between the two breeds were relatively closer and clustered together. Then the other seven sheep populations were clustered into another group. Of them, the three populations (GMM, BAM and BAS) were classed in one subgroup. The sire line in the initial breeding process of BAM and BAS is GMM and BAM rams, respectively. Therefore, a close genetic relationship existed among the three populations. Then, both HUS and STH belong to short fat-tailed Mongolian sheep (China National Commission of Animal Genetic Resources, 2011; Wang et al., 2014;Wei et al., 2015). A previous study suggested that, from the Mongolian Plateau, short fat-tailed sheep migrated southeast to the east of China (Zhao et al., 2017), which is a current distribution area of HUS and STH. Therefore, the two breeds were clustered together.
The gene flow and genetic exchange in the nine sheep populations were elucidated by TreeMix analysis. The results implied the possible gene flow from GMM to SUF. First of all, the gene flow is probably related to the history of human migration in Europe. From the 7th century BCE to the 3rd century BCE, Celtic tribes living in Germany had a great westward migration, and some of them entered Britain (Byrne et al., 2018). Then, from the fifth century CE, some of the Saxons and Angles of Germanic tribes again poured into Great Britain from Germany. Sheep has been considered as an important and easy-to-carry domestic animal for human several thousand years ago; therefore the gene flow between GMM and SUF might appear following human migrations, commercial trade, and extensive transport. Secondly, the gene flow was associated with the widespread use of Merino sires across Europe that commenced after the Middle Ages. Therefore, extensive haplotype sharing between Merino (including GMM) and other European breeds was observed (Kijas et al., 2012). Thirdly, previous analysis on the genetic relationships among world's sheep breeds suggests a major sheep migration route from South-West Asia to Britain and the Nordic regions, whose end potentially includes the route from Germany to the Britain (Kijas et al., 2012). Meanwhile, the genome-wide analysis of world's sheep breeds reveals frequent genetic exchange occurred during the development of modern sheep breeds (Kijas et al., 2012). In summary, these results can help us further understand the migration history and genetic relationship among populations for this important livestock species.

Conclusions
As a whole, these sheep populations in this study exhibited rich genetic diversities. However, specifically for individual breeds, STH and the four introduced populations did not display the expected diversity. The results of the UPGMA phylogenetic tree and the structure analysis are consistent with sheep breeding history. TreeMix analysis hinted at a possible gene flow from GMM to SUF. Together, these results will have important implications for sheep breed-specific management.
Data availability. The data sets are available upon request from the corresponding author.
Author contributions. Conceptualization was done by RZ. Data curation was carried out by RZ and RD. The formal analysis was carried out by QX. MC acquired funding. Methodology was done by QX. Project administration was carried out by RD and XW. CW and ZP were responsible for the software. The original draft was written by QX and RD, while RD, MC and ZP wrote the review and edited the manuscript. Review statement. This paper was edited by Steffen Maak and reviewed by three anonymous referees.