Casein haplotype diversity in seven dairy goat breeds

Abstract Selection, drift, gene flow and breeding have extensively shaped the genomic variability of domestic animals. In goat species, several mutations identified within the casein genes have been shown to affect the level of gene expression of milk production traits. The four casein genes – CSN1S1, CSN2, CSN1S2 and CSN3 – are organized in a cluster of 250 kb located in chromosome 6, and due to tight linkage, their genetic variability is well depicted by haplotypes which are transmitted to the progeny. Thirty single nucleotide polymorphisms (SNPs) located within the casein gene cluster were used to characterize the haplotype variability of six southern Italian goat breeds (Girgentana, Maltese, Rossa Mediterranea, Argentata dell'Etna, Messinese, Capra dell'Aspromonte). A representative sample of the Norwegian dairy goat breed (Norsk melkegeit) has been used as an out-group to obtain a weighted measure of genetic diversity in the metapopulation. A total of 54 haplotypes were detected among the seven breeds: 26, 9, 8 and 11 haplotypes were found at CSN1S1, CSN2, CSN1S2 and CSN3 respectively. The number of haplotypes per breed was 14 (Norwegian), 26 (Messinese), 27 (Rossa Mediterranea and Girgentana) and 31 (Maltese, Argentata dell'Etna and Capra dell'Aspromonte). The Maltese breed showed the highest number of private haplotypes, whereas the Norwegian goat recorded the highest number of shared haplotypes. The linkage disequilibrium analysis showed higher levels of association for the SNP pairs within casein loci than SNP pairs between casein loci, likely reflecting low levels of intra-genic recombination. The highest linkage disequilibrium values were found in CSN1S1 and CSN2 genes in all the breeds, except for Argentata dell'Etna and Rossa Mediterranea. The resolution of the haplotype diversity at the casein cluster can be exploited both for selective and conservative plans.


Introduction
The casein cluster is a genomic region of particular interest in dairy ruminants. In goat species, the genetic variability of caseins arises from several mutations that have been shown to affect gene expression level; the quantitative alleles modify the amount of single caseins in individual milk and, consequently, the technological and nutritional properties of goat milk (Marletta et al., 2007). Due to the tight association among the four casein genes, organized in a cluster of 250 kb, the study of the haplotype diversity, instead of single locus genotyping, seems to be preferable in view of planning selection strategies (Haplotype Assisted Selection) (Hayes et al., 2006;Caroli et al., 2006) and investigating breed genetic variability (Sacchi et al., 2005;Finocchiaro et al., 2008;Gigli et al., 2008;Küpper et al., 2010). South Italy preserves a rich heritage of dairy goat breeds: some of them originate from the Far and Middle East (Porter, 1996); others are derived from indigenous goats or were locally developed through different crossbreeding strategies. All these breeds have low population sizes; some of them are threatened by extinction and genetic erosion. In this regard, the molecular genetics can provide effective tools to effectively describe, monitor and manage genetic resources by handling variability within and among breeds. A recent study (Criscione et al., 2016) dealt with the genetic characterization of these goat genetic resources by means of a set of microsatellites (STRs), assumed to be selectively neutral markers and described the intra-and inter-breed diversity in view of planning conservation priorities. However, the genomic regions harbouring quantitative loci for milk, such as casein genes, are worth investigating in dairy breeds, in order to better analyse, characterize and explore their genetic diversity. The resolution of the haplotype diversity at the casein cluster can be exploited both for selective and conservative plans. The aim of this study was to investigate the haplotype diversity of the four casein genes' cluster in six southern Italian goat breeds (Argentata dell'Etna, Capra dell'Aspromonte, Girgentana, Maltese, Messinese, Rossa Mediterranea); a representative sample of the Norwegian dairy goat breed (Norsk melkegeit) was used as an out-group to obtain a weighted measure of genetic diversity in the metapopulation. The haplotype variability, assessed by 30 single nucleotide polymorphisms (SNPs), was used to estimate the genetic diversity within and between, as well as the genetic relationship among these dairy goat breeds. The use of these data for a potential casein haplotype assisted conservation plan was also briefly discussed.

Sampling and SNPs analysis
A total of 207 goats and bucks representative of seven dairy goat populations were sampled in Italy (31 Girgentana GIR, 30 Argentata dell'Etna ARG, 30 Maltese MAL, 22 Rossa Mediterranea ROS, 30 Messinese MES and 31 Capra dell'Aspromonte ASP) and in Norway (33 Norwegian dairy goat NOR). All the animals were reared under real commercial farm conditions. Authorized personnel collected blood samples during the periodic veterinary control; therefore, no pain, suffering, distress or lasting harm was caused to the animals involved in the present study. The sampling was obtained from an average of nine herds per population to avoid closely related individuals, also according to the information obtained by farmers and genealogical data when available. More information on breeds' origin, sampling regions, total population size and the year of the studbook's establishment are reported in Criscione et al. (2016). Blood samples were collected in 10 mL vacutainer tubes (K3-EDTA). DNA was extracted from peripheral blood using the Illustrablood genomic Prep Mini Spin kit (GE Healthcare, Little Chalfont, UK). The whole sample was genotyped through a set of 30 SNP markers located in the promoter, exonic and intronic regions of the four casein genes (11 CSN1S1, 6 CSN2, 4 CSN1S2 and 9 CSN3). The genetic characterization was performed using matrix-assisted laser desorption-ionization time-of-flight mass spectroscopy (MALDI-TOF MS) implemented in a MassARRAY System (Sequenom, San Diego, Ca, USA).

Statistical analysis
The software PHASE ver. 2.1, which implements a Bayesian statistical method, was used to reconstruct haplotypes from the allelic frequencies within each gene in each population (Stephens et al., 2001). The threshold frequency for the determination of haplotypes was set to 1 %.
The level of linkage disequilibrium (LD) between all pairs of loci in each breed was calculated through the software Haploview ver. 4.1 (Barrett et al., 2005) using the r2 statistic (Hudson, 1985).
The deviation from Hardy-Weinberg equilibrium was investigated, for each SNP in each population, by using the software Genepop ver. 4.2 (Raymond and Rousset, 1995) according to the procedure of a probability test.
The frequencies of the inferred haplotypes of the four casein genes in each goat breed were used to calculate the Nei's pairwise genetic distance (D s ) (Nei, 1972) by implementing the software Phylip 3.67 (Felsenstein, 2005). The Neighbour-Net algorithm was used to construct the phylogenetic network on D s distances by running the software SplitsTree4 ver. 4.14.2 (Huson and Briant, 2006). The haplotypes inferred at the four casein genes in each goat breed, were used also to implement the principal component analysis (PCA) by means of the software Minitab ver. 16.1 (Minitab, Inc., 2009)

Results
The results of the SNP genotyping are shown in Table 1. Three SNPs were found to be monomorphic in some breeds: SNP 13 in ROS, SNP 14 in GIR and NOR, SNP 19 in NOR. The distribution of the allelic frequencies was markedly different between the Norwegian sample and the Italian breeds: in particular 20 of 30 SNPs in NOR had a frequency of the minor allele (MAF) below 20 %, whereas a maximum of 7 of 30 SNPs reported the same frequency threshold among the Italian breeds (ASP). The lowest frequency of the minor allele (MAF) was 0.017 at SNP 13 and 14 in MAL. The deletion (allele D) at CSN1S1 exon 9 (SNP 8) was found rarely in NOR (0.076), while it showed a frequency always above 30 % in all the Italian breeds and represented the major allele in MAL and ROS.
A sizable number (13) of SNPs were in Hardy-Weinberg disequilibrium in GIR (Table 1), while a maximum of three SNPs were in HW disequilibrium in the other Italian breeds, and there were none in NOR. A total of 54 haplotypes were inferred within the four casein genes ( Table 2): 26 at CSN1S1, 9 at CSN2, 8 at CSN1S2 and 11 at CSN3. In each breed, the first two haplotypes always accounted for more than 50 % of the total frequency, ranging from 55 % to 84 % in the Italian breeds, except for MES and ROS at CSN1S2. NOR breed showed the highest frequency of the first two haplotypes at CSN1S1, CSN2 and CSN1S2, always above 90 %. Private haplotypes had a frequency always below 5 % except for haplotype a16 in GIR (f = 0.13). The breeds showing the highest number of haplotypes (31) were ARG, ASP and MAL (Table 3); the latter had also the highest number of private haplotypes. In contrast, NOR had the least number of haplotypes (14) together with the highest percentage of shared haplotypes.
In Table 4 are reported the results of LD analysis as average values within each locus and across loci in each breed. In Table 1   all the breeds, the intra-locus LD level highlighted the highest values at CSN1S1 and then at CSN3, CSN2 and CSN1S2, in order of decreasing SNP number. NOR showed the highest level of LD within each of the casein genes, with a very high value at CSN1S1. Across the genes, very high LD values were found between CSN1S1 and CSN2, in all the breeds except for ARG and ROS. The matrix of pairwise D s genetic distance (Table 5) reported NOR and MES as the most and the least distinctive breed respectively. ASP and ARG highlighted the lowest pairwise value of the matrix, while NOR and MAL were the highest in the data set. Among the Italian goats, GIR and MAL were the most genetically distant breeds, even if the pairwise values of D s distance among the Italian goat populations were mostly comparable. The representation of the D s genetic distance using the Neighbour-Net algorithm (Fig. 1) clearly shows the dichotomy between the Norwegian goats and the Italian breeds, in which the three most selected breeds (GIR, ROS and MAL) grouped in a cluster separated from the mountain populations (MES, ASP and ARG). The haplotype frequencies, condensed in the principal component analysis (PCA), showed a spatial distribution of goat breeds (Fig. 2) in which the NOR was clearly separated from the Italian goats, the most selected Sicilian breeds (GIR, MAL and ROS) were clustered closely and were separated from MES and from the ARG-ASP group.

Discussion
The demographic history of a breed, including gene flow, drift and bottleneck events, shapes the distribution of nucleotide polymorphisms and the extent of genomic linkage disequilibrium (Nordborg and Tavare, 2002); therefore, the estimate of the haplotype variation and the LD in a casein cluster are very informative in estimating the effects of the selection, migrations or admixture in dairy goat populations.
Among the six Italian breeds, a rather high number of haplotypes per population was observed (from 26 in MES to 31 in ARG, ASP and MAL) with a frequency distribution and a number of private haplotypes, which denoted a high rate of within-breed variability and, at least partly, the absence of systematic breeding strategies. In contrast, the NOR breed showed the least within-breed genetic variability; the low values of the MAF for most of the 30 SNPs, the low number of haplotypes per locus and the unique private haplotype c8 (CSN1S2 locus) described a very low variability in the casein cluster as well as strong distinctiveness. These findings could be linked to the selection management and the low number  of founders of the Norwegian goat breed (Hayes et al., 2006). In fact, the planned mating system that uses buck circles on the national territory has undergone a downsizing following the implementation of the national programme for the eradication of caprine arthritis encephalitis virus since 2001 (Ådnøy, 2014); the number of available Norwegian bucks has decreased, and this has probably reduced genetic diversity. Nevertheless, the marked homogeneity of the casein cluster highlighted here does not seem to have led to a severe inbreeding rate, as shown by the population study through anonymous neutral marker STRs (Criscione et al., 2016).
In the Italian breeds the percentage of shared haplotypes (never > 42.3 %) was quite low in comparison with NOR (78.6 %). These data are partially unexpected taking into account the gene flow that can potentially occur between Italian populations reared in the same area, in extensive systems and in the absence of structured selective breeding programmes, but they are in agreement with the pronounced degree of biodiversity and morphological differentiation described in these local breeds. In particular, MAL, with its high number of haplotypes and private combinations, seems to retain a great amount of heterogeneity mainly located in the CSN1S1 locus. It is also worth noting the presence of a peculiar allelic distribution on SNP 8, in which the deletion (exon 9) represents the major allele in MAL and ROS goats. This mutation has previously been described as common in some Mediterranean breeds, including GIR and MAL (Sacchi et al., 2005;Gigli et al., 2008;Finocchiaro et al., 2008).
The level of linkage disequilibrium for pairs of markers within each casein locus was higher than for pairs of markers from different loci, as reported by Hayes et al. (2006) and Finocchiaro et al. (2008). The highest LD values were found in CSN1S1 and CSN2 genes in all breeds, except for ARG and ROS (Table 4).
The genetic relationship among the breeds, described according to D s distance, showed the phylogenetic distinctness of NOR from the Italian goat breeds. The set of SNPs used for the genetic characterization separated clearly the most selected dairy breeds GIR, MAL and ROS from the unselected populations (MESS, ASP and ARG) reared in mountainous regions and generally less productive. The spatial distribution of the seven goat breeds, represented through the score plot of the first two components of the PCA, was highly comparable to that of the Neighbour-Net built on the D s distance. Overall, 79 % of the variance showed two main results, as expected: the high distance between the NOR and the Italian goats as well as the clustering of the Sicilian breeds, which have been historically subjected to selection for dairy purposes (GIR, MAL and ROS). This last outcome slightly differs from the previous study, carried out by means of 20 microsatellites, which highlighted a clear admixture between ROS and the genetic pool of the mountainous population (Criscione et al., 2016), and it is due primarily to the use of different unlinked and non-neutral markers, which showed a different pattern of evolution.
Instead, given the remarkable genetic and geographical separation of NOR from the Italian breeds, the high level of shared haplotypes (78 %) between the Norwegian sample and the Italian breeds seems quite surprising. Due to the great geographical distance and the reproductive isolation, an explanation is not straightforward. Our findings confirm the low haplotypic variability at the casein cluster detected by Finocchiaro et al. (2008) in a different sample of GIR and NOR goats and the general assessment that goat populations show a low genetic differentiation rate (Luikart et al., 2001;Canon et al., 2006;Nicoloso et al., 2015). The severe shortage of haplotypes in the NOR breed also reflects the high geographical distance from the centre of goat domestication.
In the light of the casein haplotype diversity and according to the relatively rich patrimony of rare private haplotypes, all the Italian breeds are worth safeguarding. The endangered GIR, ASP and especially MAL breeds seem to acquire conservation priorities in the context of the southern Italian dairy goat, thanks to their degree of distinction and the private allelic combinations they have maintained.

Conclusion
Understanding the extent, distribution and origin of current genetic diversity in livestock populations requires diverse sources of information, including molecular data from different classes of markers. This study analysed an important region in the goat genome: the casein cluster located in chromosome 6. The results presented here have been compared with those obtained from the characterization of the same sampling by selectively neutral marker STRs. The characterization by means of this set of 30 SNPs highlighted a significant genetic diversity and differentiation among the six dairy goat breeds reared in southern Italy. All the Italian breeds, but mainly Maltese and to some extent Girgentana and Capra dell'Aspromonte, showed a relevant rate of genetic distinctness, especially when compared with the Norwegian breed. All these Italian breeds are reared in traditional management systems and in marginal areas, and some of them are endangered. Our findings could help interpret the evolutionary history of these breeds and represent a potential tool for the development of future management, conservation and breeding strategies.
Data availability. The data set is available upon request from the corresponding author.