A production herd of Czech Simmental cattle (Czech Red Pied, CRP),
the conserved subpopulation of this breed, and the ancient local breed Czech
Red cattle (CR) were screened for diversity in the antibacterial toll-like
receptors (TLRs), which are members of the innate immune system. Polymerase chain reaction (PCR)
amplicons of
Resistance breeding is a prospective tool for the prevention of increases in infectious diseases in production populations of dairy cattle. This trend is considered to be a consequence of the increasing physiological load, unintended co-selection, and inbreeding associated with preferential breeding for milk production (Boichard et al., 2015). It also results from the emergence of new diseases (Purse et al., 2005) that are often resistant to standard antibiotic treatment.
In spite of the general adoption of genomic selection, an approach that targets causal genes might be more efficient in specific cases because the full available genetic variation is used. Disease-related breeding values are often underrepresented in combined breeding indices and the efficiency of current genomic selections might be disturbed by specific configurations in haplotype blocks (Abdel-Shafy et al., 2014).
The obvious targets for resequencing are components of the innate immune
system (Boichard et al., 2015). In contrast to those of adaptive immunity
they are fully determined in the germ line and are not formed ontogenetically
(Kawai and Akira, 2010). A central role is played by the pattern-recognizing
receptors (PRRs) that are activated by ligands originating in pathogenesis.
The primary class of PRRs are the toll-like receptors, which are encoded by
10 gene paralogs denoted
Natural variation in the
The importance of conserved historical populations of farm animals as a
source of genetic diversity for resistance breeding is often acknowledged
(FAO, 2007). Although the role of conserved populations in the protection of
historical breeds is plain, gene pool erosion because of limited population
sizes may significantly reduce their practical application in breeding;
however, direct comparisons of the genetic richness in immunity genes with
the modern production populations remain limited (Bilgen et al., 2016). For
example, in historical Czech Red cattle previous studies reported high
diversity in the major histocompatibility complex receptors (Hořín
et al., 1997) and provided insight into the diversity of
Screening for variations in antibacterial members of the
The Czech Red Pied breed was developed in the territory of Bohemia and Moravia during the 19th century from the imported original Simmental cattle. Breed book registration of bulls of the Fleckvieh and Montbélliard breeds began in 2000; therefore, the current trend leads to the convergence of gene pools with the major Simmental breeds. For conservation purposes, a nucleus herd was formed in 2010 from 70 animals corresponding to the gene pool prevailing at the end of the 1990s (Mátlová, 2013).
The CRP breed is considered to be dual purpose. The female population
consists of 212 000 cows, representing 63.3 % of the female Holstein
population and 32.7 % of the total cow population in the country. The
difference in milk production compared to the Holstein breed persists; the
CRP breed produces 7137 kg yr
The third studied population was the conserved population of Czech Red
cattle. The breed is sometimes tracked back to the ancient Celtic cattle of
Roman times, analogous to other red highland cattle breeds of central and
western Europe such as Harz Mountain cattle and Salers; however, molecular
data for independent conclusions have yet to be collected (Ludwig et al.,
2016). Czech Red cattle prevailed until the 18th century in the territory of
Bohemia and Moravia and presumably contributed to the formation of the Czech
Simmental. The actual herd was restored in 1978 from only 14 cows and one
bull with 50 % identity (
To avoid the necessity of validation of the revealed polymorphisms using
individual genotyping reactions, the principle of hybrid resequencing was
applied. Dataset noise was mostly suppressed by combining two technologies
for population resequencing. In addition to polymerase chain reaction (PCR) amplicons that were prepared
from pooled DNA samples and sequenced with PacBio technology with
3–
The primary collection of DNA samples was from 150 bulls in the Czech Red
Pied production population. DNA was prepared from cryo-preserved insemination
doses using the MagSep tissue method (Eppendorf, Hamburg, Germany).
Insemination doses of 100
The two conserved populations were characterized using DNA isolated from the aliquots of compulsory blood samples provided to the gene bank of the genetic resources programme. Thirty-five animals of the conserved CRP and 80 animals of CR (Czech Red) were included. One hundred microlitres of thawed blood was processed in silica membrane columns using the BloodPrep commercial procedure (Life Technologies, Carlsbad, CA, USA).
The concentration of the isolated DNA was determined fluorometrically on a
microplate with SybrGold stain (Biotium, Fremont, CA, USA) and camera G:BOX
Chemi XR5 (Syngene, Cambridge, UK) equipped with an EtBr filter under blue
light excitation. Normalized genomic DNA (20 ng
The pooled samples for low-coverage whole-genome sequencing were prepared from the population gDNA samples in equimolar concentrations. Purification was performed with the AMPure XP magnetic bead procedure (Beckman Coulter, Brea, CA, USA).
Used PCR amplicons in antibacterial
The pooled amplicons from the population DNA samples were sequenced with the PacBio technology (Pacific Biosciences, Menlo Park, California, USA). The libraries for sequencing were prepared in the GATC-Biotech sequencing core laboratory (Constance, Germany) using P4-C2 chemistry according to the manufacturer's procedure. One 120 min movie was obtained using the circular consensus sequencing (CCS) protocol on the PacBio RS II machine. The primary data in the h5 format were processed into the FASTQ format with Pacific Biosciences software (SMRT Analysis Software Suite).
The library for the technology of Illumina (San Diego, CA, USA) was prepared
from the pooled gDNA sample in the core laboratory of Novogene (London, UK).
Two rounds of pair-end
The FASTQ files containing reads obtained with the two technologies from all
three populations were mapped to the reference sequences for all five genes.
The reference sequences were FJ147090 for
The exported lists of variants in comma separated text file (csv) format were carried over to the coordinates of the UMD_3.1.1 assembly of the cattle genome. The single-nucleotide polymorphisms (SNPs) arising from the differences between the original reference sequences and the current genome assembly were added. The variants were filtered for clusters resulting from a read misalignment and were defined as more than 3 variants in a 25 nt stretch.
The filtered variants were compared between both platforms. Only the
variants independently detected with both the Illumina and PacBio
technologies were considered to be valid. Moreover, the consistency of the
allelic frequencies based on the representation of the reads was used to
distinguish the valid results. The presence of the variant in the EBI European Variation Archive (
Haplotype structure was determined directly from the simultaneous occurrence of SNPs in the long reads provided by PacBio technology.
Nei's standard genetic distances (Nei, 1972) were calculated for the allelic frequencies generated by the next-generation sequencing (NGS) read representation and averaged for the two technologies. Graphic representation with un-rooted trees was generated with the neighbour joining algorithm and the FigTree program package (Rambaut and Drummond, 2010) for graph visualization.
The list of the validated SNPs was submitted to the Variant Effect Predictor
application (VEP) (McLaren et al., 2016) of the ENSEMBL database
(
Both the reference and mutant protein sequences were sent to the SWISS-MODEL
server of the University of Basel (
The variants found in the antibacterial
Classes of
The number of variants found in each population can be used to characterize their total diversity: 110 in the production population of CRP, 84 in the conserved CRP subpopulation, and 78 in the Czech Red cattle. Consequently, the diversity of the conserved populations of CRP and CR was close to the diversity of the far more abundant population of current CRP, representing 76.4 % and 70.9 % of its diversity, respectively.
All three populations – the production population of CRP, and the conserved
populations of CRP and CR – shared 68 (54.4 %) polymorphisms (Table 2).
Thirty-eight private SNPs (30.4 %) were confined to the production
population of CRP; four (3.2 %) were characteristic of CRP in a broad
sense, while only five (4.0 %) were associated strictly with the conserved
subpopulation of this breed. Surprisingly, only three SNPs (2.4 %) were
characteristic of the ancient Czech Red breed. The CR-specific variants
comprised 1414C
Proportion of gene variants according to the assignment to the studied cattle populations. CRP – Czech Red Pied cattle, CRP-PR – production population of CRP, GR – conserved populations in genetic resources, CRP-GR – conserved subpopulation of CRP, and CR – conserved Czech Red cattle.
The allelic frequency derived from the read representation allowed for
comparison of the population structure. The frequencies obtained with the
two technologies were consistent in most of the SNPs detected and the mean
value was used for subsequent interpretation. The inter-population distances
based on all five
The identified SNPs with breeding potential, i.e. the SNPs located in the coding sequence (CDS) or the promoter region, are presented in Table 3 along with predictions of their effects and their distribution among the populations.
In total, 57 of the found variants were located in the coding sequences, 32
of them being synonymous, and 25 non-synonymous. The ratio of non-synonymous
mutations to the total number of CDS variants was 43.9 % (25 of 57) for
all five genes, with the highest value, 52.0 %, in
Inter-population distances based on all five
SNPs located to the coding sequence or to the promoter region of
antibacterial
Continued.
Most of the non-synonymous nucleotide changes, which are potentially
important for breeding, were concentrated in
In variants of
In spite of considerable total variation in the
In contrast to
The SIFT value, which quantifies the expected impact, dropped to 0.2 or
below in four amino acid changes in
Summing up, only 5 of 45 variants that were predicted to have effects on the toll-like receptors function were restricted to the conserved populations.
The study demonstrated considerable variation in the five members of the
Consistency with available data on the general diversity in cattle in the EBI variant database demonstrates the reliability of the hybrid sequencing (Koren et al., 2012). In this scheme, two different NGS technologies are combined to identify and eliminate the systemic faults of each. The technique of hybrid sequencing was originally suggested for obtaining correct de novo assemblies (Koren et al., 2012). This approach allows for avoidance of disambiguities originating from duplicates and large rearrangements. The principle is applicable to the validation of discovered SNPs, helping to avoid single-purpose genotyping reactions, a costly and laborious step in the discovery of polymorphisms. The application of this approach on a small scale for the description of variability in individual genes is well justified.
Nevertheless, the comparatively high proportion of novel SNPs observed
(23.2 %) compared with the limited number of novel SNPs reported in an
analogous study (Bilgen et al., 2016) for the Holstein population (only 4
novel SNPs of 274 for three
It should be noted that many of the observed polymorphisms have been
previously reported, mostly for meat breeds, in the panel of 26 world
breeds, namely 798C
Application of the PacBio technology provides in addition long reads (up to
1200 bp in the present work) that allow direct phasing of located SNPs. The
data obtained in the current work are thus expected to contribute to the
precision of the previously studied haplotype structure of the
Inclusion of the conserved subpopulation of the Czech Red Pied cattle, reflecting the genetic structure before 2000, facilitated the detection of selection trends in CRP. The diversity found in the conserved subpopulation of CRP, representing 75.4 % of the diversity of the main production population, demonstrates that gene erosion due to intensive breeding for the production traits has not occurred over the last 2 decades.
Significantly, only five SNPs were specific for the conserved historical
population compared with 38 SNPs that were specific for the modern
population. The low diversity of the conserved CRP population may have
reflected the small number of sampled animals (
Notably, the allelic richness of the historical population of the Czech Red
cattle after two historical bottlenecks remained comparable with the
richness of the large population of CRP; however, the distinctness of the CR
breed was characterized by only three breed-specific alleles when compared
to both Simmental populations. This indicates that breed-specific diversity
due to the ascribed ancient origin of the breed is not visible at the level
of
The average allelic frequency-based differences among the studied populations were moderate, as characterized by Nei's distances. They corresponded to the previously demonstrated relatedness of Czech Red Pied and Czech Red cattle, as estimated using microsatellite polymorphisms in the context of other central European breeds (Čítek et al., 2006; Zaton-Dobrowolska et al., 2007).
The pattern of inter-population distances for different
The calculated inter-population distances did not support the distinctness of
the historical breed CR, consistently with private allele occurrence. It
should be noted that inter-varietal differences in bovine
The role of historical populations as a source of new functionally important
alleles for breeding seems to be moderate; however, some of the variants
restricted to the historical populations might be recruited to resistance
breeding. The polymorphism H326Q in
As shown in protein modelling that included 280 variable sites in all bovine
In spite of a limited number of mutations found in
Synonymous 2097T
In view of the epidemiological and economic importance of brucellosis, it is
worth mentioning that two intronic mutations, 1575G
Based on high diversity in
Applying this information to the present purpose, most of the aa changes
found in the studied populations could be placed in protein domains. E63D
(in CR only) is located in LRR1, R152Q (shared by the three populations) in
LRR5, I211V (shared polymorphism) in LRR7, H326G, and R563H (restricted to
the CRP production herd) in LRR20, and H665Q (in the CRP production herd) is
located in the highly conserved toll/interleukin-1 receptor (TIR) homology region. Positioning of these
substitutions in functionally important regions increases the probability of
detecting health effects in anticipated population studies. Notably, the
functional effects of 2546G
Accordingly, the predicted effects found support on the population level.
The E63D substitution in exon 2 of the
The change of 513C
Surprisingly, the C allele of the synonymous change 2565T
Testing the complete haplotypes instead of individual polymorphisms is an
approach that improves the resolution power of association studies
(Ruiz-Larrañaga et al., 2011; Abdel-Shafy et al., 2014). Consequently, a
risk haplotype of
Besides, the ability of
Some of the SNPs in
A functionally important SNP detected in
The change 9787C
The high frequencies that were discovered in five functionally relevant
intronic variants of
The effects of the mutation 5087A
Unfortunately, the
The aa polymorphisms R262H and F643L in TLR5, previously associated with a predicted functional impact (Fisher et al., 2011), were not found in our populations; however, the R125* mutation, reported as a candidate polymorphism for MAP resistance (Fisher et al., 2011), may be identical to *123Q (rs207872139) observed in our study. This was interpreted by the SIFT algorithm as a stop-loss mutation with significant functional consequences.
Notably, mutation 855G
Hybrid sequencing of population samples combining long-range amplicon
sequencing with PacBio technology and direct gDNA sequencing with Illumina
HiSeq technology is a fast and reliable approach to a survey of total
variation in target genes. Using this approach, the scope of variants found
in a series of antibacterial
The complete list of identified SNPs has been deposited in the Open Science Framework data repository (
All the authors contributed in equal measure to this work.
The authors declare that they have no conflict of interest.
The authors thank the company CHD Impuls for making archived insemination doses available for the purposes of this study.
The Long-Term Concept of the Research Institution Development of the Institute of Animal Science funded by the Ministry of Agriculture of the Czech Republic and the National Agency for Agricultural Research, project no. QJ1610489, supported this work.
This paper was edited by Steffen Maak and reviewed by two anonymous referees.