Criteria of GenCall score to edit marker data and methods to handle missing markers have an influence on accuracy of genomic predictions
Abstract. The aim of this study was to investigate the effect of different strategies for handling low-quality or missing data on prediction accuracy for direct genomic values of protein yield, mastitis and fertility using a Bayesian variable model and a GBLUP model in the Danish Jersey population. The data contained 1,071 Jersey bulls that were genotyped with the Illumina Bovine 50K chip. After preliminary editing, 39,227 SNP remained in the dataset. Four methods to handle missing genotypes were: 1) BEAGLE: missing markers were imputed using Beagle 3.3 software, 2) COMMON: missing genotypes at a locus were replaced by the most common genotype at this locus observed in the marker data, 3) EX-ALLELE: missing marker genotypes at a locus were treated as an extra allele, and 4) POP-EXP: missing genotypes at a locus were replaced with population expectation at this locus. It was shown that among the methods used in this study, imputation with Beagle was the best approach to handle missing genotypes. Treating missing markers as a pseudo-allele, replacing missing markers with a population average or substituting the most common alleles each reduced the accuracy of genomic predictions. The results from this study suggest that missing genotypes should be imputed in order to improve genomic prediction. Editing the marker data with stringent threshold on GenCall (GC) scores and then imputing the discarded genotypes did not lead to higher accuracy. All marker genotypes with a GC score over 0.15 should be retained for genomic prediction.