Location: Animal Genomics and Improvement Laboratory
Title: Challenges in the use of sequence data in animal breedingAuthor
![]() |
BEN ZAABZA, HAFEDH - Michigan State University |
![]() |
FERDOSI, MOHAMMAD - University Of New England |
![]() |
STRANDEN, ISMO - Natural Resources Institute Finland (LUKE) |
![]() |
CUYABANO, BEATRIZ - Université Paris-Saclay |
![]() |
Neupane, Mahesh |
![]() |
MISZTAL, IGNACY - University Of Georgia |
![]() |
LOURENCO, DANIELA - University Of Georgia |
![]() |
GONDRO, CEDRIC - Michigan State University |
Submitted to: Journal of Animal Science
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 4/3/2025 Publication Date: N/A Citation: N/A Interpretive Summary: Selecting superior animals based on good characteristics is an approach that has been used effectively for a long time in livestock. Recently, researchers have started using variation in DNA to select better animals. In the last ten years, this new technology has made selection of high quality offspring cheaper and rapidly increased rate of improvement. Scientists thought having more detailed data would make predictions better, but it hasn't always been true. For animals like cattle, which have many common ancestors, using fewer variations in DNA can work just as well. The best way to handle a lot of genetic data is to focus on the most important parts in order to improve current methods. This research demonstrates that methods with fewer markers often work well for improving livestock. Technical Abstract: Genomic selection has been used in animal breeding for c. 15 years, and it continues to be an important tool due to its success in predicting genetic merit in livestock populations. Genomic selection was initially based on approximately 50K SNP arrays for thousands of animals. In the last decade, the advent of genome-scanning technologies and relatively inexpensive genotyping and sequencing have led to an increase in genomic data, both in terms of the number of genotyped animals and detected variant/SNP marker density. Sequence data were expected to substantially improve the accuracy of genomic prediction, because the causal variants would be present in the data being analyzed. We review the various methods and computational approaches used with sequence data and focus on the impact of these methods and model assumptions on genomic prediction accuracy. Because sequence data pose more computational challenges than genotyping, we discuss the different methods in terms of modeling and development, and their applicability to sequence data, and the computational resources required. Finally, we describe the best strategies for handling large-scale sequence data and for improving the efficiency of computer programs. Many details scattered throughout the literature are also illustrated. Despite the theory that the use of sequence data is associated with higher genomic prediction accuracy, there is no clear evidence of additional benefits from using sequence data over medium- to high-density SNPs. This is particularly true for small effective population sizes Ne, such as cattle populations, where animals have many common ancestors and thus longer chromosome segments with high linkage disequilibrium that can be accurately tracked with low- to medium-density SNPs. Most of the studies reporting the benefits of using sequence data for genomic prediction are based on simulated data, assuming that a few QTL can explain all the variation – an assumption that has been shown to be unrealistic. Furthermore, we demonstrate that the best strategy for dealing with any large data with high SNP densities would be to develop approximations, such as using only a subset of (important) markers, and to further improve the existing algorithms developed by animal breeding scientists. |