Junge Wang, Jie Chai, Li Chen, Tinghuan Zhang, Xi Long, Shuqi Diao, Dong Chen, Zongyi Guo, Guoqing Tang, Pingxian Wu
{"title":"利用机器学习提高荣昌猪繁殖性状基因组预测精度","authors":"Junge Wang, Jie Chai, Li Chen, Tinghuan Zhang, Xi Long, Shuqi Diao, Dong Chen, Zongyi Guo, Guoqing Tang, Pingxian Wu","doi":"10.3390/ani15040525","DOIUrl":null,"url":null,"abstract":"<p><p>The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.</p>","PeriodicalId":7955,"journal":{"name":"Animals","volume":"15 4","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852217/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning.\",\"authors\":\"Junge Wang, Jie Chai, Li Chen, Tinghuan Zhang, Xi Long, Shuqi Diao, Dong Chen, Zongyi Guo, Guoqing Tang, Pingxian Wu\",\"doi\":\"10.3390/ani15040525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.</p>\",\"PeriodicalId\":7955,\"journal\":{\"name\":\"Animals\",\"volume\":\"15 4\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852217/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Animals\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.3390/ani15040525\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Animals","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3390/ani15040525","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
摘要
随着基因组测序数据量的不断增加,传统的全基因组预测方法在处理大数据集时面临着挑战。机器学习(ML)技术可以处理高维数据,提供了有前途的解决方案。本研究旨在利用515头荣昌猪的测序数据和猪QTL数据库中10个不同SNP密度的数据集,寻找一种地方猪品种的全基因组预测方法。采用核岭回归、随机森林、梯度增强决策树(GBDT)、轻梯度增强机(Light Gradient Boosting Machine)和Adaboost等6种传统方法和5种ML方法对窝重、总产仔数和活仔数3个繁殖性状进行了预测。采用五重交叉验证和独立试验评价方法的疗效。传统方法和ML方法的预测性能最初都随着SNP密度的增加而增加,在800-900 k SNP时达到峰值。ML方法优于传统方法,提高了0.4-4.1%。GWAS和Pig QTL数据库的集成增强了ML的鲁棒性。ML模型具有较好的通用性,交叉验证结果与独立检验结果具有较高的相关系数(0.935 ~ 0.998)。GBDT和随机森林具有较高的计算效率,是家畜育种基因组预测的理想方法。
Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning.
The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.
AnimalsAgricultural and Biological Sciences-Animal Science and Zoology
CiteScore
4.90
自引率
16.70%
发文量
3015
审稿时长
20.52 days
期刊介绍:
Animals (ISSN 2076-2615) is an international and interdisciplinary scholarly open access journal. It publishes original research articles, reviews, communications, and short notes that are relevant to any field of study that involves animals, including zoology, ethnozoology, animal science, animal ethics and animal welfare. However, preference will be given to those articles that provide an understanding of animals within a larger context (i.e., the animals'' interactions with the outside world, including humans). There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental details and/or method of study, must be provided for research articles. Articles submitted that involve subjecting animals to unnecessary pain or suffering will not be accepted, and all articles must be submitted with the necessary ethical approval (please refer to the Ethical Guidelines for more information).