利用机器学习提高荣昌猪繁殖性状基因组预测精度

IF 2.7 2区 农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE Animals Pub Date : 2025-02-12 DOI:10.3390/ani15040525
Junge Wang, Jie Chai, Li Chen, Tinghuan Zhang, Xi Long, Shuqi Diao, Dong Chen, Zongyi Guo, Guoqing Tang, Pingxian Wu
{"title":"利用机器学习提高荣昌猪繁殖性状基因组预测精度","authors":"Junge Wang, Jie Chai, Li Chen, Tinghuan Zhang, Xi Long, Shuqi Diao, Dong Chen, Zongyi Guo, Guoqing Tang, Pingxian Wu","doi":"10.3390/ani15040525","DOIUrl":null,"url":null,"abstract":"<p><p>The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.</p>","PeriodicalId":7955,"journal":{"name":"Animals","volume":"15 4","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852217/pdf/","citationCount":"0","resultStr":"{\"title\":\"Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning.\",\"authors\":\"Junge Wang, Jie Chai, Li Chen, Tinghuan Zhang, Xi Long, Shuqi Diao, Dong Chen, Zongyi Guo, Guoqing Tang, Pingxian Wu\",\"doi\":\"10.3390/ani15040525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.</p>\",\"PeriodicalId\":7955,\"journal\":{\"name\":\"Animals\",\"volume\":\"15 4\",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2025-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11852217/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Animals\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.3390/ani15040525\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Animals","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3390/ani15040525","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0

摘要

随着基因组测序数据量的不断增加,传统的全基因组预测方法在处理大数据集时面临着挑战。机器学习(ML)技术可以处理高维数据,提供了有前途的解决方案。本研究旨在利用515头荣昌猪的测序数据和猪QTL数据库中10个不同SNP密度的数据集,寻找一种地方猪品种的全基因组预测方法。采用核岭回归、随机森林、梯度增强决策树(GBDT)、轻梯度增强机(Light Gradient Boosting Machine)和Adaboost等6种传统方法和5种ML方法对窝重、总产仔数和活仔数3个繁殖性状进行了预测。采用五重交叉验证和独立试验评价方法的疗效。传统方法和ML方法的预测性能最初都随着SNP密度的增加而增加,在800-900 k SNP时达到峰值。ML方法优于传统方法,提高了0.4-4.1%。GWAS和Pig QTL数据库的集成增强了ML的鲁棒性。ML模型具有较好的通用性,交叉验证结果与独立检验结果具有较高的相关系数(0.935 ~ 0.998)。GBDT和随机森林具有较高的计算效率,是家畜育种基因组预测的理想方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning.

The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Animals
Animals Agricultural and Biological Sciences-Animal Science and Zoology
CiteScore
4.90
自引率
16.70%
发文量
3015
审稿时长
20.52 days
期刊介绍: Animals (ISSN 2076-2615) is an international and interdisciplinary scholarly open access journal. It publishes original research articles, reviews, communications, and short notes that are relevant to any field of study that involves animals, including zoology, ethnozoology, animal science, animal ethics and animal welfare. However, preference will be given to those articles that provide an understanding of animals within a larger context (i.e., the animals'' interactions with the outside world, including humans). There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental details and/or method of study, must be provided for research articles. Articles submitted that involve subjecting animals to unnecessary pain or suffering will not be accepted, and all articles must be submitted with the necessary ethical approval (please refer to the Ethical Guidelines for more information).
期刊最新文献
RETRACTED: Choi, J.-Y.; Kim, S.-K. An Evolutionary Strategy for Spawning Habitat Selection by Pseudopungtungia tenuicorpa. Animals 2023, 13, 2170. Accuracy of Computed Tomography in Diagnosing Temporomandibular Joint Osteoarthritis Relative to Histopathological Findings-An Ex Vivo Study of 41 Horses. Effects of Dietary Zinc Cysteamine Supplementation on Growth Performance, Physiological Responses, and Fecal Microbiota in Weaned Foals. N-Carbamylglutamate Improves Production Performance and Muscle Growth by Regulating Protein Digestive Function and Muscle Protein Synthesis in Broiler Chickens. Near-Infrared Imaging in Small Animal Surgical Oncology: Current Applications and Future Directions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1