用于复杂性状基因组预测的参数模型和机器学习模型基准。

Christina B Azodi, Emily Bolger, Andrew McCarren, Mark Roantree, Gustavo de Los Campos, Shin-Han Shiu
{"title":"用于复杂性状基因组预测的参数模型和机器学习模型基准。","authors":"Christina B Azodi, Emily Bolger, Andrew McCarren, Mark Roantree, Gustavo de Los Campos, Shin-Han Shiu","doi":"10.1534/g3.119.400498","DOIUrl":null,"url":null,"abstract":"<p><p>The usefulness of genomic prediction in crop and livestock breeding programs has prompted efforts to develop new and improved genomic prediction algorithms, such as artificial neural networks and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and six non-linear algorithms. First, we found that hyperparameter selection was necessary for all non-linear algorithms and that feature selection prior to model training was critical for artificial neural networks when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple algorithms (<i>i.e.</i>, ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits. Although artificial neural networks did not perform best for any trait, we identified strategies (<i>i.e.</i>, feature selection, seeded starting weights) that boosted their performance to near the level of other algorithms. Our results highlight the importance of algorithm selection for the prediction of trait values.</p>","PeriodicalId":31358,"journal":{"name":"ILIRIA International Review","volume":"8 1","pages":"3691-3702"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829122/pdf/","citationCount":"0","resultStr":"{\"title\":\"Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits.\",\"authors\":\"Christina B Azodi, Emily Bolger, Andrew McCarren, Mark Roantree, Gustavo de Los Campos, Shin-Han Shiu\",\"doi\":\"10.1534/g3.119.400498\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The usefulness of genomic prediction in crop and livestock breeding programs has prompted efforts to develop new and improved genomic prediction algorithms, such as artificial neural networks and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and six non-linear algorithms. First, we found that hyperparameter selection was necessary for all non-linear algorithms and that feature selection prior to model training was critical for artificial neural networks when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple algorithms (<i>i.e.</i>, ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits. Although artificial neural networks did not perform best for any trait, we identified strategies (<i>i.e.</i>, feature selection, seeded starting weights) that boosted their performance to near the level of other algorithms. Our results highlight the importance of algorithm selection for the prediction of trait values.</p>\",\"PeriodicalId\":31358,\"journal\":{\"name\":\"ILIRIA International Review\",\"volume\":\"8 1\",\"pages\":\"3691-3702\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6829122/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ILIRIA International Review\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1534/g3.119.400498\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ILIRIA International Review","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1534/g3.119.400498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基因组预测在作物和牲畜育种计划中的作用促使人们努力开发新的和改进的基因组预测算法,如人工神经网络和梯度树增强算法。然而,这些算法的性能尚未通过广泛的数据集和模型进行系统比较。我们利用六种植物的 18 个性状数据,以不同的标记密度和训练群体大小,比较了六种线性算法和六种非线性算法的性能。首先,我们发现超参数选择对所有非线性算法都是必要的,当标记数量大大超过训练线数量时,模型训练前的特征选择对人工神经网络至关重要。在所有物种和性状组合中,没有一种算法表现最好,但是基于多种算法结果组合的预测(即集合预测)表现一直很好。虽然线性和非线性算法在类似数量的性状上表现最佳,但非线性算法在不同性状上的表现差异较大。虽然人工神经网络在任何性状上的表现都不是最好的,但我们发现了一些策略(如特征选择、种子起始权重)能将其性能提升到接近其他算法的水平。我们的研究结果凸显了算法选择对预测性状值的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Benchmarking Parametric and Machine Learning Models for Genomic Prediction of Complex Traits.

The usefulness of genomic prediction in crop and livestock breeding programs has prompted efforts to develop new and improved genomic prediction algorithms, such as artificial neural networks and gradient tree boosting. However, the performance of these algorithms has not been compared in a systematic manner using a wide range of datasets and models. Using data of 18 traits across six plant species with different marker densities and training population sizes, we compared the performance of six linear and six non-linear algorithms. First, we found that hyperparameter selection was necessary for all non-linear algorithms and that feature selection prior to model training was critical for artificial neural networks when the markers greatly outnumbered the number of training lines. Across all species and trait combinations, no one algorithm performed best, however predictions based on a combination of results from multiple algorithms (i.e., ensemble predictions) performed consistently well. While linear and non-linear algorithms performed best for a similar number of traits, the performance of non-linear algorithms vary more between traits. Although artificial neural networks did not perform best for any trait, we identified strategies (i.e., feature selection, seeded starting weights) that boosted their performance to near the level of other algorithms. Our results highlight the importance of algorithm selection for the prediction of trait values.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
6 weeks
期刊最新文献
Special register of child sexual abusers and pedophiles as a necessary preventive measure The role of Kosovo in littoralisation processes around the Adriatic and the Aegean Testing of Social Disorganization Theory Recent Developments of Cyber-Offences in Slovenia Labor market flexibility in transition countries: a case study of North Macedonia
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1