{"title":"用于甘蔗基因组和血统预测的机器学习。","authors":"Minoru Inamori, Tatsuro Kimura, Masaaki Mori, Yusuke Tarumoto, Taiichiro Hattori, Michiko Hayano, Makoto Umeda, Hiroyoshi Iwata","doi":"10.1002/tpg2.20486","DOIUrl":null,"url":null,"abstract":"<p><p>Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non-additive genetic effects is essential in genome prediction (GP) models of crops with highly heterozygous polyploid genomes. This study incorporates non-additive genetic effects and pedigree information using machine learning methods to track sugarcane breeding lines and enhance the prediction by assessing the degree of association between genotypes. This study measured the stalk biomass and sugar content of 297 clones from 87 families within a breeding population used in the Japanese sugarcane breeding program. Subsequently, we conducted analyses based on the marker genotypes of 33,149 single-nucleotide polymorphisms. To validate the accuracy of GP in the population, we first predicted the prediction accuracy of the best linear unbiased prediction (BLUP) based on a genomic relationship matrix. Prediction accuracy was assessed using two different cross-validation methods: repeated 10-fold cross-validation and leave-one-family-out cross-validation. The accuracy of GP of the first and second methods ranged from 0.36 to 0.74 and 0.15 to 0.63, respectively. Next, we compared the prediction accuracy of BLUP and two machine learning methods: random forests and simulation annealing ensemble (SAE), a newly developed machine learning method that explicitly models the interaction between variables. Both pedigree and genomic information were utilized as input in these methods. Through repeated 10-fold cross-validation, we found that the accuracy of the machine learning methods consistently surpassed that of BLUP in most cases. In leave-one-family-out cross-validation, SAE demonstrated the highest accuracy among the methods. These results underscore the effectiveness of GP in Japanese sugarcane breeding and highlight the significant potential of machine learning methods.</p>","PeriodicalId":49002,"journal":{"name":"Plant Genome","volume":" ","pages":"e20486"},"PeriodicalIF":3.9000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning for genomic and pedigree prediction in sugarcane.\",\"authors\":\"Minoru Inamori, Tatsuro Kimura, Masaaki Mori, Yusuke Tarumoto, Taiichiro Hattori, Michiko Hayano, Makoto Umeda, Hiroyoshi Iwata\",\"doi\":\"10.1002/tpg2.20486\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non-additive genetic effects is essential in genome prediction (GP) models of crops with highly heterozygous polyploid genomes. This study incorporates non-additive genetic effects and pedigree information using machine learning methods to track sugarcane breeding lines and enhance the prediction by assessing the degree of association between genotypes. This study measured the stalk biomass and sugar content of 297 clones from 87 families within a breeding population used in the Japanese sugarcane breeding program. Subsequently, we conducted analyses based on the marker genotypes of 33,149 single-nucleotide polymorphisms. To validate the accuracy of GP in the population, we first predicted the prediction accuracy of the best linear unbiased prediction (BLUP) based on a genomic relationship matrix. Prediction accuracy was assessed using two different cross-validation methods: repeated 10-fold cross-validation and leave-one-family-out cross-validation. The accuracy of GP of the first and second methods ranged from 0.36 to 0.74 and 0.15 to 0.63, respectively. Next, we compared the prediction accuracy of BLUP and two machine learning methods: random forests and simulation annealing ensemble (SAE), a newly developed machine learning method that explicitly models the interaction between variables. Both pedigree and genomic information were utilized as input in these methods. Through repeated 10-fold cross-validation, we found that the accuracy of the machine learning methods consistently surpassed that of BLUP in most cases. In leave-one-family-out cross-validation, SAE demonstrated the highest accuracy among the methods. These results underscore the effectiveness of GP in Japanese sugarcane breeding and highlight the significant potential of machine learning methods.</p>\",\"PeriodicalId\":49002,\"journal\":{\"name\":\"Plant Genome\",\"volume\":\" \",\"pages\":\"e20486\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Plant Genome\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/tpg2.20486\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/26 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Genome","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/tpg2.20486","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
摘要
甘蔗(Saccharum spp.)在全球蔗糖生产中发挥着至关重要的作用;然而,其杂合多倍体基因组阻碍了育种计划的效率。考虑非加性遗传效应对于具有高度杂合多倍体基因组的作物基因组预测(GP)模型至关重要。本研究利用机器学习方法将非加性遗传效应和血统信息纳入甘蔗育种系的跟踪,并通过评估基因型之间的关联程度来加强预测。本研究测量了日本甘蔗育种计划中一个育种群体中 87 个家系的 297 个克隆的茎秆生物量和含糖量。随后,我们根据 33,149 个单核苷酸多态性的标记基因型进行了分析。为了验证群体中 GP 的准确性,我们首先根据基因组关系矩阵预测了最佳线性无偏预测(BLUP)的预测准确性。预测准确性的评估采用了两种不同的交叉验证方法:重复 10 倍交叉验证和排除一族交叉验证。第一种和第二种方法的 GP 预测准确率分别为 0.36 至 0.74 和 0.15 至 0.63。接下来,我们比较了 BLUP 和两种机器学习方法的预测准确率:随机森林和模拟退火集合(SAE),后者是一种新开发的机器学习方法,可明确模拟变量之间的相互作用。血统和基因组信息都被用作这些方法的输入。通过反复的 10 倍交叉验证,我们发现机器学习方法的准确性在大多数情况下都超过了 BLUP。在一族淘汰交叉验证中,SAE 的准确率是所有方法中最高的。这些结果凸显了 GP 在日本甘蔗育种中的有效性,并彰显了机器学习方法的巨大潜力。
Machine learning for genomic and pedigree prediction in sugarcane.
Sugarcane (Saccharum spp.) plays a crucial role in global sugar production; however, the efficiency of breeding programs has been hindered by its heterozygous polyploid genomes. Considering non-additive genetic effects is essential in genome prediction (GP) models of crops with highly heterozygous polyploid genomes. This study incorporates non-additive genetic effects and pedigree information using machine learning methods to track sugarcane breeding lines and enhance the prediction by assessing the degree of association between genotypes. This study measured the stalk biomass and sugar content of 297 clones from 87 families within a breeding population used in the Japanese sugarcane breeding program. Subsequently, we conducted analyses based on the marker genotypes of 33,149 single-nucleotide polymorphisms. To validate the accuracy of GP in the population, we first predicted the prediction accuracy of the best linear unbiased prediction (BLUP) based on a genomic relationship matrix. Prediction accuracy was assessed using two different cross-validation methods: repeated 10-fold cross-validation and leave-one-family-out cross-validation. The accuracy of GP of the first and second methods ranged from 0.36 to 0.74 and 0.15 to 0.63, respectively. Next, we compared the prediction accuracy of BLUP and two machine learning methods: random forests and simulation annealing ensemble (SAE), a newly developed machine learning method that explicitly models the interaction between variables. Both pedigree and genomic information were utilized as input in these methods. Through repeated 10-fold cross-validation, we found that the accuracy of the machine learning methods consistently surpassed that of BLUP in most cases. In leave-one-family-out cross-validation, SAE demonstrated the highest accuracy among the methods. These results underscore the effectiveness of GP in Japanese sugarcane breeding and highlight the significant potential of machine learning methods.
期刊介绍:
The Plant Genome publishes original research investigating all aspects of plant genomics. Technical breakthroughs reporting improvements in the efficiency and speed of acquiring and interpreting plant genomics data are welcome. The editorial board gives preference to novel reports that use innovative genomic applications that advance our understanding of plant biology that may have applications to crop improvement. The journal also publishes invited review articles and perspectives that offer insight and commentary on recent advances in genomics and their potential for agronomic improvement.