多组学利用机器学习方法协助玉米产量的基因组预测。

IF 2.6 3区农林科学 Q1 AGRONOMY Molecular Breeding Pub Date : 2024-02-08 eCollection Date: 2024-02-01 DOI:10.1007/s11032-024-01454-z

Chengxiu Wu, Jingyun Luo, Yingjie Xiao

{"title":"多组学利用机器学习方法协助玉米产量的基因组预测。","authors":"Chengxiu Wu, Jingyun Luo, Yingjie Xiao","doi":"10.1007/s11032-024-01454-z","DOIUrl":null,"url":null,"abstract":"With the improvement of high-throughput technologies in recent years, large multi-dimensional plant omics data have been produced, and big-data-driven yield prediction research has received increasing attention. Machine learning offers promising computational and analytical solutions to interpret the biological meaning of large amounts of data in crops. In this study, we utilized multi-omics datasets from 156 maize recombinant inbred lines, containing 2496 single nucleotide polymorphisms (SNPs), 46 image traits (i-traits) from 16 developmental stages obtained through an automatic phenotyping platform, and 133 primary metabolites. Based on benchmark tests with different types of prediction models, some machine learning methods, such as Partial Least Squares (PLS), Random Forest (RF), and Gaussian process with Radial basis function kernel (GaussprRadial), achieved better prediction for maize yield, albeit slight difference for method preferences among i-traits, genomic, and metabolic data. We found that better yield prediction may be caused by various capabilities in ranking and filtering data features, which is found to be linked with biological meaning such as photosynthesis-related or kernel development-related regulations. Finally, by integrating multiple omics data with the RF machine learning approach, we can further improve the prediction accuracy of grain yield from 0.32 to 0.43. Our research provides new ideas for the application of plant omics data and artificial intelligence approaches to facilitate crop genetic improvements.Supplementary information: The online version contains supplementary material available at 10.1007/s11032-024-01454-z.","PeriodicalId":18769,"journal":{"name":"Molecular Breeding","volume":"44 2","pages":"14"},"PeriodicalIF":2.6000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10853138/pdf/","citationCount":"0","resultStr":"{\"title\":\"Multi-omics assists genomic prediction of maize yield with machine learning approaches.\",\"authors\":\"Chengxiu Wu, Jingyun Luo, Yingjie Xiao\",\"doi\":\"10.1007/s11032-024-01454-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the improvement of high-throughput technologies in recent years, large multi-dimensional plant omics data have been produced, and big-data-driven yield prediction research has received increasing attention. Machine learning offers promising computational and analytical solutions to interpret the biological meaning of large amounts of data in crops. In this study, we utilized multi-omics datasets from 156 maize recombinant inbred lines, containing 2496 single nucleotide polymorphisms (SNPs), 46 image traits (i-traits) from 16 developmental stages obtained through an automatic phenotyping platform, and 133 primary metabolites. Based on benchmark tests with different types of prediction models, some machine learning methods, such as Partial Least Squares (PLS), Random Forest (RF), and Gaussian process with Radial basis function kernel (GaussprRadial), achieved better prediction for maize yield, albeit slight difference for method preferences among i-traits, genomic, and metabolic data. We found that better yield prediction may be caused by various capabilities in ranking and filtering data features, which is found to be linked with biological meaning such as photosynthesis-related or kernel development-related regulations. Finally, by integrating multiple omics data with the RF machine learning approach, we can further improve the prediction accuracy of grain yield from 0.32 to 0.43. Our research provides new ideas for the application of plant omics data and artificial intelligence approaches to facilitate crop genetic improvements.Supplementary information: The online version contains supplementary material available at 10.1007/s11032-024-01454-z.\",\"PeriodicalId\":18769,\"journal\":{\"name\":\"Molecular Breeding\",\"volume\":\"44 2\",\"pages\":\"14\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-02-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10853138/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Breeding\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1007/s11032-024-01454-z\",\"RegionNum\":3,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/2/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"AGRONOMY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Breeding","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s11032-024-01454-z","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}

引用次数: 0

摘要

近年来，随着高通量技术的进步，产生了大量多维植物组学数据，大数据驱动的产量预测研究日益受到重视。机器学习为解读农作物海量数据的生物学意义提供了前景广阔的计算和分析解决方案。在这项研究中，我们利用了来自 156 个玉米重组近交系的多组学数据集，其中包含 2496 个单核苷酸多态性（SNPs）、通过自动表型平台获得的 16 个发育阶段的 46 个图像性状（i-traits）和 133 个初级代谢物。基于不同类型预测模型的基准测试，一些机器学习方法，如部分最小二乘法（PLS）、随机森林（RF）和带径向基函数核的高斯过程（GaussprRadial），对玉米产量的预测效果更好，尽管i-traits、基因组和代谢数据之间的方法偏好略有不同。我们发现，更好的产量预测可能源于对数据特征进行排序和过滤的各种能力，这些能力与生物学意义相关，如与光合作用相关或与籽粒发育相关的规定。最后，通过将多种表征数据与射频机器学习方法相结合，我们可以将谷物产量的预测精度从 0.32 进一步提高到 0.43。我们的研究为应用植物组学数据和人工智能方法促进作物遗传改良提供了新思路：在线版本包含补充材料，可查阅 10.1007/s11032-024-01454-z。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-omics assists genomic prediction of maize yield with machine learning approaches.

With the improvement of high-throughput technologies in recent years, large multi-dimensional plant omics data have been produced, and big-data-driven yield prediction research has received increasing attention. Machine learning offers promising computational and analytical solutions to interpret the biological meaning of large amounts of data in crops. In this study, we utilized multi-omics datasets from 156 maize recombinant inbred lines, containing 2496 single nucleotide polymorphisms (SNPs), 46 image traits (i-traits) from 16 developmental stages obtained through an automatic phenotyping platform, and 133 primary metabolites. Based on benchmark tests with different types of prediction models, some machine learning methods, such as Partial Least Squares (PLS), Random Forest (RF), and Gaussian process with Radial basis function kernel (GaussprRadial), achieved better prediction for maize yield, albeit slight difference for method preferences among i-traits, genomic, and metabolic data. We found that better yield prediction may be caused by various capabilities in ranking and filtering data features, which is found to be linked with biological meaning such as photosynthesis-related or kernel development-related regulations. Finally, by integrating multiple omics data with the RF machine learning approach, we can further improve the prediction accuracy of grain yield from 0.32 to 0.43. Our research provides new ideas for the application of plant omics data and artificial intelligence approaches to facilitate crop genetic improvements.

Supplementary information: The online version contains supplementary material available at 10.1007/s11032-024-01454-z.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular Breeding 农林科学-农艺学

CiteScore

5.60

自引率

6.50%

发文量

审稿时长

1.5 months

期刊介绍： Molecular Breeding is an international journal publishing papers on applications of plant molecular biology, i.e., research most likely leading to practical applications. The practical applications might relate to the Developing as well as the industrialised World and have demonstrable benefits for the seed industry, farmers, processing industry, the environment and the consumer. All papers published should contribute to the understanding and progress of modern plant breeding, encompassing the scientific disciplines of molecular biology, biochemistry, genetics, physiology, pathology, plant breeding, and ecology among others. Molecular Breeding welcomes the following categories of papers: full papers, short communications, papers describing novel methods and review papers. All submission will be subject to peer review ensuring the highest possible scientific quality standards. Molecular Breeding core areas: Molecular Breeding will consider manuscripts describing contemporary methods of molecular genetics and genomic analysis, structural and functional genomics in crops, proteomics and metabolic profiling, abiotic stress and field evaluation of transgenic crops containing particular traits. Manuscripts on marker assisted breeding are also of major interest, in particular novel approaches and new results of marker assisted breeding, QTL cloning, integration of conventional and marker assisted breeding, and QTL studies in crop plants.