MFMGP: an integrated machine learning fusion model for genomic prediction

IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY Plant Biotechnology Journal Pub Date : 2025-01-11 DOI:10.1111/pbi.14532
Chaopu Zhang, Qiqi Liang, Yuye Yu, Shaojuan Jin, Jinmei Huang, Zhongping Xu, Erbao Liu, Wensheng Wang, Fan Zhang, Fangzhou Liu, Yingyao Shi, Fenge Li, Zhikang Li, Shuangxia Jin, Min Li
{"title":"MFMGP: an integrated machine learning fusion model for genomic prediction","authors":"Chaopu Zhang, Qiqi Liang, Yuye Yu, Shaojuan Jin, Jinmei Huang, Zhongping Xu, Erbao Liu, Wensheng Wang, Fan Zhang, Fangzhou Liu, Yingyao Shi, Fenge Li, Zhikang Li, Shuangxia Jin, Min Li","doi":"10.1111/pbi.14532","DOIUrl":null,"url":null,"abstract":"<p>Genome-wide selection (GS) represents a contemporary methodology that harnesses a comprehensive array of molecular markers across the entire genome. However, challenges such as lack of informative molecular markers and selection of appropriate and efficient GS model(s) have confined most GS-based breeding efforts to the realm of laboratory simulations (Wang <i>et al</i>., <span>2023</span>). Compared to the conventional prediction models, the machine learning (ML) algorithm provides new insights for solving challenges such as big data analysis and high-performance parallel computing. GS using ML also has some limitations at the current stage such as limitations in model selection.</p>\n<p>Here, the MFMGP software is a fusion model that is based on a variety of ML training methods. The normalization fusion method with exponential decay weights involves assigning weights to the prediction results of each model and applying the exponential decay to these weights, so that more recent and/or more relevant model predictions have higher weights. Then, a weighted average of the model's prediction results is calculated to obtain the final fusion prediction by normalizing these weights (Figure 1a). The software of MFMGP for interactive GS analyses was made available at website: http://www.biohuaxing.com/#/MFMGP. To verify the prediction accuracy of the MFMGP model, we compared MFMGP with seven commonly used GS models. These included the classical GS model (GBLUP), four ML-based models (LightGBM, SVR, XGBoost and HGBoost) and two DL-based (DNNGP and DeepCCR) models.</p>\n<figure><picture>\n<source media=\"(min-width: 1650px)\" srcset=\"/cms/asset/96b5e04f-412c-4447-a23e-77f24ecd952c/pbi14532-fig-0001-m.jpg\"/><img alt=\"Details are in the caption following the image\" data-lg-src=\"/cms/asset/96b5e04f-412c-4447-a23e-77f24ecd952c/pbi14532-fig-0001-m.jpg\" loading=\"lazy\" src=\"/cms/asset/cde0d7f6-73bd-41fa-819c-73433ebc1c88/pbi14532-fig-0001-m.png\" title=\"Details are in the caption following the image\"/></picture><figcaption>\n<div><strong>Figure 1<span style=\"font-weight:normal\"></span></strong><div>Open in figure viewer<i aria-hidden=\"true\"></i><span>PowerPoint</span></div>\n</div>\n<div>Prediction accuracy of eight methods based on three crop datasets. (a) The design and algorithmic framework for Multiple Machine Learning Fusion Model for Genomics Prediction (MFMGP). (b) Phenotypic variation of the agronomic traits in rice. (c) Performance of eight methods in predicting 13 traits using rice 3KRP (<i>n</i> = 2110). The red arrow and text box indicate the proportion by which the MFMGP model can improve accuracy compared to the other seven models. (d) Performance of eight methods in predicting six traits using wheat dataset (<i>n</i> = 2000). (e) Performance of eight methods in predicting four traits using cotton dataset (<i>n</i> = 1245). Prediction accuracy of eight methods based on maize (<i>n</i> = 6210) (f) and pig datasets (<i>n</i> = 1490) (g). (h) The relationship between prediction accuracy and heritability. (i) The relationship between prediction accuracy and sample size.</div>\n</figcaption>\n</figure>\n<p>In rice, we utilized a natural population, which consists of 3024 (3KRG) Asian cultivated rice accessions to construct the training population (Table S1). The GS accuracy of MFMGP was compared using the phenotype datasets of 2110 rice accessions for 13 yield-related and morphological traits with over 1.0 M SNPs (Figure 1b,c; Table S2). The results of the 10-fold cross-validation (CV) indicated that MFMGP exhibited the highest prediction accuracy for all 13 tested traits, with an average accuracy of 0.53, significantly (<i>P</i> &lt; 0.01) higher than that of the GBLUP model (average value = 0.36). At the same time, the prediction accuracy of MFMGP also significantly higher compared to the average of four ML models (average value = 0.45) and two DL methods (average value = 0.34) (Tables S2 and S3). Comparatively, the prediction accuracy of MFMGP had an average improved advantage of 52.9% over GBLUP, 18.4% over other all ML models, 4.2% over the best model from the four integrated ML methods and 73.3% over the DL models. Additionally, MFMGP had the smallest root mean square error (RMSE) in all 13 traits, or an average 11.1% reduced RMSE over GBLUP, 5.8% reduced RMSE over ML and 24.3% reduced RMSE over DL (Tables S2 and S4). With the sample size of 2110, the computation time using CPU (Server Configuration: Intel®X®(R)CPU E7-8860 v3 @2.20GHZ), the MFMGP model spans a slightly longer duration than the four tested ML models, but significantly shorter than the GBLUP method and DL (using GPU) methods (Table S5).</p>\n<p>We then used six traits from the 2000 Iranian bread wheat dataset to compare the prediction accuracy of the eight models using 33 709 SNPs (Figure 1d; Table S2). Compared to other seven models, the average prediction accuracy of MFMGP for all six traits was 0.65 as compared with GBLUP (0.32), DeepCCR (0.59), DNNGP (0.57), HGBoost (0.63), LightGBM (0.63), SVR (0.28) and XGBoost (0.62). The prediction accuracy of MFMGP had an average improved advantage of 2.9% over the best model from the four integrated ML methods. Using 1 122 352 SNPs and four traits from 1245 cotton accessions, MFMGP showed the highest prediction accuracy and lowest RMSE values among all methods (Figure 1e; Table S2). On average, MFMGP had an improved prediction accuracy by 12.1% and reduced RMSE by 21.9% for the four traits, when compared to the other seven methods and improved prediction accuracy by 3.5% when compared to the four integrated ML methods. Using 32 599 markers and four traits of 6210 maize samples, MFMGP showed an average prediction accuracy of 0.85, again the highest among the eight methods used, except for DTT with a similar prediction accuracy to SVR (Figure 1f; Table S2). To explore the predictive ability of MFMGP in animals, we used the IMF content phenotype and 39 614 markers of 1490 pig samples for comparing the prediction of the eight methods (Figure 1g; Table S2). MFMGP performed best among all the methods with an average improved prediction accuracy of 24.5% over GBLUP, 57.6% over the ML models, 16.2% over the best model from the four integrated ML methods and 18.5% over the DL models.</p>\n<p>To investigate the impact of trait heritability, we compared the low heritability trait data of RBSSD (<i>H</i><sup>2</sup> = 0.38) with the high heritability traits, GL (<i>H</i><sup>2</sup> = 0.94) and GW (<i>H</i><sup>2</sup> = 0.94) using MFMGP. We utilized the RBSSD phenotypic data in 2017 as the training population (<i>n</i> = 1277) to predict their phenotypes under two independent environments, yielding the prediction accuracies of 0.36 in 2016 (<i>n</i> = 606) and 0.34 in 2019 (<i>n</i> = 676), respectively. However, when we used the GL and GW from 2017 to predict their phenotypic values in 2015 and 2016 (<i>n</i> = 760), the prediction accuracy of GL and GW reached very high average values of 0.91 and 0.92, respectively (Figure 1h). The four density plots all showed that the angles between the <i>y</i> = <i>x</i> and the fitted regression line were very small in the repeated experiments across different environments (Figure S1). To verify the influence of subspecific differences on GS accuracy, we randomly selected two subgroups with the same number accessions (<i>n</i> = 500) from <i>Xian</i> and <i>Geng</i>. We used MFMGP to analyse two representative traits (GW and HD), and found that the prediction accuracy of <i>Geng</i> was higher than that of <i>Xian</i> for GW, but the opposite was true for HD. Additionally, we used the <i>Xian</i> subgroup as the training population to predict the accuracy of the <i>Geng</i> subgroup, as well as used the <i>Geng</i> as the training population to test the prediction accuracy of the <i>Xian</i>. The results showed that the prediction accuracy of one subgroup for another was extremely low (Figure S2A). The same cautions should be taken when GS is to be applied to breeding for disease resistance. As Figure S2B clearly demonstrated, the highly virulent race (V) had a much higher prediction accuracy than the weak virulent races C4 and C5. To verify the impact of different population sizes on GS, we randomly selected nine accession numbers for GS. The GS analysis results showed that the prediction accuracy of the trait improved gradually with the increase of population sizes (Figure 1i).</p>\n<p>In summary, we developed a ML fusion model for predicting the phenotypes of breeding populations for complex traits using GS. Compared with other methods, MFMGP was proven to have the following advantages. (1) Improved prediction accuracy: MFMGP was able to integrate the strengths of many classical models and reduce the biases associated with single classical models. (2) Reduced overfitting: MFMGP was able to mitigate the problem of overfitting training data commonly encountered by other single models. (3) Enhanced generalization ability: MFMGP could better capture the complex patterns and diversity in the data. (4) Robustness to errors: MFMGP could effectively reduce prediction errors due to anomalies or specific circumstances by single models through synthesizing the predictions of multiple models. (5) Exploitation of model complementarity.</p>\n<p>Currently, most GS experiments focus on predicting performances of single traits of specific populations in specific environments, neglecting the fact that most plant and animal breeding programmes are aiming at improving multiple target traits across target environments (particularly plants). The most significant factors affecting predictive accuracy are heritability and sample size. As the key parameter of the genotype–phenotype association, the higher a trait's heritability is, the more accurate a GS model would be. Conversely, low heritability leads to lower model prediction accuracy. Insufficient sample size reduces representativeness of the training population due to increased sampling error, resulting in biased estimates of genetic parameters and reduced prediction accuracy. Thus, it is necessary to collect more phenotypes of training populations of appropriate sizes across multiple target environments such that trait genetic effects and their interactions with environments can be adequately estimated and integrated into the MFMGP model. As the plant and animal functional and population genomic research progress rapidly, the greatest challenge is how to integrate accurate functional information of many genes and allelic effects on target traits into the MFMGP model in GS applications in plant and animal breeding and eventually realizing breeding by design in future.</p>","PeriodicalId":221,"journal":{"name":"Plant Biotechnology Journal","volume":"39 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Biotechnology Journal","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/pbi.14532","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Genome-wide selection (GS) represents a contemporary methodology that harnesses a comprehensive array of molecular markers across the entire genome. However, challenges such as lack of informative molecular markers and selection of appropriate and efficient GS model(s) have confined most GS-based breeding efforts to the realm of laboratory simulations (Wang et al., 2023). Compared to the conventional prediction models, the machine learning (ML) algorithm provides new insights for solving challenges such as big data analysis and high-performance parallel computing. GS using ML also has some limitations at the current stage such as limitations in model selection.

Here, the MFMGP software is a fusion model that is based on a variety of ML training methods. The normalization fusion method with exponential decay weights involves assigning weights to the prediction results of each model and applying the exponential decay to these weights, so that more recent and/or more relevant model predictions have higher weights. Then, a weighted average of the model's prediction results is calculated to obtain the final fusion prediction by normalizing these weights (Figure 1a). The software of MFMGP for interactive GS analyses was made available at website: http://www.biohuaxing.com/#/MFMGP. To verify the prediction accuracy of the MFMGP model, we compared MFMGP with seven commonly used GS models. These included the classical GS model (GBLUP), four ML-based models (LightGBM, SVR, XGBoost and HGBoost) and two DL-based (DNNGP and DeepCCR) models.

Abstract Image
Figure 1
Open in figure viewerPowerPoint
Prediction accuracy of eight methods based on three crop datasets. (a) The design and algorithmic framework for Multiple Machine Learning Fusion Model for Genomics Prediction (MFMGP). (b) Phenotypic variation of the agronomic traits in rice. (c) Performance of eight methods in predicting 13 traits using rice 3KRP (n = 2110). The red arrow and text box indicate the proportion by which the MFMGP model can improve accuracy compared to the other seven models. (d) Performance of eight methods in predicting six traits using wheat dataset (n = 2000). (e) Performance of eight methods in predicting four traits using cotton dataset (n = 1245). Prediction accuracy of eight methods based on maize (n = 6210) (f) and pig datasets (n = 1490) (g). (h) The relationship between prediction accuracy and heritability. (i) The relationship between prediction accuracy and sample size.

In rice, we utilized a natural population, which consists of 3024 (3KRG) Asian cultivated rice accessions to construct the training population (Table S1). The GS accuracy of MFMGP was compared using the phenotype datasets of 2110 rice accessions for 13 yield-related and morphological traits with over 1.0 M SNPs (Figure 1b,c; Table S2). The results of the 10-fold cross-validation (CV) indicated that MFMGP exhibited the highest prediction accuracy for all 13 tested traits, with an average accuracy of 0.53, significantly (P < 0.01) higher than that of the GBLUP model (average value = 0.36). At the same time, the prediction accuracy of MFMGP also significantly higher compared to the average of four ML models (average value = 0.45) and two DL methods (average value = 0.34) (Tables S2 and S3). Comparatively, the prediction accuracy of MFMGP had an average improved advantage of 52.9% over GBLUP, 18.4% over other all ML models, 4.2% over the best model from the four integrated ML methods and 73.3% over the DL models. Additionally, MFMGP had the smallest root mean square error (RMSE) in all 13 traits, or an average 11.1% reduced RMSE over GBLUP, 5.8% reduced RMSE over ML and 24.3% reduced RMSE over DL (Tables S2 and S4). With the sample size of 2110, the computation time using CPU (Server Configuration: Intel®X®(R)CPU E7-8860 v3 @2.20GHZ), the MFMGP model spans a slightly longer duration than the four tested ML models, but significantly shorter than the GBLUP method and DL (using GPU) methods (Table S5).

We then used six traits from the 2000 Iranian bread wheat dataset to compare the prediction accuracy of the eight models using 33 709 SNPs (Figure 1d; Table S2). Compared to other seven models, the average prediction accuracy of MFMGP for all six traits was 0.65 as compared with GBLUP (0.32), DeepCCR (0.59), DNNGP (0.57), HGBoost (0.63), LightGBM (0.63), SVR (0.28) and XGBoost (0.62). The prediction accuracy of MFMGP had an average improved advantage of 2.9% over the best model from the four integrated ML methods. Using 1 122 352 SNPs and four traits from 1245 cotton accessions, MFMGP showed the highest prediction accuracy and lowest RMSE values among all methods (Figure 1e; Table S2). On average, MFMGP had an improved prediction accuracy by 12.1% and reduced RMSE by 21.9% for the four traits, when compared to the other seven methods and improved prediction accuracy by 3.5% when compared to the four integrated ML methods. Using 32 599 markers and four traits of 6210 maize samples, MFMGP showed an average prediction accuracy of 0.85, again the highest among the eight methods used, except for DTT with a similar prediction accuracy to SVR (Figure 1f; Table S2). To explore the predictive ability of MFMGP in animals, we used the IMF content phenotype and 39 614 markers of 1490 pig samples for comparing the prediction of the eight methods (Figure 1g; Table S2). MFMGP performed best among all the methods with an average improved prediction accuracy of 24.5% over GBLUP, 57.6% over the ML models, 16.2% over the best model from the four integrated ML methods and 18.5% over the DL models.

To investigate the impact of trait heritability, we compared the low heritability trait data of RBSSD (H2 = 0.38) with the high heritability traits, GL (H2 = 0.94) and GW (H2 = 0.94) using MFMGP. We utilized the RBSSD phenotypic data in 2017 as the training population (n = 1277) to predict their phenotypes under two independent environments, yielding the prediction accuracies of 0.36 in 2016 (n = 606) and 0.34 in 2019 (n = 676), respectively. However, when we used the GL and GW from 2017 to predict their phenotypic values in 2015 and 2016 (n = 760), the prediction accuracy of GL and GW reached very high average values of 0.91 and 0.92, respectively (Figure 1h). The four density plots all showed that the angles between the y = x and the fitted regression line were very small in the repeated experiments across different environments (Figure S1). To verify the influence of subspecific differences on GS accuracy, we randomly selected two subgroups with the same number accessions (n = 500) from Xian and Geng. We used MFMGP to analyse two representative traits (GW and HD), and found that the prediction accuracy of Geng was higher than that of Xian for GW, but the opposite was true for HD. Additionally, we used the Xian subgroup as the training population to predict the accuracy of the Geng subgroup, as well as used the Geng as the training population to test the prediction accuracy of the Xian. The results showed that the prediction accuracy of one subgroup for another was extremely low (Figure S2A). The same cautions should be taken when GS is to be applied to breeding for disease resistance. As Figure S2B clearly demonstrated, the highly virulent race (V) had a much higher prediction accuracy than the weak virulent races C4 and C5. To verify the impact of different population sizes on GS, we randomly selected nine accession numbers for GS. The GS analysis results showed that the prediction accuracy of the trait improved gradually with the increase of population sizes (Figure 1i).

In summary, we developed a ML fusion model for predicting the phenotypes of breeding populations for complex traits using GS. Compared with other methods, MFMGP was proven to have the following advantages. (1) Improved prediction accuracy: MFMGP was able to integrate the strengths of many classical models and reduce the biases associated with single classical models. (2) Reduced overfitting: MFMGP was able to mitigate the problem of overfitting training data commonly encountered by other single models. (3) Enhanced generalization ability: MFMGP could better capture the complex patterns and diversity in the data. (4) Robustness to errors: MFMGP could effectively reduce prediction errors due to anomalies or specific circumstances by single models through synthesizing the predictions of multiple models. (5) Exploitation of model complementarity.

Currently, most GS experiments focus on predicting performances of single traits of specific populations in specific environments, neglecting the fact that most plant and animal breeding programmes are aiming at improving multiple target traits across target environments (particularly plants). The most significant factors affecting predictive accuracy are heritability and sample size. As the key parameter of the genotype–phenotype association, the higher a trait's heritability is, the more accurate a GS model would be. Conversely, low heritability leads to lower model prediction accuracy. Insufficient sample size reduces representativeness of the training population due to increased sampling error, resulting in biased estimates of genetic parameters and reduced prediction accuracy. Thus, it is necessary to collect more phenotypes of training populations of appropriate sizes across multiple target environments such that trait genetic effects and their interactions with environments can be adequately estimated and integrated into the MFMGP model. As the plant and animal functional and population genomic research progress rapidly, the greatest challenge is how to integrate accurate functional information of many genes and allelic effects on target traits into the MFMGP model in GS applications in plant and animal breeding and eventually realizing breeding by design in future.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MFMGP:基因组预测的集成机器学习融合模型
全基因组选择(GS)代表了一种当代的方法,它利用了整个基因组的分子标记的综合阵列。然而,诸如缺乏信息丰富的分子标记和选择适当和有效的GS模型等挑战将大多数基于GS的育种工作限制在实验室模拟领域(Wang et al., 2023)。与传统预测模型相比,机器学习(ML)算法为解决大数据分析和高性能并行计算等挑战提供了新的见解。使用ML的GS在当前阶段也有一些局限性,比如模型选择的局限性。这里,MFMGP软件是一个基于多种ML训练方法的融合模型。指数衰减权值的归一化融合方法是为每个模型的预测结果分配权重,并对这些权重应用指数衰减,从而使更近期和/或更相关的模型预测具有更高的权重。然后,计算模型预测结果的加权平均值,通过将这些权重归一化得到最终的融合预测(图1a)。交互式GS分析的MFMGP软件可在网站上获得:http://www.biohuaxing.com/#/MFMGP。为了验证MFMGP模型的预测精度,我们将MFMGP与7种常用的GS模型进行了比较。其中包括经典的GS模型(GBLUP),四个基于ml的模型(LightGBM, SVR, XGBoost和HGBoost)和两个基于dl的模型(DNNGP和DeepCCR)。图1在图视图中打开powerpoint8种方法基于3个裁剪数据集的预测精度。(a)基因组学预测多机器学习融合模型(MFMGP)的设计和算法框架。(b)水稻农艺性状的表型变异。(c) 8种方法对水稻3KRP 13个性状的预测效果(n = 2110)。红色箭头和文本框表示与其他七个模型相比,MFMGP模型可以提高精度的比例。(d)利用小麦数据集(n = 2000)预测6个性状的8种方法的性能。(e) 8种方法对棉花数据(n = 1245) 4个性状的预测效果。基于玉米(n = 6210) (f)和猪(n = 1490)数据集的8种方法的预测精度(g)。(h)预测精度与遗传力的关系。(i)预测精度与样本量之间的关系。在水稻方面,我们使用了一个由3024 (3KRG)亚洲栽培水稻品种组成的自然种群来构建训练种群(表S1)。利用2110份水稻材料的表型数据集,比较了MFMGP的GS精度,其中13个产量相关性状和形态性状的snp超过1.0 M(图1b,c;表S2)。10倍交叉验证(CV)结果表明,MFMGP对13个性状的预测准确率最高,平均准确率为0.53,显著(P &lt; 0.01)高于GBLUP模型(平均值= 0.36)。同时,MFMGP的预测精度也显著高于四种ML模型(平均值= 0.45)和两种DL方法(平均值= 0.34)的平均值(表S2和S3)。相比之下,MFMGP的预测精度比GBLUP平均提高52.9%,比其他所有ML模型平均提高18.4%,比四种综合ML方法的最佳模型平均提高4.2%,比DL模型平均提高73.3%。此外,MFMGP在所有13个性状中均具有最小的均方根误差(RMSE),平均比GBLUP降低11.1%,比ML降低5.8%,比DL降低24.3%(表S2和S4)。当样本数量为2110时,使用CPU(服务器配置:Intel®X®(R)CPU E7-8860 v3 @2.20GHZ)的计算时间,MFMGP模型的持续时间略长于四种被测试的ML模型,但明显短于GBLUP方法和DL(使用GPU)方法(表S5)。然后,我们使用来自2000年伊朗面包小麦数据集的6个性状来比较使用33 709个snp的8个模型的预测精度(图1d;表S2)。与其他7个模型相比,MFMGP对6个性状的平均预测精度为0.65,高于GBLUP(0.32)、DeepCCR(0.59)、DNNGP(0.57)、HGBoost(0.63)、LightGBM(0.63)、SVR(0.28)和XGBoost(0.62)。MFMGP的预测精度比四种综合ML方法的最佳模型平均提高2.9%。利用1245份棉花材料的1 122 352个snp和4个性状,MFMGP在所有方法中预测精度最高,RMSE值最低(图1e;表S2)。平均而言,与其他7种方法相比,MFMGP对4个性状的预测精度提高了12.1%,RMSE降低了21.9%,与4种综合ML方法相比,预测精度提高了3.5%。 使用32599个标记和6210个玉米样品的4个性状,MFMGP的平均预测精度为0.85,在8种方法中仍然是最高的,除了DTT的预测精度与SVR相似(图1f;表S2)。为了探索MFMGP在动物中的预测能力,我们使用1490个猪样本的IMF含量表型和39614个标记来比较八种方法的预测能力(图1g;表S2)。MFMGP在所有方法中表现最好,平均预测精度比GBLUP提高24.5%,比ML模型提高57.6%,比四种综合ML模型的最佳模型提高16.2%,比DL模型提高18.5%。为了研究性状遗传力的影响,利用MFMGP将RBSSD低遗传力性状数据(H2 = 0.38)与高遗传力性状GL (H2 = 0.94)和GW (H2 = 0.94)进行比较。我们利用2017年的RBSSD表型数据作为训练群体(n = 1277)来预测两种独立环境下的表型,2016年(n = 606)和2019年(n = 676)的预测精度分别为0.36和0.34。然而,当我们使用2017年的GL和GW预测2015年和2016年(n = 760)的表型值时,GL和GW的预测精度达到了非常高的平均值,分别为0.91和0.92(图1)。四个密度图均显示,在不同环境的重复实验中,y = x与拟合回归线之间的夹角都很小(图S1)。为了验证亚特异性差异对GS精度的影响,我们从西安和耿随机选择了两个亚组,它们的数据数量相同(n = 500)。利用MFMGP对两个代表性性状(GW和HD)进行分析,发现耿对GW的预测精度高于西安,而对HD的预测精度则相反。此外,我们使用Xian子组作为训练总体来预测耿子组的准确性,并使用耿子组作为训练总体来测试Xian的预测准确性。结果显示,一个亚组对另一个亚组的预测精度极低(图S2A)。当将GS应用于抗病育种时,也应采取同样的谨慎态度。如图S2B所示,高毒力小种(V)的预测准确率远高于弱毒性小种C4和C5。为了验证不同种群大小对GS的影响,我们随机选择了9个GS的加入号。GS分析结果显示,随着群体规模的增加,该性状的预测精度逐渐提高(图1i)。综上所述,我们建立了一个ML融合模型,用于利用GS预测复杂性状的育种群体表型。与其他方法相比,MFMGP被证明具有以下优点:(1)预测精度提高:MFMGP能够整合多个经典模型的优点,减少单个经典模型的偏差。(2)减少过拟合:MFMGP能够缓解其他单一模型经常遇到的训练数据过拟合问题。(3)泛化能力增强:MFMGP能够更好地捕捉数据中的复杂模式和多样性。(4)对误差的鲁棒性:MFMGP通过综合多个模型的预测,可以有效减少单个模型由于异常或特定情况导致的预测误差。(5)利用模式互补性。目前,大多数GS实验侧重于预测特定种群在特定环境下的单一性状的表现,而忽略了大多数动植物育种计划旨在改善目标环境(特别是植物)中的多个目标性状的事实。影响预测准确性的最重要因素是遗传力和样本量。遗传力作为基因型-表型关联的关键参数,遗传力越高,GS模型越准确。反之,遗传力低导致模型预测精度较低。由于采样误差增加,样本量不足降低了训练总体的代表性,导致遗传参数估计有偏,预测精度降低。因此,有必要在多个目标环境中收集更多适当大小的训练群体的表型,以便充分估计性状遗传效应及其与环境的相互作用,并将其整合到MFMGP模型中。随着动植物功能基因组和群体基因组研究的迅速发展,如何将众多基因的准确功能信息和目标性状的等位基因效应整合到MFMGP模型中,从而在未来的动植物遗传育种中应用,最终实现设计育种,是最大的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Plant Biotechnology Journal
Plant Biotechnology Journal 生物-生物工程与应用微生物
CiteScore
20.50
自引率
2.90%
发文量
201
审稿时长
1 months
期刊介绍: Plant Biotechnology Journal aspires to publish original research and insightful reviews of high impact, authored by prominent researchers in applied plant science. The journal places a special emphasis on molecular plant sciences and their practical applications through plant biotechnology. Our goal is to establish a platform for showcasing significant advances in the field, encompassing curiosity-driven studies with potential applications, strategic research in plant biotechnology, scientific analysis of crucial issues for the beneficial utilization of plant sciences, and assessments of the performance of plant biotechnology products in practical applications.
期刊最新文献
An efficient mRNA delivery system for genome editing in plants BR signalling haplotypes contribute to indica–japonica differentiation for grain yield and quality in rice A split ribozyme system for in vivo plant RNA imaging and genetic engineering Beyond species and spatial boundaries: Enabling long-distance gene silencing in plants via guanidinium-siRNA nanoparticles Manipulation of the brown glume and internode 1 gene leads to alterations in the colouration of lignified tissues, lignin content and pathogen resistance in wheat
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1