Yansen Chen , Hadi Atashi , Jiayi Qu , Pauline Delhez , Daniel Runcie , Hélène Soyeurt , Nicolas Gengler
{"title":"探索基于贝叶斯稀疏因子模型的策略,对动物育种中的数千个近红外光谱性状进行遗传分析。","authors":"Yansen Chen , Hadi Atashi , Jiayi Qu , Pauline Delhez , Daniel Runcie , Hélène Soyeurt , Nicolas Gengler","doi":"10.3168/jds.2023-24319","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of animal phenomics and deep phenotyping, we can obtain thousands of traditional (but also molecular) phenotypes per individual. However, there is still a lack of exploration regarding how to handle this huge amount of data in the context of animal breeding, presenting a challenge that we are likely to encounter more and more in the future. This study aimed to (1) explore the use of the mega-scale linear mixed model (MegaLMM), a factor model-based approach that is able to simultaneously estimate (co)variance components and genetic parameters in the context of thousands of milk traits, hereafter called thousand-trait (TT) models; (2) compare the phenotype values and genomic breeding value (<strong>u</strong>) predictions for focal traits (i.e., traits that are targeted for prediction, compared with secondary traits that are helping to evaluate), from single-trait (ST) and TT models, respectively; (3) propose a new approximate method of GEBV (<strong>U</strong>) prediction with TT models and MegaLMM. We used a total of 3,421 milk mid-infrared (MIR) spectra wavepoints (called secondary traits) and 3 focal traits (average fat percentage [AFP], average methane production [ACH4], and average SCS [ASCS]) collected on 3,302 first-parity Holstein cows. The 3,421 milk MIR wavepoint traits were composed of 311 wavepoints in 11 classes (months in lactation). Genotyping information of 564,439 SNPs was available for all animals and was used to calculate the genomic relationship matrix. The MegaLMM was implemented in the framework of the Bayesian sparse factor model and solved through Gibbs sampling (Markov chain Monte Carlo). The heritabilities of the studied 3,421 milk MIR wavepoints gradually increased and then decreased in units of 311 wavepoints throughout the lactation. The genetic and phenotypic correlations between the first 311 wavepoints and the other 3,110 wavepoints were low. The accuracies of phenotype predictions from the ST model were lower than those from the TT model for AFP (0.51 vs. 0.93), ACH4 (0.30 vs. 0.86), and ASCS (0.14 vs. 0.33). The same trend was observed for the accuracies of <strong>u</strong> predictions for AFP (0.59 vs. 0.86), ACH4 (0.47 vs. 0.78), and ASCS (0.39 vs. 0.59). The average correlation between <strong>U</strong> predicted from the TT model and the new approximate method was 0.90. The new approximate method used for estimating <strong>U</strong> in MegaLMM will enhance the suitability of MegaLMM for applications in animal breeding. This study conducted an initial investigation into the application of thousands of traits in animal breeding and showed that the TT model is beneficial for the prediction of focal traits (phenotype and breeding values), especially for difficult-to-measure traits (e.g., ACH4).</div></div>","PeriodicalId":354,"journal":{"name":"Journal of Dairy Science","volume":"107 11","pages":"Pages 9615-9627"},"PeriodicalIF":3.7000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring a Bayesian sparse factor model-based strategy for the genetic analysis of thousands of mid-infrared spectra traits for animal breeding\",\"authors\":\"Yansen Chen , Hadi Atashi , Jiayi Qu , Pauline Delhez , Daniel Runcie , Hélène Soyeurt , Nicolas Gengler\",\"doi\":\"10.3168/jds.2023-24319\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid development of animal phenomics and deep phenotyping, we can obtain thousands of traditional (but also molecular) phenotypes per individual. However, there is still a lack of exploration regarding how to handle this huge amount of data in the context of animal breeding, presenting a challenge that we are likely to encounter more and more in the future. This study aimed to (1) explore the use of the mega-scale linear mixed model (MegaLMM), a factor model-based approach that is able to simultaneously estimate (co)variance components and genetic parameters in the context of thousands of milk traits, hereafter called thousand-trait (TT) models; (2) compare the phenotype values and genomic breeding value (<strong>u</strong>) predictions for focal traits (i.e., traits that are targeted for prediction, compared with secondary traits that are helping to evaluate), from single-trait (ST) and TT models, respectively; (3) propose a new approximate method of GEBV (<strong>U</strong>) prediction with TT models and MegaLMM. We used a total of 3,421 milk mid-infrared (MIR) spectra wavepoints (called secondary traits) and 3 focal traits (average fat percentage [AFP], average methane production [ACH4], and average SCS [ASCS]) collected on 3,302 first-parity Holstein cows. The 3,421 milk MIR wavepoint traits were composed of 311 wavepoints in 11 classes (months in lactation). Genotyping information of 564,439 SNPs was available for all animals and was used to calculate the genomic relationship matrix. The MegaLMM was implemented in the framework of the Bayesian sparse factor model and solved through Gibbs sampling (Markov chain Monte Carlo). The heritabilities of the studied 3,421 milk MIR wavepoints gradually increased and then decreased in units of 311 wavepoints throughout the lactation. The genetic and phenotypic correlations between the first 311 wavepoints and the other 3,110 wavepoints were low. The accuracies of phenotype predictions from the ST model were lower than those from the TT model for AFP (0.51 vs. 0.93), ACH4 (0.30 vs. 0.86), and ASCS (0.14 vs. 0.33). The same trend was observed for the accuracies of <strong>u</strong> predictions for AFP (0.59 vs. 0.86), ACH4 (0.47 vs. 0.78), and ASCS (0.39 vs. 0.59). The average correlation between <strong>U</strong> predicted from the TT model and the new approximate method was 0.90. The new approximate method used for estimating <strong>U</strong> in MegaLMM will enhance the suitability of MegaLMM for applications in animal breeding. This study conducted an initial investigation into the application of thousands of traits in animal breeding and showed that the TT model is beneficial for the prediction of focal traits (phenotype and breeding values), especially for difficult-to-measure traits (e.g., ACH4).</div></div>\",\"PeriodicalId\":354,\"journal\":{\"name\":\"Journal of Dairy Science\",\"volume\":\"107 11\",\"pages\":\"Pages 9615-9627\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Dairy Science\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0022030224009755\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dairy Science","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022030224009755","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
引用次数: 0
摘要
随着动物表型组学和深度表型技术的快速发展,我们可以获得每个个体成千上万的传统表型和分子表型。然而,如何在动物育种中处理这些海量数据仍缺乏探索,这也是我们未来可能会遇到的越来越多的挑战。本研究旨在:(1)探索使用基于因子模型的巨尺度线性混合模型(MegaLMM),该模型可同时估计数千个奶牛性状的(共)方差分量和遗传参数,以下称千性状(TT)模型;(2)比较重点性状(即:作为预测目标的性状)的表型值和基因组育种值(u)预测值、(3) 提出一种利用 TT 模型和 MegaLMM 预测基因组育种值(U)的新近似方法。研究使用了从 3,302 头头等荷斯坦奶牛身上采集的 3,421 个牛奶中红外光谱波点(称为次要性状)和 3 个重点性状[平均脂肪率(Fat)、平均甲烷(CH4)和平均体细胞评分(SCS)]。3421 个牛奶 MIR 波点性状由 11 个等级(泌乳月份)的 311 个波点组成。所有动物都有 564,439 个 SNP 的基因分型信息,用于计算基因组关系矩阵。MegaLMM 在贝叶斯稀疏因子模型的框架内实现,并通过吉布斯采样(马尔科夫链蒙特卡罗)求解。所研究的 3,421 个牛奶 MIR 波点的遗传力在整个泌乳期以 311 个波点为单位逐渐增大,然后减小。前 311 个波点与其他 3,110 个波点之间的遗传和表型相关性较低。在脂肪(0.51 对 0.93)、CH4(0.30 对 0.86)和 SCS(0.14 对 0.33)方面,ST 模型的表型预测准确率低于 TT 模型。u 预测的准确度也呈现出同样的趋势:脂肪(0.59 vs. 0.86)、CH4(0.47 vs. 0.78)和 SCS(0.39 vs. 0.59)。TT 模型和新近似方法预测的 U 平均相关性为 0.90。在 MegaLMM 中用于估计 U 的新近似方法将提高 MegaLMM 在动物育种应用中的适用性。本研究对动物育种中数千个性状的应用进行了初步调查,结果表明 TT 模型有利于预测重点性状(表型和育种值),尤其是难以测量的性状(如 CH4)。
Exploring a Bayesian sparse factor model-based strategy for the genetic analysis of thousands of mid-infrared spectra traits for animal breeding
With the rapid development of animal phenomics and deep phenotyping, we can obtain thousands of traditional (but also molecular) phenotypes per individual. However, there is still a lack of exploration regarding how to handle this huge amount of data in the context of animal breeding, presenting a challenge that we are likely to encounter more and more in the future. This study aimed to (1) explore the use of the mega-scale linear mixed model (MegaLMM), a factor model-based approach that is able to simultaneously estimate (co)variance components and genetic parameters in the context of thousands of milk traits, hereafter called thousand-trait (TT) models; (2) compare the phenotype values and genomic breeding value (u) predictions for focal traits (i.e., traits that are targeted for prediction, compared with secondary traits that are helping to evaluate), from single-trait (ST) and TT models, respectively; (3) propose a new approximate method of GEBV (U) prediction with TT models and MegaLMM. We used a total of 3,421 milk mid-infrared (MIR) spectra wavepoints (called secondary traits) and 3 focal traits (average fat percentage [AFP], average methane production [ACH4], and average SCS [ASCS]) collected on 3,302 first-parity Holstein cows. The 3,421 milk MIR wavepoint traits were composed of 311 wavepoints in 11 classes (months in lactation). Genotyping information of 564,439 SNPs was available for all animals and was used to calculate the genomic relationship matrix. The MegaLMM was implemented in the framework of the Bayesian sparse factor model and solved through Gibbs sampling (Markov chain Monte Carlo). The heritabilities of the studied 3,421 milk MIR wavepoints gradually increased and then decreased in units of 311 wavepoints throughout the lactation. The genetic and phenotypic correlations between the first 311 wavepoints and the other 3,110 wavepoints were low. The accuracies of phenotype predictions from the ST model were lower than those from the TT model for AFP (0.51 vs. 0.93), ACH4 (0.30 vs. 0.86), and ASCS (0.14 vs. 0.33). The same trend was observed for the accuracies of u predictions for AFP (0.59 vs. 0.86), ACH4 (0.47 vs. 0.78), and ASCS (0.39 vs. 0.59). The average correlation between U predicted from the TT model and the new approximate method was 0.90. The new approximate method used for estimating U in MegaLMM will enhance the suitability of MegaLMM for applications in animal breeding. This study conducted an initial investigation into the application of thousands of traits in animal breeding and showed that the TT model is beneficial for the prediction of focal traits (phenotype and breeding values), especially for difficult-to-measure traits (e.g., ACH4).
期刊介绍:
The official journal of the American Dairy Science Association®, Journal of Dairy Science® (JDS) is the leading peer-reviewed general dairy research journal in the world. JDS readers represent education, industry, and government agencies in more than 70 countries with interests in biochemistry, breeding, economics, engineering, environment, food science, genetics, microbiology, nutrition, pathology, physiology, processing, public health, quality assurance, and sanitation.