基因组最佳线性无偏预测和四种机器学习模型在工作犬基因组育种值估计中的性能比较。

IF 3.2 2区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE Animals Pub Date : 2025-02-02 DOI:10.3390/ani15030408

Joseph A Thorsrud, Katy M Evans, Kyle C Quigley, Krishnamoorthy Srikanth, Heather J Huson

{"title":"基因组最佳线性无偏预测和四种机器学习模型在工作犬基因组育种值估计中的性能比较。","authors":"Joseph A Thorsrud, Katy M Evans, Kyle C Quigley, Krishnamoorthy Srikanth, Heather J Huson","doi":"10.3390/ani15030408","DOIUrl":null,"url":null,"abstract":"This study investigates the efficacy of various genomic prediction models-Genomic Best Linear Unbiased Prediction (GBLUP), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), and Multilayer Perceptron (MLP)-in predicting genomic breeding values (gEBVs). The phenotypic data include three binary health traits (anodontia, distichiasis, oral papillomatosis) and one behavioral trait (distraction) in a population of guide dogs. These traits impact the potential for success in guide dogs and are therefore routinely characterized but were chosen based on differences in heritability and case counts specifically to assess gEBV model performance. Utilizing a dataset from The Seeing Eye organization, which includes German Shepherds (n = 482), Golden Retrievers (n = 239), Labrador Retrievers (n = 1188), and Labrador and Golden Retriever crosses (n = 111), we assessed model performance within and across different breeds, trait heritability, case counts, and SNP marker densities. Our results indicate that no significant differences were found in model performance across varying heritabilities, case counts, or SNP densities, with all models performing similarly. Given its lack of need for parameter optimization, GBLUP was the most efficient model. Distichiasis showed the highest overall predictive performance, likely due to its higher heritability, while anodontia and distraction exhibited moderate accuracy, and oral papillomatosis had the lowest accuracy, correlating with its low heritability. These findings underscore that lower density SNP datasets can effectively construct gEBVs, suggesting that high-cost, high-density genotyping may not always be necessary. Additionally, the similar performance of all models indicates that simpler models like GBLUP, which requires less fine tuning, may be sufficient for genomic prediction in canine breeding programs. The research highlights the importance of standardized phenotypic assessments and carefully constructed reference populations to optimize the utility of genomic selection in canine breeding programs.","PeriodicalId":7955,"journal":{"name":"Animals","volume":"15 3","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11816165/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance Comparison of Genomic Best Linear Unbiased Prediction and Four Machine Learning Models for Estimating Genomic Breeding Values in Working Dogs.\",\"authors\":\"Joseph A Thorsrud, Katy M Evans, Kyle C Quigley, Krishnamoorthy Srikanth, Heather J Huson\",\"doi\":\"10.3390/ani15030408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study investigates the efficacy of various genomic prediction models-Genomic Best Linear Unbiased Prediction (GBLUP), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), and Multilayer Perceptron (MLP)-in predicting genomic breeding values (gEBVs). The phenotypic data include three binary health traits (anodontia, distichiasis, oral papillomatosis) and one behavioral trait (distraction) in a population of guide dogs. These traits impact the potential for success in guide dogs and are therefore routinely characterized but were chosen based on differences in heritability and case counts specifically to assess gEBV model performance. Utilizing a dataset from The Seeing Eye organization, which includes German Shepherds (n = 482), Golden Retrievers (n = 239), Labrador Retrievers (n = 1188), and Labrador and Golden Retriever crosses (n = 111), we assessed model performance within and across different breeds, trait heritability, case counts, and SNP marker densities. Our results indicate that no significant differences were found in model performance across varying heritabilities, case counts, or SNP densities, with all models performing similarly. Given its lack of need for parameter optimization, GBLUP was the most efficient model. Distichiasis showed the highest overall predictive performance, likely due to its higher heritability, while anodontia and distraction exhibited moderate accuracy, and oral papillomatosis had the lowest accuracy, correlating with its low heritability. These findings underscore that lower density SNP datasets can effectively construct gEBVs, suggesting that high-cost, high-density genotyping may not always be necessary. Additionally, the similar performance of all models indicates that simpler models like GBLUP, which requires less fine tuning, may be sufficient for genomic prediction in canine breeding programs. The research highlights the importance of standardized phenotypic assessments and carefully constructed reference populations to optimize the utility of genomic selection in canine breeding programs.\",\"PeriodicalId\":7955,\"journal\":{\"name\":\"Animals\",\"volume\":\"15 3\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-02-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11816165/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Animals\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.3390/ani15030408\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Animals","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3390/ani15030408","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

本研究探讨了各种基因组预测模型——基因组最佳线性无偏预测（GBLUP）、随机森林（RF）、支持向量机（SVM）、极端梯度增强（XGB）和多层感知器（MLP）——在预测基因组育种值（gEBVs）中的功效。表型数据包括导盲犬种群的三个二元健康特征（畸形、双牙病、口腔乳头状瘤病）和一个行为特征（注意力分散）。这些特征影响着导盲犬成功的潜力，因此通常被描述为特征，但根据遗传差异和病例数来选择，专门评估gEBV模型的性能。利用The Seeing Eye组织的数据集，其中包括德国牧羊犬（n = 482）、金毛猎犬（n = 239）、拉布拉多猎犬（n = 1188）和拉布拉多与金毛猎犬杂交（n = 111），我们评估了不同品种内部和不同品种的模型性能、性状遗传力、病例数和SNP标记密度。我们的研究结果表明，在不同的遗传率、病例数或SNP密度下，模型的性能没有显著差异，所有模型的性能都相似。由于不需要参数优化，GBLUP是最有效的模型。双牙病表现出最高的总体预测性能，可能是由于其较高的遗传率，而无牙症和分心表现出中等的准确性，而口腔乳头状瘤病的准确性最低，与其低遗传率相关。这些发现强调，低密度SNP数据集可以有效地构建gebv，这表明高成本、高密度的基因分型可能并不总是必要的。此外，所有模型的相似性能表明，像GBLUP这样更简单的模型，需要更少的微调，可能足以用于犬类育种计划的基因组预测。该研究强调了标准化表型评估和精心构建参考种群的重要性，以优化犬类育种计划中基因组选择的效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Performance Comparison of Genomic Best Linear Unbiased Prediction and Four Machine Learning Models for Estimating Genomic Breeding Values in Working Dogs.

This study investigates the efficacy of various genomic prediction models-Genomic Best Linear Unbiased Prediction (GBLUP), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), and Multilayer Perceptron (MLP)-in predicting genomic breeding values (gEBVs). The phenotypic data include three binary health traits (anodontia, distichiasis, oral papillomatosis) and one behavioral trait (distraction) in a population of guide dogs. These traits impact the potential for success in guide dogs and are therefore routinely characterized but were chosen based on differences in heritability and case counts specifically to assess gEBV model performance. Utilizing a dataset from The Seeing Eye organization, which includes German Shepherds (n = 482), Golden Retrievers (n = 239), Labrador Retrievers (n = 1188), and Labrador and Golden Retriever crosses (n = 111), we assessed model performance within and across different breeds, trait heritability, case counts, and SNP marker densities. Our results indicate that no significant differences were found in model performance across varying heritabilities, case counts, or SNP densities, with all models performing similarly. Given its lack of need for parameter optimization, GBLUP was the most efficient model. Distichiasis showed the highest overall predictive performance, likely due to its higher heritability, while anodontia and distraction exhibited moderate accuracy, and oral papillomatosis had the lowest accuracy, correlating with its low heritability. These findings underscore that lower density SNP datasets can effectively construct gEBVs, suggesting that high-cost, high-density genotyping may not always be necessary. Additionally, the similar performance of all models indicates that simpler models like GBLUP, which requires less fine tuning, may be sufficient for genomic prediction in canine breeding programs. The research highlights the importance of standardized phenotypic assessments and carefully constructed reference populations to optimize the utility of genomic selection in canine breeding programs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Animals Agricultural and Biological Sciences-Animal Science and Zoology

CiteScore

4.90

自引率

16.70%

发文量

3015

审稿时长

20.52 days

期刊介绍： Animals (ISSN 2076-2615) is an international and interdisciplinary scholarly open access journal. It publishes original research articles, reviews, communications, and short notes that are relevant to any field of study that involves animals, including zoology, ethnozoology, animal science, animal ethics and animal welfare. However, preference will be given to those articles that provide an understanding of animals within a larger context (i.e., the animals'' interactions with the outside world, including humans). There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental details and/or method of study, must be provided for research articles. Articles submitted that involve subjecting animals to unnecessary pain or suffering will not be accepted, and all articles must be submitted with the necessary ethical approval (please refer to the Ethical Guidelines for more information).