Joseph A Thorsrud, Katy M Evans, Kyle C Quigley, Krishnamoorthy Srikanth, Heather J Huson
{"title":"基因组最佳线性无偏预测和四种机器学习模型在工作犬基因组育种值估计中的性能比较。","authors":"Joseph A Thorsrud, Katy M Evans, Kyle C Quigley, Krishnamoorthy Srikanth, Heather J Huson","doi":"10.3390/ani15030408","DOIUrl":null,"url":null,"abstract":"<p><p>This study investigates the efficacy of various genomic prediction models-Genomic Best Linear Unbiased Prediction (GBLUP), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), and Multilayer Perceptron (MLP)-in predicting genomic breeding values (gEBVs). The phenotypic data include three binary health traits (anodontia, distichiasis, oral papillomatosis) and one behavioral trait (distraction) in a population of guide dogs. These traits impact the potential for success in guide dogs and are therefore routinely characterized but were chosen based on differences in heritability and case counts specifically to assess gEBV model performance. Utilizing a dataset from The Seeing Eye organization, which includes German Shepherds (<i>n</i> = 482), Golden Retrievers (<i>n</i> = 239), Labrador Retrievers (<i>n</i> = 1188), and Labrador and Golden Retriever crosses (<i>n</i> = 111), we assessed model performance within and across different breeds, trait heritability, case counts, and SNP marker densities. Our results indicate that no significant differences were found in model performance across varying heritabilities, case counts, or SNP densities, with all models performing similarly. Given its lack of need for parameter optimization, GBLUP was the most efficient model. Distichiasis showed the highest overall predictive performance, likely due to its higher heritability, while anodontia and distraction exhibited moderate accuracy, and oral papillomatosis had the lowest accuracy, correlating with its low heritability. These findings underscore that lower density SNP datasets can effectively construct gEBVs, suggesting that high-cost, high-density genotyping may not always be necessary. Additionally, the similar performance of all models indicates that simpler models like GBLUP, which requires less fine tuning, may be sufficient for genomic prediction in canine breeding programs. The research highlights the importance of standardized phenotypic assessments and carefully constructed reference populations to optimize the utility of genomic selection in canine breeding programs.</p>","PeriodicalId":7955,"journal":{"name":"Animals","volume":"15 3","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11816165/pdf/","citationCount":"0","resultStr":"{\"title\":\"Performance Comparison of Genomic Best Linear Unbiased Prediction and Four Machine Learning Models for Estimating Genomic Breeding Values in Working Dogs.\",\"authors\":\"Joseph A Thorsrud, Katy M Evans, Kyle C Quigley, Krishnamoorthy Srikanth, Heather J Huson\",\"doi\":\"10.3390/ani15030408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study investigates the efficacy of various genomic prediction models-Genomic Best Linear Unbiased Prediction (GBLUP), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), and Multilayer Perceptron (MLP)-in predicting genomic breeding values (gEBVs). The phenotypic data include three binary health traits (anodontia, distichiasis, oral papillomatosis) and one behavioral trait (distraction) in a population of guide dogs. These traits impact the potential for success in guide dogs and are therefore routinely characterized but were chosen based on differences in heritability and case counts specifically to assess gEBV model performance. Utilizing a dataset from The Seeing Eye organization, which includes German Shepherds (<i>n</i> = 482), Golden Retrievers (<i>n</i> = 239), Labrador Retrievers (<i>n</i> = 1188), and Labrador and Golden Retriever crosses (<i>n</i> = 111), we assessed model performance within and across different breeds, trait heritability, case counts, and SNP marker densities. Our results indicate that no significant differences were found in model performance across varying heritabilities, case counts, or SNP densities, with all models performing similarly. Given its lack of need for parameter optimization, GBLUP was the most efficient model. Distichiasis showed the highest overall predictive performance, likely due to its higher heritability, while anodontia and distraction exhibited moderate accuracy, and oral papillomatosis had the lowest accuracy, correlating with its low heritability. These findings underscore that lower density SNP datasets can effectively construct gEBVs, suggesting that high-cost, high-density genotyping may not always be necessary. Additionally, the similar performance of all models indicates that simpler models like GBLUP, which requires less fine tuning, may be sufficient for genomic prediction in canine breeding programs. The research highlights the importance of standardized phenotypic assessments and carefully constructed reference populations to optimize the utility of genomic selection in canine breeding programs.</p>\",\"PeriodicalId\":7955,\"journal\":{\"name\":\"Animals\",\"volume\":\"15 3\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-02-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11816165/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Animals\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.3390/ani15030408\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Animals","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3390/ani15030408","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
Performance Comparison of Genomic Best Linear Unbiased Prediction and Four Machine Learning Models for Estimating Genomic Breeding Values in Working Dogs.
This study investigates the efficacy of various genomic prediction models-Genomic Best Linear Unbiased Prediction (GBLUP), Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), and Multilayer Perceptron (MLP)-in predicting genomic breeding values (gEBVs). The phenotypic data include three binary health traits (anodontia, distichiasis, oral papillomatosis) and one behavioral trait (distraction) in a population of guide dogs. These traits impact the potential for success in guide dogs and are therefore routinely characterized but were chosen based on differences in heritability and case counts specifically to assess gEBV model performance. Utilizing a dataset from The Seeing Eye organization, which includes German Shepherds (n = 482), Golden Retrievers (n = 239), Labrador Retrievers (n = 1188), and Labrador and Golden Retriever crosses (n = 111), we assessed model performance within and across different breeds, trait heritability, case counts, and SNP marker densities. Our results indicate that no significant differences were found in model performance across varying heritabilities, case counts, or SNP densities, with all models performing similarly. Given its lack of need for parameter optimization, GBLUP was the most efficient model. Distichiasis showed the highest overall predictive performance, likely due to its higher heritability, while anodontia and distraction exhibited moderate accuracy, and oral papillomatosis had the lowest accuracy, correlating with its low heritability. These findings underscore that lower density SNP datasets can effectively construct gEBVs, suggesting that high-cost, high-density genotyping may not always be necessary. Additionally, the similar performance of all models indicates that simpler models like GBLUP, which requires less fine tuning, may be sufficient for genomic prediction in canine breeding programs. The research highlights the importance of standardized phenotypic assessments and carefully constructed reference populations to optimize the utility of genomic selection in canine breeding programs.
AnimalsAgricultural and Biological Sciences-Animal Science and Zoology
CiteScore
4.90
自引率
16.70%
发文量
3015
审稿时长
20.52 days
期刊介绍:
Animals (ISSN 2076-2615) is an international and interdisciplinary scholarly open access journal. It publishes original research articles, reviews, communications, and short notes that are relevant to any field of study that involves animals, including zoology, ethnozoology, animal science, animal ethics and animal welfare. However, preference will be given to those articles that provide an understanding of animals within a larger context (i.e., the animals'' interactions with the outside world, including humans). There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental details and/or method of study, must be provided for research articles. Articles submitted that involve subjecting animals to unnecessary pain or suffering will not be accepted, and all articles must be submitted with the necessary ethical approval (please refer to the Ethical Guidelines for more information).