Theodore J Morley, Drew Willimitis, Michael Ripperger, Hyunjoon Lee, Yu Zhou, Lide Han, Jooeun Kang, William U Meyerson, Jordan W Smoller, Karmel W Choi, Colin G Walsh, Douglas M Ruderfer
{"title":"Evaluating the impact of modeling choices on the performance of integrated genetic and clinical models.","authors":"Theodore J Morley, Drew Willimitis, Michael Ripperger, Hyunjoon Lee, Yu Zhou, Lide Han, Jooeun Kang, William U Meyerson, Jordan W Smoller, Karmel W Choi, Colin G Walsh, Douglas M Ruderfer","doi":"10.1016/j.gim.2024.101353","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results. We performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) with genetic data to understand which decisions may affect performance.</p><p><strong>Methods: </strong>Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from two large independent health systems and polygenic risk scores (PRS) were generated across all patients of European ancestry with genetic data in the corresponding biobanks. Crohn's disease was studied based on its substantial genetic component, established EHR-based definition, and sufficient prevalence for training and testing. We investigated the impact of choices regarding PRS integration method, training sample, model complexity, and performance metrics.</p><p><strong>Results: </strong>Overall, our results show that including PRS resulted in higher performance but this gain was only robust in situations with limited clinical information. We find consistent performance increases from more compute-intensive models such as random forest, but the impact of other decisions vary by site.</p><p><strong>Conclusion: </strong>This work highlights the importance of considering methodological decision points in interpreting the impact of PRS on prediction performance in clinical models.</p>","PeriodicalId":12717,"journal":{"name":"Genetics in Medicine","volume":" ","pages":"101353"},"PeriodicalIF":6.6000,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genetics in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.gim.2024.101353","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The value of genetic information for improving the performance of clinical risk prediction models has yielded variable conclusions. Many methodological decisions have the potential to contribute to differential results. We performed multiple modeling experiments integrating clinical and demographic data from electronic health records (EHR) with genetic data to understand which decisions may affect performance.
Methods: Clinical data in the form of structured diagnostic codes, medications, procedural codes, and demographics were extracted from two large independent health systems and polygenic risk scores (PRS) were generated across all patients of European ancestry with genetic data in the corresponding biobanks. Crohn's disease was studied based on its substantial genetic component, established EHR-based definition, and sufficient prevalence for training and testing. We investigated the impact of choices regarding PRS integration method, training sample, model complexity, and performance metrics.
Results: Overall, our results show that including PRS resulted in higher performance but this gain was only robust in situations with limited clinical information. We find consistent performance increases from more compute-intensive models such as random forest, but the impact of other decisions vary by site.
Conclusion: This work highlights the importance of considering methodological decision points in interpreting the impact of PRS on prediction performance in clinical models.
期刊介绍:
Genetics in Medicine (GIM) is the official journal of the American College of Medical Genetics and Genomics. The journal''s mission is to enhance the knowledge, understanding, and practice of medical genetics and genomics through publications in clinical and laboratory genetics and genomics, including ethical, legal, and social issues as well as public health.
GIM encourages research that combats racism, includes diverse populations and is written by authors from diverse and underrepresented backgrounds.