Karl Keat, Rasika Venkatesh, Yidi Huang, Rachit Kumar, Sony Tuteja, Katrin Sangkuhl, Binglan Li, Li Gong, Michelle Whirl-Carrillo, Teri E Klein, Marylyn D Ritchie, Dokyoon Kim
Pharmacogenetics represents one of the most promising areas of precision medicine, with several guidelines for genetics-guided treatment ready for clinical use. Despite this, implementation has been slow, with few health systems incorporating the technology into their standard of care. One major barrier to uptake is the lack of education and awareness of pharmacogenetics among clinicians and patients. The introduction of large language models (LLMs) like GPT-4 has raised the possibility of medical chatbots that deliver timely information to clinicians, patients, and researchers with a simple interface. Although state-of-the-art LLMs have shown impressive performance at advanced tasks like medical licensing exams, in practice they still often provide false information, which is particularly hazardous in a clinical context. To quantify the extent of this issue, we developed a series of automated and expert-scored tests to evaluate the performance of chatbots in answering pharmacogenetics questions from the perspective of clinicians, patients, and researchers. We applied this benchmark to state-of-the-art LLMs and found that newer models like GPT-4o greatly outperform their predecessors, but still fall short of the standards required for clinical use. Our benchmark will be a valuable public resource for subsequent developments in this space as we work towards better clinical AI for pharmacogenetics.
{"title":"PGxQA: A Resource for Evaluating LLM Performance for Pharmacogenomic QA Tasks.","authors":"Karl Keat, Rasika Venkatesh, Yidi Huang, Rachit Kumar, Sony Tuteja, Katrin Sangkuhl, Binglan Li, Li Gong, Michelle Whirl-Carrillo, Teri E Klein, Marylyn D Ritchie, Dokyoon Kim","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Pharmacogenetics represents one of the most promising areas of precision medicine, with several guidelines for genetics-guided treatment ready for clinical use. Despite this, implementation has been slow, with few health systems incorporating the technology into their standard of care. One major barrier to uptake is the lack of education and awareness of pharmacogenetics among clinicians and patients. The introduction of large language models (LLMs) like GPT-4 has raised the possibility of medical chatbots that deliver timely information to clinicians, patients, and researchers with a simple interface. Although state-of-the-art LLMs have shown impressive performance at advanced tasks like medical licensing exams, in practice they still often provide false information, which is particularly hazardous in a clinical context. To quantify the extent of this issue, we developed a series of automated and expert-scored tests to evaluate the performance of chatbots in answering pharmacogenetics questions from the perspective of clinicians, patients, and researchers. We applied this benchmark to state-of-the-art LLMs and found that newer models like GPT-4o greatly outperform their predecessors, but still fall short of the standards required for clinical use. Our benchmark will be a valuable public resource for subsequent developments in this space as we work towards better clinical AI for pharmacogenetics.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"229-246"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734741/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jessica L G Winters, Jacqueline A Piekos, Jacklyn N Hellwege, Ozan Dikilitas, Iftikhar J Kullo, Daniel J Schaid, Todd L Edwards, Digna R Velez Edwards
Uterine leiomyomata, or fibroids, are common gynecological tumors causing pelvic and menstrual symptoms that can negatively affect quality of life and child-bearing desires. As fibroids grow, symptoms can intensify and lead to invasive treatments that are less likely to preserve fertility. Identifying individuals at highest risk for fibroids can aid in access to earlier diagnoses. Polygenic risk scores (PRS) quantify genetic risk to identify those at highest risk for disease. Utilizing the PRS software PRS-CSx and publicly available genome-wide association study (GWAS) summary statistics from FinnGen and Biobank Japan, we constructed a multi-ancestry (META) PRS for fibroids. We validated the META PRS in two cross-ancestry cohorts. In the cross-ancestry Electronic Medical Record and Genomics (eMERGE) Network cohort, the META PRS was significantly associated with fibroid status and exhibited 1.11 greater odds for fibroids per standard deviation increase in PRS (95% confidence interval [CI]: 1.05 - 1.17, p = 5.21x10-5). The META PRS was validated in two BioVU cohorts: one using ICD9/ICD10 codes and one requiring imaging confirmation of fibroid status. In the ICD cohort, a standard deviation increase in the META PRS increased the odds of fibroids by 1.23 (95% CI: 1.15 - 1.32, p = 9.68x10-9), while in the imaging cohort, the odds increased by 1.26 (95% CI: 1.18 - 1.35, p = 2.40x10-11). We subsequently constructed single ancestry PRS for FinnGen (European ancestry [EUR]) and Biobank Japan (East Asian ancestry [EAS]) using PRS-CS and discovered a nominally significant association in the eMERGE cohort within fibroids and EAS PRS but not EUR PRS (95% CI: 1.09 - 1.20, p = 1.64x10-7). These findings highlight the strong predictive power of multi-ancestry PRS over single ancestry PRS. This study underscores the necessity of diverse population inclusion in genetic research to ensure precision medicine benefits all individuals equitably.
{"title":"Constructing a multi-ancestry polygenic risk score for uterine fibroids using publicly available data highlights need for inclusive genetic research.","authors":"Jessica L G Winters, Jacqueline A Piekos, Jacklyn N Hellwege, Ozan Dikilitas, Iftikhar J Kullo, Daniel J Schaid, Todd L Edwards, Digna R Velez Edwards","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Uterine leiomyomata, or fibroids, are common gynecological tumors causing pelvic and menstrual symptoms that can negatively affect quality of life and child-bearing desires. As fibroids grow, symptoms can intensify and lead to invasive treatments that are less likely to preserve fertility. Identifying individuals at highest risk for fibroids can aid in access to earlier diagnoses. Polygenic risk scores (PRS) quantify genetic risk to identify those at highest risk for disease. Utilizing the PRS software PRS-CSx and publicly available genome-wide association study (GWAS) summary statistics from FinnGen and Biobank Japan, we constructed a multi-ancestry (META) PRS for fibroids. We validated the META PRS in two cross-ancestry cohorts. In the cross-ancestry Electronic Medical Record and Genomics (eMERGE) Network cohort, the META PRS was significantly associated with fibroid status and exhibited 1.11 greater odds for fibroids per standard deviation increase in PRS (95% confidence interval [CI]: 1.05 - 1.17, p = 5.21x10-5). The META PRS was validated in two BioVU cohorts: one using ICD9/ICD10 codes and one requiring imaging confirmation of fibroid status. In the ICD cohort, a standard deviation increase in the META PRS increased the odds of fibroids by 1.23 (95% CI: 1.15 - 1.32, p = 9.68x10-9), while in the imaging cohort, the odds increased by 1.26 (95% CI: 1.18 - 1.35, p = 2.40x10-11). We subsequently constructed single ancestry PRS for FinnGen (European ancestry [EUR]) and Biobank Japan (East Asian ancestry [EAS]) using PRS-CS and discovered a nominally significant association in the eMERGE cohort within fibroids and EAS PRS but not EUR PRS (95% CI: 1.09 - 1.20, p = 1.64x10-7). These findings highlight the strong predictive power of multi-ancestry PRS over single ancestry PRS. This study underscores the necessity of diverse population inclusion in genetic research to ensure precision medicine benefits all individuals equitably.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"268-280"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a comparative study on the performance of two popular open-source large language models for early prediction of sepsis: Llama-3 8B and Mixtral 8x7B. The primary goal was to determine whether a smaller model could achieve comparable predictive accuracy to a significantly larger model in the context of sepsis prediction using clinical data.Our proposed LLM-based sepsis prediction system, COMPOSER-LLM, enhances the previously published COMPOSER model, which utilizes structured EHR data to generate hourly sepsis risk scores. The new system incorporates an LLM-based approach to extract sepsis-related clinical signs and symptoms from unstructured clinical notes. For scores falling within high-uncertainty prediction regions, particularly those near the decision threshold, the system uses the LLM to draw additional clinical context from patient notes; thereby enhancing the model's predictive accuracy in challenging diagnostic scenarios.A total of 2,074 patient encounters admitted to the Emergency Department at two hospitals within the University of California San Diego Health system were used for model evaluation in this study. Our findings reveal that the Llama-3 8B model based system (COMPOSER-LLMLlama) achieved a sensitivity of 70.3%, positive predictive value (PPV) of 32.5%, F-1 score of 44.4% and false alarms per patient hour (FAPH) of 0.0194, closely matching the performance of the larger Mixtral 8x7B model based system (COMPOSER-LLMmixtral) which achieved a sensitivity of 72.1%, PPV of 31.9%, F-1 score of 44.2% and FAPH of 0.020. When prospectively evaluated, COMPOSER-LLMLlama demonstrated similar performance to the COMPOSER-LLMmixtral pipeline, with a sensitivity of 68.7%, PPV of 36.6%, F-1 score of 47.7% and FAPH of 0.019 vs. sensitivity of 70.5%, PPV of 36.3%, F-1 score of 47.9% and FAPH of 0.020. This result indicates that, for extraction of clinical signs and symptoms from unstructured clinical notes to enable early prediction of sepsis, the Llama-3 generation of smaller language models can perform as effectively and more efficiently than larger models. This finding has significant implications for healthcare settings with limited resources.
{"title":"A Prospective Comparison of Large Language Models for Early Prediction of Sepsis.","authors":"Supreeth P Shashikumar, Shamim Nemati","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present a comparative study on the performance of two popular open-source large language models for early prediction of sepsis: Llama-3 8B and Mixtral 8x7B. The primary goal was to determine whether a smaller model could achieve comparable predictive accuracy to a significantly larger model in the context of sepsis prediction using clinical data.Our proposed LLM-based sepsis prediction system, COMPOSER-LLM, enhances the previously published COMPOSER model, which utilizes structured EHR data to generate hourly sepsis risk scores. The new system incorporates an LLM-based approach to extract sepsis-related clinical signs and symptoms from unstructured clinical notes. For scores falling within high-uncertainty prediction regions, particularly those near the decision threshold, the system uses the LLM to draw additional clinical context from patient notes; thereby enhancing the model's predictive accuracy in challenging diagnostic scenarios.A total of 2,074 patient encounters admitted to the Emergency Department at two hospitals within the University of California San Diego Health system were used for model evaluation in this study. Our findings reveal that the Llama-3 8B model based system (COMPOSER-LLMLlama) achieved a sensitivity of 70.3%, positive predictive value (PPV) of 32.5%, F-1 score of 44.4% and false alarms per patient hour (FAPH) of 0.0194, closely matching the performance of the larger Mixtral 8x7B model based system (COMPOSER-LLMmixtral) which achieved a sensitivity of 72.1%, PPV of 31.9%, F-1 score of 44.2% and FAPH of 0.020. When prospectively evaluated, COMPOSER-LLMLlama demonstrated similar performance to the COMPOSER-LLMmixtral pipeline, with a sensitivity of 68.7%, PPV of 36.6%, F-1 score of 47.7% and FAPH of 0.019 vs. sensitivity of 70.5%, PPV of 36.3%, F-1 score of 47.9% and FAPH of 0.020. This result indicates that, for extraction of clinical signs and symptoms from unstructured clinical notes to enable early prediction of sepsis, the Llama-3 generation of smaller language models can perform as effectively and more efficiently than larger models. This finding has significant implications for healthcare settings with limited resources.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"109-120"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649013/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large Language Models (LLMs) have shown significant promise across a wide array of fields, including biomedical research, but face notable limitations in their current applications. While they offer a new paradigm for data analysis and hypothesis generation, their efficacy in computational biology trails other applications such as natural language processing. This workshop addresses the state of the art in LLMs, discussing their challenges and the potential for future development tailored to computational biology. Key issues include difficulties in validating LLM outputs, proprietary model limitations, and the need for expertise in critical evaluation of model failure modes.
{"title":"Leveraging Foundational Models in Computational Biology: Validation, Understanding, and Innovation.","authors":"Brett Beaulieu-Jones, Steven Brenner","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Large Language Models (LLMs) have shown significant promise across a wide array of fields, including biomedical research, but face notable limitations in their current applications. While they offer a new paradigm for data analysis and hypothesis generation, their efficacy in computational biology trails other applications such as natural language processing. This workshop addresses the state of the art in LLMs, discussing their challenges and the potential for future development tailored to computational biology. Key issues include difficulties in validating LLM outputs, proprietary model limitations, and the need for expertise in critical evaluation of model failure modes.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"702-705"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andre Luis Garao Rico, Nicole Palmiero, Marylyn D Ritchie, Molly A Hall
Gene-environment interaction (GxE) studies provide insights into the interplay between genetics and the environment but often overlook multiple environmental factors' synergistic effects. This study encompasses the use of environment by environment interaction (ExE) studies to explore interactions among environmental factors affecting lipid phenotypes (e.g., HDL, LDL, and total cholesterol, and triglycerides), which are crucial for disease risk assessment. We developed a novel curated knowledge base, GE.db, integrating genomic and exposomic interactions. In this study, we filtered NHANES exposure variables (available 1999-2018) to identify significant ExE using GE.db. From 101,316 participants and 77 exposures, we identified 263 statistically significant interactions (FDR p < 0.1) in discovery and replication datasets, with 21 interactions significant for HDL-C (Bonferroni p < 0.05). Notable interactions included docosapentaenoic acid (22:5n-3) (DPA) - arachidic acid (20:0), stearic acid (18:0) - arachidic acid (20:0), and blood 2,5-dimethyfuran - blood benzene associated with HDL-C levels. These findings underscore GE.db's role in enhancing -omics research efficiency and highlight the complex impact of environmental exposures on lipid metabolism, informing future health strategies.
基因-环境相互作用(GxE)研究提供了遗传与环境相互作用的见解,但往往忽视了多种环境因素的协同效应。本研究包括利用环境相互作用(ExE)研究来探索影响脂质表型的环境因素之间的相互作用(例如,HDL、LDL、总胆固醇和甘油三酯),这对疾病风险评估至关重要。我们开发了一个新的知识库,GE.db,整合了基因组和暴露体的相互作用。在本研究中,我们筛选了NHANES暴露变量(1999-2018年可用),以使用GE.db识别显著的ExE。从101316名参与者和77次暴露中,我们在发现和复制数据集中确定了263个具有统计学意义的相互作用(FDR p < 0.1),其中21个相互作用对HDL-C具有统计学意义(Bonferroni p < 0.05)。显著的相互作用包括二十二碳五烯酸(22:5n-3) (DPA) -花生酸(20:0)、硬脂酸(18:0)-花生酸(20:0)和与HDL-C水平相关的血液2,5-二甲呋喃-血苯。这些发现强调了GE.db在提高组学研究效率方面的作用,并强调了环境暴露对脂质代谢的复杂影响,为未来的健康策略提供了信息。
{"title":"Integrated exposomic analysis of lipid phenotypes: Leveraging GE.db in environment by environment interaction studies.","authors":"Andre Luis Garao Rico, Nicole Palmiero, Marylyn D Ritchie, Molly A Hall","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Gene-environment interaction (GxE) studies provide insights into the interplay between genetics and the environment but often overlook multiple environmental factors' synergistic effects. This study encompasses the use of environment by environment interaction (ExE) studies to explore interactions among environmental factors affecting lipid phenotypes (e.g., HDL, LDL, and total cholesterol, and triglycerides), which are crucial for disease risk assessment. We developed a novel curated knowledge base, GE.db, integrating genomic and exposomic interactions. In this study, we filtered NHANES exposure variables (available 1999-2018) to identify significant ExE using GE.db. From 101,316 participants and 77 exposures, we identified 263 statistically significant interactions (FDR p < 0.1) in discovery and replication datasets, with 21 interactions significant for HDL-C (Bonferroni p < 0.05). Notable interactions included docosapentaenoic acid (22:5n-3) (DPA) - arachidic acid (20:0), stearic acid (18:0) - arachidic acid (20:0), and blood 2,5-dimethyfuran - blood benzene associated with HDL-C levels. These findings underscore GE.db's role in enhancing -omics research efficiency and highlight the complex impact of environmental exposures on lipid metabolism, informing future health strategies.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"535-550"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11694901/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapid advancement of artificial intelligence and machine learning (AI/ML) technologies in healthcare presents significant opportunities for enhancing patient care through innovative diagnostic tools, monitoring systems, and personalized treatment plans. However, these innovative advancements might result in regulatory challenges given recent Supreme Court decisions that impact the authority of regulatory agencies like the Food and Drug Administration (FDA). This paper explores the implications of regulatory uncertainty for the healthcare industry related to balancing innovation in biotechnology and biocomputing with ensuring regulatory uniformity and patient safety. We examine key Supreme Court cases, including Loper Bright Enterprises v. Raimondo, Relentless, Inc. v. Department of Commerce, and Corner Post, Inc. v. Board of Governors of the Federal Reserve System, and their impact on the Chevron doctrine. We also discuss other relevant cases to highlight shifts in judicial approaches to agency deference and regulatory authority that might affect how science is handled in regulatory spaces, including how biocomputing and other health sciences are governed, how scientific facts are applied in policymaking, and how scientific expertise guides decision making. Through a detailed analysis, we assess the potential impact of regulatory uncertainty in healthcare. Additionally, we provide recommendations for the medical community on navigating these challenges.
{"title":"Implications of An Evolving Regulatory Landscape on the Development of AI and ML in Medicine.","authors":"Nicole Rincon, Sara Gerke, Jennifer K Wagner","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The rapid advancement of artificial intelligence and machine learning (AI/ML) technologies in healthcare presents significant opportunities for enhancing patient care through innovative diagnostic tools, monitoring systems, and personalized treatment plans. However, these innovative advancements might result in regulatory challenges given recent Supreme Court decisions that impact the authority of regulatory agencies like the Food and Drug Administration (FDA). This paper explores the implications of regulatory uncertainty for the healthcare industry related to balancing innovation in biotechnology and biocomputing with ensuring regulatory uniformity and patient safety. We examine key Supreme Court cases, including Loper Bright Enterprises v. Raimondo, Relentless, Inc. v. Department of Commerce, and Corner Post, Inc. v. Board of Governors of the Federal Reserve System, and their impact on the Chevron doctrine. We also discuss other relevant cases to highlight shifts in judicial approaches to agency deference and regulatory authority that might affect how science is handled in regulatory spaces, including how biocomputing and other health sciences are governed, how scientific facts are applied in policymaking, and how scientific expertise guides decision making. Through a detailed analysis, we assess the potential impact of regulatory uncertainty in healthcare. Additionally, we provide recommendations for the medical community on navigating these challenges.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"154-166"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649012/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jakob Woerner, Thomas Westbrook, Seokho Jeong, Manu Shivakumar, Allison R Greenplate, Sokratis A Apostolidis, Seunggeun Lee, Yonghyun Nam, Dokyoon Kim
Inflammatory bowel disease (IBD), encompassing Crohn's disease (CD) and ulcerative colitis (UC), has a significant genetic component and is increasingly prevalent due to environmental factors. Current polygenic risk scores (PRS) have limited predictive power and cannot inform time of symptom onset. Circulating proteomics profiling offers a novel, non-invasive approach for understanding the inflammatory state of complex diseases, enabling the creation of proteomic risk scores (ProRS). This study utilizes data from 51,772 individuals in the UK Biobank to evaluate the unique and combined contributions of PRS and ProRS to IBD risk prediction. We developed ProRS models for CD and UC, assessed their predictive performance over time, and examined the benefits of integrating PRS and ProRS for enhanced risk stratification. Our findings are the first to demonstrate that combining genetic and proteomic data improves IBD incidence prediction, with ProRS providing time-sensitive predictions and PRS offering additional long-term predictive value. We also show that the ProRS achieves better predictive performance among individuals with high PRS. This integrated approach highlights the potential for multi-omic data in precision medicine for IBD.
{"title":"Plasma protein-based and polygenic risk scores serve complementary roles in predicting inflammatory bowel disease.","authors":"Jakob Woerner, Thomas Westbrook, Seokho Jeong, Manu Shivakumar, Allison R Greenplate, Sokratis A Apostolidis, Seunggeun Lee, Yonghyun Nam, Dokyoon Kim","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Inflammatory bowel disease (IBD), encompassing Crohn's disease (CD) and ulcerative colitis (UC), has a significant genetic component and is increasingly prevalent due to environmental factors. Current polygenic risk scores (PRS) have limited predictive power and cannot inform time of symptom onset. Circulating proteomics profiling offers a novel, non-invasive approach for understanding the inflammatory state of complex diseases, enabling the creation of proteomic risk scores (ProRS). This study utilizes data from 51,772 individuals in the UK Biobank to evaluate the unique and combined contributions of PRS and ProRS to IBD risk prediction. We developed ProRS models for CD and UC, assessed their predictive performance over time, and examined the benefits of integrating PRS and ProRS for enhanced risk stratification. Our findings are the first to demonstrate that combining genetic and proteomic data improves IBD incidence prediction, with ProRS providing time-sensitive predictions and PRS offering additional long-term predictive value. We also show that the ProRS achieves better predictive performance among individuals with high PRS. This integrated approach highlights the potential for multi-omic data in precision medicine for IBD.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"522-534"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649021/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sutanu Nandi, Yuehua Zhu, Lucas A Gillenwater, Marc Subirana-Granés, Haoyu Zhang, Negar Janani, Casey Greene, Milton Pividori, Maria Chikina, James C Costello
Down syndrome (DS), caused by the triplication of chromosome 21 (T21), is a prevalent genetic disorder with a higher incidence of obesity. Traditional approaches have struggled to differentiate T21-specific molecular dysregulation from general obesity-related processes. This study introduces the omni-PLIER framework, combining the Pathway-Level Information ExtractoR (PLIER) with the omnigenic model, to uncover molecular mechanisms underlying obesity in DS. The PLIER framework aligns gene expression data with biological pathways, facilitating the identification of relevant molecular patterns. Using RNA sequencing data from the Human Trisome Project, omni-PLIER identified latent variables (LVs) significantly associated with both T21 and body mass index (BMI). Elastic net regression and causal mediation analysis revealed LVs mediating the effect of karyotype on BMI. Notably, LVs involving glutathione peroxidase-1 (GPX1) and MCL1 apoptosis regulator, BCL2 family members emerged as crucial mediators. These findings provide insights into the molecular interplay between DS and obesity. The omni-PLIER model offers a robust methodological advancement for dissecting complex genetic disorders, with implications for understanding obesity-related processes in both DS and the general population.
唐氏综合症(DS)是由21号染色体三倍(T21)引起的,是一种普遍存在的遗传性疾病,肥胖发病率较高。传统方法很难区分t21特异性分子失调与一般肥胖相关过程。本研究引入omni-PLIER框架,结合通路水平信息提取器(pathway level Information ExtractoR, PLIER)和omnigenic模型,揭示DS肥胖的分子机制。PLIER框架将基因表达数据与生物学途径相结合,促进了相关分子模式的识别。利用人类三体计划的RNA测序数据,omni-PLIER鉴定出与T21和体重指数(BMI)显著相关的潜在变量(lv)。弹性网回归和因果中介分析表明LVs介导核型对BMI的影响。值得注意的是,涉及谷胱甘肽过氧化物酶-1 (GPX1)和MCL1细胞凋亡调节因子、BCL2家族成员的lv成为关键的介质。这些发现为DS和肥胖之间的分子相互作用提供了见解。omni-PLIER模型为解剖复杂的遗传疾病提供了强有力的方法进步,对理解退行性痴呆和普通人群中肥胖相关过程具有重要意义。
{"title":"A Pathway-Level Information ExtractoR (PLIER) framework to gain mechanistic insights into obesity in Down syndrome.","authors":"Sutanu Nandi, Yuehua Zhu, Lucas A Gillenwater, Marc Subirana-Granés, Haoyu Zhang, Negar Janani, Casey Greene, Milton Pividori, Maria Chikina, James C Costello","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Down syndrome (DS), caused by the triplication of chromosome 21 (T21), is a prevalent genetic disorder with a higher incidence of obesity. Traditional approaches have struggled to differentiate T21-specific molecular dysregulation from general obesity-related processes. This study introduces the omni-PLIER framework, combining the Pathway-Level Information ExtractoR (PLIER) with the omnigenic model, to uncover molecular mechanisms underlying obesity in DS. The PLIER framework aligns gene expression data with biological pathways, facilitating the identification of relevant molecular patterns. Using RNA sequencing data from the Human Trisome Project, omni-PLIER identified latent variables (LVs) significantly associated with both T21 and body mass index (BMI). Elastic net regression and causal mediation analysis revealed LVs mediating the effect of karyotype on BMI. Notably, LVs involving glutathione peroxidase-1 (GPX1) and MCL1 apoptosis regulator, BCL2 family members emerged as crucial mediators. These findings provide insights into the molecular interplay between DS and obesity. The omni-PLIER model offers a robust methodological advancement for dissecting complex genetic disorders, with implications for understanding obesity-related processes in both DS and the general population.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"412-425"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olivia Wen, Samuel C Wolff, Wayne Stallaert, Didong Li, Jeremy E Purvis, Tarek M Zikry
CDK4/6 inhibitors such as palbociclib block cell cycle progression and improve outcomes for many ER+/HER2- breast cancer patients. Unfortunately, many patients are initially resistant to the drug or develop resistance over time in part due to heterogeneity among individual tumor cells. To better understand these mechanisms of resistance, we used multiplex, single-cell imaging to profile cell cycle proteins in ER+ breast tumor cells under increasing palbociclib concentrations. We then applied spherical principal component analysis (SPCA), a dimensionality reduction method that leverages the inherently cyclical nature of the high-dimensional imaging data, to look for changes in cell cycle behavior in resistant cells. SPCA characterizes data as a hypersphere and provides a framework for visualizing and quantifying differences in cell cycles across treatment-induced perturbations. The hypersphere representations revealed shifts in the mean cell state and population heterogeneity. SPCA validated expected trends of CDK4/6 inhibitor response such as decreased expression of proliferation markers (Ki67, pRB), but also revealed potential mechanisms of resistance including increased expression of cyclin D1 and CDK2. Understanding the molecular mechanisms that allow treated tumor cells to evade arrest is critical for identifying targets of future therapies. Ultimately, we seek to further SPCA as a tool of precision medicine, targeting treatments by individual tumors, and extending this computational framework to interpret other cyclical biological processes represented by high-dimensional data.
{"title":"Spherical Manifolds Capture Drug-Induced Changes in Tumor Cell Cycle Behavior.","authors":"Olivia Wen, Samuel C Wolff, Wayne Stallaert, Didong Li, Jeremy E Purvis, Tarek M Zikry","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>CDK4/6 inhibitors such as palbociclib block cell cycle progression and improve outcomes for many ER+/HER2- breast cancer patients. Unfortunately, many patients are initially resistant to the drug or develop resistance over time in part due to heterogeneity among individual tumor cells. To better understand these mechanisms of resistance, we used multiplex, single-cell imaging to profile cell cycle proteins in ER+ breast tumor cells under increasing palbociclib concentrations. We then applied spherical principal component analysis (SPCA), a dimensionality reduction method that leverages the inherently cyclical nature of the high-dimensional imaging data, to look for changes in cell cycle behavior in resistant cells. SPCA characterizes data as a hypersphere and provides a framework for visualizing and quantifying differences in cell cycles across treatment-induced perturbations. The hypersphere representations revealed shifts in the mean cell state and population heterogeneity. SPCA validated expected trends of CDK4/6 inhibitor response such as decreased expression of proliferation markers (Ki67, pRB), but also revealed potential mechanisms of resistance including increased expression of cyclin D1 and CDK2. Understanding the molecular mechanisms that allow treated tumor cells to evade arrest is critical for identifying targets of future therapies. Ultimately, we seek to further SPCA as a tool of precision medicine, targeting treatments by individual tumors, and extending this computational framework to interpret other cyclical biological processes represented by high-dimensional data.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"473-487"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11687821/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jared M Phillips, Julie A Schneider, David A Bennett, Paul K Crane, Shannon L Risacher, Andrew J Saykin, Logan C Dumitrescu, Timothy J Hohman
Alzheimer's disease (AD) is a polygenic disorder with a prolonged prodromal phase, complicating early diagnosis. Recent research indicates that increased astrocyte reactivity is associated with a higher risk of pathogenic tau accumulation, particularly in amyloid-positive individuals. However, few clinical tools are available to predict which individuals are likely to exhibit elevated astrocyte activation and, consequently, be susceptible to hyperphosphorylated tau-induced neurodegeneration. Polygenic risk scores (PRS) aggregate the effects of multiple genetic loci to provide a single, continuous metric representing an individual's genetic risk for a specific phenotype. We hypothesized that an astrocyte activation PRS could aid in the early detection of faster clinical decline. Therefore, we constructed an astrocyte activation PRS and assessed its predictive value for cognitive decline and AD biomarkers (i.e., cerebrospinal fluid [CSF] levels of Aβ1-42, total tau, and p-tau181) in a cohort of 791 elderly individuals. The astrocyte activation PRS showed significant main effects on cross-sectional memory (β = -0.07, p = 0.03) and longitudinal executive function (β = -0.01, p = 0.03). Additionally, the PRS interacted with amyloid positivity (p.intx = 0.02), whereby indicating that amyloid burden modifies the association between the PRS and annual rate of language decline. Furthermore, the PRS was negatively associated with CSF Aβ1-42 levels (β = -3.4, p = 0.07) and interacted with amyloid status, such that amyloid burden modifies the association between the PRS and CSF phosphorylated tau levels (p.intx = 0.08). These findings suggest that an astrocyte activation PRS could be a valuable tool for early disease risk prediction, potentially enabling intervention during the interval between pathogenic amyloid and tau accumulation.
阿尔茨海默病(AD)是一种多基因疾病,前驱期延长,使早期诊断复杂化。最近的研究表明,星形胶质细胞反应性增加与致病性tau积聚的高风险相关,特别是在淀粉样蛋白阳性个体中。然而,很少有临床工具可用于预测哪些个体可能表现出升高的星形胶质细胞激活,从而易受过度磷酸化tau诱导的神经变性的影响。多基因风险评分(PRS)综合了多个基因位点的影响,提供了一个单一的、连续的指标,代表了个体对特定表型的遗传风险。我们假设星形胶质细胞激活PRS可以帮助早期发现更快的临床衰退。因此,我们在791名老年人中构建了星形胶质细胞激活PRS,并评估了其对认知能力下降和AD生物标志物(即脑脊液中a β1-42、总tau和p-tau181)的预测价值。星形胶质细胞激活对横截面记忆(β = -0.07, p = 0.03)和纵向执行功能(β = -0.01, p = 0.03)有显著的主要影响。此外,PRS与淀粉样蛋白阳性相互作用(p.intx = 0.02),这表明淀粉样蛋白负担改变了PRS与年语言衰退率之间的关系。此外,PRS与脑脊液Aβ1-42水平呈负相关(β = -3.4, p = 0.07),并与淀粉样蛋白状态相互作用,因此淀粉样蛋白负荷改变了PRS与脑脊液磷酸化tau水平之间的关系(p.intx = 0.08)。这些发现表明星形胶质细胞激活PRS可能是早期疾病风险预测的一个有价值的工具,可能在致病性淀粉样蛋白和tau积累之间的间隔期间进行干预。
{"title":"Astrocyte Reactivity Polygenic Risk Score May Predict Cognitive Decline in Alzheimer's Disease.","authors":"Jared M Phillips, Julie A Schneider, David A Bennett, Paul K Crane, Shannon L Risacher, Andrew J Saykin, Logan C Dumitrescu, Timothy J Hohman","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a polygenic disorder with a prolonged prodromal phase, complicating early diagnosis. Recent research indicates that increased astrocyte reactivity is associated with a higher risk of pathogenic tau accumulation, particularly in amyloid-positive individuals. However, few clinical tools are available to predict which individuals are likely to exhibit elevated astrocyte activation and, consequently, be susceptible to hyperphosphorylated tau-induced neurodegeneration. Polygenic risk scores (PRS) aggregate the effects of multiple genetic loci to provide a single, continuous metric representing an individual's genetic risk for a specific phenotype. We hypothesized that an astrocyte activation PRS could aid in the early detection of faster clinical decline. Therefore, we constructed an astrocyte activation PRS and assessed its predictive value for cognitive decline and AD biomarkers (i.e., cerebrospinal fluid [CSF] levels of Aβ1-42, total tau, and p-tau181) in a cohort of 791 elderly individuals. The astrocyte activation PRS showed significant main effects on cross-sectional memory (β = -0.07, p = 0.03) and longitudinal executive function (β = -0.01, p = 0.03). Additionally, the PRS interacted with amyloid positivity (p.intx = 0.02), whereby indicating that amyloid burden modifies the association between the PRS and annual rate of language decline. Furthermore, the PRS was negatively associated with CSF Aβ1-42 levels (β = -3.4, p = 0.07) and interacted with amyloid status, such that amyloid burden modifies the association between the PRS and CSF phosphorylated tau levels (p.intx = 0.08). These findings suggest that an astrocyte activation PRS could be a valuable tool for early disease risk prediction, potentially enabling intervention during the interval between pathogenic amyloid and tau accumulation.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"488-503"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11752824/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}