Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing最新文献_第3页

Social risk factors and cardiovascular risk in obstructive sleep apnea: a systematic assessment of clinical predictors in community health centers. 阻塞性睡眠呼吸暂停患者的社会风险因素和心血管风险：社区卫生中心临床预测因素的系统评估

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0023

Diego R Mazzotti, Ryan Urbanowicz, Marta Jankowska

We leveraged electronic health record (EHR) data from the Accelerating Data Value Across a National Community Health Center Network (ADVANCE) Clinical Research Network (CRN) to identify social risk factor clusters, assess their association with obstructive sleep apnea (OSA), and determine relevant clinical predictors of cardiovascular (CV) outcomes among those experiencing OSA. Geographically informed social indicators were used to define social risk factor clusters via latent class analysis. EHR-wide diagnoses were used as predictors of 5-year incidence of major adverse CV events (MACE) using STREAMLINE, an end-to-end rigorous and interpretable automated machine learning pipeline. Analyses among over 1.4 million individuals revealed three major social risk factor clusters: lowest (35.7%), average (43.6%) and highest (22.7%) social burden. In adjusted analyses, those experiencing highest social burden were less likely to have received a diagnosis of OSA when compared to those experiencing lowest social burden (OR [95%CI]=0.85[0.82-0.88]). Among those with OSA and free of prior CV diseases (N=4,405), performance of predicting incident MACE reached a ROC-AUC of 0.70 [0.03] overall but varied when assessed within each social risk factor cluster. Feature importance also revealed that different clinical factors might explain predictions among each cluster. Results suggest relevant health disparities in the diagnosis of OSA and across clinical predictors of CV diseases among those with OSA, across social risk factor clusters, indicating that tailored interventions geared toward minimizing these disparities are warranted.

我们利用来自全国社区卫生中心网络（ADVANCE）临床研究网络（CRN）加速数据价值的电子健康记录（EHR）数据来识别社会风险因素集群，评估其与阻塞性睡眠呼吸暂停（OSA）的关联，并确定OSA患者心血管（CV）结局的相关临床预测因素。通过潜在类别分析，使用地理信息社会指标来定义社会风险因素集群。使用流程化（一种端到端严格且可解释的自动化机器学习管道），将ehr全范围诊断用作5年主要不良CV事件（MACE）发生率的预测因子。对140多万人的分析显示，社会负担最低（35.7%）、平均（43.6%）和最高（22.7%）是三个主要的社会风险因素集群。在调整分析中，与社会负担最低的患者相比，社会负担最重的患者被诊断为OSA的可能性更小（OR [95%CI]=0.85[0.82-0.88]）。在患有OSA且无既往CV疾病的患者中（N=4,405），预测MACE事件的ROC-AUC总体达到0.70[0.03]，但在每个社会风险因素集群内评估时存在差异。特征重要性也揭示了不同的临床因素可能解释每个集群之间的预测。结果表明，在OSA患者中，OSA的诊断和心血管疾病的临床预测指标存在相关的健康差异，这表明有必要采取针对性的干预措施，以尽量减少这些差异。

{"title":"Social risk factors and cardiovascular risk in obstructive sleep apnea: a systematic assessment of clinical predictors in community health centers.","authors":"Diego R Mazzotti, Ryan Urbanowicz, Marta Jankowska","doi":"10.1142/9789819807024_0023","DOIUrl":"10.1142/9789819807024_0023","url":null,"abstract":"We leveraged electronic health record (EHR) data from the Accelerating Data Value Across a National Community Health Center Network (ADVANCE) Clinical Research Network (CRN) to identify social risk factor clusters, assess their association with obstructive sleep apnea (OSA), and determine relevant clinical predictors of cardiovascular (CV) outcomes among those experiencing OSA. Geographically informed social indicators were used to define social risk factor clusters via latent class analysis. EHR-wide diagnoses were used as predictors of 5-year incidence of major adverse CV events (MACE) using STREAMLINE, an end-to-end rigorous and interpretable automated machine learning pipeline. Analyses among over 1.4 million individuals revealed three major social risk factor clusters: lowest (35.7%), average (43.6%) and highest (22.7%) social burden. In adjusted analyses, those experiencing highest social burden were less likely to have received a diagnosis of OSA when compared to those experiencing lowest social burden (OR [95%CI]=0.85[0.82-0.88]). Among those with OSA and free of prior CV diseases (N=4,405), performance of predicting incident MACE reached a ROC-AUC of 0.70 [0.03] overall but varied when assessed within each social risk factor cluster. Feature importance also revealed that different clinical factors might explain predictions among each cluster. Results suggest relevant health disparities in the diagnosis of OSA and across clinical predictors of CV diseases among those with OSA, across social risk factor clusters, indicating that tailored interventions geared toward minimizing these disparities are warranted.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"314-329"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying DNA methylation sites affecting drug response using electronic health record-derived GWAS summary statistics. 使用电子健康记录衍生的GWAS汇总统计确定影响药物反应的DNA甲基化位点。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0033

Delaney A Smith, Stephanie A Arteaga, Marie C Sadler, Russ B Altman

Adverse drug responses (ADRs) result in over 7,000 deaths annually. Pharmacogenomic studies have shown that many ADRs are partially attributable to genetics. However, emerging data suggest that epigenetic mechanisms, such as DNA methylation (DNAm) also contribute to this variance. Understanding the impact of DNA methylation on drug response may minimize ADRs and improve the personalization of drug regimens. In this work, we identify DNA methylation sites that likely impact drug response phenotypes for anticoagulant and cardiometabolic drugs. We use instrumental variable analysis to integrate genome-wide association study (GWAS) summary statistics derived from electronic health records (EHRs) within the U.K. Biobank (UKBB) with methylation quantitative trait loci (mQTL) data from the Genetics of DNA Methylation Consortium (GoDMC). This approach allows us to achieve a robust sample size using the largest publicly available pharmacogenomic GWAS. For warfarin, we find 71 DNAm sites. Of those, 8 are near the gene VKORC1 and 48 are on chromosome 6 near the human leukocyte antigen (HLA) gene family. We also find 2 warfarin DNAm sites near the genes CYP2C9 and CYP2C19. For statins, we identify 17 DNAm sites. Eight are near the APOB gene, which encodes a carrier protein for low-density lipoprotein cholesterol (LDL-C). We find no novel significant epigenetic results for metformin.

药物不良反应（adr）每年导致7000多人死亡。药物基因组学研究表明，许多不良反应可部分归因于遗传。然而，新出现的数据表明，表观遗传机制，如DNA甲基化（DNAm）也有助于这种差异。了解DNA甲基化对药物反应的影响可以最大限度地减少不良反应，提高药物方案的个性化。在这项工作中，我们确定了可能影响抗凝血和心脏代谢药物的药物反应表型的DNA甲基化位点。我们使用工具变量分析将来自英国生物银行（UKBB）电子健康记录（EHRs）的全基因组关联研究（GWAS）汇总统计数据与来自DNA甲基化联盟遗传学（GoDMC）的甲基化数量性状位点（mQTL）数据进行整合。这种方法使我们能够使用最大的公开药物基因组学GWAS实现稳健的样本量。对于华法林，我们发现了71个dna位点。其中，8个靠近VKORC1基因，48个位于6号染色体上靠近人类白细胞抗原（HLA）基因家族。我们还在CYP2C9和CYP2C19基因附近发现了2个华法林dna位点。对于他汀类药物，我们确定了17个DNAm位点。其中8个位于APOB基因附近，该基因编码低密度脂蛋白胆固醇（LDL-C）的载体蛋白。我们发现二甲双胍没有新的显著的表观遗传结果。

{"title":"Identifying DNA methylation sites affecting drug response using electronic health record-derived GWAS summary statistics.","authors":"Delaney A Smith, Stephanie A Arteaga, Marie C Sadler, Russ B Altman","doi":"10.1142/9789819807024_0033","DOIUrl":"10.1142/9789819807024_0033","url":null,"abstract":"Adverse drug responses (ADRs) result in over 7,000 deaths annually. Pharmacogenomic studies have shown that many ADRs are partially attributable to genetics. However, emerging data suggest that epigenetic mechanisms, such as DNA methylation (DNAm) also contribute to this variance. Understanding the impact of DNA methylation on drug response may minimize ADRs and improve the personalization of drug regimens. In this work, we identify DNA methylation sites that likely impact drug response phenotypes for anticoagulant and cardiometabolic drugs. We use instrumental variable analysis to integrate genome-wide association study (GWAS) summary statistics derived from electronic health records (EHRs) within the U.K. Biobank (UKBB) with methylation quantitative trait loci (mQTL) data from the Genetics of DNA Methylation Consortium (GoDMC). This approach allows us to achieve a robust sample size using the largest publicly available pharmacogenomic GWAS. For warfarin, we find 71 DNAm sites. Of those, 8 are near the gene VKORC1 and 48 are on chromosome 6 near the human leukocyte antigen (HLA) gene family. We also find 2 warfarin DNAm sites near the genes CYP2C9 and CYP2C19. For statins, we identify 17 DNAm sites. Eight are near the APOB gene, which encodes a carrier protein for low-density lipoprotein cholesterol (LDL-C). We find no novel significant epigenetic results for metformin.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"457-472"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Opportunities and Pitfalls with Large Language Models for Biomedical Annotation. 生物医学注释大型语言模型的机遇与陷阱。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0052

Cecilia Arighi, Jin-Dong Kim, Zhiyong Lu, Fabio Rinaldi

Large language models (LLMs) and biomedical annotations have a symbiotic relationship. LLMs rely on high-quality annotations for training and/or fine-tuning for specific biomedical tasks. These annotations are traditionally generated through expensive and time-consuming human curation. Meanwhile LLMs can also be used to accelerate the process of curation, thus simplifying the process, and potentially creating a virtuous feedback loop. However, their use also introduces new limitations and risks, which are as important to consider as the opportunities they offer. In this workshop, we will review the process that has led to the current rise of LLMs in several fields, and in particular in biomedicine, and discuss specifically the opportunities and pitfalls when they are applied to biomedical annotation and curation.

大型语言模型（llm）和生物医学注释具有共生关系。llm依靠高质量的注释进行培训和/或微调特定的生物医学任务。传统上，这些注释是通过昂贵且耗时的人工管理生成的。同时，法学硕士也可以用来加速策展过程，从而简化流程，并有可能创造一个良性的反馈循环。然而，它们的使用也带来了新的限制和风险，这与它们提供的机会一样重要。在本次研讨会中，我们将回顾导致法学硕士在几个领域，特别是生物医学领域兴起的过程，并具体讨论将法学硕士应用于生物医学注释和策展时的机会和陷阱。

引用次数: 0

Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine. 心脏代谢特征的多基因风险评分显示了祖先对于预测性精准医疗的重要性。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0056

Rachel L Kember, Shefali S Verma, Anurag Verma, Brenda Xiao, Anastasia Lucas, Colleen M Kripke, Renae Judy, Jinbo Chen, Scott M Damrauer, Daniel J Rader, Marylyn D Ritchie

Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.

多基因风险评分（PRS）主要来源于欧洲血统（EUR）个体的全基因组关联研究（GWAS）。在这项研究中，我们在宾夕法尼亚大学医学生物库（PMBB）中对基于多祖先GWAS的五种心脏代谢表型的PRS进行了深入评估，随后进行了全表型关联研究（PheWAS）。我们检查了所有个体的PRS表现，并分别在非洲血统（AFR）和欧洲血统群体。对于AFR个体，使用多祖先LD面板得出的PRS对5个PRS中的4个（舒张压、收缩压、T2D和BMI）的效应值高于来自AFR LD面板的效应值。相比之下，对于欧洲个体，与欧洲LD面板相比，多祖先LD面板PRS对五分之二的PRS （SBP和T2D）显示出更高的效应量。这些发现强调了在不同遗传背景下利用多祖先LD面板进行PRS衍生的潜在好处，并证明了所有个体的总体稳健性。我们的研究结果还揭示了PRS与各种表型类别之间的显著关联。例如，CAD PRS在AFR中与18种表型相关，在EUR中与82种表型相关，而T2D PRS在AFR中与84种表型相关，在EUR中与78种表型相关。值得注意的是，在AFR组和EUR组的不同PRSs中观察到高脂血症、肾衰竭、心房颤动、冠状动脉粥样硬化、肥胖和高血压等关联，其效应大小和显著性水平各不相同。然而，在AFR个体中，与EUR个体相比，PRS与其他表型的关联强度和数量普遍降低。我们的研究强调了未来的研究需要优先考虑：1)在不同的祖先群体中进行GWAS； 2)创建一个普遍适用于所有遗传背景的世界性PRS方法。这些进步将促进更加公平和个性化的精准医疗方法。

{"title":"Polygenic risk scores for cardiometabolic traits demonstrate importance of ancestry for predictive precision medicine.","authors":"Rachel L Kember, Shefali S Verma, Anurag Verma, Brenda Xiao, Anastasia Lucas, Colleen M Kripke, Renae Judy, Jinbo Chen, Scott M Damrauer, Daniel J Rader, Marylyn D Ritchie","doi":"10.1142/9789819807024_0056","DOIUrl":"10.1142/9789819807024_0056","url":null,"abstract":"Polygenic risk scores (PRS) have predominantly been derived from genome-wide association studies (GWAS) conducted in European ancestry (EUR) individuals. In this study, we present an in-depth evaluation of PRS based on multi-ancestry GWAS for five cardiometabolic phenotypes in the Penn Medicine BioBank (PMBB) followed by a phenome-wide association study (PheWAS). We examine the PRS performance across all individuals and separately in African ancestry (AFR) and EUR ancestry groups. For AFR individuals, PRS derived using the multi-ancestry LD panel showed a higher effect size for four out of five PRSs (DBP, SBP, T2D, and BMI) than those derived from the AFR LD panel. In contrast, for EUR individuals, the multi-ancestry LD panel PRS demonstrated a higher effect size for two out of five PRSs (SBP and T2D) compared to the EUR LD panel. These findings underscore the potential benefits of utilizing a multi-ancestry LD panel for PRS derivation in diverse genetic backgrounds and demonstrate overall robustness in all individuals. Our results also revealed significant associations between PRS and various phenotypic categories. For instance, CAD PRS was linked with 18 phenotypes in AFR and 82 in EUR, while T2D PRS correlated with 84 phenotypes in AFR and 78 in EUR. Notably, associations like hyperlipidemia, renal failure, atrial fibrillation, coronary atherosclerosis, obesity, and hypertension were observed across different PRSs in both AFR and EUR groups, with varying effect sizes and significance levels. However, in AFR individuals, the strength and number of PRS associations with other phenotypes were generally reduced compared to EUR individuals. Our study underscores the need for future research to prioritize 1) conducting GWAS in diverse ancestry groups and 2) creating a cosmopolitan PRS methodology that is universally applicable across all genetic backgrounds. Such advances will foster a more equitable and personalized approach to precision medicine.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"748-765"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CHARTING THE EVOLUTION AND TRANSFORMATIVE IMPACT OF THE PACIFIC SYMPOSIUM ON BIOCOMPUTING THROUGH A 30-YEAR RETROSPECTIVE ANALYSIS OF COLLABORATIVE NETWORKS AND THEMES USING MODERN COMPUTATIONAL TOOLS. 通过对使用现代计算工具的协作网络和主题的30年回顾性分析，绘制太平洋生物计算研讨会的演变和变革性影响。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0002

Leah Zhang, Sameeksha Garg, Edward Zhang, Sean McOsker, Carly Bobak, Kristine Giffin, Brock Christensen, Joshua Levy

Founded nearly 30 years ago, the Pacific Symposium on Biocomputing (PSB) has continually promoted collaborative research in computational biology, annually highlighting emergent themes that reflect the expanding interdisciplinary nature of the field. This study aimed to explore the collaborative and thematic dynamics at PSB using topic modeling and network analysis methods. We identified 14 central topics that have characterized the discourse at PSB over the past three decades. Our findings demonstrate significant trends in topic relevance, with a growing emphasis on machine learning and integrative analyses. We observed not only an expanding nexus of collaboration but also PSB's crucial role in fostering interdisciplinary collaborations. It remains unclear, however, whether the shift towards interdisciplinarity was driven by the conference itself, external academic trends, or broader societal shifts towards integrated research approaches. Future applications of next-generation analytical methods may offer deeper insights into these dynamics. Additionally, we have developed a web application that leverages retrieval augmented generation and large language models, enabling users to efficiently explore past PSB proceedings.

太平洋生物计算研讨会（PSB）成立于近30年前，一直在推动计算生物学的合作研究，每年都会突出反映该领域跨学科性质的新兴主题。本研究旨在利用主题建模和网络分析方法，探讨公共事业单位的合作和主题动态。我们确定了过去三十年来PSB论述的14个中心主题。我们的研究结果显示了主题相关性的显著趋势，越来越强调机器学习和综合分析。我们不仅观察到合作关系的扩大，而且还观察到PSB在促进跨学科合作方面的关键作用。然而，目前尚不清楚，向跨学科的转变是由会议本身、外部学术趋势还是更广泛的社会向综合研究方法的转变推动的。下一代分析方法的未来应用可能会对这些动态提供更深入的见解。此外，我们开发了一个web应用程序，利用检索增强生成和大型语言模型，使用户能够有效地探索过去的PSB会议记录。

{"title":"CHARTING THE EVOLUTION AND TRANSFORMATIVE IMPACT OF THE PACIFIC SYMPOSIUM ON BIOCOMPUTING THROUGH A 30-YEAR RETROSPECTIVE ANALYSIS OF COLLABORATIVE NETWORKS AND THEMES USING MODERN COMPUTATIONAL TOOLS.","authors":"Leah Zhang, Sameeksha Garg, Edward Zhang, Sean McOsker, Carly Bobak, Kristine Giffin, Brock Christensen, Joshua Levy","doi":"10.1142/9789819807024_0002","DOIUrl":"10.1142/9789819807024_0002","url":null,"abstract":"Founded nearly 30 years ago, the Pacific Symposium on Biocomputing (PSB) has continually promoted collaborative research in computational biology, annually highlighting emergent themes that reflect the expanding interdisciplinary nature of the field. This study aimed to explore the collaborative and thematic dynamics at PSB using topic modeling and network analysis methods. We identified 14 central topics that have characterized the discourse at PSB over the past three decades. Our findings demonstrate significant trends in topic relevance, with a growing emphasis on machine learning and integrative analyses. We observed not only an expanding nexus of collaboration but also PSB's crucial role in fostering interdisciplinary collaborations. It remains unclear, however, whether the shift towards interdisciplinarity was driven by the conference itself, external academic trends, or broader societal shifts towards integrated research approaches. Future applications of next-generation analytical methods may offer deeper insights into these dynamics. Additionally, we have developed a web application that leverages retrieval augmented generation and large language models, enabling users to efficiently explore past PSB proceedings.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"16-32"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747933/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating the Differential Impact of Psychosocial Factors by Patient Characteristics and Demographics on Veteran Suicide Risk Through Machine Learning Extraction of Cross-Modal Interactions. 通过跨模式交互的机器学习提取，研究患者特征和人口统计学中的社会心理因素对退伍军人自杀风险的不同影响。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0013

Joshua Levy, Monica Dimambro, Alos Diallo, Jiang Gui, Brian Shiner, Maxwell Levis

Accurate prediction of suicide risk is crucial for identifying patients with elevated risk burden, helping ensure these patients receive targeted care. The US Department of Veteran Affairs' suicide prediction model primarily leverages structured electronic health records (EHR) data. This approach largely overlooks unstructured EHR, a data format that could be utilized to enhance predictive accuracy. This study aims to enhance suicide risk models' predictive accuracy by developing a model that incorporates both structured EHR predictors and semantic NLP-derived variables from unstructured EHR. XGBoost models were fit to predict suicide risk- the interactions identified by the model were extracted using SHAP, validated using logistic regression models, added to a ridge regression model, which was subsequently compared to a ridge regression approach without the use of interactions. By introducing a selection parameter, α, to balance the influence of structured (α=1) and unstructured (α=0) data, we found that intermediate α values achieved optimal performance across various risk strata, improved model performance of the ridge regression approach and uncovered significant cross-modal interactions between psychosocial constructs and patient characteristics. These interactions highlight how psychosocial risk factors are influenced by individual patient contexts, potentially informing improved risk prediction methods and personalized interventions. Our findings underscore the importance of incorporating nuanced narrative data into predictive models and set the stage for future research that will expand the use of advanced machine learning techniques, including deep learning, to further refine suicide risk prediction methods.

准确预测自杀风险对于识别风险负担加重的患者至关重要，有助于确保这些患者得到有针对性的治疗。美国退伍军人事务部的自杀预测模型主要利用结构化电子健康记录（EHR）数据。这种方法在很大程度上忽略了非结构化电子病历，而非结构化电子病历是一种可以用来提高预测准确性的数据格式。本研究旨在通过开发一种既包含结构化 EHR 预测因子，又包含从非结构化 EHR 中提取的语义 NLP 变量的模型，来提高自杀风险模型的预测准确性。研究人员拟合了 XGBoost 模型来预测自杀风险--使用 SHAP 提取模型识别出的交互作用，使用逻辑回归模型进行验证，并将其添加到脊回归模型中，随后与不使用交互作用的脊回归方法进行比较。通过引入一个选择参数α来平衡结构化数据（α=1）和非结构化数据（α=0）的影响，我们发现中间的α值在不同的风险分层中实现了最佳性能，改善了脊回归方法的模型性能，并发现了社会心理结构和患者特征之间显著的跨模式交互作用。这些相互作用凸显了社会心理风险因素是如何受患者个体背景影响的，从而为改进风险预测方法和个性化干预措施提供了潜在信息。我们的研究结果强调了将细致入微的叙事数据纳入预测模型的重要性，并为未来的研究奠定了基础，这些研究将扩大先进机器学习技术（包括深度学习）的使用范围，以进一步完善自杀风险预测方法。

{"title":"Investigating the Differential Impact of Psychosocial Factors by Patient Characteristics and Demographics on Veteran Suicide Risk Through Machine Learning Extraction of Cross-Modal Interactions.","authors":"Joshua Levy, Monica Dimambro, Alos Diallo, Jiang Gui, Brian Shiner, Maxwell Levis","doi":"10.1142/9789819807024_0013","DOIUrl":"10.1142/9789819807024_0013","url":null,"abstract":"Accurate prediction of suicide risk is crucial for identifying patients with elevated risk burden, helping ensure these patients receive targeted care. The US Department of Veteran Affairs' suicide prediction model primarily leverages structured electronic health records (EHR) data. This approach largely overlooks unstructured EHR, a data format that could be utilized to enhance predictive accuracy. This study aims to enhance suicide risk models' predictive accuracy by developing a model that incorporates both structured EHR predictors and semantic NLP-derived variables from unstructured EHR. XGBoost models were fit to predict suicide risk- the interactions identified by the model were extracted using SHAP, validated using logistic regression models, added to a ridge regression model, which was subsequently compared to a ridge regression approach without the use of interactions. By introducing a selection parameter, α, to balance the influence of structured (α=1) and unstructured (α=0) data, we found that intermediate α values achieved optimal performance across various risk strata, improved model performance of the ridge regression approach and uncovered significant cross-modal interactions between psychosocial constructs and patient characteristics. These interactions highlight how psychosocial risk factors are influenced by individual patient contexts, potentially informing improved risk prediction methods and personalized interventions. Our findings underscore the importance of incorporating nuanced narrative data into predictive models and set the stage for future research that will expand the use of advanced machine learning techniques, including deep learning, to further refine suicide risk prediction methods.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"167-184"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747942/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-modal Imaging-based Pseudotime Analysis of Alzheimer progression. 基于多模态成像的阿尔茨海默病进展伪时间分析

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0047

Bing He, Shu Zhang, Shannon L Risacher, Andrew J Saykin, Jingwen Yan

Alzheimer's disease (AD) is a neurodegenerative disorder that results in progressive cognitive decline but without any clinically validated cures so far. Understanding the progression of AD is critical for early detection and risk assessment for AD in aging individuals, thereby enabling initiation of timely intervention and improved chance of success in AD trials. Recent pseudotime approach turns cross-sectional data into "faux" longitudinal data to understand how a complex process evolves over time. This is critical for Alzheimer, which unfolds over the course of decades, but the collected data offers only a snapshot. In this study, we tested several state-of-the-art pseudotime approaches to model the full spectrum of AD progression. Subsequently, we evaluated and compared the pseudotime progression score derived from individual imaging modalities and multi-modalities in the ADNI cohort. Our results showed that most existing pseudotime analysis tools do not generalize well to the imaging data, with either flipped progression score or poor separation of diagnosis groups. This is likely due to the underlying assumptions that only stand for single cell data. From the only tool with promising results, it was observed that all pseudotime, derived from either single imaging modalities or multi-modalities, captures the progressiveness of diagnosis groups. Pseudotime from multi-modality, but not the single modalities, confirmed the hypothetical temporal order of imaging phenotypes. In addition, we found that multi-modal pseudotime is mostly driven by amyloid and tau imaging, suggesting their continuous changes along the full spectrum of AD progression.

阿尔茨海默病（AD）是一种神经退行性疾病，会导致认知能力逐渐下降，但迄今为止还没有任何经临床验证的治疗方法。了解阿兹海默病的进展对于早期发现和评估老年阿兹海默病的风险至关重要，这样才能及时采取干预措施，提高阿兹海默病试验的成功几率。最近的伪时间方法将横截面数据转化为 "假 "纵向数据，以了解复杂过程如何随时间演变。这对阿尔茨海默病至关重要，因为阿尔茨海默病的病程长达数十年，但收集到的数据只能提供一个快照。在这项研究中，我们测试了几种最先进的伪时间方法，以模拟阿兹海默症的整个发展过程。随后，我们评估并比较了 ADNI 队列中由单个成像模式和多模式得出的伪时间进展评分。我们的结果表明，大多数现有的假时分析工具都不能很好地概括成像数据，要么是进展评分翻转，要么是诊断组分离不佳。这可能是由于其基本假设只适用于单细胞数据。从唯一有希望的工具中可以观察到，无论是从单一成像模式还是从多模式得出的所有伪时间，都能捕捉到诊断组的进展情况。来自多模态而非单一模态的伪时间证实了成像表型的假定时间顺序。此外，我们还发现，多模态伪时间主要由淀粉样蛋白和 tau 成像驱动，这表明它们在 AD 进展的整个过程中会发生持续变化。

{"title":"Multi-modal Imaging-based Pseudotime Analysis of Alzheimer progression.","authors":"Bing He, Shu Zhang, Shannon L Risacher, Andrew J Saykin, Jingwen Yan","doi":"10.1142/9789819807024_0047","DOIUrl":"10.1142/9789819807024_0047","url":null,"abstract":"Alzheimer's disease (AD) is a neurodegenerative disorder that results in progressive cognitive decline but without any clinically validated cures so far. Understanding the progression of AD is critical for early detection and risk assessment for AD in aging individuals, thereby enabling initiation of timely intervention and improved chance of success in AD trials. Recent pseudotime approach turns cross-sectional data into \"faux\" longitudinal data to understand how a complex process evolves over time. This is critical for Alzheimer, which unfolds over the course of decades, but the collected data offers only a snapshot. In this study, we tested several state-of-the-art pseudotime approaches to model the full spectrum of AD progression. Subsequently, we evaluated and compared the pseudotime progression score derived from individual imaging modalities and multi-modalities in the ADNI cohort. Our results showed that most existing pseudotime analysis tools do not generalize well to the imaging data, with either flipped progression score or poor separation of diagnosis groups. This is likely due to the underlying assumptions that only stand for single cell data. From the only tool with promising results, it was observed that all pseudotime, derived from either single imaging modalities or multi-modalities, captures the progressiveness of diagnosis groups. Pseudotime from multi-modality, but not the single modalities, confirmed the hypothetical temporal order of imaging phenotypes. In addition, we found that multi-modal pseudotime is mostly driven by amyloid and tau imaging, suggesting their continuous changes along the full spectrum of AD progression.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"664-674"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12044618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Session Introduction: Overcoming health disparities in precision medicine: Intersectional approaches in precision medicine. 会议简介：克服精准医学中的健康差距：精准医疗中的交叉方法。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0018

Francisco M De La Vega, Kathleen C Barnes, Harris Bland, Todd Edwards, Keolu Fox, Alexander Ioannidis, Eimear Kenny, Rasika A Mathias, Bogdan Pasaniuc, Jada Benn Torres, Digna R Velez Edwards

The following sections are included: Overview, Advancing multi-ancestry genetic research, Integrating social determinants of health to enhance genetic risk models, Methods to detect and mitigate disparities, Addressing Disparities in Adverse Drug Reactions, Conclusion, Acknowledgments,References.

以下部分包括：概述，推进多祖先遗传研究，整合健康的社会决定因素以增强遗传风险模型，检测和减轻差异的方法，解决药物不良反应的差异，结论，致谢，参考文献。

引用次数: 0

Session Introduction: AI and Machine Learning in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface. 会议介绍：临床医学中的人工智能和机器学习：人机界面的生成和交互系统。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0003

Fateme Nateghi Haredasht, Dokyoon Kim, Joseph D Romano, Geoff Tison, Roxana Daneshjou, Jonathan H Chen

Artificial Intelligence (AI) technologies are increasingly capable of processing complex and multilayered datasets. Innovations in generative AI and deep learning have notably enhanced the extraction of insights from both unstructured texts, images, and structured data alike. These breakthroughs in AI technology have spurred a wave of research in the medical field, leading to the creation of a variety of tools aimed at improving clinical decision-making, patient monitoring, image analysis, and emergency response systems. However, thorough research is essential to fully understand the broader impact and potential consequences of deploying AI within the healthcare sector.

人工智能（AI）技术处理复杂和多层数据集的能力越来越强。生成式人工智能和深度学习的创新显著增强了从非结构化文本、图像和结构化数据中提取见解的能力。人工智能技术的这些突破激发了医疗领域的一波研究浪潮，催生了旨在改善临床决策、患者监测、图像分析和应急响应系统的各种工具。然而，要充分了解在医疗保健行业部署人工智能的更广泛影响和潜在后果，进行彻底的研究至关重要。

引用次数: 0

PGxQA: A Resource for Evaluating LLM Performance for Pharmacogenomic QA Tasks. PGxQA：用于评估药物基因组质量保证任务的 LLM 性能的资源。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0017

Karl Keat, Rasika Venkatesh, Yidi Huang, Rachit Kumar, Sony Tuteja, Katrin Sangkuhl, Binglan Li, Li Gong, Michelle Whirl-Carrillo, Teri E Klein, Marylyn D Ritchie, Dokyoon Kim

Pharmacogenetics represents one of the most promising areas of precision medicine, with several guidelines for genetics-guided treatment ready for clinical use. Despite this, implementation has been slow, with few health systems incorporating the technology into their standard of care. One major barrier to uptake is the lack of education and awareness of pharmacogenetics among clinicians and patients. The introduction of large language models (LLMs) like GPT-4 has raised the possibility of medical chatbots that deliver timely information to clinicians, patients, and researchers with a simple interface. Although state-of-the-art LLMs have shown impressive performance at advanced tasks like medical licensing exams, in practice they still often provide false information, which is particularly hazardous in a clinical context. To quantify the extent of this issue, we developed a series of automated and expert-scored tests to evaluate the performance of chatbots in answering pharmacogenetics questions from the perspective of clinicians, patients, and researchers. We applied this benchmark to state-of-the-art LLMs and found that newer models like GPT-4o greatly outperform their predecessors, but still fall short of the standards required for clinical use. Our benchmark will be a valuable public resource for subsequent developments in this space as we work towards better clinical AI for pharmacogenetics.

药物遗传学是精准医疗中最有前景的领域之一，目前已有多份基因指导治疗指南可供临床使用。尽管如此，药物遗传学的实施进展缓慢，很少有医疗系统将该技术纳入其标准护理中。临床医生和患者缺乏对药物遗传学的教育和认识是阻碍该技术被广泛应用的主要原因之一。GPT-4等大型语言模型（LLM）的问世为医疗聊天机器人提供了可能，它能通过简单的界面向临床医生、患者和研究人员及时提供信息。虽然最先进的 LLM 在医学执照考试等高级任务中表现出了令人印象深刻的性能，但在实践中，它们仍然经常提供虚假信息，这在临床环境中尤其危险。为了量化这一问题的严重程度，我们开发了一系列自动测试和专家评分测试，从临床医生、患者和研究人员的角度评估聊天机器人在回答药物遗传学问题时的表现。我们将该基准应用于最先进的 LLM，发现 GPT-4o 等较新的模型大大优于其前辈，但仍未达到临床使用所需的标准。我们的基准将为这一领域的后续发展提供宝贵的公共资源，因为我们正在努力为药物遗传学提供更好的临床人工智能。

{"title":"PGxQA: A Resource for Evaluating LLM Performance for Pharmacogenomic QA Tasks.","authors":"Karl Keat, Rasika Venkatesh, Yidi Huang, Rachit Kumar, Sony Tuteja, Katrin Sangkuhl, Binglan Li, Li Gong, Michelle Whirl-Carrillo, Teri E Klein, Marylyn D Ritchie, Dokyoon Kim","doi":"10.1142/9789819807024_0017","DOIUrl":"10.1142/9789819807024_0017","url":null,"abstract":"Pharmacogenetics represents one of the most promising areas of precision medicine, with several guidelines for genetics-guided treatment ready for clinical use. Despite this, implementation has been slow, with few health systems incorporating the technology into their standard of care. One major barrier to uptake is the lack of education and awareness of pharmacogenetics among clinicians and patients. The introduction of large language models (LLMs) like GPT-4 has raised the possibility of medical chatbots that deliver timely information to clinicians, patients, and researchers with a simple interface. Although state-of-the-art LLMs have shown impressive performance at advanced tasks like medical licensing exams, in practice they still often provide false information, which is particularly hazardous in a clinical context. To quantify the extent of this issue, we developed a series of automated and expert-scored tests to evaluate the performance of chatbots in answering pharmacogenetics questions from the perspective of clinicians, patients, and researchers. We applied this benchmark to state-of-the-art LLMs and found that newer models like GPT-4o greatly outperform their predecessors, but still fall short of the standards required for clinical use. Our benchmark will be a valuable public resource for subsequent developments in this space as we work towards better clinical AI for pharmacogenetics.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"229-246"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734741/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0