Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing最新文献_第4页

ClinValAI: A framework for developing Cloud-based infrastructures for the External Clinical Validation of AI in Medical Imaging. ClinValAI：为医学影像中人工智能的外部临床验证开发基于云的基础设施的框架。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0016

Ojas A Ramwala, Kathryn P Lowry, Daniel S Hippe, Matthew P N Unrath, Matthew J Nyflot, Sean D Mooney, Christoph I Lee

Artificial Intelligence (AI) algorithms showcase the potential to steer a paradigm shift in clinical medicine, especially medical imaging. Concerns associated with model generalizability and biases necessitate rigorous external validation of AI algorithms prior to their adoption into clinical workflows. To address the barriers associated with patient privacy, intellectual property, and diverse model requirements, we introduce ClinValAI, a framework for establishing robust cloud-based infrastructures to clinically validate AI algorithms in medical imaging. By featuring dedicated workflows for data ingestion, algorithm scoring, and output processing, we propose an easily customizable method to assess AI models and investigate biases. Our novel orchestration mechanism facilitates utilizing the complete potential of the cloud computing environment. ClinValAI's input auditing and standardization mechanisms ensure that inputs consistent with model prerequisites are provided to the algorithm for a streamlined validation. The scoring workflow comprises multiple steps to facilitate consistent inferencing and systematic troubleshooting. The output processing workflow helps identify and analyze samples with missing results and aggregates final outputs for downstream analysis. We demonstrate the usability of our work by evaluating a state-of-the-art breast cancer risk prediction algorithm on a large and diverse dataset of 2D screening mammograms. We perform comprehensive statistical analysis to study model calibration and evaluate performance on important factors, including breast density, age, and race, to identify latent biases. ClinValAI provides a holistic framework to validate medical imaging models and has the potential to advance the development of generalizable AI models in clinical medicine and promote health equity.

人工智能（AI）算法展示了引导临床医学范式转变的潜力，尤其是医学成像。在将人工智能算法应用于临床工作流程之前，需要对其进行严格的外部验证，这与模型的泛化性和偏差有关。为了解决与患者隐私、知识产权和不同模型需求相关的障碍，我们引入了ClinValAI，这是一个框架，用于建立强大的基于云的基础设施，以临床验证医学成像中的人工智能算法。通过为数据摄取、算法评分和输出处理提供专门的工作流程，我们提出了一种易于定制的方法来评估人工智能模型和调查偏差。我们新颖的编排机制有助于充分利用云计算环境的全部潜力。ClinValAI的输入审计和标准化机制确保与模型先决条件一致的输入被提供给简化验证的算法。评分工作流程包括多个步骤，以促进一致的推理和系统的故障排除。输出处理工作流有助于识别和分析缺少结果的样本，并汇总最终输出以供下游分析。我们通过评估最先进的乳腺癌风险预测算法在大型和多样化的2D筛查乳房x线照片数据集上的可用性来证明我们工作的可用性。我们进行了全面的统计分析来研究模型校准，并评估了包括乳房密度、年龄和种族在内的重要因素的性能，以识别潜在的偏差。ClinValAI提供了一个整体框架来验证医学成像模型，并有可能推动临床医学中通用人工智能模型的发展，促进卫生公平。

{"title":"ClinValAI: A framework for developing Cloud-based infrastructures for the External Clinical Validation of AI in Medical Imaging.","authors":"Ojas A Ramwala, Kathryn P Lowry, Daniel S Hippe, Matthew P N Unrath, Matthew J Nyflot, Sean D Mooney, Christoph I Lee","doi":"10.1142/9789819807024_0016","DOIUrl":"10.1142/9789819807024_0016","url":null,"abstract":"Artificial Intelligence (AI) algorithms showcase the potential to steer a paradigm shift in clinical medicine, especially medical imaging. Concerns associated with model generalizability and biases necessitate rigorous external validation of AI algorithms prior to their adoption into clinical workflows. To address the barriers associated with patient privacy, intellectual property, and diverse model requirements, we introduce ClinValAI, a framework for establishing robust cloud-based infrastructures to clinically validate AI algorithms in medical imaging. By featuring dedicated workflows for data ingestion, algorithm scoring, and output processing, we propose an easily customizable method to assess AI models and investigate biases. Our novel orchestration mechanism facilitates utilizing the complete potential of the cloud computing environment. ClinValAI's input auditing and standardization mechanisms ensure that inputs consistent with model prerequisites are provided to the algorithm for a streamlined validation. The scoring workflow comprises multiple steps to facilitate consistent inferencing and systematic troubleshooting. The output processing workflow helps identify and analyze samples with missing results and aggregates final outputs for downstream analysis. We demonstrate the usability of our work by evaluating a state-of-the-art breast cancer risk prediction algorithm on a large and diverse dataset of 2D screening mammograms. We perform comprehensive statistical analysis to study model calibration and evaluate performance on important factors, including breast density, age, and race, to identify latent biases. ClinValAI provides a holistic framework to validate medical imaging models and has the potential to advance the development of generalizable AI models in clinical medicine and promote health equity.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"215-228"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12240695/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Foundational Models in Computational Biology: Validation, Understanding, and Innovation. 在计算生物学中利用基础模型：验证、理解和创新。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0051

Brett Beaulieu-Jones, Steven Brenner

Large Language Models (LLMs) have shown significant promise across a wide array of fields, including biomedical research, but face notable limitations in their current applications. While they offer a new paradigm for data analysis and hypothesis generation, their efficacy in computational biology trails other applications such as natural language processing. This workshop addresses the state of the art in LLMs, discussing their challenges and the potential for future development tailored to computational biology. Key issues include difficulties in validating LLM outputs, proprietary model limitations, and the need for expertise in critical evaluation of model failure modes.

大型语言模型（llm）在包括生物医学研究在内的广泛领域显示出巨大的前景，但在目前的应用中面临着明显的限制。虽然它们为数据分析和假设生成提供了一个新的范例，但它们在计算生物学中的功效落后于自然语言处理等其他应用。本次研讨会讨论了法学硕士的最新进展，讨论了法学硕士面临的挑战以及为计算生物学量身定制的未来发展潜力。关键问题包括验证法学硕士输出的困难，专有模型的限制，以及对模型失效模式的关键评估的专业知识的需求。

引用次数: 0

Integrated exposomic analysis of lipid phenotypes: Leveraging GE.db in environment by environment interaction studies. 脂质表型的综合暴露组学分析：在环境相互作用研究中利用 GE.db。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0038

Andre Luis Garao Rico, Nicole Palmiero, Marylyn D Ritchie, Molly A Hall

Gene-environment interaction (GxE) studies provide insights into the interplay between genetics and the environment but often overlook multiple environmental factors' synergistic effects. This study encompasses the use of environment by environment interaction (ExE) studies to explore interactions among environmental factors affecting lipid phenotypes (e.g., HDL, LDL, and total cholesterol, and triglycerides), which are crucial for disease risk assessment. We developed a novel curated knowledge base, GE.db, integrating genomic and exposomic interactions. In this study, we filtered NHANES exposure variables (available 1999-2018) to identify significant ExE using GE.db. From 101,316 participants and 77 exposures, we identified 263 statistically significant interactions (FDR p < 0.1) in discovery and replication datasets, with 21 interactions significant for HDL-C (Bonferroni p < 0.05). Notable interactions included docosapentaenoic acid (22:5n-3) (DPA) - arachidic acid (20:0), stearic acid (18:0) - arachidic acid (20:0), and blood 2,5-dimethyfuran - blood benzene associated with HDL-C levels. These findings underscore GE.db's role in enhancing -omics research efficiency and highlight the complex impact of environmental exposures on lipid metabolism, informing future health strategies.

基因-环境相互作用（GxE）研究提供了遗传与环境相互作用的见解，但往往忽视了多种环境因素的协同效应。本研究包括利用环境相互作用（ExE）研究来探索影响脂质表型的环境因素之间的相互作用（例如，HDL、LDL、总胆固醇和甘油三酯），这对疾病风险评估至关重要。我们开发了一个新的知识库，GE.db，整合了基因组和暴露体的相互作用。在本研究中，我们筛选了NHANES暴露变量（1999-2018年可用），以使用GE.db识别显著的ExE。从101316名参与者和77次暴露中，我们在发现和复制数据集中确定了263个具有统计学意义的相互作用（FDR p < 0.1），其中21个相互作用对HDL-C具有统计学意义（Bonferroni p < 0.05）。显著的相互作用包括二十二碳五烯酸（22:5n-3） (DPA) -花生酸（20:0）、硬脂酸(18:0)-花生酸（20:0）和与HDL-C水平相关的血液2,5-二甲呋喃-血苯。这些发现强调了GE.db在提高组学研究效率方面的作用，并强调了环境暴露对脂质代谢的复杂影响，为未来的健康策略提供了信息。

{"title":"Integrated exposomic analysis of lipid phenotypes: Leveraging GE.db in environment by environment interaction studies.","authors":"Andre Luis Garao Rico, Nicole Palmiero, Marylyn D Ritchie, Molly A Hall","doi":"10.1142/9789819807024_0038","DOIUrl":"10.1142/9789819807024_0038","url":null,"abstract":"Gene-environment interaction (GxE) studies provide insights into the interplay between genetics and the environment but often overlook multiple environmental factors' synergistic effects. This study encompasses the use of environment by environment interaction (ExE) studies to explore interactions among environmental factors affecting lipid phenotypes (e.g., HDL, LDL, and total cholesterol, and triglycerides), which are crucial for disease risk assessment. We developed a novel curated knowledge base, GE.db, integrating genomic and exposomic interactions. In this study, we filtered NHANES exposure variables (available 1999-2018) to identify significant ExE using GE.db. From 101,316 participants and 77 exposures, we identified 263 statistically significant interactions (FDR p < 0.1) in discovery and replication datasets, with 21 interactions significant for HDL-C (Bonferroni p < 0.05). Notable interactions included docosapentaenoic acid (22:5n-3) (DPA) - arachidic acid (20:0), stearic acid (18:0) - arachidic acid (20:0), and blood 2,5-dimethyfuran - blood benzene associated with HDL-C levels. These findings underscore GE.db's role in enhancing -omics research efficiency and highlight the complex impact of environmental exposures on lipid metabolism, informing future health strategies.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"535-550"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11694901/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Prospective Comparison of Large Language Models for Early Prediction of Sepsis. 脓毒症早期预测大型语言模型的前瞻性比较。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0009

Supreeth P Shashikumar, Shamim Nemati

We present a comparative study on the performance of two popular open-source large language models for early prediction of sepsis: Llama-3 8B and Mixtral 8x7B. The primary goal was to determine whether a smaller model could achieve comparable predictive accuracy to a significantly larger model in the context of sepsis prediction using clinical data.Our proposed LLM-based sepsis prediction system, COMPOSER-LLM, enhances the previously published COMPOSER model, which utilizes structured EHR data to generate hourly sepsis risk scores. The new system incorporates an LLM-based approach to extract sepsis-related clinical signs and symptoms from unstructured clinical notes. For scores falling within high-uncertainty prediction regions, particularly those near the decision threshold, the system uses the LLM to draw additional clinical context from patient notes; thereby enhancing the model's predictive accuracy in challenging diagnostic scenarios.A total of 2,074 patient encounters admitted to the Emergency Department at two hospitals within the University of California San Diego Health system were used for model evaluation in this study. Our findings reveal that the Llama-3 8B model based system (COMPOSER-LLMLlama) achieved a sensitivity of 70.3%, positive predictive value (PPV) of 32.5%, F-1 score of 44.4% and false alarms per patient hour (FAPH) of 0.0194, closely matching the performance of the larger Mixtral 8x7B model based system (COMPOSER-LLMmixtral) which achieved a sensitivity of 72.1%, PPV of 31.9%, F-1 score of 44.2% and FAPH of 0.020. When prospectively evaluated, COMPOSER-LLMLlama demonstrated similar performance to the COMPOSER-LLMmixtral pipeline, with a sensitivity of 68.7%, PPV of 36.6%, F-1 score of 47.7% and FAPH of 0.019 vs. sensitivity of 70.5%, PPV of 36.3%, F-1 score of 47.9% and FAPH of 0.020. This result indicates that, for extraction of clinical signs and symptoms from unstructured clinical notes to enable early prediction of sepsis, the Llama-3 generation of smaller language models can perform as effectively and more efficiently than larger models. This finding has significant implications for healthcare settings with limited resources.

我们对两种流行的开源大型语言模型的性能进行了比较研究，用于脓毒症的早期预测：llama - 38b和Mixtral 8x7B。主要目的是确定在脓毒症预测的背景下，使用临床数据确定一个较小的模型是否可以达到与一个显著较大的模型相当的预测准确性。我们提出的基于法学硕士的败血症预测系统COMPOSER- llm增强了先前发表的COMPOSER模型，该模型利用结构化的电子病历数据生成每小时败血症风险评分。新系统结合了基于法学硕士的方法，从非结构化的临床记录中提取败血症相关的临床体征和症状。对于处于高不确定性预测区域的分数，特别是那些接近决策阈值的分数，系统使用LLM从患者笔记中提取额外的临床背景；从而在具有挑战性的诊断场景中提高模型的预测准确性。在本研究中，加州大学圣地亚哥分校卫生系统内两家医院急诊科收治的2,074名患者被用于模型评估。结果表明，基于llama - 38b模型的系统（comser - llmllama）的灵敏度为70.3%，阳性预测值（PPV）为32.5%，F-1评分为44.4%，每病人小时误报率（FAPH）为0.0194，与基于更大的Mixtral 8 × 7b模型的系统（comser - llmmixtral）的灵敏度为72.1%，PPV为31.9%，F-1评分为44.2%，FAPH为0.020的性能非常接近。在前瞻性评价中，COMPOSER-LLMLlama表现出与composer - llmmix管道相似的性能，敏感性为68.7%，PPV为36.6%，F-1评分为47.7%，FAPH为0.019，敏感性为70.5%，PPV为36.3%，F-1评分为47.9%，FAPH为0.020。这一结果表明，对于从非结构化临床记录中提取临床体征和症状以实现脓毒症的早期预测，Llama-3代较小的语言模型可以比较大的模型更有效地执行。这一发现对资源有限的医疗机构具有重要意义。

{"title":"A Prospective Comparison of Large Language Models for Early Prediction of Sepsis.","authors":"Supreeth P Shashikumar, Shamim Nemati","doi":"10.1142/9789819807024_0009","DOIUrl":"10.1142/9789819807024_0009","url":null,"abstract":"We present a comparative study on the performance of two popular open-source large language models for early prediction of sepsis: Llama-3 8B and Mixtral 8x7B. The primary goal was to determine whether a smaller model could achieve comparable predictive accuracy to a significantly larger model in the context of sepsis prediction using clinical data.Our proposed LLM-based sepsis prediction system, COMPOSER-LLM, enhances the previously published COMPOSER model, which utilizes structured EHR data to generate hourly sepsis risk scores. The new system incorporates an LLM-based approach to extract sepsis-related clinical signs and symptoms from unstructured clinical notes. For scores falling within high-uncertainty prediction regions, particularly those near the decision threshold, the system uses the LLM to draw additional clinical context from patient notes; thereby enhancing the model's predictive accuracy in challenging diagnostic scenarios.A total of 2,074 patient encounters admitted to the Emergency Department at two hospitals within the University of California San Diego Health system were used for model evaluation in this study. Our findings reveal that the Llama-3 8B model based system (COMPOSER-LLMLlama) achieved a sensitivity of 70.3%, positive predictive value (PPV) of 32.5%, F-1 score of 44.4% and false alarms per patient hour (FAPH) of 0.0194, closely matching the performance of the larger Mixtral 8x7B model based system (COMPOSER-LLMmixtral) which achieved a sensitivity of 72.1%, PPV of 31.9%, F-1 score of 44.2% and FAPH of 0.020. When prospectively evaluated, COMPOSER-LLMLlama demonstrated similar performance to the COMPOSER-LLMmixtral pipeline, with a sensitivity of 68.7%, PPV of 36.6%, F-1 score of 47.7% and FAPH of 0.019 vs. sensitivity of 70.5%, PPV of 36.3%, F-1 score of 47.9% and FAPH of 0.020. This result indicates that, for extraction of clinical signs and symptoms from unstructured clinical notes to enable early prediction of sepsis, the Llama-3 generation of smaller language models can perform as effectively and more efficiently than larger models. This finding has significant implications for healthcare settings with limited resources.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"109-120"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649013/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Constructing a multi-ancestry polygenic risk score for uterine fibroids using publicly available data highlights need for inclusive genetic research. 利用可公开获得的数据构建子宫肌瘤的多世系多基因风险评分，凸显了包容性遗传研究的必要性。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0020

Jessica L G Winters, Jacqueline A Piekos, Jacklyn N Hellwege, Ozan Dikilitas, Iftikhar J Kullo, Daniel J Schaid, Todd L Edwards, Digna R Velez Edwards

Uterine leiomyomata, or fibroids, are common gynecological tumors causing pelvic and menstrual symptoms that can negatively affect quality of life and child-bearing desires. As fibroids grow, symptoms can intensify and lead to invasive treatments that are less likely to preserve fertility. Identifying individuals at highest risk for fibroids can aid in access to earlier diagnoses. Polygenic risk scores (PRS) quantify genetic risk to identify those at highest risk for disease. Utilizing the PRS software PRS-CSx and publicly available genome-wide association study (GWAS) summary statistics from FinnGen and Biobank Japan, we constructed a multi-ancestry (META) PRS for fibroids. We validated the META PRS in two cross-ancestry cohorts. In the cross-ancestry Electronic Medical Record and Genomics (eMERGE) Network cohort, the META PRS was significantly associated with fibroid status and exhibited 1.11 greater odds for fibroids per standard deviation increase in PRS (95% confidence interval [CI]: 1.05 - 1.17, p = 5.21x10-5). The META PRS was validated in two BioVU cohorts: one using ICD9/ICD10 codes and one requiring imaging confirmation of fibroid status. In the ICD cohort, a standard deviation increase in the META PRS increased the odds of fibroids by 1.23 (95% CI: 1.15 - 1.32, p = 9.68x10-9), while in the imaging cohort, the odds increased by 1.26 (95% CI: 1.18 - 1.35, p = 2.40x10-11). We subsequently constructed single ancestry PRS for FinnGen (European ancestry [EUR]) and Biobank Japan (East Asian ancestry [EAS]) using PRS-CS and discovered a nominally significant association in the eMERGE cohort within fibroids and EAS PRS but not EUR PRS (95% CI: 1.09 - 1.20, p = 1.64x10-7). These findings highlight the strong predictive power of multi-ancestry PRS over single ancestry PRS. This study underscores the necessity of diverse population inclusion in genetic research to ensure precision medicine benefits all individuals equitably.

子宫良性肌瘤或子宫肌瘤是常见的妇科肿瘤，会引起盆腔和月经症状，对生活质量和生育愿望造成负面影响。随着子宫肌瘤的生长，症状可能会加剧，并导致不太可能保留生育能力的侵入性治疗。识别子宫肌瘤的高危人群有助于尽早确诊。多基因风险评分（PRS）对遗传风险进行量化，以确定患病风险最高的人群。利用 PRS 软件 PRS-CSx，以及从 FinnGen 和 Biobank Japan 公开获得的全基因组关联研究（GWAS）汇总统计数据，我们构建了子宫肌瘤的多家系（META）PRS。我们在两个跨种属队列中验证了 META PRS。在跨种属电子病历和基因组学（eMERGE）网络队列中，META PRS 与子宫肌瘤状态显著相关，PRS 每增加一个标准差，子宫肌瘤发生几率增加 1.11（95% 置信区间 [CI]：1.05 - 1.17，p = 5.21x10-5）。META PRS 在 BioVU 的两个队列中进行了验证：一个队列使用 ICD9/ICD10 编码，另一个队列需要通过成像确认子宫肌瘤状态。在 ICD 队列中，META PRS 每增加一个标准差，子宫肌瘤的几率就增加 1.23（95% CI：1.15 - 1.32，p = 9.68x10-9），而在影像队列中，几率增加 1.26（95% CI：1.18 - 1.35，p = 2.40x10-11）。随后，我们使用 PRS-CS 为 FinnGen（欧洲血统 [EUR]）和 Biobank Japan（东亚血统 [EAS]）构建了单一血统 PRS，发现在 eMERGE 队列中，子宫肌瘤与 EAS PRS 名义上有显著关联，但与 EUR PRS 没有关联（95% CI：1.09 - 1.20，p = 1.64x10-7）。这些发现凸显了多血统 PRS 比单一血统 PRS 更强的预测能力。这项研究强调了将不同人群纳入基因研究的必要性，以确保精准医学公平地惠及所有人。

{"title":"Constructing a multi-ancestry polygenic risk score for uterine fibroids using publicly available data highlights need for inclusive genetic research.","authors":"Jessica L G Winters, Jacqueline A Piekos, Jacklyn N Hellwege, Ozan Dikilitas, Iftikhar J Kullo, Daniel J Schaid, Todd L Edwards, Digna R Velez Edwards","doi":"10.1142/9789819807024_0020","DOIUrl":"10.1142/9789819807024_0020","url":null,"abstract":"Uterine leiomyomata, or fibroids, are common gynecological tumors causing pelvic and menstrual symptoms that can negatively affect quality of life and child-bearing desires. As fibroids grow, symptoms can intensify and lead to invasive treatments that are less likely to preserve fertility. Identifying individuals at highest risk for fibroids can aid in access to earlier diagnoses. Polygenic risk scores (PRS) quantify genetic risk to identify those at highest risk for disease. Utilizing the PRS software PRS-CSx and publicly available genome-wide association study (GWAS) summary statistics from FinnGen and Biobank Japan, we constructed a multi-ancestry (META) PRS for fibroids. We validated the META PRS in two cross-ancestry cohorts. In the cross-ancestry Electronic Medical Record and Genomics (eMERGE) Network cohort, the META PRS was significantly associated with fibroid status and exhibited 1.11 greater odds for fibroids per standard deviation increase in PRS (95% confidence interval [CI]: 1.05 - 1.17, p = 5.21x10-5). The META PRS was validated in two BioVU cohorts: one using ICD9/ICD10 codes and one requiring imaging confirmation of fibroid status. In the ICD cohort, a standard deviation increase in the META PRS increased the odds of fibroids by 1.23 (95% CI: 1.15 - 1.32, p = 9.68x10-9), while in the imaging cohort, the odds increased by 1.26 (95% CI: 1.18 - 1.35, p = 2.40x10-11). We subsequently constructed single ancestry PRS for FinnGen (European ancestry [EUR]) and Biobank Japan (East Asian ancestry [EAS]) using PRS-CS and discovered a nominally significant association in the eMERGE cohort within fibroids and EAS PRS but not EUR PRS (95% CI: 1.09 - 1.20, p = 1.64x10-7). These findings highlight the strong predictive power of multi-ancestry PRS over single ancestry PRS. This study underscores the necessity of diverse population inclusion in genetic research to ensure precision medicine benefits all individuals equitably.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"268-280"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Implications of An Evolving Regulatory Landscape on the Development of AI and ML in Medicine. 不断变化的监管环境对人工智能和 ML 在医学领域发展的影响。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0012

Nicole Rincon, Sara Gerke, Jennifer K Wagner

The rapid advancement of artificial intelligence and machine learning (AI/ML) technologies in healthcare presents significant opportunities for enhancing patient care through innovative diagnostic tools, monitoring systems, and personalized treatment plans. However, these innovative advancements might result in regulatory challenges given recent Supreme Court decisions that impact the authority of regulatory agencies like the Food and Drug Administration (FDA). This paper explores the implications of regulatory uncertainty for the healthcare industry related to balancing innovation in biotechnology and biocomputing with ensuring regulatory uniformity and patient safety. We examine key Supreme Court cases, including Loper Bright Enterprises v. Raimondo, Relentless, Inc. v. Department of Commerce, and Corner Post, Inc. v. Board of Governors of the Federal Reserve System, and their impact on the Chevron doctrine. We also discuss other relevant cases to highlight shifts in judicial approaches to agency deference and regulatory authority that might affect how science is handled in regulatory spaces, including how biocomputing and other health sciences are governed, how scientific facts are applied in policymaking, and how scientific expertise guides decision making. Through a detailed analysis, we assess the potential impact of regulatory uncertainty in healthcare. Additionally, we provide recommendations for the medical community on navigating these challenges.

人工智能和机器学习（AI/ML）技术在医疗保健领域的快速发展为通过创新的诊断工具、监测系统和个性化治疗计划加强患者护理提供了重要机会。然而，鉴于最近最高法院的决定影响了食品和药物管理局（FDA）等监管机构的权威，这些创新的进步可能会导致监管方面的挑战。本文探讨了与平衡生物技术和生物计算创新与确保监管统一性和患者安全相关的医疗保健行业监管不确定性的影响。我们研究了最高法院的关键案例，包括Loper Bright Enterprises诉Raimondo案、Relentless公司诉商务部案和Corner Post公司诉联邦储备系统理事会案，以及它们对雪佛龙原则的影响。我们还讨论了其他相关案例，以突出可能影响在监管空间中如何处理科学的司法方法的转变，包括如何管理生物计算和其他健康科学，如何将科学事实应用于决策，以及科学专业知识如何指导决策。通过详细的分析，我们评估监管不确定性对医疗保健的潜在影响。此外，我们还为医学界提供了应对这些挑战的建议。

{"title":"Implications of An Evolving Regulatory Landscape on the Development of AI and ML in Medicine.","authors":"Nicole Rincon, Sara Gerke, Jennifer K Wagner","doi":"10.1142/9789819807024_0012","DOIUrl":"10.1142/9789819807024_0012","url":null,"abstract":"The rapid advancement of artificial intelligence and machine learning (AI/ML) technologies in healthcare presents significant opportunities for enhancing patient care through innovative diagnostic tools, monitoring systems, and personalized treatment plans. However, these innovative advancements might result in regulatory challenges given recent Supreme Court decisions that impact the authority of regulatory agencies like the Food and Drug Administration (FDA). This paper explores the implications of regulatory uncertainty for the healthcare industry related to balancing innovation in biotechnology and biocomputing with ensuring regulatory uniformity and patient safety. We examine key Supreme Court cases, including Loper Bright Enterprises v. Raimondo, Relentless, Inc. v. Department of Commerce, and Corner Post, Inc. v. Board of Governors of the Federal Reserve System, and their impact on the Chevron doctrine. We also discuss other relevant cases to highlight shifts in judicial approaches to agency deference and regulatory authority that might affect how science is handled in regulatory spaces, including how biocomputing and other health sciences are governed, how scientific facts are applied in policymaking, and how scientific expertise guides decision making. Through a detailed analysis, we assess the potential impact of regulatory uncertainty in healthcare. Additionally, we provide recommendations for the medical community on navigating these challenges.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"154-166"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649012/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Visual Analytics Framework for Assessing Interactive AI for Clinical Decision Support. 评估交互式人工智能临床决策支持的可视化分析框架。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0004

Eric W Prince, Todd C Hankinson, Carsten Görg

Human involvement remains critical in most instances of clinical decision-making. Recent advances in AI and machine learning opened the door for designing, implementing, and translating interactive AI systems to support clinicians in decision-making. Assessing the impact and implications of such systems on patient care and clinical workflows requires in-depth studies. Conducting evaluation studies of AI-supported interactive systems to support decision-making in clinical settings is challenging and time-consuming. These studies involve carefully collecting, analyzing, and interpreting quantitative and qualitative data to assess the performance of the underlying AI-supported system, its impact on the human decision-making process, and the implications for patient care. We have previously developed a toolkit for designing and implementing clinical AI software so that it can be subjected to an application-based evaluation. Here, we present a visual analytics frame-work for analyzing and interpreting the data collected during such an evaluation process. Our framework supports identifying subgroups of users and patients based on their characteristics, detecting outliers among them, and providing evidence to ensure adherence to regulatory guidelines. We used early-stage clinical AI regulatory guidelines to drive the system design, implemented multiple-factor analysis and hierarchical clustering as exemplary analysis tools, and provided interactive visualizations to explore and interpret results. We demonstrate the effectiveness of our framework through a case study to evaluate a prototype AI-based clinical decision-support system for diagnosing pediatric brain tumors.

在大多数临床决策中，人类的参与仍然至关重要。人工智能和机器学习的最新进展为设计、实施和翻译交互式人工智能系统打开了大门，以支持临床医生做出决策。评估这些系统对患者护理和临床工作流程的影响和影响需要深入研究。对人工智能支持的交互式系统进行评估研究以支持临床环境中的决策是具有挑战性和耗时的。这些研究涉及仔细收集、分析和解释定量和定性数据，以评估潜在的人工智能支持系统的性能、其对人类决策过程的影响以及对患者护理的影响。我们之前已经开发了一个用于设计和实施临床人工智能软件的工具包，以便它可以接受基于应用程序的评估。在这里，我们提出了一个可视化的分析框架，用于分析和解释在这样的评估过程中收集的数据。我们的框架支持根据用户和患者的特征识别亚组，检测其中的异常值，并提供证据以确保遵守监管指南。我们使用早期临床人工智能监管指南来驱动系统设计，实施多因素分析和分层聚类作为示例分析工具，并提供交互式可视化来探索和解释结果。我们通过一个案例研究来评估一个基于人工智能的儿童脑肿瘤诊断临床决策支持系统的原型，从而证明了我们的框架的有效性。

{"title":"A Visual Analytics Framework for Assessing Interactive AI for Clinical Decision Support.","authors":"Eric W Prince, Todd C Hankinson, Carsten Görg","doi":"10.1142/9789819807024_0004","DOIUrl":"10.1142/9789819807024_0004","url":null,"abstract":"Human involvement remains critical in most instances of clinical decision-making. Recent advances in AI and machine learning opened the door for designing, implementing, and translating interactive AI systems to support clinicians in decision-making. Assessing the impact and implications of such systems on patient care and clinical workflows requires in-depth studies. Conducting evaluation studies of AI-supported interactive systems to support decision-making in clinical settings is challenging and time-consuming. These studies involve carefully collecting, analyzing, and interpreting quantitative and qualitative data to assess the performance of the underlying AI-supported system, its impact on the human decision-making process, and the implications for patient care. We have previously developed a toolkit for designing and implementing clinical AI software so that it can be subjected to an application-based evaluation. Here, we present a visual analytics frame-work for analyzing and interpreting the data collected during such an evaluation process. Our framework supports identifying subgroups of users and patients based on their characteristics, detecting outliers among them, and providing evidence to ensure adherence to regulatory guidelines. We used early-stage clinical AI regulatory guidelines to drive the system design, implemented multiple-factor analysis and hierarchical clustering as exemplary analysis tools, and provided interactive visualizations to explore and interpret results. We demonstrate the effectiveness of our framework through a case study to evaluate a prototype AI-based clinical decision-support system for diagnosing pediatric brain tumors.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"40-53"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12452040/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spatial Clustering for Carolina Breast Cancer Study. 卡罗莱纳州乳腺癌研究的空间聚类。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0025

Hongqian Niu, Melissa Troester, Didong Li

In the Carolina Breast Cancer Study (CBCS), clustering census tracts based on spatial location, demographic variables, and socioeconomic status is crucial for understanding how these factors influence health outcomes and cancer risk. This task, known as spatial clustering, involves identifying clusters of similar locations by considering both geographic and characteristic patterns. While standard clustering methods such as K-means, spectral clustering, and hierarchical clustering are well-studied, spatial clustering is less explored due to the inherent differences between spatial domains and their corresponding covariates. In this paper, we introduce a spatial clustering algorithm called Gaussian Process Spatial Clustering (GPSC). GPSC leverages the flexibility of Gaussian Processes to cluster unobserved functions between different domains, extending traditional clustering techniques to effectively handle geospatial data. We provide theoretical guarantees for GPSC's performance and demonstrate its capability to recover true clusters through several empirical studies. Specifically, we identify clusters of census tracts in North Carolina based on socioeconomic and environmental indicators associated with health and cancer risk.

在卡罗莱纳乳腺癌研究（CBCS）中，基于空间位置、人口变量和社会经济地位的人口普查区聚类对于理解这些因素如何影响健康结果和癌症风险至关重要。这个任务被称为空间聚类，包括通过考虑地理和特征模式来识别相似位置的集群。虽然标准聚类方法，如K-means、光谱聚类和分层聚类已经得到了很好的研究，但由于空间域及其相应协变量之间的内在差异，对空间聚类的探索较少。本文介绍了一种空间聚类算法高斯过程空间聚类（GPSC）。GPSC利用高斯过程的灵活性在不同域之间对未观察到的函数进行聚类，扩展了传统的聚类技术来有效地处理地理空间数据。我们为GPSC的性能提供了理论保证，并通过几个实证研究证明了其恢复真实集群的能力。具体而言，我们根据与健康和癌症风险相关的社会经济和环境指标确定了北卡罗来纳州人口普查区的集群。

{"title":"Spatial Clustering for Carolina Breast Cancer Study.","authors":"Hongqian Niu, Melissa Troester, Didong Li","doi":"10.1142/9789819807024_0025","DOIUrl":"10.1142/9789819807024_0025","url":null,"abstract":"In the Carolina Breast Cancer Study (CBCS), clustering census tracts based on spatial location, demographic variables, and socioeconomic status is crucial for understanding how these factors influence health outcomes and cancer risk. This task, known as spatial clustering, involves identifying clusters of similar locations by considering both geographic and characteristic patterns. While standard clustering methods such as K-means, spectral clustering, and hierarchical clustering are well-studied, spatial clustering is less explored due to the inherent differences between spatial domains and their corresponding covariates. In this paper, we introduce a spatial clustering algorithm called Gaussian Process Spatial Clustering (GPSC). GPSC leverages the flexibility of Gaussian Processes to cluster unobserved functions between different domains, extending traditional clustering techniques to effectively handle geospatial data. We provide theoretical guarantees for GPSC's performance and demonstrate its capability to recover true clusters through several empirical studies. Specifically, we identify clusters of census tracts in North Carolina based on socioeconomic and environmental indicators associated with health and cancer risk.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"346-359"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12764386/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Pathway-Level Information ExtractoR (PLIER) framework to gain mechanistic insights into obesity in Down syndrome. 途径水平信息提取器（PLIER）框架获得唐氏综合征肥胖的机制见解。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0030

Sutanu Nandi, Yuehua Zhu, Lucas A Gillenwater, Marc Subirana-Granés, Haoyu Zhang, Negar Janani, Casey Greene, Milton Pividori, Maria Chikina, James C Costello

Down syndrome (DS), caused by the triplication of chromosome 21 (T21), is a prevalent genetic disorder with a higher incidence of obesity. Traditional approaches have struggled to differentiate T21-specific molecular dysregulation from general obesity-related processes. This study introduces the omni-PLIER framework, combining the Pathway-Level Information ExtractoR (PLIER) with the omnigenic model, to uncover molecular mechanisms underlying obesity in DS. The PLIER framework aligns gene expression data with biological pathways, facilitating the identification of relevant molecular patterns. Using RNA sequencing data from the Human Trisome Project, omni-PLIER identified latent variables (LVs) significantly associated with both T21 and body mass index (BMI). Elastic net regression and causal mediation analysis revealed LVs mediating the effect of karyotype on BMI. Notably, LVs involving glutathione peroxidase-1 (GPX1) and MCL1 apoptosis regulator, BCL2 family members emerged as crucial mediators. These findings provide insights into the molecular interplay between DS and obesity. The omni-PLIER model offers a robust methodological advancement for dissecting complex genetic disorders, with implications for understanding obesity-related processes in both DS and the general population.

唐氏综合症（DS）是由21号染色体三倍（T21）引起的，是一种普遍存在的遗传性疾病，肥胖发病率较高。传统方法很难区分t21特异性分子失调与一般肥胖相关过程。本研究引入omni-PLIER框架，结合通路水平信息提取器（pathway level Information ExtractoR， PLIER）和omnigenic模型，揭示DS肥胖的分子机制。PLIER框架将基因表达数据与生物学途径相结合，促进了相关分子模式的识别。利用人类三体计划的RNA测序数据，omni-PLIER鉴定出与T21和体重指数（BMI）显著相关的潜在变量（lv）。弹性网回归和因果中介分析表明LVs介导核型对BMI的影响。值得注意的是，涉及谷胱甘肽过氧化物酶-1 （GPX1）和MCL1细胞凋亡调节因子、BCL2家族成员的lv成为关键的介质。这些发现为DS和肥胖之间的分子相互作用提供了见解。omni-PLIER模型为解剖复杂的遗传疾病提供了强有力的方法进步，对理解退行性痴呆和普通人群中肥胖相关过程具有重要意义。

{"title":"A Pathway-Level Information ExtractoR (PLIER) framework to gain mechanistic insights into obesity in Down syndrome.","authors":"Sutanu Nandi, Yuehua Zhu, Lucas A Gillenwater, Marc Subirana-Granés, Haoyu Zhang, Negar Janani, Casey Greene, Milton Pividori, Maria Chikina, James C Costello","doi":"10.1142/9789819807024_0030","DOIUrl":"10.1142/9789819807024_0030","url":null,"abstract":"Down syndrome (DS), caused by the triplication of chromosome 21 (T21), is a prevalent genetic disorder with a higher incidence of obesity. Traditional approaches have struggled to differentiate T21-specific molecular dysregulation from general obesity-related processes. This study introduces the omni-PLIER framework, combining the Pathway-Level Information ExtractoR (PLIER) with the omnigenic model, to uncover molecular mechanisms underlying obesity in DS. The PLIER framework aligns gene expression data with biological pathways, facilitating the identification of relevant molecular patterns. Using RNA sequencing data from the Human Trisome Project, omni-PLIER identified latent variables (LVs) significantly associated with both T21 and body mass index (BMI). Elastic net regression and causal mediation analysis revealed LVs mediating the effect of karyotype on BMI. Notably, LVs involving glutathione peroxidase-1 (GPX1) and MCL1 apoptosis regulator, BCL2 family members emerged as crucial mediators. These findings provide insights into the molecular interplay between DS and obesity. The omni-PLIER model offers a robust methodological advancement for dissecting complex genetic disorders, with implications for understanding obesity-related processes in both DS and the general population.","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"412-425"},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142819311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Indigenous Data Sovereignty, Circular Systems, and Solarpunk Solutions for a Sustainable Future. 土著数据主权、循环系统和可持续未来的太阳能朋克解决方案。

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Pub Date : 2025-01-01 DOI: 10.1142/9789819807024_0054

Ka'ulawena Alipio, Javier García-Colón, Nima Boscarino, Keolu Fox

Recent advancements in Artificial Intelligence (AI) and data center infrastructure have brought the global cloud computing market to the forefront of conversations about sustainability and energy use. Current policy and infrastructure for data centers prioritize economic gain and resource extraction, inherently unsustainable models which generate massive amounts of energy and heat waste. Our team proposes the formation of policy around earth-friendly computation practices rooted in Indigenous models of circular systems of sustainability. By looking to alternative systems of sustainability rooted in Indigenous values of aloha 'āina, or love for the land, we find examples of traditional ecological knowledge (TEK) that can be imagined alongside Solarpunk visions for a more sustainable future. One in which technology works with the environment, reusing electronic waste (e-waste) and improving data life cycles.

人工智能（AI）和数据中心基础设施的最新进展使全球云计算市场成为可持续发展和能源使用对话的前沿。目前数据中心的政策和基础设施优先考虑经济收益和资源开采，这是固有的不可持续的模式，会产生大量的能源和热量浪费。我们的团队提出了基于可持续循环系统的本土模型，围绕地球友好型计算实践形成政策。通过寻找植根于aloha 'āina的土著价值观或对土地的热爱的可持续性替代系统，我们找到了传统生态知识（TEK）的例子，这些例子可以与Solarpunk对更可持续未来的愿景相结合。技术与环境相结合，重新利用电子废物并改善数据生命周期。

引用次数: 0