首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
Large language model uncertainty proxies: discrimination and calibration for medical diagnosis and treatment. 大语言模型不确定性代理:医学诊断和治疗的鉴别与校准。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae254
Thomas Savage, John Wang, Robert Gallo, Abdessalem Boukil, Vishwesh Patel, Seyed Amir Ahmad Safavi-Naini, Ali Soroush, Jonathan H Chen

Introduction: The inability of large language models (LLMs) to communicate uncertainty is a significant barrier to their use in medicine. Before LLMs can be integrated into patient care, the field must assess methods to estimate uncertainty in ways that are useful to physician-users.

Objective: Evaluate the ability for uncertainty proxies to quantify LLM confidence when performing diagnosis and treatment selection tasks by assessing the properties of discrimination and calibration.

Methods: We examined confidence elicitation (CE), token-level probability (TLP), and sample consistency (SC) proxies across GPT3.5, GPT4, Llama2, and Llama3. Uncertainty proxies were evaluated against 3 datasets of open-ended patient scenarios.

Results: SC discrimination outperformed TLP and CE methods. SC by sentence embedding achieved the highest discriminative performance (ROC AUC 0.68-0.79), yet with poor calibration. SC by GPT annotation achieved the second-best discrimination (ROC AUC 0.66-0.74) with accurate calibration. Verbalized confidence (CE) was found to consistently overestimate model confidence.

Discussion and conclusions: SC is the most effective method for estimating LLM uncertainty of the proxies evaluated. SC by sentence embedding can effectively estimate uncertainty if the user has a set of reference cases with which to re-calibrate their results, while SC by GPT annotation is the more effective method if the user does not have reference cases and requires accurate raw calibration. Our results confirm LLMs are consistently over-confident when verbalizing their confidence (CE).

简介大型语言模型(LLMs)无法传达不确定性是其应用于医学的一大障碍。在将 LLM 纳入病人护理之前,该领域必须评估以对医生用户有用的方式估计不确定性的方法:目标:通过评估辨别和校准特性,评估不确定性代理在执行诊断和治疗选择任务时量化 LLM 置信度的能力:我们检查了 GPT3.5、GPT4、Llama2 和 Llama3 中的置信度激发 (CE)、标记级概率 (TLP) 和样本一致性 (SC) 代理。根据 3 个开放式患者情景数据集对不确定性代理进行了评估:SC 辨识能力优于 TLP 和 CE 方法。通过句子嵌入的 SC 分辨性能最高(ROC AUC 0.68-0.79),但校准效果不佳。通过 GPT 注释的 SC 分辨性能次之(ROC AUC 0.66-0.74),校准准确。讨论与结论:SC 是估算所评估代用指标的 LLM 不确定性的最有效方法。如果用户有一组可用于重新校准其结果的参考案例,那么通过句子嵌入进行 SC 可以有效地估计不确定性,而如果用户没有参考案例并需要精确的原始校准,那么通过 GPT 注释进行 SC 则是更有效的方法。我们的结果证实,LLMs 在口头表达其置信度 (CE) 时总是过于自信。
{"title":"Large language model uncertainty proxies: discrimination and calibration for medical diagnosis and treatment.","authors":"Thomas Savage, John Wang, Robert Gallo, Abdessalem Boukil, Vishwesh Patel, Seyed Amir Ahmad Safavi-Naini, Ali Soroush, Jonathan H Chen","doi":"10.1093/jamia/ocae254","DOIUrl":"10.1093/jamia/ocae254","url":null,"abstract":"<p><strong>Introduction: </strong>The inability of large language models (LLMs) to communicate uncertainty is a significant barrier to their use in medicine. Before LLMs can be integrated into patient care, the field must assess methods to estimate uncertainty in ways that are useful to physician-users.</p><p><strong>Objective: </strong>Evaluate the ability for uncertainty proxies to quantify LLM confidence when performing diagnosis and treatment selection tasks by assessing the properties of discrimination and calibration.</p><p><strong>Methods: </strong>We examined confidence elicitation (CE), token-level probability (TLP), and sample consistency (SC) proxies across GPT3.5, GPT4, Llama2, and Llama3. Uncertainty proxies were evaluated against 3 datasets of open-ended patient scenarios.</p><p><strong>Results: </strong>SC discrimination outperformed TLP and CE methods. SC by sentence embedding achieved the highest discriminative performance (ROC AUC 0.68-0.79), yet with poor calibration. SC by GPT annotation achieved the second-best discrimination (ROC AUC 0.66-0.74) with accurate calibration. Verbalized confidence (CE) was found to consistently overestimate model confidence.</p><p><strong>Discussion and conclusions: </strong>SC is the most effective method for estimating LLM uncertainty of the proxies evaluated. SC by sentence embedding can effectively estimate uncertainty if the user has a set of reference cases with which to re-calibrate their results, while SC by GPT annotation is the more effective method if the user does not have reference cases and requires accurate raw calibration. Our results confirm LLMs are consistently over-confident when verbalizing their confidence (CE).</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"139-149"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of a digital quality measure for cancer diagnosis in Epic Cosmos. 在 Epic Cosmos 中应用癌症诊断数字质量标准。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae253
Andrew J Zimolzak, Sundas P Khan, Hardeep Singh, Jessica A Davila

Objectives: Missed and delayed cancer diagnoses are common, harmful, and often preventable. We previously validated a digital quality measure (dQM) of emergency presentation (EP) of lung cancer in 2 US health systems. This study aimed to apply the dQM to a new national electronic health record (EHR) database and examine demographic associations.

Materials and methods: We applied the dQM (emergency encounter followed by new lung cancer diagnosis within 30 days) to Epic Cosmos, a deidentified database covering 184 million US patients. We examined dQM associations with sociodemographic factors.

Results: The overall EP rate was 19.6%. EP rate was higher in Black vs White patients (24% vs 19%, P < .001) and patients with younger age, higher social vulnerability, lower-income ZIP code, and self-reported transport difficulties.

Discussion: We successfully applied a dQM based on cancer EP to the largest US EHR database.

Conclusion: This dQM could be a marker for sociodemographic vulnerabilities in cancer diagnosis.

目标:癌症漏诊和延误诊断是常见的、有害的,而且往往是可以预防的。我们曾在美国的两个医疗系统中验证了肺癌急诊(EP)的数字质量测量(dQM)。本研究旨在将 dQM 应用于一个新的全国电子健康记录(EHR)数据库,并研究人口统计学关联:我们将 dQM(急诊后 30 天内新诊断出肺癌)应用于 Epic Cosmos,这是一个涵盖 1.84 亿美国患者的去身份化数据库。我们研究了 dQM 与社会人口因素的关系:结果:总体 EP 率为 19.6%。黑人患者的 EP 率高于白人患者(24% 对 19%,P 讨论):我们在美国最大的电子病历数据库中成功应用了基于癌症 EP 的 dQM:结论:该 dQM 可以作为癌症诊断中社会人口脆弱性的标记。
{"title":"Application of a digital quality measure for cancer diagnosis in Epic Cosmos.","authors":"Andrew J Zimolzak, Sundas P Khan, Hardeep Singh, Jessica A Davila","doi":"10.1093/jamia/ocae253","DOIUrl":"10.1093/jamia/ocae253","url":null,"abstract":"<p><strong>Objectives: </strong>Missed and delayed cancer diagnoses are common, harmful, and often preventable. We previously validated a digital quality measure (dQM) of emergency presentation (EP) of lung cancer in 2 US health systems. This study aimed to apply the dQM to a new national electronic health record (EHR) database and examine demographic associations.</p><p><strong>Materials and methods: </strong>We applied the dQM (emergency encounter followed by new lung cancer diagnosis within 30 days) to Epic Cosmos, a deidentified database covering 184 million US patients. We examined dQM associations with sociodemographic factors.</p><p><strong>Results: </strong>The overall EP rate was 19.6%. EP rate was higher in Black vs White patients (24% vs 19%, P < .001) and patients with younger age, higher social vulnerability, lower-income ZIP code, and self-reported transport difficulties.</p><p><strong>Discussion: </strong>We successfully applied a dQM based on cancer EP to the largest US EHR database.</p><p><strong>Conclusion: </strong>This dQM could be a marker for sociodemographic vulnerabilities in cancer diagnosis.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"227-229"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning health system linchpins: information exchange and a common data model. 学习卫生系统的关键:信息交换和通用数据模型。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae277
Aaron S Eisman, Elizabeth S Chen, Wen-Chih Wu, Karen M Crowley, Dilum P Aluthge, Katherine Brown, Indra Neil Sarkar

Objective: To demonstrate the potential for a centrally managed health information exchange standardized to a common data model (HIE-CDM) to facilitate semantic data flow needed to support a learning health system (LHS).

Materials and methods: The Rhode Island Quality Institute operates the Rhode Island (RI) statewide HIE, which aggregates RI health data for more than half of the state's population from 47 data partners. We standardized HIE data to the Observational Medical Outcomes Partnership (OMOP) CDM. Atherosclerotic cardiovascular disease (ASCVD) risk and primary prevention practices were selected to demonstrate LHS semantic data flow from 2013 to 2023.

Results: We calculated longitudinal 10-year ASCVD risk on 62,999 individuals. Nearly two-thirds had ASCVD risk factors from more than one data partner. This enabled granular tracking of individual ASCVD risk, primary prevention (ie, statin therapy), and incident disease. The population was on statins for fewer than half of the guideline-recommended days. We also found that individuals receiving care at Federally Qualified Health Centers were more likely to have unfavorable ASCVD risk profiles and more likely to be on statins. CDM transformation reduced data heterogeneity through a unified health record that adheres to defined terminologies per OMOP domain.

Discussion: We demonstrated the potential for an HIE-CDM to enable observational population health research. We also showed how to leverage existing health information technology infrastructure and health data best practices to break down LHS barriers.

Conclusion: HIE-CDM facilitates knowledge curation and health system intervention development at the individual, health system, and population levels.

目的展示以通用数据模型(HIE-CDM)为标准的集中管理式医疗信息交换的潜力,以促进支持学习型医疗系统(LHS)所需的语义数据流:罗德岛质量研究所运营着罗德岛(RI)全州范围的 HIE,该 HIE 从 47 个数据合作伙伴处汇集了罗德岛半数以上人口的健康数据。我们将 HIE 数据标准化为观察性医疗结果合作组织 (OMOP) CDM。我们选择了动脉粥样硬化性心血管疾病(ASCVD)风险和一级预防实践,以展示从 2013 年到 2023 年的 LHS 语义数据流:我们计算了 62999 人的 10 年纵向 ASCVD 风险。近三分之二的人的 ASCVD 风险因素来自一个以上的数据合作伙伴。这样就可以对个人的 ASCVD 风险、一级预防(即他汀类药物治疗)和突发疾病进行细粒度跟踪。该人群使用他汀类药物的天数不到指南推荐天数的一半。我们还发现,在联邦合格医疗中心接受治疗的人更有可能具有不利的 ASCVD 风险特征,也更有可能服用他汀类药物。CDM 转换通过统一的健康记录减少了数据的异质性,该健康记录遵循每个 OMOP 领域的定义术语:我们展示了 HIE-CDM 在开展人口健康观察研究方面的潜力。我们还展示了如何利用现有的健康信息技术基础设施和健康数据最佳实践来打破 LHS 的障碍:HIE-CDM有助于在个人、卫生系统和人口层面进行知识整理和卫生系统干预开发。
{"title":"Learning health system linchpins: information exchange and a common data model.","authors":"Aaron S Eisman, Elizabeth S Chen, Wen-Chih Wu, Karen M Crowley, Dilum P Aluthge, Katherine Brown, Indra Neil Sarkar","doi":"10.1093/jamia/ocae277","DOIUrl":"10.1093/jamia/ocae277","url":null,"abstract":"<p><strong>Objective: </strong>To demonstrate the potential for a centrally managed health information exchange standardized to a common data model (HIE-CDM) to facilitate semantic data flow needed to support a learning health system (LHS).</p><p><strong>Materials and methods: </strong>The Rhode Island Quality Institute operates the Rhode Island (RI) statewide HIE, which aggregates RI health data for more than half of the state's population from 47 data partners. We standardized HIE data to the Observational Medical Outcomes Partnership (OMOP) CDM. Atherosclerotic cardiovascular disease (ASCVD) risk and primary prevention practices were selected to demonstrate LHS semantic data flow from 2013 to 2023.</p><p><strong>Results: </strong>We calculated longitudinal 10-year ASCVD risk on 62,999 individuals. Nearly two-thirds had ASCVD risk factors from more than one data partner. This enabled granular tracking of individual ASCVD risk, primary prevention (ie, statin therapy), and incident disease. The population was on statins for fewer than half of the guideline-recommended days. We also found that individuals receiving care at Federally Qualified Health Centers were more likely to have unfavorable ASCVD risk profiles and more likely to be on statins. CDM transformation reduced data heterogeneity through a unified health record that adheres to defined terminologies per OMOP domain.</p><p><strong>Discussion: </strong>We demonstrated the potential for an HIE-CDM to enable observational population health research. We also showed how to leverage existing health information technology infrastructure and health data best practices to break down LHS barriers.</p><p><strong>Conclusion: </strong>HIE-CDM facilitates knowledge curation and health system intervention development at the individual, health system, and population levels.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"9-19"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648737/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitatively assessing the impact of the quality of SNOMED CT subtype hierarchy on cohort queries. 定量评估 SNOMED CT 亚型分级质量对队列查询的影响。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae272
Xubing Hao, Xiaojin Li, Yan Huang, Jay Shi, Rashmie Abeysinghe, Cui Tao, Kirk Roberts, Guo-Qiang Zhang, Licong Cui

Objective: SNOMED CT provides a standardized terminology for clinical concepts, allowing cohort queries over heterogeneous clinical data including Electronic Health Records (EHRs). While it is intuitive that missing and inaccurate subtype (or is-a) relations in SNOMED CT reduce the recall and precision of cohort queries, the extent of these impacts has not been formally assessed. This study fills this gap by developing quantitative metrics to measure these impacts and performing statistical analysis on their significance.

Material and methods: We used the Optum de-identified COVID-19 Electronic Health Record dataset. We defined micro-averaged and macro-averaged recall and precision metrics to assess the impact of missing and inaccurate is-a relations on cohort queries. Both practical and simulated analyses were performed. Practical analyses involved 407 missing and 48 inaccurate is-a relations confirmed by domain experts, with statistical testing using Wilcoxon signed-rank tests. Simulated analyses used two random sets of 400 is-a relations to simulate missing and inaccurate is-a relations.

Results: Wilcoxon signed-rank tests from both practical and simulated analyses (P-values < .001) showed that missing is-a relations significantly reduced the micro- and macro-averaged recall, and inaccurate is-a relations significantly reduced the micro- and macro-averaged precision.

Discussion: The introduced impact metrics can assist SNOMED CT maintainers in prioritizing critical hierarchical defects for quality enhancement. These metrics are generally applicable for assessing the quality impact of a terminology's subtype hierarchy on its cohort query applications.

Conclusion: Our results indicate a significant impact of missing and inaccurate is-a relations in SNOMED CT on the recall and precision of cohort queries. Our work highlights the importance of high-quality terminology hierarchy for cohort queries over EHR data and provides valuable insights for prioritizing quality improvements of SNOMED CT's hierarchy.

目的SNOMED CT 为临床概念提供了标准化术语,允许对包括电子健康记录 (EHR) 在内的异构临床数据进行队列查询。SNOMED CT 中缺失和不准确的子类型(或 is-a)关系会降低队列查询的召回率和精确度,这一点很直观,但这些影响的程度尚未得到正式评估。本研究通过制定量化指标来衡量这些影响并对其重要性进行统计分析,填补了这一空白:我们使用了 Optum 去标识化 COVID-19 电子健康记录数据集。我们定义了微观平均和宏观平均召回率和精确度指标,以评估缺失和不准确的 is-a 关系对队列查询的影响。我们进行了实际分析和模拟分析。实际分析包括经领域专家确认的 407 个缺失的 is-a 关系和 48 个不准确的 is-a 关系,并使用 Wilcoxon 符号秩检验进行统计检验。模拟分析使用了两组随机的 400 个 is-a 关系来模拟缺失和不准确的 is-a 关系:实际分析和模拟分析的 Wilcoxon 符号秩检验(P 值 < .001)表明,缺失的 is-a 关系显著降低了微观和宏观平均召回率,而不准确的 is-a 关系显著降低了微观和宏观平均精确率:所介绍的影响度量标准可以帮助 SNOMED CT 维护者优先处理关键的分层缺陷,以提高质量。这些指标通常适用于评估术语的子类型层次结构对其同类查询应用的质量影响:我们的研究结果表明,SNOMED CT 中缺失和不准确的 is-a 关系对队列查询的召回率和精确度有很大影响。我们的工作凸显了高质量术语层次结构对电子病历数据队列查询的重要性,并为优先提高 SNOMED CT 层次结构的质量提供了有价值的见解。
{"title":"Quantitatively assessing the impact of the quality of SNOMED CT subtype hierarchy on cohort queries.","authors":"Xubing Hao, Xiaojin Li, Yan Huang, Jay Shi, Rashmie Abeysinghe, Cui Tao, Kirk Roberts, Guo-Qiang Zhang, Licong Cui","doi":"10.1093/jamia/ocae272","DOIUrl":"10.1093/jamia/ocae272","url":null,"abstract":"<p><strong>Objective: </strong>SNOMED CT provides a standardized terminology for clinical concepts, allowing cohort queries over heterogeneous clinical data including Electronic Health Records (EHRs). While it is intuitive that missing and inaccurate subtype (or is-a) relations in SNOMED CT reduce the recall and precision of cohort queries, the extent of these impacts has not been formally assessed. This study fills this gap by developing quantitative metrics to measure these impacts and performing statistical analysis on their significance.</p><p><strong>Material and methods: </strong>We used the Optum de-identified COVID-19 Electronic Health Record dataset. We defined micro-averaged and macro-averaged recall and precision metrics to assess the impact of missing and inaccurate is-a relations on cohort queries. Both practical and simulated analyses were performed. Practical analyses involved 407 missing and 48 inaccurate is-a relations confirmed by domain experts, with statistical testing using Wilcoxon signed-rank tests. Simulated analyses used two random sets of 400 is-a relations to simulate missing and inaccurate is-a relations.</p><p><strong>Results: </strong>Wilcoxon signed-rank tests from both practical and simulated analyses (P-values < .001) showed that missing is-a relations significantly reduced the micro- and macro-averaged recall, and inaccurate is-a relations significantly reduced the micro- and macro-averaged precision.</p><p><strong>Discussion: </strong>The introduced impact metrics can assist SNOMED CT maintainers in prioritizing critical hierarchical defects for quality enhancement. These metrics are generally applicable for assessing the quality impact of a terminology's subtype hierarchy on its cohort query applications.</p><p><strong>Conclusion: </strong>Our results indicate a significant impact of missing and inaccurate is-a relations in SNOMED CT on the recall and precision of cohort queries. Our work highlights the importance of high-quality terminology hierarchy for cohort queries over EHR data and provides valuable insights for prioritizing quality improvements of SNOMED CT's hierarchy.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"89-96"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648736/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing a learning health system through biomedical and health informatics. 通过生物医学和卫生信息学推进学习型卫生系统。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae307
Suzanne Bakken
{"title":"Advancing a learning health system through biomedical and health informatics.","authors":"Suzanne Bakken","doi":"10.1093/jamia/ocae307","DOIUrl":"10.1093/jamia/ocae307","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":"32 1","pages":"1-2"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648707/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142839965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation and impact of an electronic patient reported outcomes system in a phase II multi-site adaptive platform clinical trial for early-stage breast cancer. 在一项针对早期乳腺癌的 II 期多站点自适应平台临床试验中实施电子患者报告结果系统及其影响。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae190
Anna Northrop, Anika Christofferson, Saumya Umashankar, Michelle Melisko, Paolo Castillo, Thelma Brown, Diane Heditsian, Susie Brain, Carol Simmons, Tina Hieken, Kathryn J Ruddy, Candace Mainor, Anosheh Afghahi, Sarah Tevis, Anne Blaes, Irene Kang, Adam Asare, Laura Esserman, Dawn L Hershman, Amrita Basu

Objectives: We describe the development and implementation of a system for monitoring patient-reported adverse events and quality of life using electronic Patient Reported Outcome (ePRO) instruments in the I-SPY2 Trial, a phase II clinical trial for locally advanced breast cancer. We describe the administration of technological, workflow, and behavior change interventions and their associated impact on questionnaire completion.

Materials and methods: Using the OpenClinica electronic data capture system, we developed rules-based logic to build automated ePRO surveys, customized to the I-SPY2 treatment schedule. We piloted ePROs at the University of California, San Francisco (UCSF) to optimize workflow in the context of trial treatment scenarios and staggered rollout of the ePRO system to 26 sites to ensure effective implementation of the technology.

Results: Increasing ePRO completion requires workflow solutions and research staff engagement. Over two years, we increased baseline survey completion from 25% to 80%. The majority of patients completed between 30% and 75% of the questionnaires they received, with no statistically significant variation in survey completion by age, race or ethnicity. Patients who completed the screening timepoint questionnaire were significantly more likely to complete more of the surveys they received at later timepoints (mean completion of 74.1% vs 35.5%, P < .0001). Baseline PROMIS social functioning and grade 2 or more PRO-CTCAE interference of Abdominal Pain, Decreased Appetite, Dizziness and Shortness of Breath was associated with lower survey completion rates.

Discussion and conclusion: By implementing ePROs, we have the potential to increase efficiency and accuracy of patient-reported clinical trial data collection, while improving quality of care, patient safety, and health outcomes. Our method is accessible across demographics and facilitates an ease of data collection and sharing across nationwide sites. We identify predictors of decreased completion that can optimize resource allocation by better targeting efforts such as in-person outreach, staff engagement, a robust technical workflow, and increased monitoring to improve overall completion rates.

Trial registration: https://clinicaltrials.gov/study/NCT01042379.

目的:我们描述了在治疗局部晚期乳腺癌的 II 期临床试验 I-SPY2 试验中使用电子患者报告结果(ePRO)工具监测患者报告的不良事件和生活质量的系统的开发和实施情况。我们介绍了技术、工作流程和行为改变干预措施的实施情况及其对问卷完成情况的相关影响:利用 OpenClinica 电子数据采集系统,我们开发了基于规则的逻辑来建立自动 ePRO 调查,并根据 I-SPY2 治疗计划进行了定制。我们在加州大学旧金山分校(UCSF)试行了 ePRO,以优化试验治疗方案中的工作流程,并将 ePRO 系统交错推广到 26 个研究机构,以确保该技术的有效实施:提高 ePRO 的完成率需要工作流程解决方案和研究人员的参与。两年来,我们将基线调查的完成率从 25% 提高到了 80%。大多数患者的问卷完成率在 30% 到 75% 之间,不同年龄、种族或民族的问卷完成率没有明显的统计学差异。完成筛查时间点调查问卷的患者更有可能在以后的时间点完成更多的调查问卷(平均完成率为 74.1% vs 35.5%,P 讨论和结论:通过实施 ePRO,我们有可能提高患者报告的临床试验数据收集的效率和准确性,同时改善护理质量、患者安全和健康结果。我们的方法适用于各种人口统计学特征,便于在全国范围内收集和共享数据。我们确定了完成率下降的预测因素,这些因素可以优化资源分配,更好地有针对性地开展工作,如面对面宣传、员工参与、强大的技术工作流程以及加强监测,从而提高总体完成率。试验注册:https://clinicaltrials.gov/study/NCT01042379。
{"title":"Implementation and impact of an electronic patient reported outcomes system in a phase II multi-site adaptive platform clinical trial for early-stage breast cancer.","authors":"Anna Northrop, Anika Christofferson, Saumya Umashankar, Michelle Melisko, Paolo Castillo, Thelma Brown, Diane Heditsian, Susie Brain, Carol Simmons, Tina Hieken, Kathryn J Ruddy, Candace Mainor, Anosheh Afghahi, Sarah Tevis, Anne Blaes, Irene Kang, Adam Asare, Laura Esserman, Dawn L Hershman, Amrita Basu","doi":"10.1093/jamia/ocae190","DOIUrl":"10.1093/jamia/ocae190","url":null,"abstract":"<p><strong>Objectives: </strong>We describe the development and implementation of a system for monitoring patient-reported adverse events and quality of life using electronic Patient Reported Outcome (ePRO) instruments in the I-SPY2 Trial, a phase II clinical trial for locally advanced breast cancer. We describe the administration of technological, workflow, and behavior change interventions and their associated impact on questionnaire completion.</p><p><strong>Materials and methods: </strong>Using the OpenClinica electronic data capture system, we developed rules-based logic to build automated ePRO surveys, customized to the I-SPY2 treatment schedule. We piloted ePROs at the University of California, San Francisco (UCSF) to optimize workflow in the context of trial treatment scenarios and staggered rollout of the ePRO system to 26 sites to ensure effective implementation of the technology.</p><p><strong>Results: </strong>Increasing ePRO completion requires workflow solutions and research staff engagement. Over two years, we increased baseline survey completion from 25% to 80%. The majority of patients completed between 30% and 75% of the questionnaires they received, with no statistically significant variation in survey completion by age, race or ethnicity. Patients who completed the screening timepoint questionnaire were significantly more likely to complete more of the surveys they received at later timepoints (mean completion of 74.1% vs 35.5%, P < .0001). Baseline PROMIS social functioning and grade 2 or more PRO-CTCAE interference of Abdominal Pain, Decreased Appetite, Dizziness and Shortness of Breath was associated with lower survey completion rates.</p><p><strong>Discussion and conclusion: </strong>By implementing ePROs, we have the potential to increase efficiency and accuracy of patient-reported clinical trial data collection, while improving quality of care, patient safety, and health outcomes. Our method is accessible across demographics and facilitates an ease of data collection and sharing across nationwide sites. We identify predictors of decreased completion that can optimize resource allocation by better targeting efforts such as in-person outreach, staff engagement, a robust technical workflow, and increased monitoring to improve overall completion rates.</p><p><strong>Trial registration: </strong>https://clinicaltrials.gov/study/NCT01042379.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"172-180"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142001162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increasing adherence and collecting symptom-specific biometric signals in remote monitoring of heart failure patients: a randomized controlled trial. 提高心衰患者远程监护的依从性并收集症状特异性生物测量信号:随机对照试验。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae221
Sukanya Mohapatra, Mirna Issa, Vedrana Ivezic, Rose Doherty, Stephanie Marks, Esther Lan, Shawn Chen, Keith Rozett, Lauren Cullen, Wren Reynolds, Rose Rocchio, Gregg C Fonarow, Michael K Ong, William F Speier, Corey W Arnold

Objectives: Mobile health (mHealth) regimens can improve health through the continuous monitoring of biometric parameters paired with appropriate interventions. However, adherence to monitoring tends to decay over time. Our randomized controlled trial sought to determine: (1) if a mobile app with gamification and financial incentives significantly increases adherence to mHealth monitoring in a population of heart failure patients; and (2) if activity data correlate with disease-specific symptoms.

Materials and methods: We recruited individuals with heart failure into a prospective 180-day monitoring study with 3 arms. All 3 arms included monitoring with a connected weight scale and an activity tracker. The second arm included an additional mobile app with gamification, and the third arm included the mobile app and a financial incentive awarded based on adherence to mobile monitoring.

Results: We recruited 111 heart failure patients into the study. We found that the arm including the financial incentive led to significantly higher adherence to activity tracker (95% vs 72.2%, P = .01) and weight (87.5% vs 69.4%, P = .002) monitoring compared to the arm that included the monitoring devices alone. Furthermore, we found a significant correlation between daily steps and daily symptom severity.

Discussion and conclusion: Our findings indicate that mobile apps with added engagement features can be useful tools for improving adherence over time and may thus increase the impact of mHealth-driven interventions. Additionally, activity tracker data can provide passive monitoring of disease burden that may be used to predict future events.

目的:移动保健(mHealth)疗法可通过持续监测生物计量参数并配以适当的干预措施来改善健康状况。然而,随着时间的推移,监测的依从性往往会下降。我们的随机对照试验旨在确定:(1) 带有游戏化和经济激励的移动应用程序是否能显著提高心力衰竭患者对移动医疗监测的依从性;(2) 活动数据是否与疾病特异性症状相关:我们招募了心力衰竭患者参加一项为期 180 天的前瞻性监测研究,研究分为 3 个阶段。所有 3 个观察组都包括使用连接的体重秤和活动追踪器进行监测。第二组包括一个额外的游戏化移动应用程序,第三组包括移动应用程序和基于坚持移动监测的经济奖励:我们招募了 111 名心衰患者参与研究。结果:我们招募了 111 名心衰患者参与研究。我们发现,与仅使用监测设备的研究组相比,使用经济奖励的研究组对活动追踪器(95% vs 72.2%,P = .01)和体重(87.5% vs 69.4%,P = .002)监测的依从性明显更高。此外,我们还发现每日步数与每日症状严重程度之间存在明显的相关性:我们的研究结果表明,增加了参与功能的移动应用程序可以成为提高长期依从性的有用工具,从而提高移动健康干预的效果。此外,活动追踪器数据还能提供对疾病负担的被动监测,可用于预测未来事件。
{"title":"Increasing adherence and collecting symptom-specific biometric signals in remote monitoring of heart failure patients: a randomized controlled trial.","authors":"Sukanya Mohapatra, Mirna Issa, Vedrana Ivezic, Rose Doherty, Stephanie Marks, Esther Lan, Shawn Chen, Keith Rozett, Lauren Cullen, Wren Reynolds, Rose Rocchio, Gregg C Fonarow, Michael K Ong, William F Speier, Corey W Arnold","doi":"10.1093/jamia/ocae221","DOIUrl":"10.1093/jamia/ocae221","url":null,"abstract":"<p><strong>Objectives: </strong>Mobile health (mHealth) regimens can improve health through the continuous monitoring of biometric parameters paired with appropriate interventions. However, adherence to monitoring tends to decay over time. Our randomized controlled trial sought to determine: (1) if a mobile app with gamification and financial incentives significantly increases adherence to mHealth monitoring in a population of heart failure patients; and (2) if activity data correlate with disease-specific symptoms.</p><p><strong>Materials and methods: </strong>We recruited individuals with heart failure into a prospective 180-day monitoring study with 3 arms. All 3 arms included monitoring with a connected weight scale and an activity tracker. The second arm included an additional mobile app with gamification, and the third arm included the mobile app and a financial incentive awarded based on adherence to mobile monitoring.</p><p><strong>Results: </strong>We recruited 111 heart failure patients into the study. We found that the arm including the financial incentive led to significantly higher adherence to activity tracker (95% vs 72.2%, P = .01) and weight (87.5% vs 69.4%, P = .002) monitoring compared to the arm that included the monitoring devices alone. Furthermore, we found a significant correlation between daily steps and daily symptom severity.</p><p><strong>Discussion and conclusion: </strong>Our findings indicate that mobile apps with added engagement features can be useful tools for improving adherence over time and may thus increase the impact of mHealth-driven interventions. Additionally, activity tracker data can provide passive monitoring of disease burden that may be used to predict future events.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"181-192"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142037585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-world federated learning in radiology: hurdles to overcome and benefits to gain. 放射学中的真实世界联合学习:需要克服的障碍和获得的益处。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae259
Markus Ralf Bujotzek, Ünal Akünal, Stefan Denner, Peter Neher, Maximilian Zenk, Eric Frodl, Astha Jaiswal, Moon Kim, Nicolai R Krekiehn, Manuel Nickel, Richard Ruppel, Marcus Both, Felix Döllinger, Marcel Opitz, Thorsten Persigehl, Jens Kleesiek, Tobias Penzkofer, Klaus Maier-Hein, Andreas Bucher, Rickmer Braren

Objective: Federated Learning (FL) enables collaborative model training while keeping data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles. To bridge this significant knowledge gap, we propose a comprehensive guide for real-world FL in radiology. Minding efforts to implement real-world FL, there is a lack of comprehensive assessments comparing FL to less complex alternatives in challenging real-world settings, which we address through extensive benchmarking.

Materials and methods: We developed our own FL infrastructure within the German Radiological Cooperative Network (RACOON) and demonstrated its functionality by training FL models on lung pathology segmentation tasks across six university hospitals. Insights gained while establishing our FL initiative and running the extensive benchmark experiments were compiled and categorized into the guide.

Results: The proposed guide outlines essential steps, identified hurdles, and implemented solutions for establishing successful FL initiatives conducting real-world experiments. Our experimental results prove the practical relevance of our guide and show that FL outperforms less complex alternatives in all evaluation scenarios.

Discussion and conclusion: Our findings justify the efforts required to translate FL into real-world applications by demonstrating advantageous performance over alternative approaches. Additionally, they emphasize the importance of strategic organization, robust management of distributed data and infrastructure in real-world settings. With the proposed guide, we are aiming to aid future FL researchers in circumventing pitfalls and accelerating translation of FL into radiological applications.

目标:联合学习(FL)可以在本地保存数据的同时进行协作模型训练。目前,放射学领域的大多数联机学习研究都是在模拟环境中进行的,这是因为有许多障碍阻碍了联机学习在实践中的应用。现有的少数几个真实世界 FL 计划很少介绍为克服这些障碍而采取的具体措施。为了弥补这一巨大的知识差距,我们提出了放射学真实世界 FL 综合指南。在我们努力实施真实世界 FL 的过程中,缺乏将 FL 与具有挑战性的真实世界环境中复杂性较低的替代方案进行比较的全面评估,而我们通过广泛的基准测试解决了这一问题:我们在德国放射学合作网络(RACOON)内开发了自己的 FL 基础设施,并在六所大学医院的肺部病理分割任务中对 FL 模型进行了训练,从而展示了其功能。在建立 FL 计划和进行大量基准实验的过程中,我们获得了一些启发,并将其汇编和归类到指南中:结果:所提出的指南概述了在真实世界实验中建立成功的 FL 计划的基本步骤、确定的障碍和实施的解决方案。我们的实验结果证明了指南的实用性,并表明在所有评估场景中,FL 都优于复杂度较低的替代方案:我们的研究结果证明,将 FL 转化为实际应用所需的努力是正确的,因为它比其他方法更具优势。此外,这些研究结果还强调了在现实世界中对分布式数据和基础设施进行战略性组织和稳健管理的重要性。我们提出的指南旨在帮助未来的 FL 研究人员规避陷阱,加快 FL 在放射学应用中的转化。
{"title":"Real-world federated learning in radiology: hurdles to overcome and benefits to gain.","authors":"Markus Ralf Bujotzek, Ünal Akünal, Stefan Denner, Peter Neher, Maximilian Zenk, Eric Frodl, Astha Jaiswal, Moon Kim, Nicolai R Krekiehn, Manuel Nickel, Richard Ruppel, Marcus Both, Felix Döllinger, Marcel Opitz, Thorsten Persigehl, Jens Kleesiek, Tobias Penzkofer, Klaus Maier-Hein, Andreas Bucher, Rickmer Braren","doi":"10.1093/jamia/ocae259","DOIUrl":"10.1093/jamia/ocae259","url":null,"abstract":"<p><strong>Objective: </strong>Federated Learning (FL) enables collaborative model training while keeping data locally. Currently, most FL studies in radiology are conducted in simulated environments due to numerous hurdles impeding its translation into practice. The few existing real-world FL initiatives rarely communicate specific measures taken to overcome these hurdles. To bridge this significant knowledge gap, we propose a comprehensive guide for real-world FL in radiology. Minding efforts to implement real-world FL, there is a lack of comprehensive assessments comparing FL to less complex alternatives in challenging real-world settings, which we address through extensive benchmarking.</p><p><strong>Materials and methods: </strong>We developed our own FL infrastructure within the German Radiological Cooperative Network (RACOON) and demonstrated its functionality by training FL models on lung pathology segmentation tasks across six university hospitals. Insights gained while establishing our FL initiative and running the extensive benchmark experiments were compiled and categorized into the guide.</p><p><strong>Results: </strong>The proposed guide outlines essential steps, identified hurdles, and implemented solutions for establishing successful FL initiatives conducting real-world experiments. Our experimental results prove the practical relevance of our guide and show that FL outperforms less complex alternatives in all evaluation scenarios.</p><p><strong>Discussion and conclusion: </strong>Our findings justify the efforts required to translate FL into real-world applications by demonstrating advantageous performance over alternative approaches. Additionally, they emphasize the importance of strategic organization, robust management of distributed data and infrastructure in real-world settings. With the proposed guide, we are aiming to aid future FL researchers in circumventing pitfalls and accelerating translation of FL into radiological applications.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"193-205"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648732/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142512054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-based prediction models in medical decision-making in kidney disease: patient, caregiver, and clinician perspectives on trust and appropriate use. 肾病医疗决策中基于机器学习的预测模型:患者、护理人员和临床医生对信任和适当使用的看法。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae255
Jessica Sperling, Whitney Welsh, Erin Haseley, Stella Quenstedt, Perusi B Muhigaba, Adrian Brown, Patti Ephraim, Tariq Shafi, Michael Waitzkin, David Casarett, Benjamin A Goldstein

Objectives: This study aims to improve the ethical use of machine learning (ML)-based clinical prediction models (CPMs) in shared decision-making for patients with kidney failure on dialysis. We explore factors that inform acceptability, interpretability, and implementation of ML-based CPMs among multiple constituent groups.

Materials and methods: We collected and analyzed qualitative data from focus groups with varied end users, including: dialysis support providers (clinical providers and additional dialysis support providers such as dialysis clinic staff and social workers); patients; patients' caregivers (n = 52).

Results: Participants were broadly accepting of ML-based CPMs, but with concerns on data sources, factors included in the model, and accuracy. Use was desired in conjunction with providers' views and explanations. Differences among respondent types were minimal overall but most prevalent in discussions of CPM presentation and model use.

Discussion and conclusion: Evidence of acceptability of ML-based CPM usage provides support for ethical use, but numerous specific considerations in acceptability, model construction, and model use for shared clinical decision-making must be considered. There are specific steps that could be taken by data scientists and health systems to engender use that is accepted by end users and facilitates trust, but there are also ongoing barriers or challenges in addressing desires for use. This study contributes to emerging literature on interpretability, mechanisms for sharing complexities, including uncertainty regarding the model results, and implications for decision-making. It examines numerous stakeholder groups including providers, patients, and caregivers to provide specific considerations that can influence health system use and provide a basis for future research.

研究目的本研究旨在改善基于机器学习(ML)的临床预测模型(CPM)在透析肾衰竭患者共同决策中的伦理使用。我们探讨了多个组成群体对基于机器学习的临床预测模型的可接受性、可解释性和实施情况的影响因素:我们收集并分析了来自焦点小组的定性数据,这些焦点小组由不同的终端用户组成,包括:透析支持服务提供者(临床服务提供者和其他透析支持服务提供者,如透析诊所工作人员和社会工作者);患者;患者的护理人员(n = 52):结果:参与者普遍接受基于 ML 的 CPM,但对数据来源、模型中包含的因素和准确性表示担忧。他们希望结合医疗服务提供者的观点和解释来使用。受访者类型之间的差异总体上很小,但在 CPM 演示和模型使用的讨论中最为普遍:基于 ML 的 CPM 使用的可接受性证据为道德使用提供了支持,但在可接受性、模型构建和临床决策共享模型使用方面必须考虑许多具体因素。数据科学家和医疗系统可以采取一些具体步骤来促进最终用户接受和信任的使用,但在满足使用愿望方面也存在持续的障碍或挑战。本研究为有关可解释性、复杂性共享机制(包括模型结果的不确定性)以及对决策的影响的新兴文献做出了贡献。它对包括医疗服务提供者、患者和护理人员在内的众多利益相关者群体进行了研究,以提供可影响医疗系统使用的具体考虑因素,并为未来的研究奠定基础。
{"title":"Machine learning-based prediction models in medical decision-making in kidney disease: patient, caregiver, and clinician perspectives on trust and appropriate use.","authors":"Jessica Sperling, Whitney Welsh, Erin Haseley, Stella Quenstedt, Perusi B Muhigaba, Adrian Brown, Patti Ephraim, Tariq Shafi, Michael Waitzkin, David Casarett, Benjamin A Goldstein","doi":"10.1093/jamia/ocae255","DOIUrl":"10.1093/jamia/ocae255","url":null,"abstract":"<p><strong>Objectives: </strong>This study aims to improve the ethical use of machine learning (ML)-based clinical prediction models (CPMs) in shared decision-making for patients with kidney failure on dialysis. We explore factors that inform acceptability, interpretability, and implementation of ML-based CPMs among multiple constituent groups.</p><p><strong>Materials and methods: </strong>We collected and analyzed qualitative data from focus groups with varied end users, including: dialysis support providers (clinical providers and additional dialysis support providers such as dialysis clinic staff and social workers); patients; patients' caregivers (n = 52).</p><p><strong>Results: </strong>Participants were broadly accepting of ML-based CPMs, but with concerns on data sources, factors included in the model, and accuracy. Use was desired in conjunction with providers' views and explanations. Differences among respondent types were minimal overall but most prevalent in discussions of CPM presentation and model use.</p><p><strong>Discussion and conclusion: </strong>Evidence of acceptability of ML-based CPM usage provides support for ethical use, but numerous specific considerations in acceptability, model construction, and model use for shared clinical decision-making must be considered. There are specific steps that could be taken by data scientists and health systems to engender use that is accepted by end users and facilitates trust, but there are also ongoing barriers or challenges in addressing desires for use. This study contributes to emerging literature on interpretability, mechanisms for sharing complexities, including uncertainty regarding the model results, and implications for decision-making. It examines numerous stakeholder groups including providers, patients, and caregivers to provide specific considerations that can influence health system use and provide a basis for future research.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"51-62"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Oncointerpreter.ai enables interactive, personalized summarization of cancer diagnostics data. Oncointerpreter.ai 可对癌症诊断数据进行交互式、个性化的汇总。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1093/jamia/ocae284
Arihant Tripathi, Brett Ecker, Patrick Boland, Saum Ghodoussipour, Gregory R Riedlinger, Subhajyoti De

Objectives: Cancer diagnosis comes as a shock to many patients, and many of them feel unprepared to handle the complexity of the life-changing event, understand technicalities of the diagnostic reports, and fully engage with the clinical team regarding the personalized clinical decision-making.

Materials and methods: We develop Oncointerpreter.ai an interactive resource to offer personalized summarization of clinical cancer genomic and pathological data, and frame questions or address queries about therapeutic opportunities in near-real time via a graphical interface. It is built on the Mistral-7B and Llama-2 7B large language models trained on a local database trained using a large, curated corpus.

Results: We showcase its utility with case studies, where Oncointerpreter.ai extracted key clinical and molecular attributes from deidentified pathology and clinical genomics reports, summarized their contextual significance and answered queries on pertinent treatment options. Oncointerpreter also provided personalized summary of currently active clinical trials that match the patients' disease status, their selection criteria, and geographic locations. Benchmarking and comparative assessment indicated that the model responses were generally consistent, and hallucination, ie, factually incorrect or nonsensical response was rare; treatment- and outcome related queries led to context-aware responses, and response time correlated with verbosity.

Discussion: The choice of model and domain-specific training also affected the response quality.

Conclusion: Oncointerpreter.ai can aid the existing clinical care with interactive, individualized summarization of diagnostics data to promote informed dialogs with the patients with new cancer diagnoses.

Availability: https://github.com/Siris2314/Oncointerpreter.

目的:癌症诊断对许多患者来说是一个打击,他们中的许多人感到没有准备好应对这一改变生命事件的复杂性、理解诊断报告的技术细节,以及就个性化临床决策与临床团队充分互动:我们开发的 Oncointerpreter.ai 是一种交互式资源,可对临床癌症基因组和病理数据进行个性化总结,并通过图形界面近乎实时地提出问题或解决有关治疗机会的疑问。它建立在 Mistral-7B 和 Llama-2 7B 大型语言模型的基础上,这些模型是在一个本地数据库中使用大型语料库训练而成的:我们通过案例研究展示了 Oncointerpreter.ai 的实用性。在案例研究中,Oncointerpreter.ai 从去标识化的病理和临床基因组学报告中提取关键的临床和分子属性,总结其背景意义,并回答相关治疗方案的查询。Oncointerpreter 还提供了与患者疾病状况、选择标准和地理位置相匹配的当前活跃临床试验的个性化摘要。基准测试和比较评估结果表明,模型的回答基本一致,很少出现幻觉,即回答与事实不符或无意义的情况;与治疗和结果相关的询问都能得到上下文感知的回答,而回答时间则与语言冗长度相关:讨论:模型的选择和特定领域的训练也会影响响应质量:Oncointerpreter.ai可以通过对诊断数据进行交互式、个性化的总结来帮助现有的临床治疗,从而促进与新诊断出癌症的患者进行知情对话。可用性:https://github.com/Siris2314/Oncointerpreter。
{"title":"Oncointerpreter.ai enables interactive, personalized summarization of cancer diagnostics data.","authors":"Arihant Tripathi, Brett Ecker, Patrick Boland, Saum Ghodoussipour, Gregory R Riedlinger, Subhajyoti De","doi":"10.1093/jamia/ocae284","DOIUrl":"10.1093/jamia/ocae284","url":null,"abstract":"<p><strong>Objectives: </strong>Cancer diagnosis comes as a shock to many patients, and many of them feel unprepared to handle the complexity of the life-changing event, understand technicalities of the diagnostic reports, and fully engage with the clinical team regarding the personalized clinical decision-making.</p><p><strong>Materials and methods: </strong>We develop Oncointerpreter.ai an interactive resource to offer personalized summarization of clinical cancer genomic and pathological data, and frame questions or address queries about therapeutic opportunities in near-real time via a graphical interface. It is built on the Mistral-7B and Llama-2 7B large language models trained on a local database trained using a large, curated corpus.</p><p><strong>Results: </strong>We showcase its utility with case studies, where Oncointerpreter.ai extracted key clinical and molecular attributes from deidentified pathology and clinical genomics reports, summarized their contextual significance and answered queries on pertinent treatment options. Oncointerpreter also provided personalized summary of currently active clinical trials that match the patients' disease status, their selection criteria, and geographic locations. Benchmarking and comparative assessment indicated that the model responses were generally consistent, and hallucination, ie, factually incorrect or nonsensical response was rare; treatment- and outcome related queries led to context-aware responses, and response time correlated with verbosity.</p><p><strong>Discussion: </strong>The choice of model and domain-specific training also affected the response quality.</p><p><strong>Conclusion: </strong>Oncointerpreter.ai can aid the existing clinical care with interactive, individualized summarization of diagnostics data to promote informed dialogs with the patients with new cancer diagnoses.</p><p><strong>Availability: </strong>https://github.com/Siris2314/Oncointerpreter.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"129-138"},"PeriodicalIF":4.7,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11648722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1