首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
Multi-modality risk prediction of cardiovascular diseases for breast cancer cohort in the All of Us Research Program. 全民研究计划中乳腺癌队列的心血管疾病多模式风险预测。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-26 DOI: 10.1093/jamia/ocae199
Han Yang, Sicheng Zhou, Zexi Rao, Chen Zhao, Erjia Cui, Chetan Shenoy, Anne H Blaes, Nishitha Paidimukkala, Jinhua Wang, Jue Hou, Rui Zhang

Objective: This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.

Materials and methods: We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.

Results: The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.

Discussion: The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.

Conclusion: This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.

研究目的本研究利用 "我们所有人研究计划"(All of Us)数据集的丰富多样性,设计出乳腺癌(BC)幸存者心血管疾病(CVD)的预测模型。这项工作的核心是创建一个强大的数据集成管道,该管道可综合电子健康记录(EHR)、患者调查和基因组数据,同时维护不同人口统计学变量之间的公平性:我们开发了一个通用数据处理管道,用于处理和合并 "我们所有人 "数据集的异构数据源,解决数据缺失和数据差异问题,并将不同的数据模式整合到一个连贯的分析框架中。利用包括电子病历、生活方式和健康的社会决定因素 (SDoH) 数据在内的复合特征集,我们采用自适应拉索和随机森林回归模型来预测 6 种心血管疾病的结果。在 10 年的时间里,我们使用 c 指数和随时间变化的接收者工作特征曲线下面积对模型进行了评估:结果:自适应套索模型在大多数心血管疾病结果中表现出一致的性能,而随机森林模型在预测短暂性脑缺血发作等结果时表现尤为突出,因为它结合了完整的多模型特征集。特征重要性分析表明,年龄和既往冠心病事件是预测心血管疾病结果的主要因素,而SDoH聚类标签则突出了社会因素的细微影响:基于 Cox 的预测模型和随机森林回归模型的开发代表了 "我们所有人 "在整合电子病历和患者调查以提高精准医疗方面的广泛应用。SDoH聚类标签的加入揭示了社会行为因素对患者预后的重大影响,强调了预测模型中综合健康决定因素的重要性。尽管取得了这些进步,但仍存在一些局限性,包括未纳入基因数据、心血管疾病分类过宽,以及需要进行公平性分析以确保模型在不同人群中的公平表现。未来的工作应完善临床和社会变量测量,采用先进的估算技术,并探索更多的预测算法,以提高模型的精确性和公平性:本研究证明了 "我们所有人 "的多样化数据集在开发多模式预测模型以预测不列颠哥伦比亚省幸存者心血管疾病方面的作用。数据整合管道和后续预测模型为未来个性化医疗保健研究奠定了方法论基础。
{"title":"Multi-modality risk prediction of cardiovascular diseases for breast cancer cohort in the All of Us Research Program.","authors":"Han Yang, Sicheng Zhou, Zexi Rao, Chen Zhao, Erjia Cui, Chetan Shenoy, Anne H Blaes, Nishitha Paidimukkala, Jinhua Wang, Jue Hou, Rui Zhang","doi":"10.1093/jamia/ocae199","DOIUrl":"https://doi.org/10.1093/jamia/ocae199","url":null,"abstract":"<p><strong>Objective: </strong>This study leverages the rich diversity of the All of Us Research Program (All of Us)'s dataset to devise a predictive model for cardiovascular disease (CVD) in breast cancer (BC) survivors. Central to this endeavor is the creation of a robust data integration pipeline that synthesizes electronic health records (EHRs), patient surveys, and genomic data, while upholding fairness across demographic variables.</p><p><strong>Materials and methods: </strong>We have developed a universal data wrangling pipeline to process and merge heterogeneous data sources of the All of Us dataset, address missingness and variance in data, and align disparate data modalities into a coherent framework for analysis. Utilizing a composite feature set including EHR, lifestyle, and social determinants of health (SDoH) data, we then employed Adaptive Lasso and Random Forest regression models to predict 6 CVD outcomes. The models were evaluated using the c-index and time-dependent Area Under the Receiver Operating Characteristic Curve over a 10-year period.</p><p><strong>Results: </strong>The Adaptive Lasso model showed consistent performance across most CVD outcomes, while the Random Forest model excelled particularly in predicting outcomes like transient ischemic attack when incorporating the full multi-model feature set. Feature importance analysis revealed age and previous coronary events as dominant predictors across CVD outcomes, with SDoH clustering labels highlighting the nuanced impact of social factors.</p><p><strong>Discussion: </strong>The development of both Cox-based predictive model and Random Forest Regression model represents the extensive application of the All of Us, in integrating EHR and patient surveys to enhance precision medicine. And the inclusion of SDoH clustering labels revealed the significant impact of sociobehavioral factors on patient outcomes, emphasizing the importance of comprehensive health determinants in predictive models. Despite these advancements, limitations include the exclusion of genetic data, broad categorization of CVD conditions, and the need for fairness analyses to ensure equitable model performance across diverse populations. Future work should refine clinical and social variable measurements, incorporate advanced imputation techniques, and explore additional predictive algorithms to enhance model precision and fairness.</p><p><strong>Conclusion: </strong>This study demonstrates the liability of the All of Us's diverse dataset in developing a multi-modality predictive model for CVD in BC survivors risk stratification in oncological survivorship. The data integration pipeline and subsequent predictive models establish a methodological foundation for future research into personalized healthcare.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141767875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An evaluation of the All of Us Research Program database to examine cumulative stress. 对 "我们所有人 "研究计划数据库进行评估,以检查累积压力。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-26 DOI: 10.1093/jamia/ocae201
Shawna Beese, Demetrius A Abshire, Trey L DeJong, Jason T Carbone

Objectives: To evaluate the NIH All of Us Research Program database as a potential data source for studying allostatic load and stress among adults in the United States (US).

Materials and methods: We evaluated the All of Us database to determine sample size significance for original-10 allostatic load biomarkers, Allostatic Load Index-5 (ALI-5), Allostatic Load Five, and Cohen's Perceived Stress Scale (PSS). We conducted a priori, post hoc, and sensitivity power analyses to determine sample sizes for conducting null hypothesis significance tests.

Results: The maximum number of responses available for each measure is 21 participants for the original-10 allostatic load biomarkers, 150 for the ALI-5, 22 476 for Allostatic Load Five, and n = 90 583 for the PSS.

Discussion: The NIH All of Us Research Program is well-suited for studying allostatic load using the Allostatic Load Five and psychological stress using PSS.

Conclusion: Improving biomarker data collection in All of Us will facilitate more nuanced examinations of allostatic load among US adults.

目的:评估美国国立卫生研究院(NIH)"我们所有人 "研究计划数据库作为研究美国成年人异质负荷和压力的潜在数据源的价值:评估美国国立卫生研究院(NIH)"我们所有人 "研究计划数据库,将其作为研究美国成年人的静态负荷和压力的潜在数据源:我们对 "我们所有人 "数据库进行了评估,以确定原有的 10 个静态负荷生物标志物、静态负荷指数-5 (ALI-5)、静态负荷五项和 Cohen 感知压力量表 (PSS) 的样本大小。我们进行了先验、事后和敏感性功率分析,以确定进行虚假假设显著性检验的样本量:结果:对于最初的 10 种静态负荷生物标志物,每种测量方法的最大响应人数为 21 人;对于 ALI-5 测量方法,最大响应人数为 150 人;对于 Allostatic Load Five 测量方法,最大响应人数为 22 476 人;对于 PSS 测量方法,最大响应人数为 90 583 人:讨论:美国国立卫生研究院的 "我们所有人 "研究计划非常适合使用 "静态负荷五项 "来研究静态负荷,使用 PSS 来研究心理压力:结论:改进 "我们所有人 "项目的生物标志物数据收集工作将有助于对美国成年人的静态负荷进行更细致的研究。
{"title":"An evaluation of the All of Us Research Program database to examine cumulative stress.","authors":"Shawna Beese, Demetrius A Abshire, Trey L DeJong, Jason T Carbone","doi":"10.1093/jamia/ocae201","DOIUrl":"https://doi.org/10.1093/jamia/ocae201","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the NIH All of Us Research Program database as a potential data source for studying allostatic load and stress among adults in the United States (US).</p><p><strong>Materials and methods: </strong>We evaluated the All of Us database to determine sample size significance for original-10 allostatic load biomarkers, Allostatic Load Index-5 (ALI-5), Allostatic Load Five, and Cohen's Perceived Stress Scale (PSS). We conducted a priori, post hoc, and sensitivity power analyses to determine sample sizes for conducting null hypothesis significance tests.</p><p><strong>Results: </strong>The maximum number of responses available for each measure is 21 participants for the original-10 allostatic load biomarkers, 150 for the ALI-5, 22 476 for Allostatic Load Five, and n = 90 583 for the PSS.</p><p><strong>Discussion: </strong>The NIH All of Us Research Program is well-suited for studying allostatic load using the Allostatic Load Five and psychological stress using PSS.</p><p><strong>Conclusion: </strong>Improving biomarker data collection in All of Us will facilitate more nuanced examinations of allostatic load among US adults.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141767874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Engagement with health research summaries via digital communication to All of Us participants. 通过向 "我们所有人 "参与者提供数字通信,让他们参与健康研究摘要。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-25 DOI: 10.1093/jamia/ocae185
Janna Ter Meer, Royan Kamyar, Christina Orlovsky, Ting-Yang Hung, Tamara Benrey, Ethan Dinh-Luong, Giorgio Quer, Julia Moore Vogel

Objective: Summaries of health research can be a complementary way to return value to participants. We assess how research participants engage with summaries via email communication and how this can be improved.

Materials and methods: We look at correlations between demographic subgroups and engagement in a longitudinal dataset of 305 626 participants (77% are classified as underrepresented in biomedical research) from the All of Us Research Program. We compare this against engagement with other program communications and use impact evaluations (N = 421 510) to measure the effect of tailoring communication by (1) eliciting content preferences, (2) Spanish focused content, (3) informational videos, and (4) article content in the email subject line.

Results: Between March 2020 and October 2021, research summaries reached 67% of enrolled participants, outperforming other program communication (60%) and return of results (31%), which have a high uptake rate but have been extended to a subset of eligible participants. While all demographic subgroups engage with research summaries, participants with higher income, educational attainment, White, and older than 45 years open and click content most often. Surfacing article content in the email subject line and Spanish focused content had negative effects on engagement. Video and social media content and eliciting preferences led to a small directional increase in clicks.

Discussion: Further individualization of tailoring efforts may be needed to drive larger engagement effects (eg, delivering multiple articles in line with stated preferences, expanding preference options). Our findings are likely a conservative representation of engagement effects, given the coarseness of our click rate measure.

Conclusions: Health research summaries show promise as a way to return value to research participants, especially if individual-level results cannot be returned. Personalization of communication requires testing to determine whether efforts are having the expected effect.

目的:健康研究摘要可以作为一种补充方式,为参与者提供价值回报。我们评估了研究参与者如何通过电子邮件交流参与摘要,以及如何改进这种方式:我们从 "我们所有人 "研究计划的 305 626 名参与者(其中 77% 被归类为生物医学研究中代表性不足的人)的纵向数据集中研究了人口统计亚群与参与度之间的相关性。我们将其与其他项目交流的参与情况进行比较,并利用影响评估(N = 421 510)来衡量通过以下方式定制交流的效果:(1)激发内容偏好;(2)以西班牙语为重点的内容;(3)信息视频;以及(4)电子邮件主题行中的文章内容:在 2020 年 3 月至 2021 年 10 月期间,67% 的注册参与者收到了研究摘要,超过了其他项目宣传(60%)和结果返还(31%),后者的接受率较高,但仅限于一部分符合条件的参与者。虽然所有人口亚群都参与了研究摘要,但收入较高、受教育程度较高、白人和 45 岁以上的参与者打开和点击内容的频率最高。在电子邮件主题行中出现文章内容和以西班牙语为重点的内容对参与度有负面影响。视频和社交媒体内容以及征询偏好会使点击率有小幅上升:讨论:要提高参与度,可能还需要进一步个性化定制(例如,根据既定偏好提供多篇文章,扩大偏好选项)。鉴于我们的点击率衡量标准比较粗略,我们的研究结果很可能只是对参与效果的保守表述:健康研究摘要有望成为一种向研究参与者回报价值的方式,尤其是在无法回报个人层面结果的情况下。个性化交流需要进行测试,以确定是否达到了预期效果。
{"title":"Engagement with health research summaries via digital communication to All of Us participants.","authors":"Janna Ter Meer, Royan Kamyar, Christina Orlovsky, Ting-Yang Hung, Tamara Benrey, Ethan Dinh-Luong, Giorgio Quer, Julia Moore Vogel","doi":"10.1093/jamia/ocae185","DOIUrl":"https://doi.org/10.1093/jamia/ocae185","url":null,"abstract":"<p><strong>Objective: </strong>Summaries of health research can be a complementary way to return value to participants. We assess how research participants engage with summaries via email communication and how this can be improved.</p><p><strong>Materials and methods: </strong>We look at correlations between demographic subgroups and engagement in a longitudinal dataset of 305 626 participants (77% are classified as underrepresented in biomedical research) from the All of Us Research Program. We compare this against engagement with other program communications and use impact evaluations (N = 421 510) to measure the effect of tailoring communication by (1) eliciting content preferences, (2) Spanish focused content, (3) informational videos, and (4) article content in the email subject line.</p><p><strong>Results: </strong>Between March 2020 and October 2021, research summaries reached 67% of enrolled participants, outperforming other program communication (60%) and return of results (31%), which have a high uptake rate but have been extended to a subset of eligible participants. While all demographic subgroups engage with research summaries, participants with higher income, educational attainment, White, and older than 45 years open and click content most often. Surfacing article content in the email subject line and Spanish focused content had negative effects on engagement. Video and social media content and eliciting preferences led to a small directional increase in clicks.</p><p><strong>Discussion: </strong>Further individualization of tailoring efforts may be needed to drive larger engagement effects (eg, delivering multiple articles in line with stated preferences, expanding preference options). Our findings are likely a conservative representation of engagement effects, given the coarseness of our click rate measure.</p><p><strong>Conclusions: </strong>Health research summaries show promise as a way to return value to research participants, especially if individual-level results cannot be returned. Personalization of communication requires testing to determine whether efforts are having the expected effect.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141762142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy preserving record linkage for public health action: opportunities and challenges. 保护隐私的记录链接促进公共卫生行动:机遇与挑战。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-24 DOI: 10.1093/jamia/ocae196
Aditi Pathak, Laina Serrer, Daniela Zapata, Raymond King, Lisa B Mirel, Thomas Sukalac, Arunkumar Srinivasan, Patrick Baier, Meera Bhalla, Corinne David-Ferdon, Steven Luxenberg, Adi V Gundlapalli

Objectives: To understand the landscape of privacy preserving record linkage (PPRL) applications in public health, assess estimates of PPRL accuracy and privacy, and evaluate factors for PPRL adoption.

Materials and methods: A literature scan examined the accuracy, data privacy, and scalability of PPRL in public health. Twelve interviews with subject matter experts were conducted and coded using an inductive approach to identify factors related to PPRL adoption.

Results: PPRL has a high level of linkage quality and accuracy. PPRL linkage quality was comparable to that of clear text linkage methods (requiring direct personally identifiable information [PII]) for linkage across various settings and research questions. Accuracy of PPRL depended on several components, such as PPRL technique, and the proportion of missingness and errors in underlying data. Strategies to increase adoption include increasing understanding of PPRL, improving data owner buy-in, establishing governance structure and oversight, and developing a public health implementation strategy for PPRL.

Discussion: PPRL protects privacy by eliminating the need to share PII for linkage, but the accuracy and linkage quality depend on factors including the choice of PPRL technique and specific PII used to create encrypted identifiers. Large-scale implementations of PPRL linking millions of observations-including PCORnet, National Institutes for Health N3C, and the Centers for Disease Control and Prevention COVID-19 project have demonstrated the scalability of PPRL for public health applications.

Conclusions: Applications of PPRL in public health have demonstrated their value for the public health community. Although gaps must be addressed before wide implementation, PPRL is a promising solution to data linkage challenges faced by the public health ecosystem.

目的:了解隐私保护记录关联(PPRL)在公共卫生领域的应用情况:了解公共卫生领域隐私保护记录关联(PPRL)的应用情况,评估 PPRL 的准确性和隐私性,并评估采用 PPRL 的因素:文献扫描研究了 PPRL 在公共卫生领域的准确性、数据隐私性和可扩展性。对主题专家进行了 12 次访谈,并采用归纳法进行编码,以确定与采用 PPRL 有关的因素:结果:PPRL 具有较高的链接质量和准确性。在不同的环境和研究问题中,PPRL 的链接质量与清晰文本链接方法(需要直接的个人身份信息 [PII])的链接质量相当。PPRL 的准确性取决于几个因素,如 PPRL 技术以及基础数据的缺失和错误比例。提高采用率的策略包括增加对 PPRL 的了解、提高数据所有者的认同度、建立管理结构和监督以及制定 PPRL 的公共卫生实施策略:PPRL 通过消除共享 PII 进行链接的需要来保护隐私,但其准确性和链接质量取决于各种因素,包括 PPRL 技术的选择和用于创建加密标识符的特定 PII。连接数百万观测数据的 PPRL 大规模实施(包括 PCORnet、美国国立卫生研究院 N3C 和美国疾病控制和预防中心 COVID-19 项目)证明了 PPRL 在公共卫生应用中的可扩展性:结论:PPRL 在公共卫生领域的应用证明了其对公共卫生界的价值。尽管在广泛实施之前必须弥补差距,但 PPRL 是应对公共卫生生态系统所面临的数据链接挑战的一个很有前途的解决方案。
{"title":"Privacy preserving record linkage for public health action: opportunities and challenges.","authors":"Aditi Pathak, Laina Serrer, Daniela Zapata, Raymond King, Lisa B Mirel, Thomas Sukalac, Arunkumar Srinivasan, Patrick Baier, Meera Bhalla, Corinne David-Ferdon, Steven Luxenberg, Adi V Gundlapalli","doi":"10.1093/jamia/ocae196","DOIUrl":"https://doi.org/10.1093/jamia/ocae196","url":null,"abstract":"<p><strong>Objectives: </strong>To understand the landscape of privacy preserving record linkage (PPRL) applications in public health, assess estimates of PPRL accuracy and privacy, and evaluate factors for PPRL adoption.</p><p><strong>Materials and methods: </strong>A literature scan examined the accuracy, data privacy, and scalability of PPRL in public health. Twelve interviews with subject matter experts were conducted and coded using an inductive approach to identify factors related to PPRL adoption.</p><p><strong>Results: </strong>PPRL has a high level of linkage quality and accuracy. PPRL linkage quality was comparable to that of clear text linkage methods (requiring direct personally identifiable information [PII]) for linkage across various settings and research questions. Accuracy of PPRL depended on several components, such as PPRL technique, and the proportion of missingness and errors in underlying data. Strategies to increase adoption include increasing understanding of PPRL, improving data owner buy-in, establishing governance structure and oversight, and developing a public health implementation strategy for PPRL.</p><p><strong>Discussion: </strong>PPRL protects privacy by eliminating the need to share PII for linkage, but the accuracy and linkage quality depend on factors including the choice of PPRL technique and specific PII used to create encrypted identifiers. Large-scale implementations of PPRL linking millions of observations-including PCORnet, National Institutes for Health N3C, and the Centers for Disease Control and Prevention COVID-19 project have demonstrated the scalability of PPRL for public health applications.</p><p><strong>Conclusions: </strong>Applications of PPRL in public health have demonstrated their value for the public health community. Although gaps must be addressed before wide implementation, PPRL is a promising solution to data linkage challenges faced by the public health ecosystem.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141762170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Representation of Social Determinants of Health terminology in medical subject headings: impact of added terms. 健康的社会决定因素术语在医学主题词表中的体现:新增术语的影响。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-24 DOI: 10.1093/jamia/ocae191
Chikako Suda-King, Lucas Winch, James M Tucker, Abbey D Zuehlke, Christine Hunter, Janine M Simmons

Objectives: To enhance and evaluate the quality of PubMed search results for Social Determinants of Health (SDoH) through the addition of new SDoH terms to Medical Subject Headings (MeSH).

Materials and methods: High priority SDoH terms and definitions were collated from authoritative sources, curated based on publication frequencies, and refined by subject matter experts. Descriptive analyses were used to investigate how PubMed search details and best match results were affected by the addition of SDoH concepts to MeSH. Three information retrieval metrics (Precision, Recall, and F measure) were used to quantitatively assess the accuracy of PubMed search results. Pre- and post-update documents were clustered into topic areas using a Natural Language Processing pipeline, and SDoH relevancy assessed.

Results: Addition of 35 SDoH terms to MeSH resulted in more accurate algorithmic translations of search terms and more reliable best match results. The Precision, Recall, and F measures of post-update results were significantly higher than those of pre-update results. The percentage of retrieved publications belonging to SDoH clusters was significantly greater in the post- than pre-update searches.

Discussion: This evaluation confirms that inclusion of new SDoH terms in MeSH can lead to qualitative and quantitative enhancements in PubMed search retrievals. It demonstrates the methodology for and impact of suggesting new terms for MeSH indexing. It provides a foundation for future efforts across behavioral and social science research (BSSR) domains.

Conclusion: Improving the representation of BSSR terminology in MeSH can improve PubMed search results, thereby enhancing the ability of investigators and clinicians to build and utilize a cumulative BSSR knowledge base.

目的通过在医学主题词表(MeSH)中添加新的健康的社会决定因素(SDoH)术语,提高并评估PubMed搜索结果的质量。材料与方法:从权威来源整理出高优先级的健康的社会决定因素术语和定义,根据发表频率对其进行整理,并由主题专家对其进行完善。通过描述性分析来研究在 MeSH 中添加 SDoH 概念对 PubMed 搜索细节和最佳匹配结果的影响。三个信息检索指标(精确度、召回率和 F 值)用于定量评估 PubMed 搜索结果的准确性。使用自然语言处理管道将更新前和更新后的文档聚类为主题领域,并评估SDoH的相关性:结果:在 MeSH 中添加 35 个 SDoH 术语后,搜索术语的算法翻译更加准确,最佳匹配结果也更加可靠。更新后结果的精确度、召回率和F值都明显高于更新前的结果。在更新后的搜索中,属于 SDoH 群组的检索出版物的百分比明显高于更新前的搜索:本次评估证实,在 MeSH 中纳入新的 SDoH 术语可以从质量和数量上提高 PubMed 搜索的检索结果。它展示了为 MeSH 索引建议新术语的方法和影响。它为今后跨行为和社会科学研究(BSSR)领域的工作奠定了基础:结论:改进行为与社会科学研究术语在 MeSH 中的表述可以改善 PubMed 的检索结果,从而提高研究人员和临床医生建立和利用累积的行为与社会科学研究知识库的能力。
{"title":"Representation of Social Determinants of Health terminology in medical subject headings: impact of added terms.","authors":"Chikako Suda-King, Lucas Winch, James M Tucker, Abbey D Zuehlke, Christine Hunter, Janine M Simmons","doi":"10.1093/jamia/ocae191","DOIUrl":"https://doi.org/10.1093/jamia/ocae191","url":null,"abstract":"<p><strong>Objectives: </strong>To enhance and evaluate the quality of PubMed search results for Social Determinants of Health (SDoH) through the addition of new SDoH terms to Medical Subject Headings (MeSH).</p><p><strong>Materials and methods: </strong>High priority SDoH terms and definitions were collated from authoritative sources, curated based on publication frequencies, and refined by subject matter experts. Descriptive analyses were used to investigate how PubMed search details and best match results were affected by the addition of SDoH concepts to MeSH. Three information retrieval metrics (Precision, Recall, and F measure) were used to quantitatively assess the accuracy of PubMed search results. Pre- and post-update documents were clustered into topic areas using a Natural Language Processing pipeline, and SDoH relevancy assessed.</p><p><strong>Results: </strong>Addition of 35 SDoH terms to MeSH resulted in more accurate algorithmic translations of search terms and more reliable best match results. The Precision, Recall, and F measures of post-update results were significantly higher than those of pre-update results. The percentage of retrieved publications belonging to SDoH clusters was significantly greater in the post- than pre-update searches.</p><p><strong>Discussion: </strong>This evaluation confirms that inclusion of new SDoH terms in MeSH can lead to qualitative and quantitative enhancements in PubMed search retrievals. It demonstrates the methodology for and impact of suggesting new terms for MeSH indexing. It provides a foundation for future efforts across behavioral and social science research (BSSR) domains.</p><p><strong>Conclusion: </strong>Improving the representation of BSSR terminology in MeSH can improve PubMed search results, thereby enhancing the ability of investigators and clinicians to build and utilize a cumulative BSSR knowledge base.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141762171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
allofus: an R package to facilitate use of the All of Us Researcher Workbench. allofus:方便使用 "全民研究员工作台 "的 R 软件包。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-24 DOI: 10.1093/jamia/ocae198
Louisa H Smith, Robert Cavanaugh

Objectives: Despite easy-to-use tools like the Cohort Builder, using All of Us Research Program data for complex research questions requires a relatively high level of technical expertise. We aimed to increase research and training capacity and reduce barriers to entry for the All of Us community through an R package, allofus. In this article, we describe functions that address common challenges we encountered while working with All of Us Research Program data, and we demonstrate this functionality with an example of creating a cohort of All of Us participants by synthesizing electronic health record and survey data with time dependencies.

Target audience: All of Us Research Program data are widely available to health researchers. The allofus R package is aimed at a wide range of researchers who wish to conduct complex analyses using best practices for reproducibility and transparency, and who have a range of experience using R. Because the All of Us data are transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), researchers familiar with existing OMOP CDM tools or who wish to conduct network studies in conjunction with other OMOP CDM data will also find value in the package.

Scope: We developed an initial set of functions that solve problems we experienced across survey and electronic health record data in our own research and in mentoring student projects. The package will continue to grow and develop with the All of Us Research Program. The allofus R package can help build community research capacity by increasing access to the All of Us Research Program data, the efficiency of its use, and the rigor and reproducibility of the resulting research.

目标:尽管有队列生成器等易于使用的工具,但使用 "我们所有人 "研究计划的数据来解决复杂的研究问题需要相对较高的专业技术水平。我们的目标是通过 R 软件包 allofus 提高研究和培训能力,减少 "我们所有人 "社区的准入门槛。在本文中,我们将介绍一些功能,这些功能可解决我们在使用我们所有人研究计划数据时遇到的常见难题,我们还将以通过综合电子健康记录和调查数据来创建我们所有人参与者队列的例子来演示这些功能:我们所有人研究计划的数据可供健康研究人员广泛使用。allofus R 软件包的目标受众是希望使用可重复性和透明度方面的最佳实践进行复杂分析,并具有一定 R 使用经验的广大研究人员。由于 All of Us 数据已转化为观察性医疗结果合作组织通用数据模型(OMOP CDM),因此熟悉现有 OMOP CDM 工具或希望结合其他 OMOP CDM 数据进行网络研究的研究人员也会发现该软件包的价值:我们开发了一套初步功能,以解决我们在自己的研究和指导学生项目中遇到的调查和电子健康记录数据问题。该软件包将随着 "我们所有人 "研究计划继续成长和发展。allofus R 软件包可以提高对 "我们所有人研究计划 "数据的访问、使用效率以及研究的严谨性和可重复性,从而帮助提高社区研究能力。
{"title":"allofus: an R package to facilitate use of the All of Us Researcher Workbench.","authors":"Louisa H Smith, Robert Cavanaugh","doi":"10.1093/jamia/ocae198","DOIUrl":"https://doi.org/10.1093/jamia/ocae198","url":null,"abstract":"<p><strong>Objectives: </strong>Despite easy-to-use tools like the Cohort Builder, using All of Us Research Program data for complex research questions requires a relatively high level of technical expertise. We aimed to increase research and training capacity and reduce barriers to entry for the All of Us community through an R package, allofus. In this article, we describe functions that address common challenges we encountered while working with All of Us Research Program data, and we demonstrate this functionality with an example of creating a cohort of All of Us participants by synthesizing electronic health record and survey data with time dependencies.</p><p><strong>Target audience: </strong>All of Us Research Program data are widely available to health researchers. The allofus R package is aimed at a wide range of researchers who wish to conduct complex analyses using best practices for reproducibility and transparency, and who have a range of experience using R. Because the All of Us data are transformed into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM), researchers familiar with existing OMOP CDM tools or who wish to conduct network studies in conjunction with other OMOP CDM data will also find value in the package.</p><p><strong>Scope: </strong>We developed an initial set of functions that solve problems we experienced across survey and electronic health record data in our own research and in mentoring student projects. The package will continue to grow and develop with the All of Us Research Program. The allofus R package can help build community research capacity by increasing access to the All of Us Research Program data, the efficiency of its use, and the rigor and reproducibility of the resulting research.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pregnancy episodes in All of Us: harnessing multi-source data for pregnancy-related research. 我们所有人的妊娠事件:利用多源数据开展与妊娠有关的研究。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-24 DOI: 10.1093/jamia/ocae195
Louisa H Smith, Wanjiang Wang, Brianna Keefe-Oates

Objectives: The National Institutes of Health's All of Us Research Program addresses gaps in biomedical research by collecting health data from diverse populations. Pregnant individuals have historically been underrepresented in biomedical research, and pregnancy-related research is often limited by data availability, sample size, and inadequate representation of the diversity of pregnant people. All of Us integrates a wealth of health-related data, providing a unique opportunity to conduct comprehensive pregnancy-related research. We aimed to identify pregnancy episodes with high-quality electronic health record (EHR) data in All of Us Research Program data and evaluate the program's utility for pregnancy-related research.

Materials and methods: We used a previously published algorithm to identify pregnancy episodes in All of Us EHR data. We described these pregnancies, validated them with All of Us survey data, and compared them to national statistics.

Results: Our study identified 18 970 pregnancy episodes from 14 234 participants; other possible pregnancy episodes had low-quality or insufficient data. Validation against people who reported a current pregnancy on an All of Us survey found low false positive and negative rates. Demographics were similar in some respects to national data; however, Asian-Americans were underrepresented, and older, highly educated pregnant people were overrepresented.

Discussion: Our approach demonstrates the capacity of All of Us to support pregnancy research and reveals the diversity of the pregnancy cohort. However, we noted an underrepresentation among some demographics. Other limitations include measurement error in gestational age and limited data on non-live births.

Conclusion: The wide variety of data in the All of Us program, encompassing EHR, survey, genomic, and fitness tracker data, offers a valuable resource for studying pregnancy, yet care must be taken to avoid biases.

目标:美国国立卫生研究院的 "我们所有人研究计划 "通过收集不同人群的健康数据来弥补生物医学研究的不足。孕妇在生物医学研究中的代表性历来不足,与怀孕相关的研究往往受到数据可用性、样本大小以及孕妇多样性代表性不足等因素的限制。我们所有人》整合了丰富的健康相关数据,为开展全面的孕期相关研究提供了独特的机会。我们的目的是在 "我们所有人 "研究计划的数据中识别具有高质量电子健康记录(EHR)数据的妊娠事件,并评估该计划在妊娠相关研究中的实用性:我们使用之前发布的算法来识别 All of Us 电子病历数据中的妊娠事件。我们对这些妊娠进行了描述,用 All of Us 调查数据对其进行了验证,并将其与国家统计数据进行了比较:我们的研究从 14 234 名参与者中识别出了 18 970 次怀孕事件;其他可能的怀孕事件数据质量较低或不足。与在 "我们所有人 "调查中报告当前怀孕的人进行验证后发现,假阳性率和假阴性率都很低。人口统计学在某些方面与全国数据相似;但是,亚裔美国人的比例偏低,年龄较大、受过高等教育的孕妇比例偏高:讨论:我们的方法展示了 "我们所有人 "支持孕期研究的能力,并揭示了孕期人群的多样性。然而,我们也注意到某些人口统计中存在代表性不足的情况。其他限制还包括胎龄测量误差和非活产数据有限:结论:All of Us 计划中的数据种类繁多,包括电子病历、调查、基因组和健身追踪器数据,为研究妊娠提供了宝贵的资源,但必须注意避免偏差。
{"title":"Pregnancy episodes in All of Us: harnessing multi-source data for pregnancy-related research.","authors":"Louisa H Smith, Wanjiang Wang, Brianna Keefe-Oates","doi":"10.1093/jamia/ocae195","DOIUrl":"https://doi.org/10.1093/jamia/ocae195","url":null,"abstract":"<p><strong>Objectives: </strong>The National Institutes of Health's All of Us Research Program addresses gaps in biomedical research by collecting health data from diverse populations. Pregnant individuals have historically been underrepresented in biomedical research, and pregnancy-related research is often limited by data availability, sample size, and inadequate representation of the diversity of pregnant people. All of Us integrates a wealth of health-related data, providing a unique opportunity to conduct comprehensive pregnancy-related research. We aimed to identify pregnancy episodes with high-quality electronic health record (EHR) data in All of Us Research Program data and evaluate the program's utility for pregnancy-related research.</p><p><strong>Materials and methods: </strong>We used a previously published algorithm to identify pregnancy episodes in All of Us EHR data. We described these pregnancies, validated them with All of Us survey data, and compared them to national statistics.</p><p><strong>Results: </strong>Our study identified 18 970 pregnancy episodes from 14 234 participants; other possible pregnancy episodes had low-quality or insufficient data. Validation against people who reported a current pregnancy on an All of Us survey found low false positive and negative rates. Demographics were similar in some respects to national data; however, Asian-Americans were underrepresented, and older, highly educated pregnant people were overrepresented.</p><p><strong>Discussion: </strong>Our approach demonstrates the capacity of All of Us to support pregnancy research and reveals the diversity of the pregnancy cohort. However, we noted an underrepresentation among some demographics. Other limitations include measurement error in gestational age and limited data on non-live births.</p><p><strong>Conclusion: </strong>The wide variety of data in the All of Us program, encompassing EHR, survey, genomic, and fitness tracker data, offers a valuable resource for studying pregnancy, yet care must be taken to avoid biases.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electronic documentation burden among outpatient rehabilitation therapists: a qualitative descriptive study and quality improvement initiative. 门诊康复治疗师的电子文档负担:定性描述研究和质量改进倡议。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-22 DOI: 10.1093/jamia/ocae192
Jessica Schwartz-Dillard, Travis Ng, Joann Villegas, Derrick Johnson, Mary Murray-Weir

Objectives: Outpatient rehabilitation (rehab) physical, occupational, and speech therapists use electronic health records (EHR), yet their documentation experiences, including any documentation burden, are not well researched. Therapists are a growing portion of the U.S. healthcare workforce, whose need is critical to the health of an aging population. We aimed to describe outpatient rehab therapists' documentation experiences and identify strategies for mitigating any documentation burden.

Materials and methods: We used qualitative descriptive methodology to conduct 4 focus groups with outpatient rehab therapists at Hospital for Special Surgery, a multi-site orthopedic institution. Transcripts were inductively coded to identify themes and actionable strategies for improving the therapists' documentation experiences. Therapists provided feedback and prioritization of proposed strategies.

Results: A total of 13 therapists were interviewed. Five themes and 10 subthemes characterize the therapists' documentation experience by a feeling that documentation inhibits clinical care and work/life balance, a perceived lack of support and efficiencies, the desire to document to communicate clinical care, and a design vision for improving the EHR. Top prioritized strategies for improvement included use of timesaving templates, expanding dictation, decluttering the EHR interface, and support for free texting over discrete data capture.

Discussion: Outpatient rehab therapists experience documentation burden similar to that documented of physicians and nurses. Manual data entry imposes burden on therapists' time and clinical care.

Conclusion: A multi-faceted approach is needed for improving therapists' experiences including EHR redesign, technology supporting dictation and narrative to discrete data capture, and support from leadership and regulators.

目的:门诊康复(rehab)理疗师、职业治疗师和言语治疗师使用电子健康记录(EHR),但他们的文档记录经验,包括任何文档记录负担,都没有得到很好的研究。治疗师在美国医疗保健队伍中所占比例越来越大,他们的需求对老龄化人口的健康至关重要。我们旨在描述门诊康复治疗师的文档记录经验,并确定减轻文档记录负担的策略:我们采用定性描述方法,与特殊外科医院的门诊康复治疗师进行了 4 次焦点小组讨论。我们对记录誊本进行了归纳编码,以确定改善治疗师记录经验的主题和可行策略。治疗师提供了反馈意见,并对提出的策略进行了优先排序:共有 13 名治疗师接受了访谈。五个主题和十个次主题描述了治疗师的文档记录体验,包括认为文档记录阻碍了临床护理和工作/生活的平衡、认为缺乏支持和效率低下、希望通过文档记录来交流临床护理,以及对改进电子病历的设计愿景。最优先的改进策略包括使用节省时间的模板、扩大口述记录、简化电子病历界面以及支持自由文本而非离散数据采集:讨论:门诊康复治疗师的文档记录负担与医生和护士类似。人工数据录入给治疗师的时间和临床护理带来了负担:结论:要改善治疗师的体验,需要采取多方面的措施,包括重新设计电子病历、支持口述和叙事的技术,以取代离散数据采集,以及领导层和监管机构的支持。
{"title":"Electronic documentation burden among outpatient rehabilitation therapists: a qualitative descriptive study and quality improvement initiative.","authors":"Jessica Schwartz-Dillard, Travis Ng, Joann Villegas, Derrick Johnson, Mary Murray-Weir","doi":"10.1093/jamia/ocae192","DOIUrl":"https://doi.org/10.1093/jamia/ocae192","url":null,"abstract":"<p><strong>Objectives: </strong>Outpatient rehabilitation (rehab) physical, occupational, and speech therapists use electronic health records (EHR), yet their documentation experiences, including any documentation burden, are not well researched. Therapists are a growing portion of the U.S. healthcare workforce, whose need is critical to the health of an aging population. We aimed to describe outpatient rehab therapists' documentation experiences and identify strategies for mitigating any documentation burden.</p><p><strong>Materials and methods: </strong>We used qualitative descriptive methodology to conduct 4 focus groups with outpatient rehab therapists at Hospital for Special Surgery, a multi-site orthopedic institution. Transcripts were inductively coded to identify themes and actionable strategies for improving the therapists' documentation experiences. Therapists provided feedback and prioritization of proposed strategies.</p><p><strong>Results: </strong>A total of 13 therapists were interviewed. Five themes and 10 subthemes characterize the therapists' documentation experience by a feeling that documentation inhibits clinical care and work/life balance, a perceived lack of support and efficiencies, the desire to document to communicate clinical care, and a design vision for improving the EHR. Top prioritized strategies for improvement included use of timesaving templates, expanding dictation, decluttering the EHR interface, and support for free texting over discrete data capture.</p><p><strong>Discussion: </strong>Outpatient rehab therapists experience documentation burden similar to that documented of physicians and nurses. Manual data entry imposes burden on therapists' time and clinical care.</p><p><strong>Conclusion: </strong>A multi-faceted approach is needed for improving therapists' experiences including EHR redesign, technology supporting dictation and narrative to discrete data capture, and support from leadership and regulators.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompt engineering on leveraging large language models in generating response to InBasket messages. 利用大型语言模型生成 InBasket 消息响应的提示工程。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-19 DOI: 10.1093/jamia/ocae172
Sherry Yan, Wendi Knapp, Andrew Leong, Sarira Kadkhodazadeh, Souvik Das, Veena G Jones, Robert Clark, David Grattendick, Kevin Chen, Lisa Hladik, Lawrence Fagan, Albert Chan

Objectives: Large Language Models (LLMs) have been proposed as a solution to address high volumes of Patient Medical Advice Requests (PMARs). This study addresses whether LLMs can generate high quality draft responses to PMARs that satisfies both patients and clinicians with prompt engineering.

Materials and methods: We designed a novel human-involved iterative processes to train and validate prompts to LLM in creating appropriate responses to PMARs. GPT-4 was used to generate response to the messages. We updated the prompts, and evaluated both clinician and patient acceptance of LLM-generated draft responses at each iteration, and tested the optimized prompt on independent validation data sets. The optimized prompt was implemented in the electronic health record production environment and tested by 69 primary care clinicians.

Results: After 3 iterations of prompt engineering, physician acceptance of draft suitability increased from 62% to 84% (P <.001) in the validation dataset (N = 200), and 74% of drafts in the test dataset were rated as "helpful." Patients also noted significantly increased favorability of message tone (78%) and overall quality (80%) for the optimized prompt compared to the original prompt in the training dataset, patients were unable to differentiate human and LLM-generated draft PMAR responses for 76% of the messages, in contrast to the earlier preference for human-generated responses. Majority (72%) of clinicians believed it can reduce cognitive load in dealing with InBasket messages.

Discussion and conclusion: Informed by clinician and patient feedback synergistically, tuning in LLM prompt alone can be effective in creating clinically relevant and useful draft responses to PMARs.

目的:大语言模型(LLMs)已被提出作为解决大量患者医疗建议请求(PMARs)的解决方案。本研究探讨了大语言模型能否生成高质量的 PMAR 回复草稿,从而通过及时的工程设计满足患者和临床医生的需求:我们设计了一种新颖的人工参与迭代流程,用于训练和验证 LLM 在创建适当的 PMAR 回复时的提示。GPT-4 用于生成对信息的响应。我们更新了提示,并在每次迭代中评估了临床医生和患者对 LLM 生成的回复草案的接受程度,并在独立的验证数据集上测试了优化后的提示。优化后的提示在电子健康记录生产环境中实施,并由 69 名初级保健临床医生进行了测试:结果:经过 3 次提示工程迭代后,医生对草案适宜性的接受度从 62% 提高到 84%(P 讨论和结论:在临床医生和患者反馈的协同作用下,单独调整 LLM 提示可有效创建与临床相关且有用的 PMAR 回复草稿。
{"title":"Prompt engineering on leveraging large language models in generating response to InBasket messages.","authors":"Sherry Yan, Wendi Knapp, Andrew Leong, Sarira Kadkhodazadeh, Souvik Das, Veena G Jones, Robert Clark, David Grattendick, Kevin Chen, Lisa Hladik, Lawrence Fagan, Albert Chan","doi":"10.1093/jamia/ocae172","DOIUrl":"https://doi.org/10.1093/jamia/ocae172","url":null,"abstract":"<p><strong>Objectives: </strong>Large Language Models (LLMs) have been proposed as a solution to address high volumes of Patient Medical Advice Requests (PMARs). This study addresses whether LLMs can generate high quality draft responses to PMARs that satisfies both patients and clinicians with prompt engineering.</p><p><strong>Materials and methods: </strong>We designed a novel human-involved iterative processes to train and validate prompts to LLM in creating appropriate responses to PMARs. GPT-4 was used to generate response to the messages. We updated the prompts, and evaluated both clinician and patient acceptance of LLM-generated draft responses at each iteration, and tested the optimized prompt on independent validation data sets. The optimized prompt was implemented in the electronic health record production environment and tested by 69 primary care clinicians.</p><p><strong>Results: </strong>After 3 iterations of prompt engineering, physician acceptance of draft suitability increased from 62% to 84% (P <.001) in the validation dataset (N = 200), and 74% of drafts in the test dataset were rated as \"helpful.\" Patients also noted significantly increased favorability of message tone (78%) and overall quality (80%) for the optimized prompt compared to the original prompt in the training dataset, patients were unable to differentiate human and LLM-generated draft PMAR responses for 76% of the messages, in contrast to the earlier preference for human-generated responses. Majority (72%) of clinicians believed it can reduce cognitive load in dealing with InBasket messages.</p><p><strong>Discussion and conclusion: </strong>Informed by clinician and patient feedback synergistically, tuning in LLM prompt alone can be effective in creating clinically relevant and useful draft responses to PMARs.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The incremental design of a machine learning framework for medical records processing. 医疗记录处理机器学习框架的增量设计。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-17 DOI: 10.1093/jamia/ocae194
Christopher Streiffer, Divya Saini, Gideon Whitehead, Jency Daniel, Carolina Garzon-Mrad, Laura Kavanaugh, Emeka Anyanwu

Objectives: This work presents the development and evaluation of coordn8, a web-based application that streamlines fax processing in outpatient clinics using a "human-in-the-loop" machine learning framework. We demonstrate the effectiveness of the platform at reducing fax processing time and producing accurate machine learning inferences across the tasks of patient identification, document classification, spam classification, and duplicate document detection.

Methods: We deployed coordn8 in 11 outpatient clinics and conducted a time savings analysis by observing users and measuring fax processing event logs. We used statistical methods to evaluate the machine learning components across different datasets to show generalizability. We conducted a time series analysis to show variations in model performance as new clinics were onboarded and to demonstrate our approach to mitigating model drift.

Results: Our observation analysis showed a mean reduction in individual fax processing time by 147.5 s, while our event log analysis of over 7000 faxes reinforced this finding. Document classification produced an accuracy of 81.6%, patient identification produced an accuracy of 83.7%, spam classification produced an accuracy of 98.4%, and duplicate document detection produced a precision of 81.0%. Retraining document classification increased accuracy by 10.2%.

Discussion: coordn8 significantly decreased fax-processing time and produced accurate machine learning inferences. Our human-in-the-loop framework facilitated the collection of high-quality data necessary for model training. Expanding to new clinics correlated with performance decline, which was mitigated through model retraining.

Conclusion: Our framework for automating clinical tasks with machine learning offers a template for health systems looking to implement similar technologies.

目标:本研究介绍了协调人8的开发和评估情况。协调人8是一款基于网络的应用程序,它利用 "人在回路中 "的机器学习框架简化了门诊诊所的传真处理流程。我们展示了该平台在减少传真处理时间方面的有效性,以及在患者识别、文档分类、垃圾邮件分类和重复文档检测等任务中产生准确机器学习推断的有效性:我们在 11 家门诊诊所部署了 coordn8,并通过观察用户和测量传真处理事件日志进行了时间节省分析。我们使用统计方法评估了不同数据集的机器学习组件,以显示其通用性。我们进行了时间序列分析,以显示新诊所加入时模型性能的变化,并展示我们减少模型漂移的方法:观察分析表明,单个传真处理时间平均缩短了 147.5 秒,而对 7000 多份传真进行的事件日志分析进一步证实了这一结论。文档分类的准确率为 81.6%,病人识别的准确率为 83.7%,垃圾邮件分类的准确率为 98.4%,重复文档检测的准确率为 81.0%。讨论:coordn8 显著减少了传真处理时间,并产生了准确的机器学习推论。我们的 "人-环 "框架有助于收集模型训练所需的高质量数据。扩展到新诊所与性能下降有关,但通过模型再训练,性能下降的情况得到了缓解:我们利用机器学习实现临床任务自动化的框架为希望实施类似技术的医疗系统提供了一个模板。
{"title":"The incremental design of a machine learning framework for medical records processing.","authors":"Christopher Streiffer, Divya Saini, Gideon Whitehead, Jency Daniel, Carolina Garzon-Mrad, Laura Kavanaugh, Emeka Anyanwu","doi":"10.1093/jamia/ocae194","DOIUrl":"https://doi.org/10.1093/jamia/ocae194","url":null,"abstract":"<p><strong>Objectives: </strong>This work presents the development and evaluation of coordn8, a web-based application that streamlines fax processing in outpatient clinics using a \"human-in-the-loop\" machine learning framework. We demonstrate the effectiveness of the platform at reducing fax processing time and producing accurate machine learning inferences across the tasks of patient identification, document classification, spam classification, and duplicate document detection.</p><p><strong>Methods: </strong>We deployed coordn8 in 11 outpatient clinics and conducted a time savings analysis by observing users and measuring fax processing event logs. We used statistical methods to evaluate the machine learning components across different datasets to show generalizability. We conducted a time series analysis to show variations in model performance as new clinics were onboarded and to demonstrate our approach to mitigating model drift.</p><p><strong>Results: </strong>Our observation analysis showed a mean reduction in individual fax processing time by 147.5 s, while our event log analysis of over 7000 faxes reinforced this finding. Document classification produced an accuracy of 81.6%, patient identification produced an accuracy of 83.7%, spam classification produced an accuracy of 98.4%, and duplicate document detection produced a precision of 81.0%. Retraining document classification increased accuracy by 10.2%.</p><p><strong>Discussion: </strong>coordn8 significantly decreased fax-processing time and produced accurate machine learning inferences. Our human-in-the-loop framework facilitated the collection of high-quality data necessary for model training. Expanding to new clinics correlated with performance decline, which was mitigated through model retraining.</p><p><strong>Conclusion: </strong>Our framework for automating clinical tasks with machine learning offers a template for health systems looking to implement similar technologies.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141635561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1