首页 > 最新文献

Journal of the American Medical Informatics Association最新文献

英文 中文
Effect of digital tools to promote hospital quality and safety on adverse events after discharge. 促进医院质量与安全的数字化工具对出院后不良事件的影响。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1093/jamia/ocae176
Anant Vasudevan, Savanna Plombon, Nicholas Piniella, Alison Garber, Maria Malik, Erin O'Fallon, Abhishek Goyal, Esteban Gershanik, Vivek Kumar, Julie Fiskio, Cathy Yoon, Stuart R Lipsitz, Jeffrey L Schnipper, Anuj K Dalal

Objectives: Post-discharge adverse events (AEs) are common and heralded by new and worsening symptoms (NWS). We evaluated the effect of electronic health record (EHR)-integrated digital tools designed to promote quality and safety in hospitalized patients on NWS and AEs after discharge.

Materials and methods: Adult general medicine patients at a community hospital were enrolled. We implemented a dashboard which clinicians used to assess safety risks during interdisciplinary rounds. Post-implementation patients were randomized to complete a discharge checklist whose responses were incorporated into the dashboard. Outcomes were assessed using EHR review and 30-day call data adjudicated by 2 clinicians and analyzed using Poisson regression. We conducted comparisons of each exposure on post-discharge outcomes and used selected variables and NWS as independent predictors to model post-discharge AEs using multivariable logistic regression.

Results: A total of 260 patients (122 pre, 71 post [dashboard], 67 post [dashboard plus discharge checklist]) enrolled. The adjusted incidence rate ratios (aIRR) for NWS and AEs were unchanged in the post- compared to pre-implementation period. For patient-reported NWS, aIRR was non-significantly higher for dashboard plus discharge checklist compared to dashboard participants (1.23 [0.97,1.56], P = .08). For post-implementation patients with an AE, aIRR for duration of injury (>1 week) was significantly lower for dashboard plus discharge checklist compared to dashboard participants (0 [0,0.53], P < .01). In multivariable models, certain patient-reported NWS were associated with AEs (3.76 [1.89,7.82], P < .01).

Discussion: While significant reductions in post-discharge AEs were not observed, checklist participants experiencing a post-discharge AE were more likely to report NWS and had a shorter duration of injury.

Conclusion: Interventions designed to prompt patients to report NWS may facilitate earlier detection of AEs after discharge.

Clinicaltrials.gov: NCT05232656.

目的:出院后不良事件(AEs)很常见,并以新症状和恶化症状(NWS)为先兆。我们评估了旨在提高住院患者质量和安全的电子健康记录(EHR)集成数字工具对出院后新症状和不良事件的影响:研究对象为一家社区医院的成人全科患者。我们实施了一个仪表板,临床医生在跨学科查房时用它来评估安全风险。实施后,患者被随机分配填写出院核对表,并将其回复纳入仪表板。结果通过电子病历审查和 30 天呼叫数据进行评估,由两名临床医生裁定,并使用泊松回归进行分析。我们比较了每种暴露对出院后结果的影响,并将选定变量和 NWS 作为独立预测因子,使用多变量逻辑回归对出院后 AEs 进行建模:共有 260 名患者(122 名出院前、71 名出院后[仪表板]、67 名出院后[仪表板加出院检查单])参加了研究。与实施前相比,实施后 NWS 和 AE 的调整后发病率比 (aIRR) 保持不变。就患者报告的 NWS 而言,与仪表板参与者相比,仪表板加出院核对表参与者的 aIRR 较高,但无显著性差异(1.23 [0.97,1.56],P = .08)。对于实施后出现 AE 的患者,与仪表板参与者相比,仪表板加出院核对表患者的损伤持续时间(>1 周)的 aIRR 显著降低(0 [0,0.53],P 讨论):虽然没有观察到出院后 AE 的明显减少,但出院后发生 AE 的核对表参与者更有可能报告 NWS,且受伤持续时间更短:结论:旨在促使患者报告 NWS 的干预措施可能有助于更早地发现出院后 AE:NCT05232656。
{"title":"Effect of digital tools to promote hospital quality and safety on adverse events after discharge.","authors":"Anant Vasudevan, Savanna Plombon, Nicholas Piniella, Alison Garber, Maria Malik, Erin O'Fallon, Abhishek Goyal, Esteban Gershanik, Vivek Kumar, Julie Fiskio, Cathy Yoon, Stuart R Lipsitz, Jeffrey L Schnipper, Anuj K Dalal","doi":"10.1093/jamia/ocae176","DOIUrl":"10.1093/jamia/ocae176","url":null,"abstract":"<p><strong>Objectives: </strong>Post-discharge adverse events (AEs) are common and heralded by new and worsening symptoms (NWS). We evaluated the effect of electronic health record (EHR)-integrated digital tools designed to promote quality and safety in hospitalized patients on NWS and AEs after discharge.</p><p><strong>Materials and methods: </strong>Adult general medicine patients at a community hospital were enrolled. We implemented a dashboard which clinicians used to assess safety risks during interdisciplinary rounds. Post-implementation patients were randomized to complete a discharge checklist whose responses were incorporated into the dashboard. Outcomes were assessed using EHR review and 30-day call data adjudicated by 2 clinicians and analyzed using Poisson regression. We conducted comparisons of each exposure on post-discharge outcomes and used selected variables and NWS as independent predictors to model post-discharge AEs using multivariable logistic regression.</p><p><strong>Results: </strong>A total of 260 patients (122 pre, 71 post [dashboard], 67 post [dashboard plus discharge checklist]) enrolled. The adjusted incidence rate ratios (aIRR) for NWS and AEs were unchanged in the post- compared to pre-implementation period. For patient-reported NWS, aIRR was non-significantly higher for dashboard plus discharge checklist compared to dashboard participants (1.23 [0.97,1.56], P = .08). For post-implementation patients with an AE, aIRR for duration of injury (>1 week) was significantly lower for dashboard plus discharge checklist compared to dashboard participants (0 [0,0.53], P < .01). In multivariable models, certain patient-reported NWS were associated with AEs (3.76 [1.89,7.82], P < .01).</p><p><strong>Discussion: </strong>While significant reductions in post-discharge AEs were not observed, checklist participants experiencing a post-discharge AE were more likely to report NWS and had a shorter duration of injury.</p><p><strong>Conclusion: </strong>Interventions designed to prompt patients to report NWS may facilitate earlier detection of AEs after discharge.</p><p><strong>Clinicaltrials.gov: </strong>NCT05232656.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2304-2314"},"PeriodicalIF":4.7,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413445/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of reinforcement learning for natural language processing and applications in healthcare. 回顾强化学习在自然语言处理和医疗保健中的应用。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1093/jamia/ocae215
Ying Liu, Haozhu Wang, Huixue Zhou, Mingchen Li, Yu Hou, Sicheng Zhou, Fang Wang, Rama Hoetzlein, Rui Zhang

Importance: Reinforcement learning (RL) represents a pivotal avenue within natural language processing (NLP), offering a potent mechanism for acquiring optimal strategies in task completion. This literature review studies various NLP applications where RL has demonstrated efficacy, with notable applications in healthcare settings.

Objectives: To systematically explore the applications of RL in NLP, focusing on its effectiveness in acquiring optimal strategies, particularly in healthcare settings, and provide a comprehensive understanding of RL's potential in NLP tasks.

Materials and methods: Adhering to the PRISMA guidelines, an exhaustive literature review was conducted to identify instances where RL has exhibited success in NLP applications, encompassing dialogue systems, machine translation, question-answering, text summarization, and information extraction. Our methodological approach involves closely examining the technical aspects of RL methodologies employed in these applications, analyzing algorithms, states, rewards, actions, datasets, and encoder-decoder architectures.

Results: The review of 93 papers yields insights into RL algorithms, prevalent techniques, emergent trends, and the fusion of RL methods in NLP healthcare applications. It clarifies the strategic approaches employed, datasets utilized, and the dynamic terrain of RL-NLP systems, thereby offering a roadmap for research and development in RL and machine learning techniques in healthcare. The review also addresses ethical concerns to ensure equity, transparency, and accountability in the evolution and application of RL-based NLP technologies, particularly within sensitive domains such as healthcare.

Discussion: The findings underscore the promising role of RL in advancing NLP applications, particularly in healthcare, where its potential to optimize decision-making and enhance patient outcomes is significant. However, the ethical challenges and technical complexities associated with RL demand careful consideration and ongoing research to ensure responsible and effective implementation.

Conclusions: By systematically exploring RL's applications in NLP and providing insights into technical analysis, ethical implications, and potential advancements, this review contributes to a deeper understanding of RL's role for language processing.

重要性:强化学习(RL)是自然语言处理(NLP)中的一个重要途径,它提供了一种在完成任务过程中获得最佳策略的有效机制。这篇文献综述研究了强化学习在 NLP 中的各种应用,其中强化学习在医疗保健领域的应用效果显著:系统探索 RL 在 NLP 中的应用,重点关注其在获取最佳策略方面的有效性,尤其是在医疗保健领域的应用,并全面了解 RL 在 NLP 任务中的潜力:根据 PRISMA 准则,我们进行了详尽的文献综述,以确定 RL 在 NLP 应用中取得成功的实例,包括对话系统、机器翻译、问题解答、文本摘要和信息提取。我们的方法包括仔细研究这些应用中采用的 RL 方法的技术方面,分析算法、状态、奖励、操作、数据集和编码器-解码器架构:通过对 93 篇论文的综述,我们深入了解了 RL 算法、流行技术、新兴趋势以及 RL 方法在 NLP 医疗保健应用中的融合。它阐明了所采用的战略方法、利用的数据集以及 RL-NLP 系统的动态范围,从而为医疗保健领域的 RL 和机器学习技术的研究与开发提供了路线图。该综述还探讨了伦理问题,以确保基于 RL 的 NLP 技术在发展和应用过程中的公平性、透明度和问责制,尤其是在医疗保健等敏感领域:讨论:研究结果强调了 RL 在推进 NLP 应用方面的重要作用,尤其是在医疗保健领域,因为它在优化决策和提高患者治疗效果方面具有巨大潜力。然而,与 RL 相关的伦理挑战和技术复杂性需要仔细考虑和持续研究,以确保负责任和有效的实施:本综述系统地探讨了 RL 在 NLP 中的应用,并对技术分析、伦理影响和潜在进步提出了见解,有助于加深对 RL 在语言处理中的作用的理解。
{"title":"A review of reinforcement learning for natural language processing and applications in healthcare.","authors":"Ying Liu, Haozhu Wang, Huixue Zhou, Mingchen Li, Yu Hou, Sicheng Zhou, Fang Wang, Rama Hoetzlein, Rui Zhang","doi":"10.1093/jamia/ocae215","DOIUrl":"10.1093/jamia/ocae215","url":null,"abstract":"<p><strong>Importance: </strong>Reinforcement learning (RL) represents a pivotal avenue within natural language processing (NLP), offering a potent mechanism for acquiring optimal strategies in task completion. This literature review studies various NLP applications where RL has demonstrated efficacy, with notable applications in healthcare settings.</p><p><strong>Objectives: </strong>To systematically explore the applications of RL in NLP, focusing on its effectiveness in acquiring optimal strategies, particularly in healthcare settings, and provide a comprehensive understanding of RL's potential in NLP tasks.</p><p><strong>Materials and methods: </strong>Adhering to the PRISMA guidelines, an exhaustive literature review was conducted to identify instances where RL has exhibited success in NLP applications, encompassing dialogue systems, machine translation, question-answering, text summarization, and information extraction. Our methodological approach involves closely examining the technical aspects of RL methodologies employed in these applications, analyzing algorithms, states, rewards, actions, datasets, and encoder-decoder architectures.</p><p><strong>Results: </strong>The review of 93 papers yields insights into RL algorithms, prevalent techniques, emergent trends, and the fusion of RL methods in NLP healthcare applications. It clarifies the strategic approaches employed, datasets utilized, and the dynamic terrain of RL-NLP systems, thereby offering a roadmap for research and development in RL and machine learning techniques in healthcare. The review also addresses ethical concerns to ensure equity, transparency, and accountability in the evolution and application of RL-based NLP technologies, particularly within sensitive domains such as healthcare.</p><p><strong>Discussion: </strong>The findings underscore the promising role of RL in advancing NLP applications, particularly in healthcare, where its potential to optimize decision-making and enhance patient outcomes is significant. However, the ethical challenges and technical complexities associated with RL demand careful consideration and ongoing research to ensure responsible and effective implementation.</p><p><strong>Conclusions: </strong>By systematically exploring RL's applications in NLP and providing insights into technical analysis, ethical implications, and potential advancements, this review contributes to a deeper understanding of RL's role for language processing.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2379-2393"},"PeriodicalIF":4.7,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413430/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142114241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis. 利用语音样本进行深度学习对抑郁症的诊断准确性:系统综述和荟萃分析。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-10-01 DOI: 10.1093/jamia/ocae189
Lidan Liu, Lu Liu, Hatem A Wafa, Florence Tydeman, Wanqing Xie, Yanzhong Wang

Objective: This study aims to conduct a systematic review and meta-analysis of the diagnostic accuracy of deep learning (DL) using speech samples in depression.

Materials and methods: This review included studies reporting diagnostic results of DL algorithms in depression using speech data, published from inception to January 31, 2024, on PubMed, Medline, Embase, PsycINFO, Scopus, IEEE, and Web of Science databases. Pooled accuracy, sensitivity, and specificity were obtained by random-effect models. The diagnostic Precision Study Quality Assessment Tool (QUADAS-2) was used to assess the risk of bias.

Results: A total of 25 studies met the inclusion criteria and 8 of them were used in the meta-analysis. The pooled estimates of accuracy, specificity, and sensitivity for depression detection models were 0.87 (95% CI, 0.81-0.93), 0.85 (95% CI, 0.78-0.91), and 0.82 (95% CI, 0.71-0.94), respectively. When stratified by model structure, the highest pooled diagnostic accuracy was 0.89 (95% CI, 0.81-0.97) in the handcrafted group.

Discussion: To our knowledge, our study is the first meta-analysis on the diagnostic performance of DL for depression detection from speech samples. All studies included in the meta-analysis used convolutional neural network (CNN) models, posing problems in deciphering the performance of other DL algorithms. The handcrafted model performed better than the end-to-end model in speech depression detection.

Conclusions: The application of DL in speech provided a useful tool for depression detection. CNN models with handcrafted acoustic features could help to improve the diagnostic performance.

Protocol registration: The study protocol was registered on PROSPERO (CRD42023423603).

研究目的本研究旨在对使用语音样本的深度学习(DL)对抑郁症的诊断准确性进行系统综述和荟萃分析:本综述纳入了PubMed、Medline、Embase、PsycINFO、Scopus、IEEE和Web of Science数据库中从开始到2024年1月31日发表的、报告使用语音数据的深度学习算法对抑郁症的诊断结果的研究。通过随机效应模型得出了汇总的准确性、敏感性和特异性。诊断精确性研究质量评估工具(QUADAS-2)用于评估偏倚风险:共有 25 项研究符合纳入标准,其中 8 项用于荟萃分析。抑郁检测模型的准确性、特异性和敏感性的汇总估计值分别为 0.87(95% CI,0.81-0.93)、0.85(95% CI,0.78-0.91)和 0.82(95% CI,0.71-0.94)。按模型结构分层后,手工组的汇总诊断准确率最高,为 0.89(95% CI,0.81-0.97):据我们所知,我们的研究是首次对 DL 从语音样本中检测抑郁的诊断性能进行荟萃分析。所有纳入荟萃分析的研究都使用了卷积神经网络(CNN)模型,这给解读其他 DL 算法的性能带来了问题。在语音抑郁检测中,手工制作的模型比端到端模型表现更好:在语音中应用 DL 为抑郁检测提供了有用的工具。带有手工制作声学特征的 CNN 模型有助于提高诊断性能:研究方案已在 PROSPERO(CRD42023423603)上注册。
{"title":"Diagnostic accuracy of deep learning using speech samples in depression: a systematic review and meta-analysis.","authors":"Lidan Liu, Lu Liu, Hatem A Wafa, Florence Tydeman, Wanqing Xie, Yanzhong Wang","doi":"10.1093/jamia/ocae189","DOIUrl":"10.1093/jamia/ocae189","url":null,"abstract":"<p><strong>Objective: </strong>This study aims to conduct a systematic review and meta-analysis of the diagnostic accuracy of deep learning (DL) using speech samples in depression.</p><p><strong>Materials and methods: </strong>This review included studies reporting diagnostic results of DL algorithms in depression using speech data, published from inception to January 31, 2024, on PubMed, Medline, Embase, PsycINFO, Scopus, IEEE, and Web of Science databases. Pooled accuracy, sensitivity, and specificity were obtained by random-effect models. The diagnostic Precision Study Quality Assessment Tool (QUADAS-2) was used to assess the risk of bias.</p><p><strong>Results: </strong>A total of 25 studies met the inclusion criteria and 8 of them were used in the meta-analysis. The pooled estimates of accuracy, specificity, and sensitivity for depression detection models were 0.87 (95% CI, 0.81-0.93), 0.85 (95% CI, 0.78-0.91), and 0.82 (95% CI, 0.71-0.94), respectively. When stratified by model structure, the highest pooled diagnostic accuracy was 0.89 (95% CI, 0.81-0.97) in the handcrafted group.</p><p><strong>Discussion: </strong>To our knowledge, our study is the first meta-analysis on the diagnostic performance of DL for depression detection from speech samples. All studies included in the meta-analysis used convolutional neural network (CNN) models, posing problems in deciphering the performance of other DL algorithms. The handcrafted model performed better than the end-to-end model in speech depression detection.</p><p><strong>Conclusions: </strong>The application of DL in speech provided a useful tool for depression detection. CNN models with handcrafted acoustic features could help to improve the diagnostic performance.</p><p><strong>Protocol registration: </strong>The study protocol was registered on PROSPERO (CRD42023423603).</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"2394-2404"},"PeriodicalIF":4.7,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413444/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141629195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Are medical history data fit for risk stratification of patients with chest pain in emergency care? Comparing data collected from patients using computerized history taking with data documented by physicians in the electronic health record in the CLEOS-CPDS prospective cohort study. 更正:病史数据是否适合对急诊胸痛患者进行风险分层?在 CLEOS-CPDS 前瞻性队列研究中,将使用电脑病史采集系统收集的患者数据与医生在电子健康记录中记录的数据进行比较。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-28 DOI: 10.1093/jamia/ocae252
{"title":"Correction to: Are medical history data fit for risk stratification of patients with chest pain in emergency care? Comparing data collected from patients using computerized history taking with data documented by physicians in the electronic health record in the CLEOS-CPDS prospective cohort study.","authors":"","doi":"10.1093/jamia/ocae252","DOIUrl":"https://doi.org/10.1093/jamia/ocae252","url":null,"abstract":"","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure messaging telehealth billing in the digital age: moving beyond time-based metrics. 数字时代的安全信息远程医疗计费:超越基于时间的指标。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-26 DOI: 10.1093/jamia/ocae250
Dong-Gil Ko, Umberto Tachinardi, Eric J Warm

Objective: We proposed adopting billing models for secure messaging (SM) telehealth services that move beyond time-based metrics, focusing on the complexity and clinical expertise involved in patient care.

Materials and methods: We trained 8 classification machine learning (ML) models using providers' electronic health record (EHR) audit log data for patient-initiated non-urgent messages. Mixed effect modeling (MEM) analyzed significance.

Results: Accuracy and area under the receiver operating characteristics curve scores generally exceeded 0.85, demonstrating robust performance. MEM showed that knowledge domains significantly influenced SM billing, explaining nearly 40% of the variance.

Discussion: This study demonstrates that ML models using EHR audit log data can improve and predict billing in SM telehealth services, supporting billing models that reflect clinical complexity and expertise rather than time-based metrics.

Conclusion: Our research highlights the need for SM billing models beyond time-based metrics, using EHR audit log data to capture the true value of clinical work.

目的:我们建议对安全信息传送(SM)远程医疗服务采用计费模式:我们建议对安全信息(SM)远程医疗服务采用计费模式,这种模式超越了基于时间的衡量标准,侧重于患者护理所涉及的复杂性和临床专业知识:我们使用医疗服务提供者的电子健康记录(EHR)审计日志数据,针对患者发起的非紧急信息训练了 8 个分类机器学习(ML)模型。混合效应建模(MEM)分析了显著性:结果:准确率和接收者工作特征曲线下面积得分普遍超过 0.85,显示出强大的性能。混合效应模型显示,知识域对 SM 计费有显著影响,解释了近 40% 的方差:本研究表明,使用 EHR 审计日志数据的 ML 模型可以改进和预测 SM 远程医疗服务的计费,支持反映临床复杂性和专业知识而非基于时间指标的计费模型:我们的研究强调,除了基于时间的指标外,还需要使用电子病历审计日志数据来捕捉临床工作的真正价值,从而建立 SM 计费模型。
{"title":"Secure messaging telehealth billing in the digital age: moving beyond time-based metrics.","authors":"Dong-Gil Ko, Umberto Tachinardi, Eric J Warm","doi":"10.1093/jamia/ocae250","DOIUrl":"https://doi.org/10.1093/jamia/ocae250","url":null,"abstract":"<p><strong>Objective: </strong>We proposed adopting billing models for secure messaging (SM) telehealth services that move beyond time-based metrics, focusing on the complexity and clinical expertise involved in patient care.</p><p><strong>Materials and methods: </strong>We trained 8 classification machine learning (ML) models using providers' electronic health record (EHR) audit log data for patient-initiated non-urgent messages. Mixed effect modeling (MEM) analyzed significance.</p><p><strong>Results: </strong>Accuracy and area under the receiver operating characteristics curve scores generally exceeded 0.85, demonstrating robust performance. MEM showed that knowledge domains significantly influenced SM billing, explaining nearly 40% of the variance.</p><p><strong>Discussion: </strong>This study demonstrates that ML models using EHR audit log data can improve and predict billing in SM telehealth services, supporting billing models that reflect clinical complexity and expertise rather than time-based metrics.</p><p><strong>Conclusion: </strong>Our research highlights the need for SM billing models beyond time-based metrics, using EHR audit log data to capture the true value of clinical work.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cigarette smoking, e-cigarette use, and sociodemographic correlates of mental health and tobacco-related disease risk in the All of Us research program. 在 "我们所有人 "研究项目中,吸烟、使用电子烟以及心理健康和烟草相关疾病风险的社会人口学相关因素。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-04 DOI: 10.1093/jamia/ocae237
Thomas R Kirchner, Danning Tian, Jian Li, Pranjal Srivastava, Yihao Zheng

Significance: Research on the conditions under which electronic cigarette (EC) use produces a net reduction in the population harm attributable to combusted cigarette (CC) use requires the triangulation of information from cohort(s) of smokers, non-smokers, EC users, and dual-users of all varieties.

Materials and methods: This project utilizes data from the All of Us Research Program to contrast a panel of wellness and disease-risk indicators across a range of self-reported tobacco-use profiles, including smokers, current, and former EC users. This article focuses on the tobacco use history and current tobacco use status among All of Us participants enrolled between May 2017 and February 2023 (Registered Controlled Tier Curated Data Repository [CDR] v7).

Results: The present analytic sample included an unweighted total of N = 412 211 individuals with information on ever-use of both CC and EC. Among them, 155 901 individuals have a history of CC use, with 65 206 identified as current smokers. EC usage is reported by 64 002 individuals, with 16 619 being current users. Model predicted analyses identified distinct patterns in CC and EC usage across demographic and socioeconomic variables, with younger ages favoring ECs.

Discussion: Age was observed to significantly affect EC usage, and gender differences reveal that males were significantly more likely to use CC and/or EC than females or African Americans of any gender. Higher educational achievement and income were associated with lower use of both CC and EC, while lower levels of mental health were observed to increase the likelihood of using CC and EC products.

Conclusion: Findings suggest the potential for the All of Us Research Program for investigation of causal factors driving both behavioral use transitions and cessation outcomes.

意义重大:研究电子香烟(EC)的使用在何种条件下能净减少因使用燃烧香烟(CC)而对人群造成的伤害,需要对吸烟者、非吸烟者、EC使用者和各种双重使用者的群体信息进行三角测量:本项目利用 "我们所有人研究计划"(All of Us Research Program)的数据,对一系列自我报告的烟草使用情况中的健康和疾病风险指标进行对比,包括吸烟者、目前和以前的EC使用者。本文重点研究了2017年5月至2023年2月期间注册的 "我们所有人 "参与者的烟草使用史和当前烟草使用状况(注册控制层策划数据存储库[CDR] v7):本分析样本包括N = 412 211名曾经使用过CC和EC的非加权个体。其中,155 901 人有使用 CC 的历史,65 206 人被确认为当前吸烟者。64002人报告使用过EC,其中16619人为当前使用者。通过模型预测分析发现,在不同的人口和社会经济变量中,CC和EC的使用模式各不相同,年龄越小越倾向于使用EC:讨论:据观察,年龄对使用EC有明显影响,性别差异显示,男性使用CC和/或EC的可能性明显高于女性或任何性别的非裔美国人。较高的教育成就和收入与较少使用CC和EC有关,而较低的心理健康水平则增加了使用CC和EC产品的可能性:研究结果表明,"我们所有人 "研究计划有潜力调查驱动行为使用转变和戒烟结果的因果因素。
{"title":"Cigarette smoking, e-cigarette use, and sociodemographic correlates of mental health and tobacco-related disease risk in the All of Us research program.","authors":"Thomas R Kirchner, Danning Tian, Jian Li, Pranjal Srivastava, Yihao Zheng","doi":"10.1093/jamia/ocae237","DOIUrl":"https://doi.org/10.1093/jamia/ocae237","url":null,"abstract":"<p><strong>Significance: </strong>Research on the conditions under which electronic cigarette (EC) use produces a net reduction in the population harm attributable to combusted cigarette (CC) use requires the triangulation of information from cohort(s) of smokers, non-smokers, EC users, and dual-users of all varieties.</p><p><strong>Materials and methods: </strong>This project utilizes data from the All of Us Research Program to contrast a panel of wellness and disease-risk indicators across a range of self-reported tobacco-use profiles, including smokers, current, and former EC users. This article focuses on the tobacco use history and current tobacco use status among All of Us participants enrolled between May 2017 and February 2023 (Registered Controlled Tier Curated Data Repository [CDR] v7).</p><p><strong>Results: </strong>The present analytic sample included an unweighted total of N = 412 211 individuals with information on ever-use of both CC and EC. Among them, 155 901 individuals have a history of CC use, with 65 206 identified as current smokers. EC usage is reported by 64 002 individuals, with 16 619 being current users. Model predicted analyses identified distinct patterns in CC and EC usage across demographic and socioeconomic variables, with younger ages favoring ECs.</p><p><strong>Discussion: </strong>Age was observed to significantly affect EC usage, and gender differences reveal that males were significantly more likely to use CC and/or EC than females or African Americans of any gender. Higher educational achievement and income were associated with lower use of both CC and EC, while lower levels of mental health were observed to increase the likelihood of using CC and EC products.</p><p><strong>Conclusion: </strong>Findings suggest the potential for the All of Us Research Program for investigation of causal factors driving both behavioral use transitions and cessation outcomes.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Socioeconomic disparities in kidney transplant access for patients with end-stage kidney disease within the All of Us Research Program. 在 "我们所有人 "研究计划中,终末期肾病患者接受肾移植的社会经济差距。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-02 DOI: 10.1093/jamia/ocae178
Jiayuan Wang, Kellie C Cho, Ekamol Tantisattamo

Objectives: Disparity in kidney transplant access has been demonstrated by a disproportionately low rate of kidney transplantation in socioeconomically disadvantaged patients. However, the information is not from national representative populations with end-stage kidney disease (ESKD). We aim to examine whether socioeconomic disparity for kidney transplant access exists by utilizing data from the All of Us Research Program.

Materials and methods: We analyzed data of adult ESKD patients using the All of Us Researcher Workbench. The association of socioeconomic data including types of health insurance, levels of education, and household incomes with kidney transplant access was evaluated by multivariable logistic regression analysis adjusted by baseline demographic, medical comorbidities, and behavioral information.

Results: Among 4078 adults with ESKD, mean diagnosis age was 54 and 51.64% were male. The majority had Medicare (39.6%), were non-graduate college (75.79%), and earned $10 000-24 999 annual income (20.16%). After adjusting for potential confounders, insurance status emerged as a significant predictor of kidney transplant access. Individuals covered by Medicaid (adjusted odds ratio [AOR] 0.45; 95% confidence interval [CI], 0.35-0.58; P-value < .001) or uninsured (AOR 0.21; 95% CI, 0.12-0.37; P-value < .001) exhibited lower odds of transplantation compared to those with private insurance.

Discussion/conclusion: Our findings reveal the influence of insurance status and socioeconomic factors on access to kidney transplantation among ESKD patients. Addressing these disparities through expanded insurance coverage and improved healthcare access is vital for promoting equitable treatment and enhancing health outcomes in vulnerable populations.

目的:社会经济状况不佳的患者接受肾移植的比例过低,这证明了肾移植机会的不均等。然而,这些信息并非来自具有全国代表性的终末期肾病(ESKD)患者。我们旨在利用 "我们所有人研究计划"(All of Us Research Program)的数据,研究肾移植机会是否存在社会经济差异:我们使用 "我们所有人 "研究人员工作台分析了成年 ESKD 患者的数据。通过多变量逻辑回归分析评估了社会经济数据(包括医疗保险类型、教育水平和家庭收入)与肾移植机会的关系,并对基线人口学、医疗合并症和行为信息进行了调整:在 4078 名 ESKD 患者中,平均诊断年龄为 54 岁,51.64% 为男性。大多数人有医疗保险(39.6%),非研究生学历(75.79%),年收入在 10 000-24 999 美元之间(20.16%)。在对潜在的混杂因素进行调整后,保险状况成为肾移植机会的重要预测因素。与有私人保险的患者相比,有医疗补助的患者(调整后的几率比[AOR]为 0.45;95% 置信区间[CI]为 0.35-0.58;P 值< .001)或无保险的患者(AOR 为 0.21;95% 置信区间[CI]为 0.12-0.37;P 值< .001)接受移植的几率较低:我们的研究结果揭示了保险状况和社会经济因素对 ESKD 患者接受肾移植的影响。通过扩大保险覆盖面和改善医疗服务来解决这些差异,对于促进公平治疗和提高弱势群体的健康状况至关重要。
{"title":"Socioeconomic disparities in kidney transplant access for patients with end-stage kidney disease within the All of Us Research Program.","authors":"Jiayuan Wang, Kellie C Cho, Ekamol Tantisattamo","doi":"10.1093/jamia/ocae178","DOIUrl":"https://doi.org/10.1093/jamia/ocae178","url":null,"abstract":"<p><strong>Objectives: </strong>Disparity in kidney transplant access has been demonstrated by a disproportionately low rate of kidney transplantation in socioeconomically disadvantaged patients. However, the information is not from national representative populations with end-stage kidney disease (ESKD). We aim to examine whether socioeconomic disparity for kidney transplant access exists by utilizing data from the All of Us Research Program.</p><p><strong>Materials and methods: </strong>We analyzed data of adult ESKD patients using the All of Us Researcher Workbench. The association of socioeconomic data including types of health insurance, levels of education, and household incomes with kidney transplant access was evaluated by multivariable logistic regression analysis adjusted by baseline demographic, medical comorbidities, and behavioral information.</p><p><strong>Results: </strong>Among 4078 adults with ESKD, mean diagnosis age was 54 and 51.64% were male. The majority had Medicare (39.6%), were non-graduate college (75.79%), and earned $10 000-24 999 annual income (20.16%). After adjusting for potential confounders, insurance status emerged as a significant predictor of kidney transplant access. Individuals covered by Medicaid (adjusted odds ratio [AOR] 0.45; 95% confidence interval [CI], 0.35-0.58; P-value < .001) or uninsured (AOR 0.21; 95% CI, 0.12-0.37; P-value < .001) exhibited lower odds of transplantation compared to those with private insurance.</p><p><strong>Discussion/conclusion: </strong>Our findings reveal the influence of insurance status and socioeconomic factors on access to kidney transplantation among ESKD patients. Addressing these disparities through expanded insurance coverage and improved healthcare access is vital for promoting equitable treatment and enhancing health outcomes in vulnerable populations.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings. 评估 GPT-4 识别可操作的偶然放射学发现并生成患者指南的能力。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-01 DOI: 10.1093/jamia/ocae117
Kar-Mun C Woo, Gregory W Simon, Olumide Akindutire, Yindalon Aphinyanaphongs, Jonathan S Austrian, Jung G Kim, Nicholas Genes, Jacob A Goldenring, Vincent J Major, Chloé S Pariente, Edwin G Pineda, Stella K Kang

Objectives: To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings.

Materials and methods: Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as "definitely actionable" (DA) or "possibly actionable-clinical correlation" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale.

Results: For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were "hallucinated" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision.

Conclusion: GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via "human-in-the-loop" workflows remains critical for clinical implementation.

目的评估符合 HIPAA 标准的 GPT-4 版本在从急诊科患者的非结构化放射学报告中识别可操作的偶然发现方面的能力。评估人工智能(AI)生成的、面向患者的这些检查结果摘要的适当性:人工审核从一家大型学术医疗中心的电子健康记录中提取的放射学报告,以确定极有可能需要随访的非急诊、偶然发现,并进一步细分为 "绝对可操作"(DA)或 "可能可操作-临床相关"(PA-CC)。我们开发了 GPT-4 的指导提示,并使用 50 份报告的验证集进行了反复优化。然后将优化后的提示应用于由 430 份未见报告组成的测试集。GPT-4 的表现主要根据识别 DA 或 PA-CC 结果的准确性进行评分,其次才是单独识别 DA 结果的准确性。对输出结果进行了幻觉审查。通过李克特量表对人工智能生成的面向患者的摘要进行适当性评估:对于主要结果(DA 或 PA-CC),GPT-4 的召回率为 99.3%,精确率为 73.6%,F-1 为 84.5%。对于次要结果(仅 DA),GPT-4 的召回率为 95.2%,精确度为 77.3%,F-1 为 85.3%。没有发现 "幻觉"。不过,有 2.8% 的案例生成了有关建议的文本,这些建议是在没有具体参考的情况下推断出来的。大多数 "真阳性 "人工智能生成的摘要无需修改或只需少量修改:结论:GPT-4 展示了在完善的指导提示后检测可操作的偶然发现的能力。人工智能生成的患者指导通常是适当的,但很少包含推断建议。虽然这项技术有望增强诊断效果,但临床医生通过 "人在回路中 "的工作流程进行积极监督对于临床实施仍然至关重要。
{"title":"Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings.","authors":"Kar-Mun C Woo, Gregory W Simon, Olumide Akindutire, Yindalon Aphinyanaphongs, Jonathan S Austrian, Jung G Kim, Nicholas Genes, Jacob A Goldenring, Vincent J Major, Chloé S Pariente, Edwin G Pineda, Stella K Kang","doi":"10.1093/jamia/ocae117","DOIUrl":"10.1093/jamia/ocae117","url":null,"abstract":"<p><strong>Objectives: </strong>To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings.</p><p><strong>Materials and methods: </strong>Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as \"definitely actionable\" (DA) or \"possibly actionable-clinical correlation\" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale.</p><p><strong>Results: </strong>For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were \"hallucinated\" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision.</p><p><strong>Conclusion: </strong>GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via \"human-in-the-loop\" workflows remains critical for clinical implementation.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1983-1993"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339516/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141082064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A question-answering framework for automated abstract screening using large language models. 利用大型语言模型自动筛选摘要的问题解答框架。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-01 DOI: 10.1093/jamia/ocae166
Opeoluwa Akinseloyin, Xiaorui Jiang, Vasile Palade

Objective: This paper aims to address the challenges in abstract screening within systematic reviews (SR) by leveraging the zero-shot capabilities of large language models (LLMs).

Methods: We employ LLM to prioritize candidate studies by aligning abstracts with the selection criteria outlined in an SR protocol. Abstract screening was transformed into a novel question-answering (QA) framework, treating each selection criterion as a question addressed by LLM. The framework involves breaking down the selection criteria into multiple questions, properly prompting LLM to answer each question, scoring and re-ranking each answer, and combining the responses to make nuanced inclusion or exclusion decisions.

Results and discussion: Large-scale validation was performed on the benchmark of CLEF eHealth 2019 Task 2: Technology-Assisted Reviews in Empirical Medicine. Focusing on GPT-3.5 as a case study, the proposed QA framework consistently exhibited a clear advantage over traditional information retrieval approaches and bespoke BERT-family models that were fine-tuned for prioritizing candidate studies (ie, from the BERT to PubMedBERT) across 31 datasets of 4 categories of SRs, underscoring their high potential in facilitating abstract screening. The experiments also showcased the viability of using selection criteria as a query for reference prioritization. The experiments also showcased the viability of the framework using different LLMs.

Conclusion: Investigation justified the indispensable value of leveraging selection criteria to improve the performance of automated abstract screening. LLMs demonstrated proficiency in prioritizing candidate studies for abstract screening using the proposed QA framework. Significant performance improvements were obtained by re-ranking answers using the semantic alignment between abstracts and selection criteria. This further highlighted the pertinence of utilizing selection criteria to enhance abstract screening.

目的本文旨在利用大型语言模型(LLM)的零误差功能,解决系统综述(SR)中摘要筛选的难题:我们利用 LLM 将摘要与系统综述协议中列出的筛选标准进行比对,从而确定候选研究的优先次序。摘要筛选被转化为一个新颖的问题解答(QA)框架,将每个选择标准视为 LLM 所要解决的问题。该框架包括将筛选标准分解为多个问题,适当提示 LLM 回答每个问题,对每个答案进行评分和重新排序,并将回答结合起来,以做出细致入微的纳入或排除决定:在 CLEF eHealth 2019 任务 2:经验医学中的技术辅助综述的基准上进行了大规模验证。以 GPT-3.5 为案例,在 4 类 SR 的 31 个数据集中,与传统的信息检索方法和为确定候选研究的优先级而微调的定制 BERT 系列模型(即从 BERT 到 PubMedBERT)相比,所提出的 QA 框架始终表现出明显的优势,凸显了其在促进摘要筛选方面的巨大潜力。实验还展示了使用选择标准作为参考文献优先排序查询的可行性。实验还展示了该框架在使用不同 LLM 时的可行性:调查证明,利用选择标准来提高自动摘要筛选的性能具有不可或缺的价值。使用所提出的质量保证框架,LLMs 能熟练地为摘要筛选确定候选研究的优先次序。利用摘要与选择标准之间的语义一致性对答案进行重新排序,可显著提高性能。这进一步凸显了利用选择标准加强摘要筛选的针对性。
{"title":"A question-answering framework for automated abstract screening using large language models.","authors":"Opeoluwa Akinseloyin, Xiaorui Jiang, Vasile Palade","doi":"10.1093/jamia/ocae166","DOIUrl":"10.1093/jamia/ocae166","url":null,"abstract":"<p><strong>Objective: </strong>This paper aims to address the challenges in abstract screening within systematic reviews (SR) by leveraging the zero-shot capabilities of large language models (LLMs).</p><p><strong>Methods: </strong>We employ LLM to prioritize candidate studies by aligning abstracts with the selection criteria outlined in an SR protocol. Abstract screening was transformed into a novel question-answering (QA) framework, treating each selection criterion as a question addressed by LLM. The framework involves breaking down the selection criteria into multiple questions, properly prompting LLM to answer each question, scoring and re-ranking each answer, and combining the responses to make nuanced inclusion or exclusion decisions.</p><p><strong>Results and discussion: </strong>Large-scale validation was performed on the benchmark of CLEF eHealth 2019 Task 2: Technology-Assisted Reviews in Empirical Medicine. Focusing on GPT-3.5 as a case study, the proposed QA framework consistently exhibited a clear advantage over traditional information retrieval approaches and bespoke BERT-family models that were fine-tuned for prioritizing candidate studies (ie, from the BERT to PubMedBERT) across 31 datasets of 4 categories of SRs, underscoring their high potential in facilitating abstract screening. The experiments also showcased the viability of using selection criteria as a query for reference prioritization. The experiments also showcased the viability of the framework using different LLMs.</p><p><strong>Conclusion: </strong>Investigation justified the indispensable value of leveraging selection criteria to improve the performance of automated abstract screening. LLMs demonstrated proficiency in prioritizing candidate studies for abstract screening using the proposed QA framework. Significant performance improvements were obtained by re-ranking answers using the semantic alignment between abstracts and selection criteria. This further highlighted the pertinence of utilizing selection criteria to enhance abstract screening.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1939-1952"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339526/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BioInstruct: instruction tuning of large language models for biomedical natural language processing. BioInstruct:生物医学自然语言处理大型语言模型的指令调整。
IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-01 DOI: 10.1093/jamia/ocae122
Hieu Tran, Zhichao Yang, Zonghai Yao, Hong Yu

Objectives: To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.

Materials and methods: We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.

Results and discussion: Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.

Conclusion: The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.

目标:在生物医学自然语言处理(BioNLP)中,通过引入特定领域的指令数据集,提高大型语言模型(LLM)的性能,并研究其与多任务学习原理相结合的影响:我们创建了 BioInstruct,其中包括 25 005 条指令,用于指令调整 LLM(LLaMA 1 和 2、7B 和 13B 版本)。这些指令是通过提示 GPT-4 语言模型,并从 80 个人类编写的指令中随机抽取 3 个种子样本创建的。我们采用了低库适应(Low-Rank Adaptation,LoRA)技术进行参数高效微调。然后,我们在多个生物 NLP 任务中对这些经过指令调整的 LLM 进行了评估,这些任务可分为三大类:问题解答(QA)、信息提取(IE)和文本生成(GEN)。我们还研究了指令类别(如 QA、IE 和生成)是否会影响模型性能:与没有经过指令调整的 LLM 相比,我们经过指令调整的 LLM 的性能显著提高:在平均准确度指标上,QA 的性能提高了 17.3%;在平均 F1 指标上,IE 的性能提高了 5.7%;在平均 GPT-4 分数指标上,生成任务的性能提高了 96%。我们的经过 7B 参数指令调整的 LLaMA 1 模型在生物医学领域具有竞争力,甚至超过了其他 LLM,这些 LLM 也是通过大量特定领域数据或各种任务对 LLaMA 1 进行微调而成的。我们的研究结果还表明,当使用密切相关的任务进行指令微调时,性能增益明显更高。我们的研究结果与多任务学习的观察结果一致,表明了两个任务之间的协同作用:结论:BioInstruct 数据集是一个宝贵的资源,经过指令微调的 LLM 可以产生性能最佳的 BioNLP 应用程序。
{"title":"BioInstruct: instruction tuning of large language models for biomedical natural language processing.","authors":"Hieu Tran, Zhichao Yang, Zonghai Yao, Hong Yu","doi":"10.1093/jamia/ocae122","DOIUrl":"10.1093/jamia/ocae122","url":null,"abstract":"<p><strong>Objectives: </strong>To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.</p><p><strong>Materials and methods: </strong>We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.</p><p><strong>Results and discussion: </strong>Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.</p><p><strong>Conclusion: </strong>The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1821-1832"},"PeriodicalIF":4.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11339494/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141237485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the American Medical Informatics Association
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1