首页 > 最新文献

JMIR Medical Informatics最新文献

英文 中文
A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation 国家 COVID-19 队列协作组织和研究 COVID 以加强 COVID-19 或 SARS CoV-2 感染后急性后遗症自然语言处理系统恢复计划的开放式健康自然语言处理工具包案例演示:算法开发与验证
IF 3.2 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-09-09 DOI: 10.2196/49997
Andrew Wen, Liwei Wang, Huan He, Sunyang Fu, Sijia Liu, David A Hanauer, Daniel R Harris, Ramakanth Kavuluru, Rui Zhang, Karthik Natarajan, Nishanth P Pavinkurve, Janos Hajagos, Sritha Rajupet, Veena Lingam, Mary Saltz, Corey Elowsky, Richard A Moffitt, Farrukh M Koraishy, Matvey B Palchuk, Jordan Donovan, Lora Lingrey, Garo Stone-DerHagopian, Robert T Miller, Andrew E Williams, Peter J Leese, Paul I Kovach, Emily R Pfaff, Mikhail Zemmel, Robert D Pates, Nick Guthe, Melissa A Haendel, Christopher G Chute, Hongfang Liu, National COVID Cohort Collaborative, The RECOVER Initiative
Background: A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC). Objective: This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC. Methods: We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm. Results: An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites. Conclusions: The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement.
背景:只有在非结构化的临床叙述中才能获得大量临床相关信息,因此临床自然语言处理(NLP)备受关注。虽然有多种 NLP 方法,但目前的算法开发方法存在局限性,可能会延缓开发进程。当任务是紧急任务时,这些局限性就会加剧,目前对 COVID-19 和 SARS-CoV-2 感染急性后遗症 (PASC) 体征和症状的 NLP 提取就是这种情况。研究目的本研究旨在强调现有 NLP 算法开发方法目前存在的局限性,这些局限性会因围绕紧急临床概念的 NLP 任务而加剧,并通过开发 COVID-19 和 PASC 体征和症状 NLP 系统的使用案例,说明我们解决这些问题的方法。方法:我们以两份已有的 PASC 研究为基线,确定了一组应由 NLP 提取的概念。然后将该概念列表与统一医学语言系统结合使用,自主生成扩展词典,对训练集进行弱注释,再由人类专家进行审核,生成微调的 NLP 算法。然后,将完全由人类注释的测试集的注释与微调算法的 NLP 结果进行比较。然后,NLP 算法被部署到另外 10 个同样运行我们的 NLP 基础架构的站点。在这 10 个站点中,有 5 个用于对 NLP 算法进行联合评估。结果:为了提取 COVID-19 或 PASC 体征和症状,我们开发了一种 NLP 算法,该算法由 12,234 个独特的规范化文本字符串组成,对应 2366 个独特的概念。5 个站点的非加权平均词典覆盖率为 77.8%。结论PASC NLP 任务的进化性和时间紧迫性大大增加了现有 NLP 算法开发方法的复杂性。在这项工作中,我们提出了一种使用开放式健康自然语言处理工具包的混合方法,旨在通过基于词典的弱标记步骤来满足这些需求,该步骤最大限度地减少了对额外专家注释的需求,同时还保留了专家参与的微调能力。
{"title":"A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation","authors":"Andrew Wen, Liwei Wang, Huan He, Sunyang Fu, Sijia Liu, David A Hanauer, Daniel R Harris, Ramakanth Kavuluru, Rui Zhang, Karthik Natarajan, Nishanth P Pavinkurve, Janos Hajagos, Sritha Rajupet, Veena Lingam, Mary Saltz, Corey Elowsky, Richard A Moffitt, Farrukh M Koraishy, Matvey B Palchuk, Jordan Donovan, Lora Lingrey, Garo Stone-DerHagopian, Robert T Miller, Andrew E Williams, Peter J Leese, Paul I Kovach, Emily R Pfaff, Mikhail Zemmel, Robert D Pates, Nick Guthe, Melissa A Haendel, Christopher G Chute, Hongfang Liu, National COVID Cohort Collaborative, The RECOVER Initiative","doi":"10.2196/49997","DOIUrl":"https://doi.org/10.2196/49997","url":null,"abstract":"<strong>Background:</strong> A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC). <strong>Objective:</strong> This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC. <strong>Methods:</strong> We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm. <strong>Results:</strong> An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites. <strong>Conclusions:</strong> The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"3 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142199768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alarm Management in Provisional COVID-19 Intensive Care Units: Retrospective Analysis and Recommendations for Future Pandemics 临时 COVID-19 重症监护病房的警报管理:回顾性分析和对未来大流行的建议
IF 3.2 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-09-09 DOI: 10.2196/58347
Maximilian Markus Wunderlich, Nicolas Frey, Sandro Amende-Wolf, Carl Hinrichs, Felix Balzer, Akira-Sebastian Poncette
Background: In response to the high patient admission rates during the COVID-19 pandemic, provisional intensive care units (ICUs) were set up, equipped with temporary monitoring and alarm systems. We sought to find out whether the provisional ICU setting led to a higher alarm burden and more staff with alarm fatigue. Objective: We aimed to compare alarm situations between provisional COVID-19 ICUs and non–COVID-19 ICUs during the second COVID-19 wave in Berlin, Germany. The study focused on measuring alarms per bed per day, identifying medical devices with higher alarm frequencies in COVID-19 settings, evaluating the median duration of alarms in both types of ICUs, and assessing the level of alarm fatigue experienced by health care staff. Methods: Our approach involved a comparative analysis of alarm data from 2 provisional COVID-19 ICUs and 2 standard non–COVID-19 ICUs. Through interviews with medical experts, we formulated hypotheses about potential differences in alarm load, alarm duration, alarm types, and staff alarm fatigue between the 2 ICU types. We analyzed alarm log data from the patient monitoring systems of all 4 ICUs to inferentially assess the differences. In addition, we assessed staff alarm fatigue with a questionnaire, aiming to comprehensively understand the impact of the alarm situation on health care personnel. Results: COVID-19 ICUs had significantly more alarms per bed per day than non–COVID-19 ICUs (P<.001), and the majority of the staff lacked experience with the alarm system. The overall median alarm duration was similar in both ICU types. We found no COVID-19–specific alarm patterns. The alarm fatigue questionnaire results suggest that staff in both types of ICUs experienced alarm fatigue. However, physicians and nurses who were working in COVID-19 ICUs reported a significantly higher level of alarm fatigue (P=.04). Conclusions: Staff in COVID-19 ICUs were exposed to a higher alarm load, and the majority lacked experience with alarm management and the alarm system. We recommend training and educating ICU staff in alarm management, emphasizing the importance of alarm management training as part of the preparations for future pandemics. However, the limitations of our study design and the specific pandemic conditions warrant further studies to confirm these findings and to explore effective alarm management strategies in different ICU settings.
背景:在 COVID-19 大流行期间,为应对高入院率,建立了临时重症监护病房(ICU),并配备了临时监控和警报系统。我们试图了解临时重症监护室的设置是否会导致更高的警报负担和更多员工出现警报疲劳。目标:我们旨在比较德国柏林第二轮 COVID-19 期间临时 COVID-19 ICU 和非 COVID-19 ICU 的警报情况。研究的重点是测量每天每张病床的警报次数,识别 COVID-19 环境中警报频率较高的医疗设备,评估两种类型重症监护病房警报持续时间的中位数,以及评估医护人员的警报疲劳程度。方法:我们采用的方法包括对两家临时 COVID-19 ICU 和两家标准非 COVID-19 ICU 的警报数据进行比较分析。通过与医学专家的访谈,我们就两类重症监护病房在警报负荷、警报持续时间、警报类型和医护人员警报疲劳方面的潜在差异提出了假设。我们分析了所有 4 个重症监护室病人监护系统的警报日志数据,以推断评估这些差异。此外,我们还通过问卷调查评估了医护人员的警报疲劳度,旨在全面了解警报情况对医护人员的影响。结果COVID-19重症监护病房每天每张病床的警报次数明显多于非COVID-19重症监护病房(P<.001),而且大多数医护人员缺乏使用警报系统的经验。两类重症监护室的总体警报持续时间中位数相似。我们没有发现 COVID-19 特有的报警模式。警报疲劳问卷调查结果表明,两类重症监护室的工作人员都出现了警报疲劳。然而,在 COVID-19 ICU 工作的医生和护士报告的警报疲劳程度明显更高(P=.04)。结论:COVID-19 ICU 的工作人员面临的警报负荷较高,而且大多数人缺乏警报管理和警报系统方面的经验。我们建议对 ICU 工作人员进行警报管理方面的培训和教育,并强调警报管理培训作为未来流行病准备工作一部分的重要性。然而,由于我们的研究设计和特定大流行条件的限制,需要进一步研究来证实这些发现,并探索不同 ICU 环境下有效的警报管理策略。
{"title":"Alarm Management in Provisional COVID-19 Intensive Care Units: Retrospective Analysis and Recommendations for Future Pandemics","authors":"Maximilian Markus Wunderlich, Nicolas Frey, Sandro Amende-Wolf, Carl Hinrichs, Felix Balzer, Akira-Sebastian Poncette","doi":"10.2196/58347","DOIUrl":"https://doi.org/10.2196/58347","url":null,"abstract":"<strong>Background:</strong> In response to the high patient admission rates during the COVID-19 pandemic, provisional intensive care units (ICUs) were set up, equipped with temporary monitoring and alarm systems. We sought to find out whether the provisional ICU setting led to a higher alarm burden and more staff with alarm fatigue. <strong>Objective:</strong> We aimed to compare alarm situations between provisional COVID-19 ICUs and non–COVID-19 ICUs during the second COVID-19 wave in Berlin, Germany. The study focused on measuring alarms per bed per day, identifying medical devices with higher alarm frequencies in COVID-19 settings, evaluating the median duration of alarms in both types of ICUs, and assessing the level of alarm fatigue experienced by health care staff. <strong>Methods:</strong> Our approach involved a comparative analysis of alarm data from 2 provisional COVID-19 ICUs and 2 standard non–COVID-19 ICUs. Through interviews with medical experts, we formulated hypotheses about potential differences in alarm load, alarm duration, alarm types, and staff alarm fatigue between the 2 ICU types. We analyzed alarm log data from the patient monitoring systems of all 4 ICUs to inferentially assess the differences. In addition, we assessed staff alarm fatigue with a questionnaire, aiming to comprehensively understand the impact of the alarm situation on health care personnel. <strong>Results:</strong> COVID-19 ICUs had significantly more alarms per bed per day than non–COVID-19 ICUs (<i>P</i>&lt;.001), and the majority of the staff lacked experience with the alarm system. The overall median alarm duration was similar in both ICU types. We found no COVID-19–specific alarm patterns. The alarm fatigue questionnaire results suggest that staff in both types of ICUs experienced alarm fatigue. However, physicians and nurses who were working in COVID-19 ICUs reported a significantly higher level of alarm fatigue (<i>P</i>=.04). <strong>Conclusions:</strong> Staff in COVID-19 ICUs were exposed to a higher alarm load, and the majority lacked experience with alarm management and the alarm system. We recommend training and educating ICU staff in alarm management, emphasizing the importance of alarm management training as part of the preparations for future pandemics. However, the limitations of our study design and the specific pandemic conditions warrant further studies to confirm these findings and to explore effective alarm management strategies in different ICU settings.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"4 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142199765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Impediments Imposed by the Medical Device Regulation EU 2017/745 on Software as a Medical Device. 探索欧盟 2017/745 号医疗器械法规对软件作为医疗器械造成的阻碍。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-09-05 DOI: 10.2196/58080
Liga Svempe

In light of rapid technological advancements, the health care sector is undergoing significant transformation with the continuous emergence of novel digital solutions. Consequently, regulatory frameworks must continuously adapt to ensure their main goal to protect patients. In 2017, the new Medical Device Regulation (EU) 2017/745 (MDR) came into force, bringing more complex requirements for development, launch, and postmarket surveillance. However, the updated regulation considerably impacts the manufacturers, especially small- and medium-sized enterprises, and consequently, the accessibility of medical devices in the European Union market, as many manufacturers decide to either discontinue their products, postpone the launch of new innovative solutions, or leave the European Union market in favor of other regions such as the United States. This could lead to reduced health care quality and slower industry innovation efforts. Effective policy calibration and collaborative efforts are essential to mitigate these effects and promote ongoing advancements in health care technologies in the European Union market. This paper is a narrative review with the objective of exploring hindering factors to software as a medical device development, launch, and marketing brought by the new regulation. It exclusively focuses on the factors that engender obstacles. Related regulations, directives, and proposals were discussed for comparison and further analysis.

随着技术的飞速发展,医疗保健行业正在经历重大变革,新型数字解决方案不断涌现。因此,监管框架必须不断调整,以确保其保护患者的主要目标。2017 年,新的《医疗器械法规(欧盟)2017/745》(MDR)正式生效,为研发、上市和上市后监管带来了更复杂的要求。然而,更新后的法规极大地影响了制造商,尤其是中小型企业,进而影响了医疗器械在欧盟市场的可及性,因为许多制造商决定停止生产其产品,推迟推出新的创新解决方案,或退出欧盟市场,转而进入美国等其他地区。这可能会导致医疗质量下降,行业创新工作放缓。有效的政策调整和合作努力对于减轻这些影响和促进欧盟市场医疗保健技术的不断进步至关重要。本文是一篇叙述性综述,旨在探讨新法规对作为医疗设备的软件的开发、发布和营销带来的阻碍因素。本文主要关注造成障碍的因素。本文还讨论了相关法规、指令和建议,以便进行比较和进一步分析。
{"title":"Exploring Impediments Imposed by the Medical Device Regulation EU 2017/745 on Software as a Medical Device.","authors":"Liga Svempe","doi":"10.2196/58080","DOIUrl":"10.2196/58080","url":null,"abstract":"<p><p>In light of rapid technological advancements, the health care sector is undergoing significant transformation with the continuous emergence of novel digital solutions. Consequently, regulatory frameworks must continuously adapt to ensure their main goal to protect patients. In 2017, the new Medical Device Regulation (EU) 2017/745 (MDR) came into force, bringing more complex requirements for development, launch, and postmarket surveillance. However, the updated regulation considerably impacts the manufacturers, especially small- and medium-sized enterprises, and consequently, the accessibility of medical devices in the European Union market, as many manufacturers decide to either discontinue their products, postpone the launch of new innovative solutions, or leave the European Union market in favor of other regions such as the United States. This could lead to reduced health care quality and slower industry innovation efforts. Effective policy calibration and collaborative efforts are essential to mitigate these effects and promote ongoing advancements in health care technologies in the European Union market. This paper is a narrative review with the objective of exploring hindering factors to software as a medical device development, launch, and marketing brought by the new regulation. It exclusively focuses on the factors that engender obstacles. Related regulations, directives, and proposals were discussed for comparison and further analysis.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e58080"},"PeriodicalIF":3.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical Applications of Large Language Models for Health Care Professionals and Scientists. 面向医疗保健专业人员和科学家的大型语言模型的实际应用。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-09-05 DOI: 10.2196/58478
Florian Reis, Christian Lenz, Manfred Gossen, Hans-Dieter Volk, Norman Michael Drzeniek

Unlabelled: With the popularization of large language models (LLMs), strategies for their effective and safe usage in health care and research have become increasingly pertinent. Despite the growing interest and eagerness among health care professionals and scientists to exploit the potential of LLMs, initial attempts may yield suboptimal results due to a lack of user experience, thus complicating the integration of artificial intelligence (AI) tools into workplace routine. Focusing on scientists and health care professionals with limited LLM experience, this viewpoint article highlights and discusses 6 easy-to-implement use cases of practical relevance. These encompass customizing translations, refining text and extracting information, generating comprehensive overviews and specialized insights, compiling ideas into cohesive narratives, crafting personalized educational materials, and facilitating intellectual sparring. Additionally, we discuss general prompting strategies and precautions for the implementation of AI tools in biomedicine. Despite various hurdles and challenges, the integration of LLMs into daily routines of physicians and researchers promises heightened workplace productivity and efficiency.

无标签:随着大型语言模型(LLMs)的普及,在医疗保健和研究中有效、安全地使用这些模型的策略变得越来越重要。尽管医疗保健专业人员和科学家对利用 LLMs 的潜力越来越感兴趣和渴望,但由于缺乏用户经验,最初的尝试可能会产生不理想的结果,从而使人工智能(AI)工具与日常工作的整合变得更加复杂。这篇文章的观点聚焦于具有有限 LLM 经验的科学家和医疗保健专业人员,重点介绍并讨论了 6 个易于实施的实用案例。这些案例包括定制翻译、完善文本和提取信息、生成全面概述和专业见解、将观点编译成连贯的叙述、制作个性化教育材料,以及促进智力比拼。此外,我们还讨论了在生物医学中实施人工智能工具的一般提示策略和注意事项。尽管存在各种障碍和挑战,但将 LLM 融入医生和研究人员的日常工作有望提高工作场所的生产力和效率。
{"title":"Practical Applications of Large Language Models for Health Care Professionals and Scientists.","authors":"Florian Reis, Christian Lenz, Manfred Gossen, Hans-Dieter Volk, Norman Michael Drzeniek","doi":"10.2196/58478","DOIUrl":"10.2196/58478","url":null,"abstract":"<p><strong>Unlabelled: </strong>With the popularization of large language models (LLMs), strategies for their effective and safe usage in health care and research have become increasingly pertinent. Despite the growing interest and eagerness among health care professionals and scientists to exploit the potential of LLMs, initial attempts may yield suboptimal results due to a lack of user experience, thus complicating the integration of artificial intelligence (AI) tools into workplace routine. Focusing on scientists and health care professionals with limited LLM experience, this viewpoint article highlights and discusses 6 easy-to-implement use cases of practical relevance. These encompass customizing translations, refining text and extracting information, generating comprehensive overviews and specialized insights, compiling ideas into cohesive narratives, crafting personalized educational materials, and facilitating intellectual sparring. Additionally, we discuss general prompting strategies and precautions for the implementation of AI tools in biomedicine. Despite various hurdles and challenges, the integration of LLMs into daily routines of physicians and researchers promises heightened workplace productivity and efficiency.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e58478"},"PeriodicalIF":3.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11391657/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study. 评估生成式人工智能工具在理解医学论文方面的能力:定性研究。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-09-04 DOI: 10.2196/59258
Seyma Handan Akyon, Fatih Cagatay Akyon, Ahmet Sefa Camyar, Fatih Hızlı, Talha Sari, Şamil Hızlı

Background: Reading medical papers is a challenging and time-consuming task for doctors, especially when the papers are long and complex. A tool that can help doctors efficiently process and understand medical papers is needed.

Objective: This study aims to critically assess and compare the comprehension capabilities of large language models (LLMs) in accurately and efficiently understanding medical research papers using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklist, which provides a standardized framework for evaluating key elements of observational study.

Methods: The study is a methodological type of research. The study aims to evaluate the understanding capabilities of new generative artificial intelligence tools in medical papers. A novel benchmark pipeline processed 50 medical research papers from PubMed, comparing the answers of 6 LLMs (GPT-3.5-Turbo, GPT-4-0613, GPT-4-1106, PaLM 2, Claude v1, and Gemini Pro) to the benchmark established by expert medical professors. Fifteen questions, derived from the STROBE checklist, assessed LLMs' understanding of different sections of a research paper.

Results: LLMs exhibited varying performance, with GPT-3.5-Turbo achieving the highest percentage of correct answers (n=3916, 66.9%), followed by GPT-4-1106 (n=3837, 65.6%), PaLM 2 (n=3632, 62.1%), Claude v1 (n=2887, 58.3%), Gemini Pro (n=2878, 49.2%), and GPT-4-0613 (n=2580, 44.1%). Statistical analysis revealed statistically significant differences between LLMs (P<.001), with older models showing inconsistent performance compared to newer versions. LLMs showcased distinct performances for each question across different parts of a scholarly paper-with certain models like PaLM 2 and GPT-3.5 showing remarkable versatility and depth in understanding.

Conclusions: This study is the first to evaluate the performance of different LLMs in understanding medical papers using the retrieval augmented generation method. The findings highlight the potential of LLMs to enhance medical research by improving efficiency and facilitating evidence-based decision-making. Further research is needed to address limitations such as the influence of question formats, potential biases, and the rapid evolution of LLM models.

背景:阅读医学论文对医生来说是一项具有挑战性且耗时的任务,尤其是当论文篇幅较长、内容复杂时。我们需要一种能帮助医生高效处理和理解医学论文的工具:本研究旨在使用 STROBE(加强流行病学中观察性研究的报告)核对表,批判性地评估和比较大型语言模型(LLM)在准确、高效地理解医学研究论文方面的理解能力:本研究属于方法论研究。研究旨在评估新的生成式人工智能工具对医学论文的理解能力。一种新型基准管道处理了来自 PubMed 的 50 篇医学研究论文,将 6 种 LLM(GPT-3.5-Turbo、GPT-4-0613、GPT-4-1106、PaLM 2、Claude v1 和 Gemini Pro)的答案与医学专家教授设定的基准进行了比较。从 STROBE 检查表中提取的 15 个问题评估了法学硕士对研究论文不同部分的理解:法学硕士的表现各不相同,GPT-3.5-Turbo的正确率最高(n=3916,66.9%),其次是GPT-4-1106(n=3837,65.6%)、PaLM 2(n=3632,62.1%)、Claude v1(n=2887,58.3%)、Gemini Pro(n=2878,49.2%)和GPT-4-0613(n=2580,44.1%)。统计分析显示,不同 LLM 之间存在显著的统计学差异(PConclusions:本研究首次使用检索增强生成法评估了不同 LLM 在理解医学论文方面的性能。研究结果凸显了 LLM 通过提高效率和促进循证决策来加强医学研究的潜力。还需要进一步的研究来解决一些局限性问题,如问题格式的影响、潜在的偏见以及 LLM 模型的快速演变。
{"title":"Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.","authors":"Seyma Handan Akyon, Fatih Cagatay Akyon, Ahmet Sefa Camyar, Fatih Hızlı, Talha Sari, Şamil Hızlı","doi":"10.2196/59258","DOIUrl":"10.2196/59258","url":null,"abstract":"<p><strong>Background: </strong>Reading medical papers is a challenging and time-consuming task for doctors, especially when the papers are long and complex. A tool that can help doctors efficiently process and understand medical papers is needed.</p><p><strong>Objective: </strong>This study aims to critically assess and compare the comprehension capabilities of large language models (LLMs) in accurately and efficiently understanding medical research papers using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklist, which provides a standardized framework for evaluating key elements of observational study.</p><p><strong>Methods: </strong>The study is a methodological type of research. The study aims to evaluate the understanding capabilities of new generative artificial intelligence tools in medical papers. A novel benchmark pipeline processed 50 medical research papers from PubMed, comparing the answers of 6 LLMs (GPT-3.5-Turbo, GPT-4-0613, GPT-4-1106, PaLM 2, Claude v1, and Gemini Pro) to the benchmark established by expert medical professors. Fifteen questions, derived from the STROBE checklist, assessed LLMs' understanding of different sections of a research paper.</p><p><strong>Results: </strong>LLMs exhibited varying performance, with GPT-3.5-Turbo achieving the highest percentage of correct answers (n=3916, 66.9%), followed by GPT-4-1106 (n=3837, 65.6%), PaLM 2 (n=3632, 62.1%), Claude v1 (n=2887, 58.3%), Gemini Pro (n=2878, 49.2%), and GPT-4-0613 (n=2580, 44.1%). Statistical analysis revealed statistically significant differences between LLMs (P<.001), with older models showing inconsistent performance compared to newer versions. LLMs showcased distinct performances for each question across different parts of a scholarly paper-with certain models like PaLM 2 and GPT-3.5 showing remarkable versatility and depth in understanding.</p><p><strong>Conclusions: </strong>This study is the first to evaluate the performance of different LLMs in understanding medical papers using the retrieval augmented generation method. The findings highlight the potential of LLMs to enhance medical research by improving efficiency and facilitating evidence-based decision-making. Further research is needed to address limitations such as the influence of question formats, potential biases, and the rapid evolution of LLM models.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e59258"},"PeriodicalIF":3.1,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11411230/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transforming Health Care Through Chatbots for Medical History-Taking and Future Directions: Comprehensive Systematic Review. 通过病史采集聊天机器人改变医疗保健和未来方向:全面系统综述。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-29 DOI: 10.2196/56628
Michael Hindelang, Sebastian Sitaru, Alexander Zink
<p><strong>Background: </strong>The integration of artificial intelligence and chatbot technology in health care has attracted significant attention due to its potential to improve patient care and streamline history-taking. As artificial intelligence-driven conversational agents, chatbots offer the opportunity to revolutionize history-taking, necessitating a comprehensive examination of their impact on medical practice.</p><p><strong>Objective: </strong>This systematic review aims to assess the role, effectiveness, usability, and patient acceptance of chatbots in medical history-taking. It also examines potential challenges and future opportunities for integration into clinical practice.</p><p><strong>Methods: </strong>A systematic search included PubMed, Embase, MEDLINE (via Ovid), CENTRAL, Scopus, and Open Science and covered studies through July 2024. The inclusion and exclusion criteria for the studies reviewed were based on the PICOS (participants, interventions, comparators, outcomes, and study design) framework. The population included individuals using health care chatbots for medical history-taking. Interventions focused on chatbots designed to facilitate medical history-taking. The outcomes of interest were the feasibility, acceptance, and usability of chatbot-based medical history-taking. Studies not reporting on these outcomes were excluded. All study designs except conference papers were eligible for inclusion. Only English-language studies were considered. There were no specific restrictions on study duration. Key search terms included "chatbot*," "conversational agent*," "virtual assistant," "artificial intelligence chatbot," "medical history," and "history-taking." The quality of observational studies was classified using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) criteria (eg, sample size, design, data collection, and follow-up). The RoB 2 (Risk of Bias) tool assessed areas and the levels of bias in randomized controlled trials (RCTs).</p><p><strong>Results: </strong>The review included 15 observational studies and 3 RCTs and synthesized evidence from different medical fields and populations. Chatbots systematically collect information through targeted queries and data retrieval, improving patient engagement and satisfaction. The results show that chatbots have great potential for history-taking and that the efficiency and accessibility of the health care system can be improved by 24/7 automated data collection. Bias assessments revealed that of the 15 observational studies, 5 (33%) studies were of high quality, 5 (33%) studies were of moderate quality, and 5 (33%) studies were of low quality. Of the RCTs, 2 had a low risk of bias, while 1 had a high risk.</p><p><strong>Conclusions: </strong>This systematic review provides critical insights into the potential benefits and challenges of using chatbots for medical history-taking. The included studies showed that chatbots can increase patient
背景:由于人工智能和聊天机器人技术具有改善患者护理和简化病史采集的潜力,因此将其整合到医疗保健领域引起了广泛关注。作为人工智能驱动的对话代理,聊天机器人提供了彻底改变病史采集的机会,因此有必要全面研究其对医疗实践的影响:本系统综述旨在评估聊天机器人在病史采集中的作用、有效性、可用性和患者接受度。目的:本系统综述旨在评估聊天机器人在病史采集中的作用、有效性、可用性和患者接受度,并探讨将其融入临床实践的潜在挑战和未来机遇:系统性检索包括PubMed、Embase、MEDLINE(通过Ovid)、CENTRAL、Scopus和Open Science,涵盖截至2024年7月的研究。所审查研究的纳入和排除标准基于 PICOS(参与者、干预措施、比较者、结果和研究设计)框架。研究对象包括使用医疗聊天机器人采集病史的个人。干预措施侧重于旨在促进病史采集的聊天机器人。研究结果关注的是基于聊天机器人的病史采集的可行性、接受度和可用性。未报告这些结果的研究被排除在外。除会议论文外,所有研究设计均可纳入。仅考虑英语研究。对研究持续时间没有具体限制。关键搜索词包括 "聊天机器人*"、"对话代理*"、"虚拟助手"、"人工智能聊天机器人"、"病史 "和 "病史采集"。观察性研究的质量采用 STROBE(加强流行病学观察性研究报告)标准(如样本大小、设计、数据收集和随访)进行分类。RoB 2(偏倚风险)工具评估了随机对照试验(RCT)中存在偏倚的领域和程度:综述包括 15 项观察性研究和 3 项随机对照试验,并综合了来自不同医学领域和人群的证据。聊天机器人通过有针对性的查询和数据检索系统地收集信息,提高了患者的参与度和满意度。研究结果表明,聊天机器人在病史采集方面潜力巨大,全天候自动数据收集可提高医疗系统的效率和可及性。偏倚评估显示,在 15 项观察性研究中,5 项(33%)研究的质量较高,5 项(33%)研究的质量中等,5 项(33%)研究的质量较低。在随机对照研究中,2 项研究的偏倚风险较低,1 项研究的偏倚风险较高:本系统综述为了解使用聊天机器人采集病史的潜在益处和挑战提供了重要见解。纳入的研究表明,聊天机器人可以提高患者参与度、简化数据收集并改善医疗决策。要想有效地融入临床实践,关键是要设计用户友好的界面、确保强大的数据安全以及保持患者与医生之间的移情互动。未来的研究应侧重于完善聊天机器人算法、提高其情商,并将其应用扩展到不同的医疗环境,以充分发挥其在现代医学中的潜力:PERCORO CRD42023410312; www.crd.york.ac.uk/prospero.
{"title":"Transforming Health Care Through Chatbots for Medical History-Taking and Future Directions: Comprehensive Systematic Review.","authors":"Michael Hindelang, Sebastian Sitaru, Alexander Zink","doi":"10.2196/56628","DOIUrl":"10.2196/56628","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;The integration of artificial intelligence and chatbot technology in health care has attracted significant attention due to its potential to improve patient care and streamline history-taking. As artificial intelligence-driven conversational agents, chatbots offer the opportunity to revolutionize history-taking, necessitating a comprehensive examination of their impact on medical practice.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This systematic review aims to assess the role, effectiveness, usability, and patient acceptance of chatbots in medical history-taking. It also examines potential challenges and future opportunities for integration into clinical practice.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;A systematic search included PubMed, Embase, MEDLINE (via Ovid), CENTRAL, Scopus, and Open Science and covered studies through July 2024. The inclusion and exclusion criteria for the studies reviewed were based on the PICOS (participants, interventions, comparators, outcomes, and study design) framework. The population included individuals using health care chatbots for medical history-taking. Interventions focused on chatbots designed to facilitate medical history-taking. The outcomes of interest were the feasibility, acceptance, and usability of chatbot-based medical history-taking. Studies not reporting on these outcomes were excluded. All study designs except conference papers were eligible for inclusion. Only English-language studies were considered. There were no specific restrictions on study duration. Key search terms included \"chatbot*,\" \"conversational agent*,\" \"virtual assistant,\" \"artificial intelligence chatbot,\" \"medical history,\" and \"history-taking.\" The quality of observational studies was classified using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) criteria (eg, sample size, design, data collection, and follow-up). The RoB 2 (Risk of Bias) tool assessed areas and the levels of bias in randomized controlled trials (RCTs).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The review included 15 observational studies and 3 RCTs and synthesized evidence from different medical fields and populations. Chatbots systematically collect information through targeted queries and data retrieval, improving patient engagement and satisfaction. The results show that chatbots have great potential for history-taking and that the efficiency and accessibility of the health care system can be improved by 24/7 automated data collection. Bias assessments revealed that of the 15 observational studies, 5 (33%) studies were of high quality, 5 (33%) studies were of moderate quality, and 5 (33%) studies were of low quality. Of the RCTs, 2 had a low risk of bias, while 1 had a high risk.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This systematic review provides critical insights into the potential benefits and challenges of using chatbots for medical history-taking. The included studies showed that chatbots can increase patient ","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e56628"},"PeriodicalIF":3.1,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393511/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142115512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of an Electronic Health Record-Based Interruptive Alert Among Patients With Headaches Seen in Primary Care: Cluster Randomized Controlled Trial. 基于电子健康记录的中断警报对基层医疗机构头痛患者的影响:分组随机对照试验》。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-29 DOI: 10.2196/58456
Apoorva Pradhan, Eric A Wright, Vanessa A Hayduk, Juliana Berhane, Mallory Sponenberg, Leeann Webster, Hannah Anderson, Siyeon Park, Jove Graham, Scott Friedenberg

Background: Headaches, including migraines, are one of the most common causes of disability and account for nearly 20%-30% of referrals from primary care to neurology. In primary care, electronic health record-based alerts offer a mechanism to influence health care provider behaviors, manage neurology referrals, and optimize headache care.

Objective: This project aimed to evaluate the impact of an electronic alert implemented in primary care on patients' overall headache management.

Methods: We conducted a stratified cluster-randomized study across 38 primary care clinic sites between December 2021 to December 2022 at a large integrated health care delivery system in the United States. Clinics were stratified into 6 blocks based on region and patient-to-health care provider ratios and then 1:1 randomized within each block into either the control or intervention. Health care providers practicing at intervention clinics received an interruptive alert in the electronic health record. The primary end point was a change in headache burden, measured using the Headache Impact Test 6 scale, from baseline to 6 months. Secondary outcomes included changes in headache frequency and intensity, access to care, and resource use. We analyzed the difference-in-differences between the arms at follow-up at the individual patient level.

Results: We enrolled 203 adult patients with a confirmed headache diagnosis. At baseline, the average Headache Impact Test 6 scores in each arm were not significantly different (intervention: mean 63, SD 6.9; control: mean 61.8, SD 6.6; P=.21). We observed a significant reduction in the headache burden only in the intervention arm at follow-up (3.5 points; P=.009). The reduction in the headache burden was not statistically different between groups (difference-in-differences estimate -1.89, 95% CI -5 to 1.31; P=.25). Similarly, secondary outcomes were not significantly different between groups. Only 11.32% (303/2677) of alerts were acted upon.

Conclusions: The use of an interruptive electronic alert did not significantly improve headache outcomes. Low use of alerts by health care providers prompts future alterations of the alert and exploration of alternative approaches.

背景:头痛(包括偏头痛)是导致残疾的最常见原因之一,占初级保健向神经内科转诊的近 20%-30%。在初级保健中,基于电子健康记录的警报提供了一种机制来影响医疗服务提供者的行为、管理神经科转诊并优化头痛护理:本项目旨在评估基层医疗机构实施电子警报对患者整体头痛管理的影响:我们于 2021 年 12 月至 2022 年 12 月在美国一家大型综合医疗保健服务系统的 38 个初级保健诊所开展了一项分层分组随机研究。根据地区和患者与医疗服务提供者的比例,将诊所分为 6 个区块,然后在每个区块内按 1:1 随机分配到对照组或干预组。在干预诊所执业的医疗服务提供者会收到电子健康记录中的中断警报。主要终点是头痛负担从基线到 6 个月的变化,采用头痛影响测试 6 量表进行测量。次要结果包括头痛频率和强度的变化、获得护理的机会以及资源使用情况。我们分析了随访时两组患者在个体水平上的差异:我们招募了 203 名确诊头痛的成年患者。基线时,各组的头痛影响测试 6 平均得分无显著差异(干预组:平均 63 分,标准差 6.9 分;对照组:平均 61.8 分,标准差 6.6 分;P=.21)。我们观察到,只有干预组在随访时头痛负担明显减轻(3.5 分;P=.009)。各组间头痛负担的减轻程度并无统计学差异(差异估计值-1.89,95% CI -5至1.31;P=.25)。同样,各组之间的次要结果也没有明显差异。只有 11.32% 的警报(303/2677)被执行:结论:使用中断性电子警报并不能明显改善头痛的治疗效果。医疗服务提供者对警报的使用率较低,这促使他们在未来改变警报并探索其他方法。
{"title":"Impact of an Electronic Health Record-Based Interruptive Alert Among Patients With Headaches Seen in Primary Care: Cluster Randomized Controlled Trial.","authors":"Apoorva Pradhan, Eric A Wright, Vanessa A Hayduk, Juliana Berhane, Mallory Sponenberg, Leeann Webster, Hannah Anderson, Siyeon Park, Jove Graham, Scott Friedenberg","doi":"10.2196/58456","DOIUrl":"10.2196/58456","url":null,"abstract":"<p><strong>Background: </strong>Headaches, including migraines, are one of the most common causes of disability and account for nearly 20%-30% of referrals from primary care to neurology. In primary care, electronic health record-based alerts offer a mechanism to influence health care provider behaviors, manage neurology referrals, and optimize headache care.</p><p><strong>Objective: </strong>This project aimed to evaluate the impact of an electronic alert implemented in primary care on patients' overall headache management.</p><p><strong>Methods: </strong>We conducted a stratified cluster-randomized study across 38 primary care clinic sites between December 2021 to December 2022 at a large integrated health care delivery system in the United States. Clinics were stratified into 6 blocks based on region and patient-to-health care provider ratios and then 1:1 randomized within each block into either the control or intervention. Health care providers practicing at intervention clinics received an interruptive alert in the electronic health record. The primary end point was a change in headache burden, measured using the Headache Impact Test 6 scale, from baseline to 6 months. Secondary outcomes included changes in headache frequency and intensity, access to care, and resource use. We analyzed the difference-in-differences between the arms at follow-up at the individual patient level.</p><p><strong>Results: </strong>We enrolled 203 adult patients with a confirmed headache diagnosis. At baseline, the average Headache Impact Test 6 scores in each arm were not significantly different (intervention: mean 63, SD 6.9; control: mean 61.8, SD 6.6; P=.21). We observed a significant reduction in the headache burden only in the intervention arm at follow-up (3.5 points; P=.009). The reduction in the headache burden was not statistically different between groups (difference-in-differences estimate -1.89, 95% CI -5 to 1.31; P=.25). Similarly, secondary outcomes were not significantly different between groups. Only 11.32% (303/2677) of alerts were acted upon.</p><p><strong>Conclusions: </strong>The use of an interruptive electronic alert did not significantly improve headache outcomes. Low use of alerts by health care providers prompts future alterations of the alert and exploration of alternative approaches.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e58456"},"PeriodicalIF":3.1,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11376138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142115511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Viability of Open Large Language Models for Clinical Documentation in German Health Care: Real-World Model Evaluation Study. 开放式大型语言模型在德国医疗保健临床文档中的可行性:真实世界模型评估研究
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-28 DOI: 10.2196/59617
Felix Heilmeyer, Daniel Böhringer, Thomas Reinhard, Sebastian Arens, Lisa Lyssenko, Christian Haverkamp

Background: The use of large language models (LLMs) as writing assistance for medical professionals is a promising approach to reduce the time required for documentation, but there may be practical, ethical, and legal challenges in many jurisdictions complicating the use of the most powerful commercial LLM solutions.

Objective: In this study, we assessed the feasibility of using nonproprietary LLMs of the GPT variety as writing assistance for medical professionals in an on-premise setting with restricted compute resources, generating German medical text.

Methods: We trained four 7-billion-parameter models with 3 different architectures for our task and evaluated their performance using a powerful commercial LLM, namely Anthropic's Claude-v2, as a rater. Based on this, we selected the best-performing model and evaluated its practical usability with 2 independent human raters on real-world data.

Results: In the automated evaluation with Claude-v2, BLOOM-CLP-German, a model trained from scratch on the German text, achieved the best results. In the manual evaluation by human experts, 95 (93.1%) of the 102 reports generated by that model were evaluated as usable as is or with only minor changes by both human raters.

Conclusions: The results show that even with restricted compute resources, it is possible to generate medical texts that are suitable for documentation in routine clinical practice. However, the target language should be considered in the model selection when processing non-English text.

背景:使用大型语言模型(LLMs)作为医疗专业人员的写作辅助工具是一种很有前途的方法,可以减少文档撰写所需的时间,但在许多司法管辖区,使用功能最强大的商业 LLM 解决方案可能会面临实际、道德和法律方面的挑战:在本研究中,我们评估了在计算资源有限的内部环境中使用 GPT 类型的非专有 LLM 作为医学专业人员写作辅助工具的可行性,并生成了德语医学文本:针对我们的任务,我们使用 3 种不同的架构训练了 4 个 70 亿参数模型,并使用功能强大的商用 LLM(即 Anthropic 的 Claude-v2)作为评分器评估了它们的性能。在此基础上,我们选出了表现最佳的模型,并由两名独立的人类评测员对其在真实世界数据中的实际可用性进行了评估:在使用 Claude-v2 进行的自动评估中,根据德语文本从头开始训练的 BLOOM-CLP-German 模型取得了最佳结果。在由人类专家进行的人工评估中,该模型生成的 102 份报告中有 95 份(93.1%)被两位人类评估员评为可用,或只需稍作修改即可使用:结果表明,即使计算资源有限,也有可能生成适合常规临床实践文档的医学文本。然而,在处理非英语文本时,在选择模型时应考虑目标语言。
{"title":"Viability of Open Large Language Models for Clinical Documentation in German Health Care: Real-World Model Evaluation Study.","authors":"Felix Heilmeyer, Daniel Böhringer, Thomas Reinhard, Sebastian Arens, Lisa Lyssenko, Christian Haverkamp","doi":"10.2196/59617","DOIUrl":"10.2196/59617","url":null,"abstract":"<p><strong>Background: </strong>The use of large language models (LLMs) as writing assistance for medical professionals is a promising approach to reduce the time required for documentation, but there may be practical, ethical, and legal challenges in many jurisdictions complicating the use of the most powerful commercial LLM solutions.</p><p><strong>Objective: </strong>In this study, we assessed the feasibility of using nonproprietary LLMs of the GPT variety as writing assistance for medical professionals in an on-premise setting with restricted compute resources, generating German medical text.</p><p><strong>Methods: </strong>We trained four 7-billion-parameter models with 3 different architectures for our task and evaluated their performance using a powerful commercial LLM, namely Anthropic's Claude-v2, as a rater. Based on this, we selected the best-performing model and evaluated its practical usability with 2 independent human raters on real-world data.</p><p><strong>Results: </strong>In the automated evaluation with Claude-v2, BLOOM-CLP-German, a model trained from scratch on the German text, achieved the best results. In the manual evaluation by human experts, 95 (93.1%) of the 102 reports generated by that model were evaluated as usable as is or with only minor changes by both human raters.</p><p><strong>Conclusions: </strong>The results show that even with restricted compute resources, it is possible to generate medical texts that are suitable for documentation in routine clinical practice. However, the target language should be considered in the model selection when processing non-English text.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e59617"},"PeriodicalIF":3.1,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373371/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of the World Health Organization Minimum Dataset for Emergency Medical Teams to Create Disaster Profiles for the Indonesian SATUSEHAT Platform Using Fast Healthcare Interoperability Resources: Development and Validation Study. 利用快速医疗保健互操作性资源为印度尼西亚 SATUSEHAT 平台实施世界卫生组织紧急医疗队最低数据集以创建灾难档案:开发与验证研究。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-28 DOI: 10.2196/59651
Hiro Putra Faisal, Masaharu Nakayama

Background: The National Disaster Management Agency (Badan Nasional Penanggulangan Bencana) handles disaster management in Indonesia as a health cluster by collecting, storing, and reporting information on the state of survivors and their health from various sources during disasters. Data were collected on paper and transferred to Microsoft Excel spreadsheets. These activities are challenging because there are no standards for data collection. The World Health Organization (WHO) introduced a standard for health data collection during disasters for emergency medical teams (EMTs) in the form of a minimum dataset (MDS). Meanwhile, the Ministry of Health of Indonesia launched the SATUSEHAT platform to integrate all electronic medical records in Indonesia based on Fast Healthcare Interoperability Resources (FHIR).

Objective: This study aims to implement the WHO EMT MDS to create a disaster profile for the SATUSEHAT platform using FHIR.

Methods: We extracted variables from 2 EMT MDS medical records-the WHO and Association of Southeast Asian Nations (ASEAN) versions-and the daily reporting form. We then performed a mapping process to match these variables with the FHIR resources and analyzed the gaps between the variables and base resources. Next, we conducted profiling to see if there were any changes in the selected resources and created extensions to fill the gap using the Forge application. Subsequently, the profile was implemented using an open-source FHIR server.

Results: The total numbers of variables extracted from the WHO EMT MDS, ASEAN EMT MDS, and daily reporting forms were 30, 32, and 46, with the percentage of variables matching FHIR resources being 100% (30/30), 97% (31/32), and 85% (39/46), respectively. From the 40 resources available in the FHIR ID core, we used 10, 14, and 9 for the WHO EMT MDS, ASEAN EMT MDS, and daily reporting form, respectively. Based on the gap analysis, we found 4 variables in the daily reporting form that were not covered by the resources. Thus, we created extensions to address this gap.

Conclusions: We successfully created a disaster profile that can be used as a disaster case for the SATUSEHAT platform. This profile may standardize health data collection during disasters.

背景:印度尼西亚国家灾害管理局(Badan Nasional Penanggulangan Bencana印度尼西亚国家灾害管理局(Badan Nasional Penanggulangan Bencana)通过收集、储存和报告灾害期间各种来源的幸存者状况及其健康信息,将灾害管理作为一个健康集群来处理。数据收集在纸上,然后转入 Microsoft Excel 电子表格。这些活动具有挑战性,因为没有数据收集标准。世界卫生组织(WHO)以最低数据集(MDS)的形式为紧急医疗队(EMTs)引入了灾难期间健康数据收集标准。与此同时,印度尼西亚卫生部启动了 SATUSEHAT 平台,以快速医疗互操作性资源(FHIR)为基础整合印度尼西亚的所有电子病历:本研究旨在实施世界卫生组织 EMT MDS,利用 FHIR 为 SATUSEHAT 平台创建灾难档案:我们从两个 EMT MDS 医疗记录(世卫组织和东南亚国家联盟(东盟)版本)和每日报告表中提取了变量。然后,我们进行了映射处理,将这些变量与 FHIR 资源相匹配,并分析了变量与基础资源之间的差距。接下来,我们进行了剖析,以了解所选资源是否有任何变化,并使用 Forge 应用程序创建了扩展来填补空白。随后,我们使用开源的 FHIR 服务器实施了剖析:从 WHO EMT MDS、ASEAN EMT MDS 和每日报告表中提取的变量总数分别为 30、32 和 46 个,与 FHIR 资源匹配的变量百分比分别为 100%(30/30)、97%(31/32)和 85%(39/46)。在 FHIR ID 核心的 40 个可用资源中,我们分别使用了 10、14 和 9 个资源用于 WHO EMT MDS、ASEAN EMT MDS 和每日报告表。根据差距分析,我们发现每日报告表中有 4 个变量未被资源涵盖。因此,我们创建了扩展功能来弥补这一不足:我们成功创建了一个灾难档案,可用作 SATUSEHAT 平台的灾难案例。该档案可使灾害期间的健康数据收集标准化。
{"title":"Implementation of the World Health Organization Minimum Dataset for Emergency Medical Teams to Create Disaster Profiles for the Indonesian SATUSEHAT Platform Using Fast Healthcare Interoperability Resources: Development and Validation Study.","authors":"Hiro Putra Faisal, Masaharu Nakayama","doi":"10.2196/59651","DOIUrl":"10.2196/59651","url":null,"abstract":"<p><strong>Background: </strong>The National Disaster Management Agency (Badan Nasional Penanggulangan Bencana) handles disaster management in Indonesia as a health cluster by collecting, storing, and reporting information on the state of survivors and their health from various sources during disasters. Data were collected on paper and transferred to Microsoft Excel spreadsheets. These activities are challenging because there are no standards for data collection. The World Health Organization (WHO) introduced a standard for health data collection during disasters for emergency medical teams (EMTs) in the form of a minimum dataset (MDS). Meanwhile, the Ministry of Health of Indonesia launched the SATUSEHAT platform to integrate all electronic medical records in Indonesia based on Fast Healthcare Interoperability Resources (FHIR).</p><p><strong>Objective: </strong>This study aims to implement the WHO EMT MDS to create a disaster profile for the SATUSEHAT platform using FHIR.</p><p><strong>Methods: </strong>We extracted variables from 2 EMT MDS medical records-the WHO and Association of Southeast Asian Nations (ASEAN) versions-and the daily reporting form. We then performed a mapping process to match these variables with the FHIR resources and analyzed the gaps between the variables and base resources. Next, we conducted profiling to see if there were any changes in the selected resources and created extensions to fill the gap using the Forge application. Subsequently, the profile was implemented using an open-source FHIR server.</p><p><strong>Results: </strong>The total numbers of variables extracted from the WHO EMT MDS, ASEAN EMT MDS, and daily reporting forms were 30, 32, and 46, with the percentage of variables matching FHIR resources being 100% (30/30), 97% (31/32), and 85% (39/46), respectively. From the 40 resources available in the FHIR ID core, we used 10, 14, and 9 for the WHO EMT MDS, ASEAN EMT MDS, and daily reporting form, respectively. Based on the gap analysis, we found 4 variables in the daily reporting form that were not covered by the resources. Thus, we created extensions to address this gap.</p><p><strong>Conclusions: </strong>We successfully created a disaster profile that can be used as a disaster case for the SATUSEHAT platform. This profile may standardize health data collection during disasters.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e59651"},"PeriodicalIF":3.1,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373372/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients With Type 2 Diabetes: Cross-Sectional Study. 评估电子健康记录数据质量对识别 2 型糖尿病患者的影响:横断面研究
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-27 DOI: 10.2196/56734
Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi
<p><strong>Background: </strong>Increasing and substantial reliance on electronic health records (EHRs) and data types (ie, diagnosis, medication, and laboratory data) demands assessment of their data quality as a fundamental approach, especially since there is a need to identify appropriate denominator populations with chronic conditions, such as type 2 diabetes (T2D), using commonly available computable phenotype definitions (ie, phenotypes).</p><p><strong>Objective: </strong>To bridge this gap, our study aims to assess how issues of EHR data quality and variations and robustness (or lack thereof) in phenotypes may have potential impacts in identifying denominator populations.</p><p><strong>Methods: </strong>Approximately 208,000 patients with T2D were included in our study, which used retrospective EHR data from the Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (ie, age, sex, race, and ethnicity), use of health care (inpatient and emergency room visits), and the average Charlson Comorbidity Index score of each phenotype. We then used different methods to induce or simulate data quality issues of completeness, accuracy, and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped diagnosis, medication, and laboratory codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a diagnosis or medication code with another code of the same data type and induced 2% incremental change from -100% to +10% in laboratory result values; and lastly, for timeliness, data were modeled for induced incremental shift of date records by 30 days to 365 days.</p><p><strong>Results: </strong>Less than a quarter (n=47,326, 23%) of the population overlapped across all phenotypes using EHRs. The population identified by each phenotype varied across all combinations of data types. Induced incompleteness identified fewer patients with each increment; for example, at 100% diagnostic incompleteness, the Chronic Conditions Data Warehouse phenotype identified zero patients, as its phenotypic characteristics included only diagnosis codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype, therefore resulting in fewer patients being identified with each incremental change.</p><p><strong>Conclusions: </strong>We used EHR data with diagnosis, medication, and laboratory data types from a large tertiary hospital system to understand T2D phenotypic differences and performance. We used induced data quality methods to learn how data quality issues may impact identification of the denominator populations upon which clinical (eg, clinical research and trials, population health evaluations) and financial or operational decisions are made. The novel results from our study may inform future a
背景:人们对电子健康记录(EHR)和数据类型(即诊断、用药和实验室数据)的依赖程度越来越高,这就要求对其数据质量进行评估,并将其作为一项基本方法,尤其是因为需要利用常用的可计算表型定义(即表型)来确定患有慢性疾病(如 2 型糖尿病)的适当分母人群:为了弥补这一差距,我们的研究旨在评估电子病历数据的质量和差异以及表型的稳健性(或缺乏稳健性)问题如何对确定分母人群产生潜在影响:我们的研究纳入了约 20.8 万名 T2D 患者,使用的是 2017-2019 年期间约翰霍普金斯医疗机构(JHMI)的回顾性 EHR 数据。我们的评估包括 4 种已发表的表型和 1 种来自霍普金斯大学专家小组的定义。我们对人口统计学(即年龄、性别、种族和民族)、医疗保健使用(住院和急诊就诊)以及每种表型的平均 Charlson 生病指数评分进行了描述性分析。然后,我们使用不同的方法分别诱导或模拟每种表型的数据完整性、准确性和及时性等数据质量问题。在诱导数据不完整性方面,我们的模型以10%的增量随机丢弃诊断、用药和实验室代码;在诱导数据不准确方面,我们的模型随机用相同数据类型的另一个代码替换诊断或用药代码,并诱导实验室结果值从-100%到+10%的2%的增量变化;最后,在及时性方面,我们对数据进行建模,诱导日期记录从30天到365天的增量变化:使用电子病历的人群中,只有不到四分之一(n=47 326,23%)的人与所有表型重叠。在所有数据类型组合中,每种表型所识别的人群各不相同。每递增一次,诱导不完整性识别出的患者人数就会减少;例如,当诊断不完整性达到 100% 时,慢性病数据仓库表型识别出的患者人数为零,因为其表型特征仅包括诊断代码。诱导的不准确性和及时性同样显示了每种表型的性能差异,因此导致每一次增量变化所识别的患者数量减少:我们使用了一家大型三级医院系统中包含诊断、用药和实验室数据类型的电子病历数据,以了解 T2D 表型的差异和性能。我们使用了诱导数据质量方法,以了解数据质量问题如何影响分母人群的识别,而临床(如临床研究和试验、人群健康评估)和财务或运营决策正是基于这些分母人群做出的。我们研究得出的新结果可能会为未来制定共同的 T2D 可计算表型定义提供参考,该定义可应用于临床信息学、慢性病管理以及医疗保健领域的其他全行业工作。
{"title":"Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients With Type 2 Diabetes: Cross-Sectional Study.","authors":"Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi","doi":"10.2196/56734","DOIUrl":"10.2196/56734","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Increasing and substantial reliance on electronic health records (EHRs) and data types (ie, diagnosis, medication, and laboratory data) demands assessment of their data quality as a fundamental approach, especially since there is a need to identify appropriate denominator populations with chronic conditions, such as type 2 diabetes (T2D), using commonly available computable phenotype definitions (ie, phenotypes).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;To bridge this gap, our study aims to assess how issues of EHR data quality and variations and robustness (or lack thereof) in phenotypes may have potential impacts in identifying denominator populations.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;Approximately 208,000 patients with T2D were included in our study, which used retrospective EHR data from the Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (ie, age, sex, race, and ethnicity), use of health care (inpatient and emergency room visits), and the average Charlson Comorbidity Index score of each phenotype. We then used different methods to induce or simulate data quality issues of completeness, accuracy, and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped diagnosis, medication, and laboratory codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a diagnosis or medication code with another code of the same data type and induced 2% incremental change from -100% to +10% in laboratory result values; and lastly, for timeliness, data were modeled for induced incremental shift of date records by 30 days to 365 days.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Less than a quarter (n=47,326, 23%) of the population overlapped across all phenotypes using EHRs. The population identified by each phenotype varied across all combinations of data types. Induced incompleteness identified fewer patients with each increment; for example, at 100% diagnostic incompleteness, the Chronic Conditions Data Warehouse phenotype identified zero patients, as its phenotypic characteristics included only diagnosis codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype, therefore resulting in fewer patients being identified with each incremental change.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;We used EHR data with diagnosis, medication, and laboratory data types from a large tertiary hospital system to understand T2D phenotypic differences and performance. We used induced data quality methods to learn how data quality issues may impact identification of the denominator populations upon which clinical (eg, clinical research and trials, population health evaluations) and financial or operational decisions are made. The novel results from our study may inform future a","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e56734"},"PeriodicalIF":3.1,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1