首页 > 最新文献

JMIR Medical Informatics最新文献

英文 中文
Extraction of Substance Use Information From Clinical Notes: Generative Pretrained Transformer-Based Investigation. 从临床笔记中提取药物使用信息:基于 GPT 的研究。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-19 DOI: 10.2196/56243
Fatemeh Shah-Mohammadi, Joseph Finkelstein
<p><strong>Background: </strong>Understanding the multifaceted nature of health outcomes requires a comprehensive examination of the social, economic, and environmental determinants that shape individual well-being. Among these determinants, behavioral factors play a crucial role, particularly the consumption patterns of psychoactive substances, which have important implications on public health. The Global Burden of Disease Study shows a growing impact in disability-adjusted life years due to substance use. The successful identification of patients' substance use information equips clinical care teams to address substance-related issues more effectively, enabling targeted support and ultimately improving patient outcomes.</p><p><strong>Objective: </strong>Traditional natural language processing methods face limitations in accurately parsing diverse clinical language associated with substance use. Large language models offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of the generative pretrained transformer (GPT) model in specific GPT-3.5 for extracting tobacco, alcohol, and substance use information from patient discharge summaries in zero-shot and few-shot learning settings. This study contributes to the evolving landscape of health care informatics by showcasing the potential of advanced language models in extracting nuanced information critical for enhancing patient care.</p><p><strong>Methods: </strong>The main data source for analysis in this paper is Medical Information Mart for Intensive Care III data set. Among all notes in this data set, we focused on discharge summaries. Prompt engineering was undertaken, involving an iterative exploration of diverse prompts. Leveraging carefully curated examples and refined prompts, we investigate the model's proficiency through zero-shot as well as few-shot prompting strategies.</p><p><strong>Results: </strong>Results show GPT's varying effectiveness in identifying mentions of tobacco, alcohol, and substance use across learning scenarios. Zero-shot learning showed high accuracy in identifying substance use, whereas few-shot learning reduced accuracy but improved in identifying substance use status, enhancing recall and F<sub>1</sub>-score at the expense of lower precision.</p><p><strong>Conclusions: </strong>Excellence of zero-shot learning in precisely extracting text span mentioning substance use demonstrates its effectiveness in situations in which comprehensive recall is important. Conversely, few-shot learning offers advantages when accurately determining the status of substance use is the primary focus, even if it involves a trade-off in precision. The results contribute to enhancement of early detection and intervention strategies, tailor treatment plans with greater precision, and ultimately, contribute to a holistic understanding of patient health profiles. By integrating these artificial intelligence-driven method
背景:要了解健康结果的多面性,就必须全面研究影响个人福祉的社会、经济和环境决定因素。在这些决定因素中,行为因素起着至关重要的作用,尤其是精神活性物质的消费模式,对公共卫生有着重要影响。全球疾病负担研究》显示,药物使用对残疾调整生命年的影响越来越大。成功识别患者的药物使用信息能让临床护理团队更有效地解决药物相关问题,从而提供有针对性的支持,最终改善患者的治疗效果:传统的自然语言处理(NLP)方法在准确解析与药物使用相关的各种临床语言方面存在局限性。大型语言模型(LLM)通过适应不同的语言模式,有望克服这些挑战。本研究调查了生成式预训练转换器(GPT)模型的应用,特别是 GPT-3.5- 在零镜头和少镜头学习设置中从患者出院摘要中提取烟草、酒精和药物使用信息的应用。这项研究通过展示高级语言模型在提取对加强患者护理至关重要的细微信息方面的潜力,为不断发展的医疗保健信息学做出了贡献:本文分析的主要数据源是重症监护医学信息市场 III(MIMIC-III)数据集。在该数据集中的所有笔记中,我们重点关注出院摘要。我们进行了提示工程,包括对各种提示的反复探索。利用精心策划的示例和改进的提示,我们研究了该模型在零次和少量提示策略下的能力:所展示的结果凸显了 GPT 在提取提及烟草、酒精和药物使用的文本跨度时,在 "零 "和 "少 "两种学习场景下的截然不同的表现。在零次学习场景中,提取烟草、酒精和药物使用信息的准确率明显较高。然而,在少次学习的情况下,准确率则明显下降。相反,与零次学习相比,少次学习在设计物质使用状况方面有显著提高,召回率和 F1 分数也有显著提高。然而,这种提高的代价是,不仅在提取提及使用情况的文本跨度方面,而且在提取使用情况的精确度方面都有所下降:结论:零点学习在精确提取提及药物使用的文本跨度方面的卓越表现,证明了它在全面召回率非常重要的情况下的有效性。相反,当准确判断药物使用状况是主要重点时,即使需要在精确度上做出权衡,零点学习也具有优势。这些结果有助于加强早期检测和干预策略,更精确地定制治疗计划,并最终有助于全面了解患者的健康状况。通过将这些人工智能驱动的方法整合到电子健康记录系统中,临床医生可以即时、全面地了解药物使用情况,从而制定出不仅及时,而且更加个性化和有效的干预措施:
{"title":"Extraction of Substance Use Information From Clinical Notes: Generative Pretrained Transformer-Based Investigation.","authors":"Fatemeh Shah-Mohammadi, Joseph Finkelstein","doi":"10.2196/56243","DOIUrl":"10.2196/56243","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Understanding the multifaceted nature of health outcomes requires a comprehensive examination of the social, economic, and environmental determinants that shape individual well-being. Among these determinants, behavioral factors play a crucial role, particularly the consumption patterns of psychoactive substances, which have important implications on public health. The Global Burden of Disease Study shows a growing impact in disability-adjusted life years due to substance use. The successful identification of patients' substance use information equips clinical care teams to address substance-related issues more effectively, enabling targeted support and ultimately improving patient outcomes.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;Traditional natural language processing methods face limitations in accurately parsing diverse clinical language associated with substance use. Large language models offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of the generative pretrained transformer (GPT) model in specific GPT-3.5 for extracting tobacco, alcohol, and substance use information from patient discharge summaries in zero-shot and few-shot learning settings. This study contributes to the evolving landscape of health care informatics by showcasing the potential of advanced language models in extracting nuanced information critical for enhancing patient care.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;The main data source for analysis in this paper is Medical Information Mart for Intensive Care III data set. Among all notes in this data set, we focused on discharge summaries. Prompt engineering was undertaken, involving an iterative exploration of diverse prompts. Leveraging carefully curated examples and refined prompts, we investigate the model's proficiency through zero-shot as well as few-shot prompting strategies.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Results show GPT's varying effectiveness in identifying mentions of tobacco, alcohol, and substance use across learning scenarios. Zero-shot learning showed high accuracy in identifying substance use, whereas few-shot learning reduced accuracy but improved in identifying substance use status, enhancing recall and F&lt;sub&gt;1&lt;/sub&gt;-score at the expense of lower precision.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;Excellence of zero-shot learning in precisely extracting text span mentioning substance use demonstrates its effectiveness in situations in which comprehensive recall is important. Conversely, few-shot learning offers advantages when accurately determining the status of substance use is the primary focus, even if it involves a trade-off in precision. The results contribute to enhancement of early detection and intervention strategies, tailor treatment plans with greater precision, and ultimately, contribute to a holistic understanding of patient health profiles. By integrating these artificial intelligence-driven method","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e56243"},"PeriodicalIF":3.1,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11369538/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141735797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating and Enhancing the Fitness-for-Purpose of Electronic Health Record Data: Qualitative Study on Current Practices and Pathway to an Automated Approach Within the Medical Informatics for Research and Care in University Medicine Consortium. 评估和加强电子健康记录数据的适用性:在大学医学研究与护理医学信息学联盟内对当前做法和自动化方法途径进行定性研究。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-19 DOI: 10.2196/57153
Gaetan Kamdje Wabo, Preetha Moorthy, Fabian Siegel, Susanne A Seuchter, Thomas Ganslandt

Background: Leveraging electronic health record (EHR) data for clinical or research purposes heavily depends on data fitness. However, there is a lack of standardized frameworks to evaluate EHR data suitability, leading to inconsistent quality in data use projects (DUPs). This research focuses on the Medical Informatics for Research and Care in University Medicine (MIRACUM) Data Integration Centers (DICs) and examines empirical practices on assessing and automating the fitness-for-purpose of clinical data in German DIC settings.

Objective: The study aims (1) to capture and discuss how MIRACUM DICs evaluate and enhance the fitness-for-purpose of observational health care data and examine the alignment with existing recommendations and (2) to identify the requirements for designing and implementing a computer-assisted solution to evaluate EHR data fitness within MIRACUM DICs.

Methods: A qualitative approach was followed using an open-ended survey across DICs of 10 German university hospitals affiliated with MIRACUM. Data were analyzed using thematic analysis following an inductive qualitative method.

Results: All 10 MIRACUM DICs participated, with 17 participants revealing various approaches to assessing data fitness, including the 4-eyes principle and data consistency checks such as cross-system data value comparison. Common practices included a DUP-related feedback loop on data fitness and using self-designed dashboards for monitoring. Most experts had a computer science background and a master's degree, suggesting strong technological proficiency but potentially lacking clinical or statistical expertise. Nine key requirements for a computer-assisted solution were identified, including flexibility, understandability, extendibility, and practicability. Participants used heterogeneous data repositories for evaluating data quality criteria and practical strategies to communicate with research and clinical teams.

Conclusions: The study identifies gaps between current practices in MIRACUM DICs and existing recommendations, offering insights into the complexities of assessing and reporting clinical data fitness. Additionally, a tripartite modular framework for fitness-for-purpose assessment was introduced to streamline the forthcoming implementation. It provides valuable input for developing and integrating an automated solution across multiple locations. This may include statistical comparisons to advanced machine learning algorithms for operationalizing frameworks such as the 3×3 data quality assessment framework. These findings provide foundational evidence for future design and implementation studies to enhance data quality assessments for specific DUPs in observational health care settings.

背景:将电子健康记录(EHR)数据用于临床或研究目的在很大程度上取决于数据的适用性。然而,由于缺乏评估电子健康记录数据适用性的标准化框架,导致数据使用项目(DUP)的质量不一致。本研究以大学医学研究与护理医学信息学(MIRACUM)数据集成中心(DIC)为重点,考察了德国 DIC 设置中临床数据适用性评估和自动化的经验做法:本研究旨在:(1)了解和讨论 MIRACUM DIC 如何评估和加强观察性医疗数据的合用性,并检查与现有建议的一致性;(2)确定设计和实施计算机辅助解决方案的要求,以评估 MIRACUM DIC 中电子病历数据的合用性:方法:采用定性方法,对隶属于 MIRACUM 的 10 家德国大学医院的 DIC 进行开放式调查。采用归纳定性方法对数据进行主题分析:结果:所有 10 家 MIRACUM DIC 都参与了调查,其中 17 位参与者揭示了评估数据合适性的各种方法,包括四眼原则和数据一致性检查(如跨系统数据值比较)。常见做法包括与 DUP 相关的数据适配性反馈回路,以及使用自行设计的仪表板进行监控。大多数专家拥有计算机科学背景和硕士学位,这表明他们具有很强的技术能力,但可能缺乏临床或统计方面的专业知识。他们对计算机辅助解决方案提出了九项关键要求,包括灵活性、可理解性、可扩展性和实用性。参与者使用异构数据存储库评估数据质量标准,并使用实用策略与研究和临床团队进行沟通:研究发现了 MIRACUM DIC 目前的做法与现有建议之间的差距,为评估和报告临床数据适宜性的复杂性提供了见解。此外,为简化即将实施的评估工作,还引入了一个三方模块化框架。它为开发和整合跨多个地点的自动化解决方案提供了宝贵的意见。这可能包括与先进的机器学习算法进行统计比较,以实现 3×3 数据质量评估框架等框架的可操作性。这些发现为未来的设计和实施研究提供了基础证据,以加强对观察性医疗环境中特定 DUP 的数据质量评估。
{"title":"Evaluating and Enhancing the Fitness-for-Purpose of Electronic Health Record Data: Qualitative Study on Current Practices and Pathway to an Automated Approach Within the Medical Informatics for Research and Care in University Medicine Consortium.","authors":"Gaetan Kamdje Wabo, Preetha Moorthy, Fabian Siegel, Susanne A Seuchter, Thomas Ganslandt","doi":"10.2196/57153","DOIUrl":"10.2196/57153","url":null,"abstract":"<p><strong>Background: </strong>Leveraging electronic health record (EHR) data for clinical or research purposes heavily depends on data fitness. However, there is a lack of standardized frameworks to evaluate EHR data suitability, leading to inconsistent quality in data use projects (DUPs). This research focuses on the Medical Informatics for Research and Care in University Medicine (MIRACUM) Data Integration Centers (DICs) and examines empirical practices on assessing and automating the fitness-for-purpose of clinical data in German DIC settings.</p><p><strong>Objective: </strong>The study aims (1) to capture and discuss how MIRACUM DICs evaluate and enhance the fitness-for-purpose of observational health care data and examine the alignment with existing recommendations and (2) to identify the requirements for designing and implementing a computer-assisted solution to evaluate EHR data fitness within MIRACUM DICs.</p><p><strong>Methods: </strong>A qualitative approach was followed using an open-ended survey across DICs of 10 German university hospitals affiliated with MIRACUM. Data were analyzed using thematic analysis following an inductive qualitative method.</p><p><strong>Results: </strong>All 10 MIRACUM DICs participated, with 17 participants revealing various approaches to assessing data fitness, including the 4-eyes principle and data consistency checks such as cross-system data value comparison. Common practices included a DUP-related feedback loop on data fitness and using self-designed dashboards for monitoring. Most experts had a computer science background and a master's degree, suggesting strong technological proficiency but potentially lacking clinical or statistical expertise. Nine key requirements for a computer-assisted solution were identified, including flexibility, understandability, extendibility, and practicability. Participants used heterogeneous data repositories for evaluating data quality criteria and practical strategies to communicate with research and clinical teams.</p><p><strong>Conclusions: </strong>The study identifies gaps between current practices in MIRACUM DICs and existing recommendations, offering insights into the complexities of assessing and reporting clinical data fitness. Additionally, a tripartite modular framework for fitness-for-purpose assessment was introduced to streamline the forthcoming implementation. It provides valuable input for developing and integrating an automated solution across multiple locations. This may include statistical comparisons to advanced machine learning algorithms for operationalizing frameworks such as the 3×3 data quality assessment framework. These findings provide foundational evidence for future design and implementation studies to enhance data quality assessments for specific DUPs in observational health care settings.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e57153"},"PeriodicalIF":3.1,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11369535/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142001479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characteristics of Existing Online Patient Navigation Interventions: Scoping Review. 现有在线患者导航干预措施的特点:范围审查。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-19 DOI: 10.2196/50307
Meghan Marsh, Syeda Rafia Shah, Sarah E P Munce, Laure Perrier, Tin-Suet Joan Lee, Tracey J F Colella, Kristina Marie Kokorelias

Background: Patient navigation interventions (PNIs) can provide personalized support and promote appropriate coordination or continuation of health and social care services. Online PNIs have demonstrated excellent potential for improving patient knowledge, transition readiness, self-efficacy, and use of services. However, the characteristics (ie, intervention type, mode of delivery, duration, frequency, outcomes and outcome measures, underlying theories or mechanisms of change of the intervention, and impact) of existing online PNIs to support the health and social needs of individuals with illness remain unclear.

Objective: This scoping review of the existing literature aims to identify the characteristics of existing online PNIs reported in the literature.

Methods: A scoping review based on the guidelines outlined in the Joanna Briggs Institute framework was conducted. A search for peer-reviewed literature published between 1989 and 2022 on online PNIs was conducted using MEDLINE, CINAHL, Embase, PsycInfo, and Cochrane Library databases. Two independent reviewers conducted 2 levels of screening. Data abstraction was conducted to outline key study characteristics (eg, study design, population, and intervention characteristics). The data were analyzed using descriptive statistics and qualitative content analysis.

Results: A total of 100 studies met the inclusion criteria. Our findings indicate that a variety of study designs are used to describe and evaluate online PNIs, with literature being published between 2003 and 2022 in Western countries. Of these studies, 39 (39%) studies were randomized controlled trials. In addition, we noticed an increase in reported online PNIs since 2019. The majority of studies involved White females with a diagnosis of cancer and a lack of participants aged 70 years or older was observed. Most online PNIs provide support through navigation, self-management and lifestyle changes, counseling, coaching, education, or a combination of support. Variation was noted in terms of mode of delivery, duration, and frequency. Only a small number of studies described theoretical frameworks or change mechanisms to guide intervention.

Conclusions: To our knowledge, this is the first review to comprehensively synthesize the existing literature on online PNIs, by focusing on the characteristics of interventions and studies in this area. Inconsistency in reporting the country of publication, population characteristics, duration and frequency of interventions, and a lack of the use of underlying theories and working mechanisms to inform intervention development, provide guidance for the reporting of future online PNIs.

背景:患者导航干预(PNIs)可提供个性化支持,促进医疗和社会护理服务的适当协调或延续。在线患者导航干预在改善患者知识、过渡准备、自我效能和服务使用方面已显示出巨大的潜力。然而,现有的支持患病者健康和社会需求的在线 PNIs 的特点(即干预类型、提供方式、持续时间、频率、结果和结果测量、干预的基本理论或变化机制以及影响)仍不清楚:本文对现有文献进行了范围界定,旨在确定文献中报道的现有在线 PNI 的特征:方法:根据乔安娜-布里格斯研究所(Joanna Briggs Institute)框架中概述的指导方针进行了范围界定审查。我们使用 MEDLINE、CINAHL、Embase、PsycInfo 和 Cochrane Library 数据库检索了 1989 年至 2022 年间发表的有关在线 PNI 的同行评审文献。两名独立审稿人进行了两级筛选。对数据进行抽取,以概括主要研究特征(如研究设计、人群和干预特征)。采用描述性统计和定性内容分析对数据进行了分析:结果:共有 100 项研究符合纳入标准。我们的研究结果表明,有多种研究设计被用于描述和评估在线 PNI,西方国家的文献发表于 2003 年至 2022 年之间。在这些研究中,39 项(39%)研究为随机对照试验。此外,我们注意到自 2019 年以来,报告的在线 PNI 有所增加。大多数研究涉及确诊为癌症的白人女性,并且观察到缺乏 70 岁或 70 岁以上的参与者。大多数在线 PNI 通过导航、自我管理和生活方式改变、咨询、辅导、教育或综合支持等方式提供支持。在提供方式、持续时间和频率方面存在差异。只有少数研究描述了指导干预的理论框架或改变机制:据我们所知,这是第一篇全面综述有关在线 PNIs 的现有文献的综述,其重点是该领域干预措施和研究的特点。在报告发表国、人群特征、干预持续时间和频率方面的不一致性,以及缺乏使用基础理论和工作机制来指导干预发展的情况,为今后报告在线 PNI 提供了指导。
{"title":"Characteristics of Existing Online Patient Navigation Interventions: Scoping Review.","authors":"Meghan Marsh, Syeda Rafia Shah, Sarah E P Munce, Laure Perrier, Tin-Suet Joan Lee, Tracey J F Colella, Kristina Marie Kokorelias","doi":"10.2196/50307","DOIUrl":"10.2196/50307","url":null,"abstract":"<p><strong>Background: </strong>Patient navigation interventions (PNIs) can provide personalized support and promote appropriate coordination or continuation of health and social care services. Online PNIs have demonstrated excellent potential for improving patient knowledge, transition readiness, self-efficacy, and use of services. However, the characteristics (ie, intervention type, mode of delivery, duration, frequency, outcomes and outcome measures, underlying theories or mechanisms of change of the intervention, and impact) of existing online PNIs to support the health and social needs of individuals with illness remain unclear.</p><p><strong>Objective: </strong>This scoping review of the existing literature aims to identify the characteristics of existing online PNIs reported in the literature.</p><p><strong>Methods: </strong>A scoping review based on the guidelines outlined in the Joanna Briggs Institute framework was conducted. A search for peer-reviewed literature published between 1989 and 2022 on online PNIs was conducted using MEDLINE, CINAHL, Embase, PsycInfo, and Cochrane Library databases. Two independent reviewers conducted 2 levels of screening. Data abstraction was conducted to outline key study characteristics (eg, study design, population, and intervention characteristics). The data were analyzed using descriptive statistics and qualitative content analysis.</p><p><strong>Results: </strong>A total of 100 studies met the inclusion criteria. Our findings indicate that a variety of study designs are used to describe and evaluate online PNIs, with literature being published between 2003 and 2022 in Western countries. Of these studies, 39 (39%) studies were randomized controlled trials. In addition, we noticed an increase in reported online PNIs since 2019. The majority of studies involved White females with a diagnosis of cancer and a lack of participants aged 70 years or older was observed. Most online PNIs provide support through navigation, self-management and lifestyle changes, counseling, coaching, education, or a combination of support. Variation was noted in terms of mode of delivery, duration, and frequency. Only a small number of studies described theoretical frameworks or change mechanisms to guide intervention.</p><p><strong>Conclusions: </strong>To our knowledge, this is the first review to comprehensively synthesize the existing literature on online PNIs, by focusing on the characteristics of interventions and studies in this area. Inconsistency in reporting the country of publication, population characteristics, duration and frequency of interventions, and a lack of the use of underlying theories and working mechanisms to inform intervention development, provide guidance for the reporting of future online PNIs.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e50307"},"PeriodicalIF":3.1,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11369544/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142006045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging Real-World Data Gaps: Connecting Dots Across 10 Asian Countries. 缩小现实世界的数据差距:连接十个亚洲国家的点。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-15 DOI: 10.2196/58548
Guilherme Silva Julian, Wen-Yi Shau, Hsu-Wen Chou, Sajita Setia

The economic trend and the health care landscape are rapidly evolving across Asia. Effective real-world data (RWD) for regulatory and clinical decision-making is a crucial milestone associated with this evolution. This necessitates a critical evaluation of RWD generation within distinct nations for the use of various RWD warehouses in the generation of real-world evidence (RWE). In this article, we outline the RWD generation trends for 2 contrasting nation archetypes: "Solo Scholars"-nations with relatively self-sufficient RWD research systems-and "Global Collaborators"-countries largely reliant on international infrastructures for RWD generation. The key trends and patterns in RWD generation, country-specific insights into the predominant databases used in each country to produce RWE, and insights into the broader landscape of RWD database use across these countries are discussed. Conclusively, the data point out the heterogeneous nature of RWD generation practices across 10 different Asian nations and advocate for strategic enhancements in data harmonization. The evidence highlights the imperative for improved database integration and the establishment of standardized protocols and infrastructure for leveraging electronic medical records (EMR) in streamlining RWD acquisition. The clinical data analysis and reporting system of Hong Kong is an excellent example of a successful EMR system that showcases the capacity of integrated robust EMR platforms to consolidate and produce diverse RWE. This, in turn, can potentially reduce the necessity for reliance on numerous condition-specific local and global registries or limited and largely unavailable medical insurance or claims databases in most Asian nations. Linking health technology assessment processes with open data initiatives such as the Observational Medical Outcomes Partnership Common Data Model and the Observational Health Data Sciences and Informatics could enable the leveraging of global data resources to inform local decision-making. Advancing such initiatives is crucial for reinforcing health care frameworks in resource-limited settings and advancing toward cohesive, evidence-driven health care policy and improved patient outcomes in the region.

非结构化:整个亚洲的经济趋势和医疗保健格局正在迅速演变。为监管和临床决策提供有效的真实世界数据(RWD)是与这一演变相关的重要里程碑。这就需要对不同国家的真实世界数据(RWD)生成情况进行严格评估,以便在生成真实世界证据(RWE)时利用各种真实世界数据仓库。在本文中,我们概述了两种截然不同的国家典型的真实世界数据生成趋势,一种是 "独行学者"--拥有相对自给自足的真实世界数据研究系统的国家,另一种是 "全球合作者"--在很大程度上依赖国际基础设施生成真实世界数据的国家。本报告讨论了研究与发展数据生成的主要趋势和模式、对各国用于生成研究与发展数据的主要数据库的国别见解,以及对这些国家研究与发展数据库利用的更广泛情况的见解。最后,数据指出了亚洲 10 个不同国家在生成 RWD 方面的不同做法,并主张从战略上加强数据协调。这些证据突出表明,必须改进数据库整合,建立标准化的协议和基础设施,以便利用电子病历(EMR)简化 RWD 采集工作。香港的临床数据分析和报告系统(CDARS)是一个成功的 EMR 系统的极佳范例,它展示了集成的强大 EMR 平台整合和生成不同 RWE 的能力。这反过来又有可能减少大多数亚洲国家对众多针对特定病症的本地和全球登记册或有限且基本不可用的医疗保险或索赔数据库的依赖。将卫生技术评估(HTA)流程与观察性医疗结果伙伴关系共同数据模型和观察性健康数据科学与信息学等开放数据倡议联系起来,可以充分利用全球数据资源,为地方决策提供信息。推进此类倡议对于在资源有限的环境中加强医疗保健框架、推动制定具有凝聚力的循证医疗保健政策以及改善该地区患者的治疗效果至关重要。
{"title":"Bridging Real-World Data Gaps: Connecting Dots Across 10 Asian Countries.","authors":"Guilherme Silva Julian, Wen-Yi Shau, Hsu-Wen Chou, Sajita Setia","doi":"10.2196/58548","DOIUrl":"10.2196/58548","url":null,"abstract":"<p><p>The economic trend and the health care landscape are rapidly evolving across Asia. Effective real-world data (RWD) for regulatory and clinical decision-making is a crucial milestone associated with this evolution. This necessitates a critical evaluation of RWD generation within distinct nations for the use of various RWD warehouses in the generation of real-world evidence (RWE). In this article, we outline the RWD generation trends for 2 contrasting nation archetypes: \"Solo Scholars\"-nations with relatively self-sufficient RWD research systems-and \"Global Collaborators\"-countries largely reliant on international infrastructures for RWD generation. The key trends and patterns in RWD generation, country-specific insights into the predominant databases used in each country to produce RWE, and insights into the broader landscape of RWD database use across these countries are discussed. Conclusively, the data point out the heterogeneous nature of RWD generation practices across 10 different Asian nations and advocate for strategic enhancements in data harmonization. The evidence highlights the imperative for improved database integration and the establishment of standardized protocols and infrastructure for leveraging electronic medical records (EMR) in streamlining RWD acquisition. The clinical data analysis and reporting system of Hong Kong is an excellent example of a successful EMR system that showcases the capacity of integrated robust EMR platforms to consolidate and produce diverse RWE. This, in turn, can potentially reduce the necessity for reliance on numerous condition-specific local and global registries or limited and largely unavailable medical insurance or claims databases in most Asian nations. Linking health technology assessment processes with open data initiatives such as the Observational Medical Outcomes Partnership Common Data Model and the Observational Health Data Sciences and Informatics could enable the leveraging of global data resources to inform local decision-making. Advancing such initiatives is crucial for reinforcing health care frameworks in resource-limited settings and advancing toward cohesive, evidence-driven health care policy and improved patient outcomes in the region.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e58548"},"PeriodicalIF":3.1,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11362708/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of AI-Driven LabTest Checker for Diagnostic Accuracy and Safety: Prospective Cohort Study. 评估人工智能驱动的 LabTest Checker 的诊断准确性和安全性:前瞻性队列研究
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-14 DOI: 10.2196/57162
Dawid Szumilas, Anna Ochmann, Katarzyna Zięba, Bartłomiej Bartoszewicz, Anna Kubrak, Sebastian Makuch, Siddarth Agrawal, Grzegorz Mazur, Jerzy Chudek

Background: In recent years, the implementation of artificial intelligence (AI) in health care is progressively transforming medical fields, with the use of clinical decision support systems (CDSSs) as a notable application. Laboratory tests are vital for accurate diagnoses, but their increasing reliance presents challenges. The need for effective strategies for managing laboratory test interpretation is evident from the millions of monthly searches on test results' significance. As the potential role of CDSSs in laboratory diagnostics gains significance, however, more research is needed to explore this area.

Objective: The primary objective of our study was to assess the accuracy and safety of LabTest Checker (LTC), a CDSS designed to support medical diagnoses by analyzing both laboratory test results and patients' medical histories.

Methods: This cohort study embraced a prospective data collection approach. A total of 101 patients aged ≥18 years, in stable condition, and requiring comprehensive diagnosis were enrolled. A panel of blood laboratory tests was conducted for each participant. Participants used LTC for test result interpretation. The accuracy and safety of the tool were assessed by comparing AI-generated suggestions to experienced doctor (consultant) recommendations, which are considered the gold standard.

Results: The system achieved a 74.3% accuracy and 100% sensitivity for emergency safety and 92.3% sensitivity for urgent cases. It potentially reduced unnecessary medical visits by 41.6% (42/101) and achieved an 82.9% accuracy in identifying underlying pathologies.

Conclusions: This study underscores the transformative potential of AI-based CDSSs in laboratory diagnostics, contributing to enhanced patient care, efficient health care systems, and improved medical outcomes. LTC's performance evaluation highlights the advancements in AI's role in laboratory medicine.

背景:近年来,人工智能(AI)在医疗保健领域的应用正在逐步改变医疗领域,其中临床决策支持系统(CDSS)的使用是一个显著的应用。实验室检测对准确诊断至关重要,但对其依赖性的增加也带来了挑战。从每月数百万次关于检验结果意义的搜索中可以明显看出,需要有效的策略来管理实验室检验的解释。然而,随着 CDSS 在实验室诊断中的潜在作用越来越重要,需要更多的研究来探索这一领域:我们研究的主要目的是评估 LabTest Checker(LTC)的准确性和安全性,LTC 是一种 CDSS,旨在通过分析实验室检验结果和患者病史来支持医疗诊断:这项队列研究采用了前瞻性数据收集方法。方法:这项队列研究采用前瞻性数据收集方法,共纳入 101 名年龄≥18 岁、病情稳定、需要综合诊断的患者。对每位参与者进行了一系列血液化验检查。参与者使用 LTC 对化验结果进行解释。通过比较人工智能生成的建议和有经验的医生(顾问)的建议(后者被认为是金标准),对该工具的准确性和安全性进行了评估:结果:该系统的准确率为 74.3%,对急诊安全的敏感度为 100%,对紧急病例的敏感度为 92.3%。该系统可减少 41.6% 的不必要就诊(42/101),在识别潜在病症方面的准确率达到 82.9%:这项研究强调了基于人工智能的 CDSS 在实验室诊断中的变革潜力,有助于加强患者护理、提高医疗保健系统的效率和改善医疗效果。LTC 的性能评估突显了人工智能在实验室医学中的作用。
{"title":"Evaluation of AI-Driven LabTest Checker for Diagnostic Accuracy and Safety: Prospective Cohort Study.","authors":"Dawid Szumilas, Anna Ochmann, Katarzyna Zięba, Bartłomiej Bartoszewicz, Anna Kubrak, Sebastian Makuch, Siddarth Agrawal, Grzegorz Mazur, Jerzy Chudek","doi":"10.2196/57162","DOIUrl":"10.2196/57162","url":null,"abstract":"<p><strong>Background: </strong>In recent years, the implementation of artificial intelligence (AI) in health care is progressively transforming medical fields, with the use of clinical decision support systems (CDSSs) as a notable application. Laboratory tests are vital for accurate diagnoses, but their increasing reliance presents challenges. The need for effective strategies for managing laboratory test interpretation is evident from the millions of monthly searches on test results' significance. As the potential role of CDSSs in laboratory diagnostics gains significance, however, more research is needed to explore this area.</p><p><strong>Objective: </strong>The primary objective of our study was to assess the accuracy and safety of LabTest Checker (LTC), a CDSS designed to support medical diagnoses by analyzing both laboratory test results and patients' medical histories.</p><p><strong>Methods: </strong>This cohort study embraced a prospective data collection approach. A total of 101 patients aged ≥18 years, in stable condition, and requiring comprehensive diagnosis were enrolled. A panel of blood laboratory tests was conducted for each participant. Participants used LTC for test result interpretation. The accuracy and safety of the tool were assessed by comparing AI-generated suggestions to experienced doctor (consultant) recommendations, which are considered the gold standard.</p><p><strong>Results: </strong>The system achieved a 74.3% accuracy and 100% sensitivity for emergency safety and 92.3% sensitivity for urgent cases. It potentially reduced unnecessary medical visits by 41.6% (42/101) and achieved an 82.9% accuracy in identifying underlying pathologies.</p><p><strong>Conclusions: </strong>This study underscores the transformative potential of AI-based CDSSs in laboratory diagnostics, contributing to enhanced patient care, efficient health care systems, and improved medical outcomes. LTC's performance evaluation highlights the advancements in AI's role in laboratory medicine.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e57162"},"PeriodicalIF":3.1,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11337233/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141989620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transforming Primary Care Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study. 将基层医疗数据转化为观察性医疗结果合作组织通用数据模型:开发和可用性研究。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-13 DOI: 10.2196/49542
Mathilde Fruchart, Paul Quindroit, Chloé Jacquemont, Jean-Baptiste Beuscart, Matthieu Calafiore, Antoine Lamer

Background: Patient-monitoring software generates a large amount of data that can be reused for clinical audits and scientific research. The Observational Health Data Sciences and Informatics (OHDSI) consortium developed the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to standardize electronic health record data and promote large-scale observational and longitudinal research.

Objective: This study aimed to transform primary care data into the OMOP CDM format.

Methods: We extracted primary care data from electronic health records at a multidisciplinary health center in Wattrelos, France. We performed structural mapping between the design of our local primary care database and the OMOP CDM tables and fields. Local French vocabularies concepts were mapped to OHDSI standard vocabularies. To validate the implementation of primary care data into the OMOP CDM format, we applied a set of queries. A practical application was achieved through the development of a dashboard.

Results: Data from 18,395 patients were implemented into the OMOP CDM, corresponding to 592,226 consultations over a period of 20 years. A total of 18 OMOP CDM tables were implemented. A total of 17 local vocabularies were identified as being related to primary care and corresponded to patient characteristics (sex, location, year of birth, and race), units of measurement, biometric measures, laboratory test results, medical histories, and drug prescriptions. During semantic mapping, 10,221 primary care concepts were mapped to standard OHDSI concepts. Five queries were used to validate the OMOP CDM by comparing the results obtained after the completion of the transformations with the results obtained in the source software. Lastly, a prototype dashboard was developed to visualize the activity of the health center, the laboratory test results, and the drug prescription data.

Conclusions: Primary care data from a French health care facility have been implemented into the OMOP CDM format. Data concerning demographics, units, measurements, and primary care consultation steps were already available in OHDSI vocabularies. Laboratory test results and drug prescription data were mapped to available vocabularies and structured in the final model. A dashboard application provided health care professionals with feedback on their practice.

背景:患者监测软件会产生大量数据,这些数据可重复用于临床审计和科学研究。观察性健康数据科学与信息学(OHDSI)联盟开发了观察性医疗结果合作组织(OMOP)通用数据模型(CDM),以规范电子健康记录数据,促进大规模观察性和纵向研究:本研究旨在将基础医疗数据转换为 OMOP CDM 格式:我们从法国瓦特雷洛斯一家多学科医疗中心的电子健康记录中提取了基础医疗数据。我们在本地初级医疗数据库设计与 OMOP CDM 表和字段之间进行了结构映射。本地法文词汇表概念与 OHDSI 标准词汇表进行了映射。为了验证将基础医疗数据转换为 OMOP CDM 格式的实施情况,我们使用了一组查询。通过开发仪表板实现了实际应用:我们将 18,395 名患者的数据导入了 OMOP CDM,这些数据与 20 年间的 592,226 次问诊相对应。共实施了 18 个 OMOP CDM 表。共确定了 17 个与初级保健相关的本地词汇表,这些词汇表与患者特征(性别、地点、出生年份和种族)、测量单位、生物计量、实验室检测结果、病史和药物处方相对应。在语义映射过程中,10,221 个初级医疗概念被映射为标准的 OHDSI 概念。通过比较完成转换后获得的结果与源软件中获得的结果,使用了五个查询来验证 OMOP CDM。最后,开发了一个仪表盘原型,用于直观显示医疗中心的活动、实验室检测结果和药物处方数据:法国一家医疗机构的基础医疗数据已被转换成 OMOP CDM 格式。有关人口统计学、单位、测量和初级保健咨询步骤的数据已在 OHDSI 词汇表中提供。实验室检测结果和药物处方数据被映射到可用的词汇表中,并在最终模型中进行了结构化处理。仪表板应用程序为医护人员提供了有关其实践的反馈信息。
{"title":"Transforming Primary Care Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study.","authors":"Mathilde Fruchart, Paul Quindroit, Chloé Jacquemont, Jean-Baptiste Beuscart, Matthieu Calafiore, Antoine Lamer","doi":"10.2196/49542","DOIUrl":"10.2196/49542","url":null,"abstract":"<p><strong>Background: </strong>Patient-monitoring software generates a large amount of data that can be reused for clinical audits and scientific research. The Observational Health Data Sciences and Informatics (OHDSI) consortium developed the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to standardize electronic health record data and promote large-scale observational and longitudinal research.</p><p><strong>Objective: </strong>This study aimed to transform primary care data into the OMOP CDM format.</p><p><strong>Methods: </strong>We extracted primary care data from electronic health records at a multidisciplinary health center in Wattrelos, France. We performed structural mapping between the design of our local primary care database and the OMOP CDM tables and fields. Local French vocabularies concepts were mapped to OHDSI standard vocabularies. To validate the implementation of primary care data into the OMOP CDM format, we applied a set of queries. A practical application was achieved through the development of a dashboard.</p><p><strong>Results: </strong>Data from 18,395 patients were implemented into the OMOP CDM, corresponding to 592,226 consultations over a period of 20 years. A total of 18 OMOP CDM tables were implemented. A total of 17 local vocabularies were identified as being related to primary care and corresponded to patient characteristics (sex, location, year of birth, and race), units of measurement, biometric measures, laboratory test results, medical histories, and drug prescriptions. During semantic mapping, 10,221 primary care concepts were mapped to standard OHDSI concepts. Five queries were used to validate the OMOP CDM by comparing the results obtained after the completion of the transformations with the results obtained in the source software. Lastly, a prototype dashboard was developed to visualize the activity of the health center, the laboratory test results, and the drug prescription data.</p><p><strong>Conclusions: </strong>Primary care data from a French health care facility have been implemented into the OMOP CDM format. Data concerning demographics, units, measurements, and primary care consultation steps were already available in OHDSI vocabularies. Laboratory test results and drug prescription data were mapped to available vocabularies and structured in the final model. A dashboard application provided health care professionals with feedback on their practice.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e49542"},"PeriodicalIF":3.1,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11337138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141977388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recognition of Daily Activities in Adults With Wearable Inertial Sensors: Deep Learning Methods Study. 利用可穿戴惯性传感器识别成年人的日常活动:深度学习方法研究。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-09 DOI: 10.2196/57097
Alberto De Ramón Fernández, Daniel Ruiz Fernández, Miguel García Jaén, Juan M Cortell-Tormo
<p><strong>Background: </strong>Activities of daily living (ADL) are essential for independence and personal well-being, reflecting an individual's functional status. Impairment in executing these tasks can limit autonomy and negatively affect quality of life. The assessment of physical function during ADL is crucial for the prevention and rehabilitation of movement limitations. Still, its traditional evaluation based on subjective observation has limitations in precision and objectivity.</p><p><strong>Objective: </strong>The primary objective of this study is to use innovative technology, specifically wearable inertial sensors combined with artificial intelligence techniques, to objectively and accurately evaluate human performance in ADL. It is proposed to overcome the limitations of traditional methods by implementing systems that allow dynamic and noninvasive monitoring of movements during daily activities. The approach seeks to provide an effective tool for the early detection of dysfunctions and the personalization of treatment and rehabilitation plans, thus promoting an improvement in the quality of life of individuals.</p><p><strong>Methods: </strong>To monitor movements, wearable inertial sensors were developed, which include accelerometers and triaxial gyroscopes. The developed sensors were used to create a proprietary database with 6 movements related to the shoulder and 3 related to the back. We registered 53,165 activity records in the database (consisting of accelerometer and gyroscope measurements), which were reduced to 52,600 after processing to remove null or abnormal values. Finally, 4 deep learning (DL) models were created by combining various processing layers to explore different approaches in ADL recognition.</p><p><strong>Results: </strong>The results revealed high performance of the 4 proposed models, with levels of accuracy, precision, recall, and F<sub>1</sub>-score ranging between 95% and 97% for all classes and an average loss of 0.10. These results indicate the great capacity of the models to accurately identify a variety of activities, with a good balance between precision and recall. Both the convolutional and bidirectional approaches achieved slightly superior results, although the bidirectional model reached convergence in a smaller number of epochs.</p><p><strong>Conclusions: </strong>The DL models implemented have demonstrated solid performance, indicating an effective ability to identify and classify various daily activities related to the shoulder and lumbar region. These results were achieved with minimal sensorization-being noninvasive and practically imperceptible to the user-which does not affect their daily routine and promotes acceptance and adherence to continuous monitoring, thus improving the reliability of the data collected. This research has the potential to have a significant impact on the clinical evaluation and rehabilitation of patients with movement limitations, by providing an objective and
背景:日常生活活动(ADL)是独立和个人幸福的必要条件,反映了个人的功能状况。执行这些任务的能力受损会限制自主性,并对生活质量产生负面影响。对 ADL 过程中的身体功能进行评估,对于预防和康复运动受限至关重要。然而,基于主观观察的传统评估在精确性和客观性方面仍有局限:本研究的主要目的是利用创新技术,特别是结合人工智能技术的可穿戴惯性传感器,客观、准确地评估人类在日常活动中的表现。建议通过实施可对日常活动中的动作进行动态和无创监测的系统,克服传统方法的局限性。该方法旨在为早期发现功能障碍以及个性化治疗和康复计划提供有效工具,从而促进个人生活质量的提高:方法:为了监测运动,开发了可穿戴惯性传感器,其中包括加速度计和三轴陀螺仪。所开发的传感器用于创建一个专有数据库,其中包含 6 个与肩部有关的动作和 3 个与背部有关的动作。我们在数据库中登记了 53,165 条活动记录(包括加速度计和陀螺仪测量值),经过去除空值或异常值的处理后,这些记录减少到 52,600 条。最后,我们结合不同的处理层创建了 4 个深度学习(DL)模型,以探索 ADL 识别的不同方法:结果表明,所提出的 4 个模型都有很高的性能,所有类别的准确率、精确度、召回率和 F1 分数都在 95% 到 97% 之间,平均损失为 0.10。这些结果表明,这些模型在精确度和召回率之间取得了良好的平衡,具有准确识别各种活动的强大能力。卷积方法和双向方法的结果都略胜一筹,不过双向模型在较少的历时内就达到了收敛:结论:已实施的 DL 模型表现出了良好的性能,表明它们能够有效识别和分类与肩部和腰部有关的各种日常活动。这些结果是在传感器最小化的情况下取得的--非侵入性,用户几乎无法察觉--这不会影响他们的日常生活,并促进了对持续监测的接受和坚持,从而提高了所收集数据的可靠性。这项研究提供了一种检测关键运动模式和关节功能障碍的客观先进工具,有望对运动受限患者的临床评估和康复产生重大影响。
{"title":"Recognition of Daily Activities in Adults With Wearable Inertial Sensors: Deep Learning Methods Study.","authors":"Alberto De Ramón Fernández, Daniel Ruiz Fernández, Miguel García Jaén, Juan M Cortell-Tormo","doi":"10.2196/57097","DOIUrl":"10.2196/57097","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Activities of daily living (ADL) are essential for independence and personal well-being, reflecting an individual's functional status. Impairment in executing these tasks can limit autonomy and negatively affect quality of life. The assessment of physical function during ADL is crucial for the prevention and rehabilitation of movement limitations. Still, its traditional evaluation based on subjective observation has limitations in precision and objectivity.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;The primary objective of this study is to use innovative technology, specifically wearable inertial sensors combined with artificial intelligence techniques, to objectively and accurately evaluate human performance in ADL. It is proposed to overcome the limitations of traditional methods by implementing systems that allow dynamic and noninvasive monitoring of movements during daily activities. The approach seeks to provide an effective tool for the early detection of dysfunctions and the personalization of treatment and rehabilitation plans, thus promoting an improvement in the quality of life of individuals.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;To monitor movements, wearable inertial sensors were developed, which include accelerometers and triaxial gyroscopes. The developed sensors were used to create a proprietary database with 6 movements related to the shoulder and 3 related to the back. We registered 53,165 activity records in the database (consisting of accelerometer and gyroscope measurements), which were reduced to 52,600 after processing to remove null or abnormal values. Finally, 4 deep learning (DL) models were created by combining various processing layers to explore different approaches in ADL recognition.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The results revealed high performance of the 4 proposed models, with levels of accuracy, precision, recall, and F&lt;sub&gt;1&lt;/sub&gt;-score ranging between 95% and 97% for all classes and an average loss of 0.10. These results indicate the great capacity of the models to accurately identify a variety of activities, with a good balance between precision and recall. Both the convolutional and bidirectional approaches achieved slightly superior results, although the bidirectional model reached convergence in a smaller number of epochs.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;The DL models implemented have demonstrated solid performance, indicating an effective ability to identify and classify various daily activities related to the shoulder and lumbar region. These results were achieved with minimal sensorization-being noninvasive and practically imperceptible to the user-which does not affect their daily routine and promotes acceptance and adherence to continuous monitoring, thus improving the reliability of the data collected. This research has the potential to have a significant impact on the clinical evaluation and rehabilitation of patients with movement limitations, by providing an objective and ","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e57097"},"PeriodicalIF":3.1,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11344189/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141910167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing ChatGPT as a Medical Consultation Assistant for Chronic Hepatitis B: Cross-Language Study of English and Chinese. 评估作为慢性乙型肝炎医疗咨询助手的 ChatGPT:中英文跨语言研究。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-08 DOI: 10.2196/56426
Yijie Wang, Yining Chen, Jifang Sheng
<p><strong>Background: </strong>Chronic hepatitis B (CHB) imposes substantial economic and social burdens globally. The management of CHB involves intricate monitoring and adherence challenges, particularly in regions like China, where a high prevalence of CHB intersects with health care resource limitations. This study explores the potential of ChatGPT-3.5, an emerging artificial intelligence (AI) assistant, to address these complexities. With notable capabilities in medical education and practice, ChatGPT-3.5's role is examined in managing CHB, particularly in regions with distinct health care landscapes.</p><p><strong>Objective: </strong>This study aimed to uncover insights into ChatGPT-3.5's potential and limitations in delivering personalized medical consultation assistance for CHB patients across diverse linguistic contexts.</p><p><strong>Methods: </strong>Questions sourced from published guidelines, online CHB communities, and search engines in English and Chinese were refined, translated, and compiled into 96 inquiries. Subsequently, these questions were presented to both ChatGPT-3.5 and ChatGPT-4.0 in independent dialogues. The responses were then evaluated by senior physicians, focusing on informativeness, emotional management, consistency across repeated inquiries, and cautionary statements regarding medical advice. Additionally, a true-or-false questionnaire was employed to further discern the variance in information accuracy for closed questions between ChatGPT-3.5 and ChatGPT-4.0.</p><p><strong>Results: </strong>Over half of the responses (228/370, 61.6%) from ChatGPT-3.5 were considered comprehensive. In contrast, ChatGPT-4.0 exhibited a higher percentage at 74.5% (172/222; P<.001). Notably, superior performance was evident in English, particularly in terms of informativeness and consistency across repeated queries. However, deficiencies were identified in emotional management guidance, with only 3.2% (6/186) in ChatGPT-3.5 and 8.1% (15/154) in ChatGPT-4.0 (P=.04). ChatGPT-3.5 included a disclaimer in 10.8% (24/222) of responses, while ChatGPT-4.0 included a disclaimer in 13.1% (29/222) of responses (P=.46). When responding to true-or-false questions, ChatGPT-4.0 achieved an accuracy rate of 93.3% (168/180), significantly surpassing ChatGPT-3.5's accuracy rate of 65.0% (117/180) (P<.001).</p><p><strong>Conclusions: </strong>In this study, ChatGPT demonstrated basic capabilities as a medical consultation assistant for CHB management. The choice of working language for ChatGPT-3.5 was considered a potential factor influencing its performance, particularly in the use of terminology and colloquial language, and this potentially affects its applicability within specific target populations. However, as an updated model, ChatGPT-4.0 exhibits improved information processing capabilities, overcoming the language impact on information accuracy. This suggests that the implications of model advancement on applications need to be considered whe
背景:慢性乙型肝炎(CHB)给全球带来了巨大的经济和社会负担。慢性乙型肝炎的管理涉及复杂的监测和依从性挑战,尤其是在中国这样的地区,慢性乙型肝炎的高发病率与医疗资源的局限性交织在一起。本研究探讨了新兴人工智能(AI)助手 ChatGPT-3.5 解决这些复杂问题的潜力。ChatGPT-3.5 在医学教育和实践方面具有显著的功能,本研究探讨了 ChatGPT-3.5 在管理慢性阻塞性肺病方面的作用,尤其是在具有独特医疗保健环境的地区:本研究旨在揭示 ChatGPT-3.5 在不同语言环境下为慢性阻塞性肺病患者提供个性化医疗咨询帮助的潜力和局限性:方法: 研究人员从已发布的指南、在线慢性阻塞性肺病社区和搜索引擎中获取了中英文问题,并对其进行了提炼、翻译,最后将其汇编成 96 个问题。随后,这些问题以独立对话的形式提交给 ChatGPT-3.5 和 ChatGPT-4.0。然后,由资深医生对这些回答进行评估,重点关注信息量、情绪管理、重复询问的一致性以及有关医疗建议的警示性声明。此外,ChatGPT-3.5 和 ChatGPT-4.0 还采用了 "真或假 "问卷来进一步确定封闭式问题的信息准确性差异:ChatGPT-3.5 中超过一半的回答(228/370,61.6%)被认为是全面的。相比之下,ChatGPT-4.0 的比例更高,达到 74.5%(172/222;PC 结论:在这项研究中,ChatGPT 展示了作为慢性阻塞性肺病管理医疗咨询助手的基本能力。ChatGPT-3.5 工作语言的选择被认为是影响其性能的一个潜在因素,尤其是在术语和口语的使用方面,这可能会影响其在特定目标人群中的适用性。然而,作为一个更新的模型,ChatGPT-4.0 在信息处理能力方面有所提高,克服了语言对信息准确性的影响。这表明,在选择大型语言模型作为医疗咨询助手时,需要考虑模型进步对应用的影响。鉴于这两种模型在情绪引导管理方面表现不佳,本研究强调了在将 ChatGPT 用于医疗目的时提供特定语言培训和情绪管理策略的重要性。此外,应进一步研究这些模型在对话中使用免责声明的倾向,以了解在实际应用中对患者体验的影响。
{"title":"Assessing ChatGPT as a Medical Consultation Assistant for Chronic Hepatitis B: Cross-Language Study of English and Chinese.","authors":"Yijie Wang, Yining Chen, Jifang Sheng","doi":"10.2196/56426","DOIUrl":"10.2196/56426","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Chronic hepatitis B (CHB) imposes substantial economic and social burdens globally. The management of CHB involves intricate monitoring and adherence challenges, particularly in regions like China, where a high prevalence of CHB intersects with health care resource limitations. This study explores the potential of ChatGPT-3.5, an emerging artificial intelligence (AI) assistant, to address these complexities. With notable capabilities in medical education and practice, ChatGPT-3.5's role is examined in managing CHB, particularly in regions with distinct health care landscapes.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to uncover insights into ChatGPT-3.5's potential and limitations in delivering personalized medical consultation assistance for CHB patients across diverse linguistic contexts.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;Questions sourced from published guidelines, online CHB communities, and search engines in English and Chinese were refined, translated, and compiled into 96 inquiries. Subsequently, these questions were presented to both ChatGPT-3.5 and ChatGPT-4.0 in independent dialogues. The responses were then evaluated by senior physicians, focusing on informativeness, emotional management, consistency across repeated inquiries, and cautionary statements regarding medical advice. Additionally, a true-or-false questionnaire was employed to further discern the variance in information accuracy for closed questions between ChatGPT-3.5 and ChatGPT-4.0.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Over half of the responses (228/370, 61.6%) from ChatGPT-3.5 were considered comprehensive. In contrast, ChatGPT-4.0 exhibited a higher percentage at 74.5% (172/222; P&lt;.001). Notably, superior performance was evident in English, particularly in terms of informativeness and consistency across repeated queries. However, deficiencies were identified in emotional management guidance, with only 3.2% (6/186) in ChatGPT-3.5 and 8.1% (15/154) in ChatGPT-4.0 (P=.04). ChatGPT-3.5 included a disclaimer in 10.8% (24/222) of responses, while ChatGPT-4.0 included a disclaimer in 13.1% (29/222) of responses (P=.46). When responding to true-or-false questions, ChatGPT-4.0 achieved an accuracy rate of 93.3% (168/180), significantly surpassing ChatGPT-3.5's accuracy rate of 65.0% (117/180) (P&lt;.001).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;In this study, ChatGPT demonstrated basic capabilities as a medical consultation assistant for CHB management. The choice of working language for ChatGPT-3.5 was considered a potential factor influencing its performance, particularly in the use of terminology and colloquial language, and this potentially affects its applicability within specific target populations. However, as an updated model, ChatGPT-4.0 exhibits improved information processing capabilities, overcoming the language impact on information accuracy. This suggests that the implications of model advancement on applications need to be considered whe","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e56426"},"PeriodicalIF":3.1,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11342014/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141903716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pediatric Sedation Assessment and Management System (PSAMS) for Pediatric Sedation in China: Development and Implementation Report. 中国儿科镇静评估与管理系统(PSAMS):开发与实施报告》。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-07 DOI: 10.2196/53427
Ziyu Zhu, Lan Liu, Min Du, Mao Ye, Ximing Xu, Ying Xu

Background: Recently, the growing demand for pediatric sedation services outside the operating room has imposed a heavy burden on pediatric centers in China. There is an urgent need to develop a novel system for improved sedation services.

Objective: This study aimed to develop and implement a computerized system, the Pediatric Sedation Assessment and Management System (PSAMS), to streamline pediatric sedation services at a major children's hospital in Southwest China.

Methods: PSAMS was designed to reflect the actual workflow of pediatric sedation. It consists of 3 main components: server-hosted software; client applications on tablets and computers; and specialized devices like gun-type scanners, desktop label printers, and pulse oximeters. With the participation of a multidisciplinary team, PSAMS was developed and refined during its application in the sedation process. This study analyzed data from the first 2 years after the system's deployment.

Unlabelled: From January 2020 to December 2021, a total of 127,325 sedations were performed on 85,281 patients using the PSAMS database. Besides basic variables imported from Hospital Information Systems (HIS), the PSAMS database currently contains 33 additional variables that capture comprehensive information from presedation assessment to postprocedural recovery. The recorded data from PSAMS indicates a one-time sedation success rate of 97.1% (50,752/52,282) in 2020 and 97.5% (73,184/75,043) in 2021. The observed adverse events rate was 3.5% (95% CI 3.4%-3.7%) in 2020 and 2.8% (95% CI 2.7%-2.9%) in 2021.

Conclusions: PSAMS streamlined the entire sedation workflow, reduced the burden of data collection, and laid a foundation for future cooperation of multiple pediatric health care centers.

背景:近年来,手术室外小儿镇静服务的需求不断增长,给中国儿科中心带来了沉重的负担。为改善镇静服务,迫切需要开发一种新型系统:本研究旨在开发并实施一套计算机化系统--儿科镇静评估与管理系统(PSAMS),以简化中国西南地区一家大型儿童医院的儿科镇静服务:方法:PSAMS 的设计反映了儿科镇静的实际工作流程。该系统由三个主要部分组成:服务器托管软件;平板电脑和计算机上的客户端应用程序;以及枪式扫描仪、台式标签打印机和脉搏血氧仪等专用设备。在多学科团队的参与下,PSAMS 在镇静过程中得到了开发和完善。本研究分析了系统部署后头两年的数据:从 2020 年 1 月到 2021 年 12 月,使用 PSAMS 数据库共对 85281 名患者实施了 127,325 次镇静治疗。除了从医院信息系统(HIS)导入的基本变量外,PSAMS 数据库目前还包含 33 个附加变量,可捕捉从术前评估到术后恢复的全面信息。PSAMS 记录的数据显示,2020 年一次性镇静成功率为 97.1%(50,752/52,282),2021 年为 97.5%(73,184/75,043)。2020年观察到的不良事件发生率为3.5%(95% CI 3.4%-3.7%),2021年为2.8%(95% CI 2.7%-2.9%):PSAMS简化了整个镇静工作流程,减轻了数据收集的负担,为多个儿科医疗中心未来的合作奠定了基础。
{"title":"Pediatric Sedation Assessment and Management System (PSAMS) for Pediatric Sedation in China: Development and Implementation Report.","authors":"Ziyu Zhu, Lan Liu, Min Du, Mao Ye, Ximing Xu, Ying Xu","doi":"10.2196/53427","DOIUrl":"10.2196/53427","url":null,"abstract":"<p><strong>Background: </strong>Recently, the growing demand for pediatric sedation services outside the operating room has imposed a heavy burden on pediatric centers in China. There is an urgent need to develop a novel system for improved sedation services.</p><p><strong>Objective: </strong>This study aimed to develop and implement a computerized system, the Pediatric Sedation Assessment and Management System (PSAMS), to streamline pediatric sedation services at a major children's hospital in Southwest China.</p><p><strong>Methods: </strong>PSAMS was designed to reflect the actual workflow of pediatric sedation. It consists of 3 main components: server-hosted software; client applications on tablets and computers; and specialized devices like gun-type scanners, desktop label printers, and pulse oximeters. With the participation of a multidisciplinary team, PSAMS was developed and refined during its application in the sedation process. This study analyzed data from the first 2 years after the system's deployment.</p><p><strong>Unlabelled: </strong>From January 2020 to December 2021, a total of 127,325 sedations were performed on 85,281 patients using the PSAMS database. Besides basic variables imported from Hospital Information Systems (HIS), the PSAMS database currently contains 33 additional variables that capture comprehensive information from presedation assessment to postprocedural recovery. The recorded data from PSAMS indicates a one-time sedation success rate of 97.1% (50,752/52,282) in 2020 and 97.5% (73,184/75,043) in 2021. The observed adverse events rate was 3.5% (95% CI 3.4%-3.7%) in 2020 and 2.8% (95% CI 2.7%-2.9%) in 2021.</p><p><strong>Conclusions: </strong>PSAMS streamlined the entire sedation workflow, reduced the burden of data collection, and laid a foundation for future cooperation of multiple pediatric health care centers.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e53427"},"PeriodicalIF":3.1,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11322794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141903717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis. Claude 3 Opus 和 ChatGPT 与 GPT-4 在皮肤镜图像分析中用于黑色素瘤诊断:性能对比分析。
IF 3.1 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2024-08-06 DOI: 10.2196/59273
Xu Liu, Chaoli Duan, Min-Kyu Kim, Lu Zhang, Eunjin Jee, Beenu Maharjan, Yuwei Huang, Dan Du, Xian Jiang
<p><strong>Background: </strong>Recent advancements in artificial intelligence (AI) and large language models (LLMs) have shown potential in medical fields, including dermatology. With the introduction of image analysis capabilities in LLMs, their application in dermatological diagnostics has garnered significant interest. These capabilities are enabled by the integration of computer vision techniques into the underlying architecture of LLMs.</p><p><strong>Objective: </strong>This study aimed to compare the diagnostic performance of Claude 3 Opus and ChatGPT with GPT-4 in analyzing dermoscopic images for melanoma detection, providing insights into their strengths and limitations.</p><p><strong>Methods: </strong>We randomly selected 100 histopathology-confirmed dermoscopic images (50 malignant, 50 benign) from the International Skin Imaging Collaboration (ISIC) archive using a computer-generated randomization process. The ISIC archive was chosen due to its comprehensive and well-annotated collection of dermoscopic images, ensuring a diverse and representative sample. Images were included if they were dermoscopic images of melanocytic lesions with histopathologically confirmed diagnoses. Each model was given the same prompt, instructing it to provide the top 3 differential diagnoses for each image, ranked by likelihood. Primary diagnosis accuracy, accuracy of the top 3 differential diagnoses, and malignancy discrimination ability were assessed. The McNemar test was chosen to compare the diagnostic performance of the 2 models, as it is suitable for analyzing paired nominal data.</p><p><strong>Results: </strong>In the primary diagnosis, Claude 3 Opus achieved 54.9% sensitivity (95% CI 44.08%-65.37%), 57.14% specificity (95% CI 46.31%-67.46%), and 56% accuracy (95% CI 46.22%-65.42%), while ChatGPT demonstrated 56.86% sensitivity (95% CI 45.99%-67.21%), 38.78% specificity (95% CI 28.77%-49.59%), and 48% accuracy (95% CI 38.37%-57.75%). The McNemar test showed no significant difference between the 2 models (P=.17). For the top 3 differential diagnoses, Claude 3 Opus and ChatGPT included the correct diagnosis in 76% (95% CI 66.33%-83.77%) and 78% (95% CI 68.46%-85.45%) of cases, respectively. The McNemar test showed no significant difference (P=.56). In malignancy discrimination, Claude 3 Opus outperformed ChatGPT with 47.06% sensitivity, 81.63% specificity, and 64% accuracy, compared to 45.1%, 42.86%, and 44%, respectively. The McNemar test showed a significant difference (P<.001). Claude 3 Opus had an odds ratio of 3.951 (95% CI 1.685-9.263) in discriminating malignancy, while ChatGPT-4 had an odds ratio of 0.616 (95% CI 0.297-1.278).</p><p><strong>Conclusions: </strong>Our study highlights the potential of LLMs in assisting dermatologists but also reveals their limitations. Both models made errors in diagnosing melanoma and benign lesions. These findings underscore the need for developing robust, transparent, and clinically validated AI models through
背景:人工智能(AI)和大型语言模型(LLMs)的最新进展显示了其在包括皮肤病学在内的医学领域的潜力。随着 LLMs 图像分析功能的引入,它们在皮肤病诊断中的应用引起了人们的极大兴趣。将计算机视觉技术整合到 LLMs 的底层架构中,使 LLMs 具备了这些功能:本研究旨在比较 Claude 3 Opus 和 ChatGPT 与 GPT-4 在分析皮肤镜图像以检测黑色素瘤方面的诊断性能,从而深入了解它们的优势和局限性:我们使用计算机生成的随机程序,从国际皮肤成像协作组织(ISIC)的档案中随机抽取了 100 张经组织病理学证实的皮肤镜图像(50 张恶性,50 张良性)。之所以选择 ISIC 档案,是因为它收集的皮肤镜图像内容全面、注释详尽,可确保样本的多样性和代表性。如果图像是经组织病理学确诊的黑色素细胞病变的皮肤镜图像,则会被包括在内。每个模型都会收到相同的提示,要求它为每张图像提供按可能性排序的前 3 个鉴别诊断。对主要诊断的准确性、前 3 个鉴别诊断的准确性和恶性肿瘤鉴别能力进行了评估。由于 McNemar 检验适用于分析配对的名义数据,因此选择了该检验来比较两个模型的诊断性能:在初级诊断中,Claude 3 Opus 的灵敏度为 54.9%(95% CI 44.08%-65.37%),特异度为 57.14%(95% CI 46.31%-67.46%),准确度为 56%(95% CI 46.22%-65.42%);而 ChatGPT 的灵敏度为 56.86%(95% CI 45.99%-67.21%),特异度为 38.78%(95% CI 28.77%-49.59%),准确度为 48%(95% CI 38.37%-57.75%)。McNemar 检验显示,2 个模型之间没有显著差异(P=.17)。对于前 3 个鉴别诊断,Claude 3 Opus 和 ChatGPT 分别有 76% (95% CI 66.33%-83.77%) 和 78% (95% CI 68.46%-85.45%) 的病例包含正确诊断。McNemar 检验显示两者无显著差异(P=.56)。在恶性肿瘤鉴别方面,Claude 3 Opus 的灵敏度、特异度和准确度分别为 47.06%、81.63% 和 64%,而 ChatGPT 的灵敏度、特异度和准确度分别为 45.1%、42.86% 和 44%,Claude 3 Opus 的表现优于 ChatGPT。McNemar 检验显示两者之间存在显著差异(PConclusions:我们的研究强调了 LLM 在协助皮肤科医生方面的潜力,但也揭示了其局限性。两种模型在诊断黑色素瘤和良性病变时都出现了错误。这些发现突出表明,需要通过人工智能研究人员、皮肤科医生和其他医疗保健专业人员之间的合作,开发稳健、透明和经过临床验证的人工智能模型。虽然人工智能可以提供有价值的见解,但它还不能取代训练有素的临床医生的专业知识。
{"title":"Claude 3 Opus and ChatGPT With GPT-4 in Dermoscopic Image Analysis for Melanoma Diagnosis: Comparative Performance Analysis.","authors":"Xu Liu, Chaoli Duan, Min-Kyu Kim, Lu Zhang, Eunjin Jee, Beenu Maharjan, Yuwei Huang, Dan Du, Xian Jiang","doi":"10.2196/59273","DOIUrl":"10.2196/59273","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Recent advancements in artificial intelligence (AI) and large language models (LLMs) have shown potential in medical fields, including dermatology. With the introduction of image analysis capabilities in LLMs, their application in dermatological diagnostics has garnered significant interest. These capabilities are enabled by the integration of computer vision techniques into the underlying architecture of LLMs.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to compare the diagnostic performance of Claude 3 Opus and ChatGPT with GPT-4 in analyzing dermoscopic images for melanoma detection, providing insights into their strengths and limitations.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We randomly selected 100 histopathology-confirmed dermoscopic images (50 malignant, 50 benign) from the International Skin Imaging Collaboration (ISIC) archive using a computer-generated randomization process. The ISIC archive was chosen due to its comprehensive and well-annotated collection of dermoscopic images, ensuring a diverse and representative sample. Images were included if they were dermoscopic images of melanocytic lesions with histopathologically confirmed diagnoses. Each model was given the same prompt, instructing it to provide the top 3 differential diagnoses for each image, ranked by likelihood. Primary diagnosis accuracy, accuracy of the top 3 differential diagnoses, and malignancy discrimination ability were assessed. The McNemar test was chosen to compare the diagnostic performance of the 2 models, as it is suitable for analyzing paired nominal data.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;In the primary diagnosis, Claude 3 Opus achieved 54.9% sensitivity (95% CI 44.08%-65.37%), 57.14% specificity (95% CI 46.31%-67.46%), and 56% accuracy (95% CI 46.22%-65.42%), while ChatGPT demonstrated 56.86% sensitivity (95% CI 45.99%-67.21%), 38.78% specificity (95% CI 28.77%-49.59%), and 48% accuracy (95% CI 38.37%-57.75%). The McNemar test showed no significant difference between the 2 models (P=.17). For the top 3 differential diagnoses, Claude 3 Opus and ChatGPT included the correct diagnosis in 76% (95% CI 66.33%-83.77%) and 78% (95% CI 68.46%-85.45%) of cases, respectively. The McNemar test showed no significant difference (P=.56). In malignancy discrimination, Claude 3 Opus outperformed ChatGPT with 47.06% sensitivity, 81.63% specificity, and 64% accuracy, compared to 45.1%, 42.86%, and 44%, respectively. The McNemar test showed a significant difference (P&lt;.001). Claude 3 Opus had an odds ratio of 3.951 (95% CI 1.685-9.263) in discriminating malignancy, while ChatGPT-4 had an odds ratio of 0.616 (95% CI 0.297-1.278).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;Our study highlights the potential of LLMs in assisting dermatologists but also reveals their limitations. Both models made errors in diagnosing melanoma and benign lesions. These findings underscore the need for developing robust, transparent, and clinically validated AI models through","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e59273"},"PeriodicalIF":3.1,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11336503/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141898999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1