Andrew Wen, Liwei Wang, Huan He, Sunyang Fu, Sijia Liu, David A Hanauer, Daniel R Harris, Ramakanth Kavuluru, Rui Zhang, Karthik Natarajan, Nishanth P Pavinkurve, Janos Hajagos, Sritha Rajupet, Veena Lingam, Mary Saltz, Corey Elowsky, Richard A Moffitt, Farrukh M Koraishy, Matvey B Palchuk, Jordan Donovan, Lora Lingrey, Garo Stone-DerHagopian, Robert T Miller, Andrew E Williams, Peter J Leese, Paul I Kovach, Emily R Pfaff, Mikhail Zemmel, Robert D Pates, Nick Guthe, Melissa A Haendel, Christopher G Chute, Hongfang Liu, National COVID Cohort Collaborative, The RECOVER Initiative
Background: A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC). Objective: This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC. Methods: We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm. Results: An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites. Conclusions: The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement.
{"title":"A Case Demonstration of the Open Health Natural Language Processing Toolkit From the National COVID-19 Cohort Collaborative and the Researching COVID to Enhance Recovery Programs for a Natural Language Processing System for COVID-19 or Postacute Sequelae of SARS CoV-2 Infection: Algorithm Development and Validation","authors":"Andrew Wen, Liwei Wang, Huan He, Sunyang Fu, Sijia Liu, David A Hanauer, Daniel R Harris, Ramakanth Kavuluru, Rui Zhang, Karthik Natarajan, Nishanth P Pavinkurve, Janos Hajagos, Sritha Rajupet, Veena Lingam, Mary Saltz, Corey Elowsky, Richard A Moffitt, Farrukh M Koraishy, Matvey B Palchuk, Jordan Donovan, Lora Lingrey, Garo Stone-DerHagopian, Robert T Miller, Andrew E Williams, Peter J Leese, Paul I Kovach, Emily R Pfaff, Mikhail Zemmel, Robert D Pates, Nick Guthe, Melissa A Haendel, Christopher G Chute, Hongfang Liu, National COVID Cohort Collaborative, The RECOVER Initiative","doi":"10.2196/49997","DOIUrl":"https://doi.org/10.2196/49997","url":null,"abstract":"<strong>Background:</strong> A wealth of clinically relevant information is only obtainable within unstructured clinical narratives, leading to great interest in clinical natural language processing (NLP). While a multitude of approaches to NLP exist, current algorithm development approaches have limitations that can slow the development process. These limitations are exacerbated when the task is emergent, as is the case currently for NLP extraction of signs and symptoms of COVID-19 and postacute sequelae of SARS-CoV-2 infection (PASC). <strong>Objective:</strong> This study aims to highlight the current limitations of existing NLP algorithm development approaches that are exacerbated by NLP tasks surrounding emergent clinical concepts and to illustrate our approach to addressing these issues through the use case of developing an NLP system for the signs and symptoms of COVID-19 and PASC. <strong>Methods:</strong> We used 2 preexisting studies on PASC as a baseline to determine a set of concepts that should be extracted by NLP. This concept list was then used in conjunction with the Unified Medical Language System to autonomously generate an expanded lexicon to weakly annotate a training set, which was then reviewed by a human expert to generate a fine-tuned NLP algorithm. The annotations from a fully human-annotated test set were then compared with NLP results from the fine-tuned algorithm. The NLP algorithm was then deployed to 10 additional sites that were also running our NLP infrastructure. Of these 10 sites, 5 were used to conduct a federated evaluation of the NLP algorithm. <strong>Results:</strong> An NLP algorithm consisting of 12,234 unique normalized text strings corresponding to 2366 unique concepts was developed to extract COVID-19 or PASC signs and symptoms. An unweighted mean dictionary coverage of 77.8% was found for the 5 sites. <strong>Conclusions:</strong> The evolutionary and time-critical nature of the PASC NLP task significantly complicates existing approaches to NLP algorithm development. In this work, we present a hybrid approach using the Open Health Natural Language Processing Toolkit aimed at addressing these needs with a dictionary-based weak labeling step that minimizes the need for additional expert annotation while still preserving the fine-tuning capabilities of expert involvement.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"3 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142199768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maximilian Markus Wunderlich, Nicolas Frey, Sandro Amende-Wolf, Carl Hinrichs, Felix Balzer, Akira-Sebastian Poncette
Background: In response to the high patient admission rates during the COVID-19 pandemic, provisional intensive care units (ICUs) were set up, equipped with temporary monitoring and alarm systems. We sought to find out whether the provisional ICU setting led to a higher alarm burden and more staff with alarm fatigue. Objective: We aimed to compare alarm situations between provisional COVID-19 ICUs and non–COVID-19 ICUs during the second COVID-19 wave in Berlin, Germany. The study focused on measuring alarms per bed per day, identifying medical devices with higher alarm frequencies in COVID-19 settings, evaluating the median duration of alarms in both types of ICUs, and assessing the level of alarm fatigue experienced by health care staff. Methods: Our approach involved a comparative analysis of alarm data from 2 provisional COVID-19 ICUs and 2 standard non–COVID-19 ICUs. Through interviews with medical experts, we formulated hypotheses about potential differences in alarm load, alarm duration, alarm types, and staff alarm fatigue between the 2 ICU types. We analyzed alarm log data from the patient monitoring systems of all 4 ICUs to inferentially assess the differences. In addition, we assessed staff alarm fatigue with a questionnaire, aiming to comprehensively understand the impact of the alarm situation on health care personnel. Results: COVID-19 ICUs had significantly more alarms per bed per day than non–COVID-19 ICUs (P<.001), and the majority of the staff lacked experience with the alarm system. The overall median alarm duration was similar in both ICU types. We found no COVID-19–specific alarm patterns. The alarm fatigue questionnaire results suggest that staff in both types of ICUs experienced alarm fatigue. However, physicians and nurses who were working in COVID-19 ICUs reported a significantly higher level of alarm fatigue (P=.04). Conclusions: Staff in COVID-19 ICUs were exposed to a higher alarm load, and the majority lacked experience with alarm management and the alarm system. We recommend training and educating ICU staff in alarm management, emphasizing the importance of alarm management training as part of the preparations for future pandemics. However, the limitations of our study design and the specific pandemic conditions warrant further studies to confirm these findings and to explore effective alarm management strategies in different ICU settings.
{"title":"Alarm Management in Provisional COVID-19 Intensive Care Units: Retrospective Analysis and Recommendations for Future Pandemics","authors":"Maximilian Markus Wunderlich, Nicolas Frey, Sandro Amende-Wolf, Carl Hinrichs, Felix Balzer, Akira-Sebastian Poncette","doi":"10.2196/58347","DOIUrl":"https://doi.org/10.2196/58347","url":null,"abstract":"<strong>Background:</strong> In response to the high patient admission rates during the COVID-19 pandemic, provisional intensive care units (ICUs) were set up, equipped with temporary monitoring and alarm systems. We sought to find out whether the provisional ICU setting led to a higher alarm burden and more staff with alarm fatigue. <strong>Objective:</strong> We aimed to compare alarm situations between provisional COVID-19 ICUs and non–COVID-19 ICUs during the second COVID-19 wave in Berlin, Germany. The study focused on measuring alarms per bed per day, identifying medical devices with higher alarm frequencies in COVID-19 settings, evaluating the median duration of alarms in both types of ICUs, and assessing the level of alarm fatigue experienced by health care staff. <strong>Methods:</strong> Our approach involved a comparative analysis of alarm data from 2 provisional COVID-19 ICUs and 2 standard non–COVID-19 ICUs. Through interviews with medical experts, we formulated hypotheses about potential differences in alarm load, alarm duration, alarm types, and staff alarm fatigue between the 2 ICU types. We analyzed alarm log data from the patient monitoring systems of all 4 ICUs to inferentially assess the differences. In addition, we assessed staff alarm fatigue with a questionnaire, aiming to comprehensively understand the impact of the alarm situation on health care personnel. <strong>Results:</strong> COVID-19 ICUs had significantly more alarms per bed per day than non–COVID-19 ICUs (<i>P</i><.001), and the majority of the staff lacked experience with the alarm system. The overall median alarm duration was similar in both ICU types. We found no COVID-19–specific alarm patterns. The alarm fatigue questionnaire results suggest that staff in both types of ICUs experienced alarm fatigue. However, physicians and nurses who were working in COVID-19 ICUs reported a significantly higher level of alarm fatigue (<i>P</i>=.04). <strong>Conclusions:</strong> Staff in COVID-19 ICUs were exposed to a higher alarm load, and the majority lacked experience with alarm management and the alarm system. We recommend training and educating ICU staff in alarm management, emphasizing the importance of alarm management training as part of the preparations for future pandemics. However, the limitations of our study design and the specific pandemic conditions warrant further studies to confirm these findings and to explore effective alarm management strategies in different ICU settings.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"4 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142199765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In light of rapid technological advancements, the health care sector is undergoing significant transformation with the continuous emergence of novel digital solutions. Consequently, regulatory frameworks must continuously adapt to ensure their main goal to protect patients. In 2017, the new Medical Device Regulation (EU) 2017/745 (MDR) came into force, bringing more complex requirements for development, launch, and postmarket surveillance. However, the updated regulation considerably impacts the manufacturers, especially small- and medium-sized enterprises, and consequently, the accessibility of medical devices in the European Union market, as many manufacturers decide to either discontinue their products, postpone the launch of new innovative solutions, or leave the European Union market in favor of other regions such as the United States. This could lead to reduced health care quality and slower industry innovation efforts. Effective policy calibration and collaborative efforts are essential to mitigate these effects and promote ongoing advancements in health care technologies in the European Union market. This paper is a narrative review with the objective of exploring hindering factors to software as a medical device development, launch, and marketing brought by the new regulation. It exclusively focuses on the factors that engender obstacles. Related regulations, directives, and proposals were discussed for comparison and further analysis.
{"title":"Exploring Impediments Imposed by the Medical Device Regulation EU 2017/745 on Software as a Medical Device.","authors":"Liga Svempe","doi":"10.2196/58080","DOIUrl":"10.2196/58080","url":null,"abstract":"<p><p>In light of rapid technological advancements, the health care sector is undergoing significant transformation with the continuous emergence of novel digital solutions. Consequently, regulatory frameworks must continuously adapt to ensure their main goal to protect patients. In 2017, the new Medical Device Regulation (EU) 2017/745 (MDR) came into force, bringing more complex requirements for development, launch, and postmarket surveillance. However, the updated regulation considerably impacts the manufacturers, especially small- and medium-sized enterprises, and consequently, the accessibility of medical devices in the European Union market, as many manufacturers decide to either discontinue their products, postpone the launch of new innovative solutions, or leave the European Union market in favor of other regions such as the United States. This could lead to reduced health care quality and slower industry innovation efforts. Effective policy calibration and collaborative efforts are essential to mitigate these effects and promote ongoing advancements in health care technologies in the European Union market. This paper is a narrative review with the objective of exploring hindering factors to software as a medical device development, launch, and marketing brought by the new regulation. It exclusively focuses on the factors that engender obstacles. Related regulations, directives, and proposals were discussed for comparison and further analysis.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e58080"},"PeriodicalIF":3.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11413540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florian Reis, Christian Lenz, Manfred Gossen, Hans-Dieter Volk, Norman Michael Drzeniek
Unlabelled: With the popularization of large language models (LLMs), strategies for their effective and safe usage in health care and research have become increasingly pertinent. Despite the growing interest and eagerness among health care professionals and scientists to exploit the potential of LLMs, initial attempts may yield suboptimal results due to a lack of user experience, thus complicating the integration of artificial intelligence (AI) tools into workplace routine. Focusing on scientists and health care professionals with limited LLM experience, this viewpoint article highlights and discusses 6 easy-to-implement use cases of practical relevance. These encompass customizing translations, refining text and extracting information, generating comprehensive overviews and specialized insights, compiling ideas into cohesive narratives, crafting personalized educational materials, and facilitating intellectual sparring. Additionally, we discuss general prompting strategies and precautions for the implementation of AI tools in biomedicine. Despite various hurdles and challenges, the integration of LLMs into daily routines of physicians and researchers promises heightened workplace productivity and efficiency.
{"title":"Practical Applications of Large Language Models for Health Care Professionals and Scientists.","authors":"Florian Reis, Christian Lenz, Manfred Gossen, Hans-Dieter Volk, Norman Michael Drzeniek","doi":"10.2196/58478","DOIUrl":"10.2196/58478","url":null,"abstract":"<p><strong>Unlabelled: </strong>With the popularization of large language models (LLMs), strategies for their effective and safe usage in health care and research have become increasingly pertinent. Despite the growing interest and eagerness among health care professionals and scientists to exploit the potential of LLMs, initial attempts may yield suboptimal results due to a lack of user experience, thus complicating the integration of artificial intelligence (AI) tools into workplace routine. Focusing on scientists and health care professionals with limited LLM experience, this viewpoint article highlights and discusses 6 easy-to-implement use cases of practical relevance. These encompass customizing translations, refining text and extracting information, generating comprehensive overviews and specialized insights, compiling ideas into cohesive narratives, crafting personalized educational materials, and facilitating intellectual sparring. Additionally, we discuss general prompting strategies and precautions for the implementation of AI tools in biomedicine. Despite various hurdles and challenges, the integration of LLMs into daily routines of physicians and researchers promises heightened workplace productivity and efficiency.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e58478"},"PeriodicalIF":3.1,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11391657/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seyma Handan Akyon, Fatih Cagatay Akyon, Ahmet Sefa Camyar, Fatih Hızlı, Talha Sari, Şamil Hızlı
Background: Reading medical papers is a challenging and time-consuming task for doctors, especially when the papers are long and complex. A tool that can help doctors efficiently process and understand medical papers is needed.
Objective: This study aims to critically assess and compare the comprehension capabilities of large language models (LLMs) in accurately and efficiently understanding medical research papers using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklist, which provides a standardized framework for evaluating key elements of observational study.
Methods: The study is a methodological type of research. The study aims to evaluate the understanding capabilities of new generative artificial intelligence tools in medical papers. A novel benchmark pipeline processed 50 medical research papers from PubMed, comparing the answers of 6 LLMs (GPT-3.5-Turbo, GPT-4-0613, GPT-4-1106, PaLM 2, Claude v1, and Gemini Pro) to the benchmark established by expert medical professors. Fifteen questions, derived from the STROBE checklist, assessed LLMs' understanding of different sections of a research paper.
Results: LLMs exhibited varying performance, with GPT-3.5-Turbo achieving the highest percentage of correct answers (n=3916, 66.9%), followed by GPT-4-1106 (n=3837, 65.6%), PaLM 2 (n=3632, 62.1%), Claude v1 (n=2887, 58.3%), Gemini Pro (n=2878, 49.2%), and GPT-4-0613 (n=2580, 44.1%). Statistical analysis revealed statistically significant differences between LLMs (P<.001), with older models showing inconsistent performance compared to newer versions. LLMs showcased distinct performances for each question across different parts of a scholarly paper-with certain models like PaLM 2 and GPT-3.5 showing remarkable versatility and depth in understanding.
Conclusions: This study is the first to evaluate the performance of different LLMs in understanding medical papers using the retrieval augmented generation method. The findings highlight the potential of LLMs to enhance medical research by improving efficiency and facilitating evidence-based decision-making. Further research is needed to address limitations such as the influence of question formats, potential biases, and the rapid evolution of LLM models.
{"title":"Evaluating the Capabilities of Generative AI Tools in Understanding Medical Papers: Qualitative Study.","authors":"Seyma Handan Akyon, Fatih Cagatay Akyon, Ahmet Sefa Camyar, Fatih Hızlı, Talha Sari, Şamil Hızlı","doi":"10.2196/59258","DOIUrl":"10.2196/59258","url":null,"abstract":"<p><strong>Background: </strong>Reading medical papers is a challenging and time-consuming task for doctors, especially when the papers are long and complex. A tool that can help doctors efficiently process and understand medical papers is needed.</p><p><strong>Objective: </strong>This study aims to critically assess and compare the comprehension capabilities of large language models (LLMs) in accurately and efficiently understanding medical research papers using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) checklist, which provides a standardized framework for evaluating key elements of observational study.</p><p><strong>Methods: </strong>The study is a methodological type of research. The study aims to evaluate the understanding capabilities of new generative artificial intelligence tools in medical papers. A novel benchmark pipeline processed 50 medical research papers from PubMed, comparing the answers of 6 LLMs (GPT-3.5-Turbo, GPT-4-0613, GPT-4-1106, PaLM 2, Claude v1, and Gemini Pro) to the benchmark established by expert medical professors. Fifteen questions, derived from the STROBE checklist, assessed LLMs' understanding of different sections of a research paper.</p><p><strong>Results: </strong>LLMs exhibited varying performance, with GPT-3.5-Turbo achieving the highest percentage of correct answers (n=3916, 66.9%), followed by GPT-4-1106 (n=3837, 65.6%), PaLM 2 (n=3632, 62.1%), Claude v1 (n=2887, 58.3%), Gemini Pro (n=2878, 49.2%), and GPT-4-0613 (n=2580, 44.1%). Statistical analysis revealed statistically significant differences between LLMs (P<.001), with older models showing inconsistent performance compared to newer versions. LLMs showcased distinct performances for each question across different parts of a scholarly paper-with certain models like PaLM 2 and GPT-3.5 showing remarkable versatility and depth in understanding.</p><p><strong>Conclusions: </strong>This study is the first to evaluate the performance of different LLMs in understanding medical papers using the retrieval augmented generation method. The findings highlight the potential of LLMs to enhance medical research by improving efficiency and facilitating evidence-based decision-making. Further research is needed to address limitations such as the influence of question formats, potential biases, and the rapid evolution of LLM models.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e59258"},"PeriodicalIF":3.1,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11411230/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142127503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Hindelang, Sebastian Sitaru, Alexander Zink
<p><strong>Background: </strong>The integration of artificial intelligence and chatbot technology in health care has attracted significant attention due to its potential to improve patient care and streamline history-taking. As artificial intelligence-driven conversational agents, chatbots offer the opportunity to revolutionize history-taking, necessitating a comprehensive examination of their impact on medical practice.</p><p><strong>Objective: </strong>This systematic review aims to assess the role, effectiveness, usability, and patient acceptance of chatbots in medical history-taking. It also examines potential challenges and future opportunities for integration into clinical practice.</p><p><strong>Methods: </strong>A systematic search included PubMed, Embase, MEDLINE (via Ovid), CENTRAL, Scopus, and Open Science and covered studies through July 2024. The inclusion and exclusion criteria for the studies reviewed were based on the PICOS (participants, interventions, comparators, outcomes, and study design) framework. The population included individuals using health care chatbots for medical history-taking. Interventions focused on chatbots designed to facilitate medical history-taking. The outcomes of interest were the feasibility, acceptance, and usability of chatbot-based medical history-taking. Studies not reporting on these outcomes were excluded. All study designs except conference papers were eligible for inclusion. Only English-language studies were considered. There were no specific restrictions on study duration. Key search terms included "chatbot*," "conversational agent*," "virtual assistant," "artificial intelligence chatbot," "medical history," and "history-taking." The quality of observational studies was classified using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) criteria (eg, sample size, design, data collection, and follow-up). The RoB 2 (Risk of Bias) tool assessed areas and the levels of bias in randomized controlled trials (RCTs).</p><p><strong>Results: </strong>The review included 15 observational studies and 3 RCTs and synthesized evidence from different medical fields and populations. Chatbots systematically collect information through targeted queries and data retrieval, improving patient engagement and satisfaction. The results show that chatbots have great potential for history-taking and that the efficiency and accessibility of the health care system can be improved by 24/7 automated data collection. Bias assessments revealed that of the 15 observational studies, 5 (33%) studies were of high quality, 5 (33%) studies were of moderate quality, and 5 (33%) studies were of low quality. Of the RCTs, 2 had a low risk of bias, while 1 had a high risk.</p><p><strong>Conclusions: </strong>This systematic review provides critical insights into the potential benefits and challenges of using chatbots for medical history-taking. The included studies showed that chatbots can increase patient
{"title":"Transforming Health Care Through Chatbots for Medical History-Taking and Future Directions: Comprehensive Systematic Review.","authors":"Michael Hindelang, Sebastian Sitaru, Alexander Zink","doi":"10.2196/56628","DOIUrl":"10.2196/56628","url":null,"abstract":"<p><strong>Background: </strong>The integration of artificial intelligence and chatbot technology in health care has attracted significant attention due to its potential to improve patient care and streamline history-taking. As artificial intelligence-driven conversational agents, chatbots offer the opportunity to revolutionize history-taking, necessitating a comprehensive examination of their impact on medical practice.</p><p><strong>Objective: </strong>This systematic review aims to assess the role, effectiveness, usability, and patient acceptance of chatbots in medical history-taking. It also examines potential challenges and future opportunities for integration into clinical practice.</p><p><strong>Methods: </strong>A systematic search included PubMed, Embase, MEDLINE (via Ovid), CENTRAL, Scopus, and Open Science and covered studies through July 2024. The inclusion and exclusion criteria for the studies reviewed were based on the PICOS (participants, interventions, comparators, outcomes, and study design) framework. The population included individuals using health care chatbots for medical history-taking. Interventions focused on chatbots designed to facilitate medical history-taking. The outcomes of interest were the feasibility, acceptance, and usability of chatbot-based medical history-taking. Studies not reporting on these outcomes were excluded. All study designs except conference papers were eligible for inclusion. Only English-language studies were considered. There were no specific restrictions on study duration. Key search terms included \"chatbot*,\" \"conversational agent*,\" \"virtual assistant,\" \"artificial intelligence chatbot,\" \"medical history,\" and \"history-taking.\" The quality of observational studies was classified using the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) criteria (eg, sample size, design, data collection, and follow-up). The RoB 2 (Risk of Bias) tool assessed areas and the levels of bias in randomized controlled trials (RCTs).</p><p><strong>Results: </strong>The review included 15 observational studies and 3 RCTs and synthesized evidence from different medical fields and populations. Chatbots systematically collect information through targeted queries and data retrieval, improving patient engagement and satisfaction. The results show that chatbots have great potential for history-taking and that the efficiency and accessibility of the health care system can be improved by 24/7 automated data collection. Bias assessments revealed that of the 15 observational studies, 5 (33%) studies were of high quality, 5 (33%) studies were of moderate quality, and 5 (33%) studies were of low quality. Of the RCTs, 2 had a low risk of bias, while 1 had a high risk.</p><p><strong>Conclusions: </strong>This systematic review provides critical insights into the potential benefits and challenges of using chatbots for medical history-taking. The included studies showed that chatbots can increase patient ","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e56628"},"PeriodicalIF":3.1,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393511/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142115512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Apoorva Pradhan, Eric A Wright, Vanessa A Hayduk, Juliana Berhane, Mallory Sponenberg, Leeann Webster, Hannah Anderson, Siyeon Park, Jove Graham, Scott Friedenberg
Background: Headaches, including migraines, are one of the most common causes of disability and account for nearly 20%-30% of referrals from primary care to neurology. In primary care, electronic health record-based alerts offer a mechanism to influence health care provider behaviors, manage neurology referrals, and optimize headache care.
Objective: This project aimed to evaluate the impact of an electronic alert implemented in primary care on patients' overall headache management.
Methods: We conducted a stratified cluster-randomized study across 38 primary care clinic sites between December 2021 to December 2022 at a large integrated health care delivery system in the United States. Clinics were stratified into 6 blocks based on region and patient-to-health care provider ratios and then 1:1 randomized within each block into either the control or intervention. Health care providers practicing at intervention clinics received an interruptive alert in the electronic health record. The primary end point was a change in headache burden, measured using the Headache Impact Test 6 scale, from baseline to 6 months. Secondary outcomes included changes in headache frequency and intensity, access to care, and resource use. We analyzed the difference-in-differences between the arms at follow-up at the individual patient level.
Results: We enrolled 203 adult patients with a confirmed headache diagnosis. At baseline, the average Headache Impact Test 6 scores in each arm were not significantly different (intervention: mean 63, SD 6.9; control: mean 61.8, SD 6.6; P=.21). We observed a significant reduction in the headache burden only in the intervention arm at follow-up (3.5 points; P=.009). The reduction in the headache burden was not statistically different between groups (difference-in-differences estimate -1.89, 95% CI -5 to 1.31; P=.25). Similarly, secondary outcomes were not significantly different between groups. Only 11.32% (303/2677) of alerts were acted upon.
Conclusions: The use of an interruptive electronic alert did not significantly improve headache outcomes. Low use of alerts by health care providers prompts future alterations of the alert and exploration of alternative approaches.
{"title":"Impact of an Electronic Health Record-Based Interruptive Alert Among Patients With Headaches Seen in Primary Care: Cluster Randomized Controlled Trial.","authors":"Apoorva Pradhan, Eric A Wright, Vanessa A Hayduk, Juliana Berhane, Mallory Sponenberg, Leeann Webster, Hannah Anderson, Siyeon Park, Jove Graham, Scott Friedenberg","doi":"10.2196/58456","DOIUrl":"10.2196/58456","url":null,"abstract":"<p><strong>Background: </strong>Headaches, including migraines, are one of the most common causes of disability and account for nearly 20%-30% of referrals from primary care to neurology. In primary care, electronic health record-based alerts offer a mechanism to influence health care provider behaviors, manage neurology referrals, and optimize headache care.</p><p><strong>Objective: </strong>This project aimed to evaluate the impact of an electronic alert implemented in primary care on patients' overall headache management.</p><p><strong>Methods: </strong>We conducted a stratified cluster-randomized study across 38 primary care clinic sites between December 2021 to December 2022 at a large integrated health care delivery system in the United States. Clinics were stratified into 6 blocks based on region and patient-to-health care provider ratios and then 1:1 randomized within each block into either the control or intervention. Health care providers practicing at intervention clinics received an interruptive alert in the electronic health record. The primary end point was a change in headache burden, measured using the Headache Impact Test 6 scale, from baseline to 6 months. Secondary outcomes included changes in headache frequency and intensity, access to care, and resource use. We analyzed the difference-in-differences between the arms at follow-up at the individual patient level.</p><p><strong>Results: </strong>We enrolled 203 adult patients with a confirmed headache diagnosis. At baseline, the average Headache Impact Test 6 scores in each arm were not significantly different (intervention: mean 63, SD 6.9; control: mean 61.8, SD 6.6; P=.21). We observed a significant reduction in the headache burden only in the intervention arm at follow-up (3.5 points; P=.009). The reduction in the headache burden was not statistically different between groups (difference-in-differences estimate -1.89, 95% CI -5 to 1.31; P=.25). Similarly, secondary outcomes were not significantly different between groups. Only 11.32% (303/2677) of alerts were acted upon.</p><p><strong>Conclusions: </strong>The use of an interruptive electronic alert did not significantly improve headache outcomes. Low use of alerts by health care providers prompts future alterations of the alert and exploration of alternative approaches.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e58456"},"PeriodicalIF":3.1,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11376138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142115511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix Heilmeyer, Daniel Böhringer, Thomas Reinhard, Sebastian Arens, Lisa Lyssenko, Christian Haverkamp
Background: The use of large language models (LLMs) as writing assistance for medical professionals is a promising approach to reduce the time required for documentation, but there may be practical, ethical, and legal challenges in many jurisdictions complicating the use of the most powerful commercial LLM solutions.
Objective: In this study, we assessed the feasibility of using nonproprietary LLMs of the GPT variety as writing assistance for medical professionals in an on-premise setting with restricted compute resources, generating German medical text.
Methods: We trained four 7-billion-parameter models with 3 different architectures for our task and evaluated their performance using a powerful commercial LLM, namely Anthropic's Claude-v2, as a rater. Based on this, we selected the best-performing model and evaluated its practical usability with 2 independent human raters on real-world data.
Results: In the automated evaluation with Claude-v2, BLOOM-CLP-German, a model trained from scratch on the German text, achieved the best results. In the manual evaluation by human experts, 95 (93.1%) of the 102 reports generated by that model were evaluated as usable as is or with only minor changes by both human raters.
Conclusions: The results show that even with restricted compute resources, it is possible to generate medical texts that are suitable for documentation in routine clinical practice. However, the target language should be considered in the model selection when processing non-English text.
{"title":"Viability of Open Large Language Models for Clinical Documentation in German Health Care: Real-World Model Evaluation Study.","authors":"Felix Heilmeyer, Daniel Böhringer, Thomas Reinhard, Sebastian Arens, Lisa Lyssenko, Christian Haverkamp","doi":"10.2196/59617","DOIUrl":"10.2196/59617","url":null,"abstract":"<p><strong>Background: </strong>The use of large language models (LLMs) as writing assistance for medical professionals is a promising approach to reduce the time required for documentation, but there may be practical, ethical, and legal challenges in many jurisdictions complicating the use of the most powerful commercial LLM solutions.</p><p><strong>Objective: </strong>In this study, we assessed the feasibility of using nonproprietary LLMs of the GPT variety as writing assistance for medical professionals in an on-premise setting with restricted compute resources, generating German medical text.</p><p><strong>Methods: </strong>We trained four 7-billion-parameter models with 3 different architectures for our task and evaluated their performance using a powerful commercial LLM, namely Anthropic's Claude-v2, as a rater. Based on this, we selected the best-performing model and evaluated its practical usability with 2 independent human raters on real-world data.</p><p><strong>Results: </strong>In the automated evaluation with Claude-v2, BLOOM-CLP-German, a model trained from scratch on the German text, achieved the best results. In the manual evaluation by human experts, 95 (93.1%) of the 102 reports generated by that model were evaluated as usable as is or with only minor changes by both human raters.</p><p><strong>Conclusions: </strong>The results show that even with restricted compute resources, it is possible to generate medical texts that are suitable for documentation in routine clinical practice. However, the target language should be considered in the model selection when processing non-English text.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e59617"},"PeriodicalIF":3.1,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373371/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The National Disaster Management Agency (Badan Nasional Penanggulangan Bencana) handles disaster management in Indonesia as a health cluster by collecting, storing, and reporting information on the state of survivors and their health from various sources during disasters. Data were collected on paper and transferred to Microsoft Excel spreadsheets. These activities are challenging because there are no standards for data collection. The World Health Organization (WHO) introduced a standard for health data collection during disasters for emergency medical teams (EMTs) in the form of a minimum dataset (MDS). Meanwhile, the Ministry of Health of Indonesia launched the SATUSEHAT platform to integrate all electronic medical records in Indonesia based on Fast Healthcare Interoperability Resources (FHIR).
Objective: This study aims to implement the WHO EMT MDS to create a disaster profile for the SATUSEHAT platform using FHIR.
Methods: We extracted variables from 2 EMT MDS medical records-the WHO and Association of Southeast Asian Nations (ASEAN) versions-and the daily reporting form. We then performed a mapping process to match these variables with the FHIR resources and analyzed the gaps between the variables and base resources. Next, we conducted profiling to see if there were any changes in the selected resources and created extensions to fill the gap using the Forge application. Subsequently, the profile was implemented using an open-source FHIR server.
Results: The total numbers of variables extracted from the WHO EMT MDS, ASEAN EMT MDS, and daily reporting forms were 30, 32, and 46, with the percentage of variables matching FHIR resources being 100% (30/30), 97% (31/32), and 85% (39/46), respectively. From the 40 resources available in the FHIR ID core, we used 10, 14, and 9 for the WHO EMT MDS, ASEAN EMT MDS, and daily reporting form, respectively. Based on the gap analysis, we found 4 variables in the daily reporting form that were not covered by the resources. Thus, we created extensions to address this gap.
Conclusions: We successfully created a disaster profile that can be used as a disaster case for the SATUSEHAT platform. This profile may standardize health data collection during disasters.
背景:印度尼西亚国家灾害管理局(Badan Nasional Penanggulangan Bencana印度尼西亚国家灾害管理局(Badan Nasional Penanggulangan Bencana)通过收集、储存和报告灾害期间各种来源的幸存者状况及其健康信息,将灾害管理作为一个健康集群来处理。数据收集在纸上,然后转入 Microsoft Excel 电子表格。这些活动具有挑战性,因为没有数据收集标准。世界卫生组织(WHO)以最低数据集(MDS)的形式为紧急医疗队(EMTs)引入了灾难期间健康数据收集标准。与此同时,印度尼西亚卫生部启动了 SATUSEHAT 平台,以快速医疗互操作性资源(FHIR)为基础整合印度尼西亚的所有电子病历:本研究旨在实施世界卫生组织 EMT MDS,利用 FHIR 为 SATUSEHAT 平台创建灾难档案:我们从两个 EMT MDS 医疗记录(世卫组织和东南亚国家联盟(东盟)版本)和每日报告表中提取了变量。然后,我们进行了映射处理,将这些变量与 FHIR 资源相匹配,并分析了变量与基础资源之间的差距。接下来,我们进行了剖析,以了解所选资源是否有任何变化,并使用 Forge 应用程序创建了扩展来填补空白。随后,我们使用开源的 FHIR 服务器实施了剖析:从 WHO EMT MDS、ASEAN EMT MDS 和每日报告表中提取的变量总数分别为 30、32 和 46 个,与 FHIR 资源匹配的变量百分比分别为 100%(30/30)、97%(31/32)和 85%(39/46)。在 FHIR ID 核心的 40 个可用资源中,我们分别使用了 10、14 和 9 个资源用于 WHO EMT MDS、ASEAN EMT MDS 和每日报告表。根据差距分析,我们发现每日报告表中有 4 个变量未被资源涵盖。因此,我们创建了扩展功能来弥补这一不足:我们成功创建了一个灾难档案,可用作 SATUSEHAT 平台的灾难案例。该档案可使灾害期间的健康数据收集标准化。
{"title":"Implementation of the World Health Organization Minimum Dataset for Emergency Medical Teams to Create Disaster Profiles for the Indonesian SATUSEHAT Platform Using Fast Healthcare Interoperability Resources: Development and Validation Study.","authors":"Hiro Putra Faisal, Masaharu Nakayama","doi":"10.2196/59651","DOIUrl":"10.2196/59651","url":null,"abstract":"<p><strong>Background: </strong>The National Disaster Management Agency (Badan Nasional Penanggulangan Bencana) handles disaster management in Indonesia as a health cluster by collecting, storing, and reporting information on the state of survivors and their health from various sources during disasters. Data were collected on paper and transferred to Microsoft Excel spreadsheets. These activities are challenging because there are no standards for data collection. The World Health Organization (WHO) introduced a standard for health data collection during disasters for emergency medical teams (EMTs) in the form of a minimum dataset (MDS). Meanwhile, the Ministry of Health of Indonesia launched the SATUSEHAT platform to integrate all electronic medical records in Indonesia based on Fast Healthcare Interoperability Resources (FHIR).</p><p><strong>Objective: </strong>This study aims to implement the WHO EMT MDS to create a disaster profile for the SATUSEHAT platform using FHIR.</p><p><strong>Methods: </strong>We extracted variables from 2 EMT MDS medical records-the WHO and Association of Southeast Asian Nations (ASEAN) versions-and the daily reporting form. We then performed a mapping process to match these variables with the FHIR resources and analyzed the gaps between the variables and base resources. Next, we conducted profiling to see if there were any changes in the selected resources and created extensions to fill the gap using the Forge application. Subsequently, the profile was implemented using an open-source FHIR server.</p><p><strong>Results: </strong>The total numbers of variables extracted from the WHO EMT MDS, ASEAN EMT MDS, and daily reporting forms were 30, 32, and 46, with the percentage of variables matching FHIR resources being 100% (30/30), 97% (31/32), and 85% (39/46), respectively. From the 40 resources available in the FHIR ID core, we used 10, 14, and 9 for the WHO EMT MDS, ASEAN EMT MDS, and daily reporting form, respectively. Based on the gap analysis, we found 4 variables in the daily reporting form that were not covered by the resources. Thus, we created extensions to address this gap.</p><p><strong>Conclusions: </strong>We successfully created a disaster profile that can be used as a disaster case for the SATUSEHAT platform. This profile may standardize health data collection during disasters.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e59651"},"PeriodicalIF":3.1,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373372/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142082647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi
<p><strong>Background: </strong>Increasing and substantial reliance on electronic health records (EHRs) and data types (ie, diagnosis, medication, and laboratory data) demands assessment of their data quality as a fundamental approach, especially since there is a need to identify appropriate denominator populations with chronic conditions, such as type 2 diabetes (T2D), using commonly available computable phenotype definitions (ie, phenotypes).</p><p><strong>Objective: </strong>To bridge this gap, our study aims to assess how issues of EHR data quality and variations and robustness (or lack thereof) in phenotypes may have potential impacts in identifying denominator populations.</p><p><strong>Methods: </strong>Approximately 208,000 patients with T2D were included in our study, which used retrospective EHR data from the Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (ie, age, sex, race, and ethnicity), use of health care (inpatient and emergency room visits), and the average Charlson Comorbidity Index score of each phenotype. We then used different methods to induce or simulate data quality issues of completeness, accuracy, and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped diagnosis, medication, and laboratory codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a diagnosis or medication code with another code of the same data type and induced 2% incremental change from -100% to +10% in laboratory result values; and lastly, for timeliness, data were modeled for induced incremental shift of date records by 30 days to 365 days.</p><p><strong>Results: </strong>Less than a quarter (n=47,326, 23%) of the population overlapped across all phenotypes using EHRs. The population identified by each phenotype varied across all combinations of data types. Induced incompleteness identified fewer patients with each increment; for example, at 100% diagnostic incompleteness, the Chronic Conditions Data Warehouse phenotype identified zero patients, as its phenotypic characteristics included only diagnosis codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype, therefore resulting in fewer patients being identified with each incremental change.</p><p><strong>Conclusions: </strong>We used EHR data with diagnosis, medication, and laboratory data types from a large tertiary hospital system to understand T2D phenotypic differences and performance. We used induced data quality methods to learn how data quality issues may impact identification of the denominator populations upon which clinical (eg, clinical research and trials, population health evaluations) and financial or operational decisions are made. The novel results from our study may inform future a
{"title":"Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients With Type 2 Diabetes: Cross-Sectional Study.","authors":"Priyanka Dua Sood, Star Liu, Harold Lehmann, Hadi Kharrazi","doi":"10.2196/56734","DOIUrl":"10.2196/56734","url":null,"abstract":"<p><strong>Background: </strong>Increasing and substantial reliance on electronic health records (EHRs) and data types (ie, diagnosis, medication, and laboratory data) demands assessment of their data quality as a fundamental approach, especially since there is a need to identify appropriate denominator populations with chronic conditions, such as type 2 diabetes (T2D), using commonly available computable phenotype definitions (ie, phenotypes).</p><p><strong>Objective: </strong>To bridge this gap, our study aims to assess how issues of EHR data quality and variations and robustness (or lack thereof) in phenotypes may have potential impacts in identifying denominator populations.</p><p><strong>Methods: </strong>Approximately 208,000 patients with T2D were included in our study, which used retrospective EHR data from the Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (ie, age, sex, race, and ethnicity), use of health care (inpatient and emergency room visits), and the average Charlson Comorbidity Index score of each phenotype. We then used different methods to induce or simulate data quality issues of completeness, accuracy, and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped diagnosis, medication, and laboratory codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a diagnosis or medication code with another code of the same data type and induced 2% incremental change from -100% to +10% in laboratory result values; and lastly, for timeliness, data were modeled for induced incremental shift of date records by 30 days to 365 days.</p><p><strong>Results: </strong>Less than a quarter (n=47,326, 23%) of the population overlapped across all phenotypes using EHRs. The population identified by each phenotype varied across all combinations of data types. Induced incompleteness identified fewer patients with each increment; for example, at 100% diagnostic incompleteness, the Chronic Conditions Data Warehouse phenotype identified zero patients, as its phenotypic characteristics included only diagnosis codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype, therefore resulting in fewer patients being identified with each incremental change.</p><p><strong>Conclusions: </strong>We used EHR data with diagnosis, medication, and laboratory data types from a large tertiary hospital system to understand T2D phenotypic differences and performance. We used induced data quality methods to learn how data quality issues may impact identification of the denominator populations upon which clinical (eg, clinical research and trials, population health evaluations) and financial or operational decisions are made. The novel results from our study may inform future a","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e56734"},"PeriodicalIF":3.1,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11370182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142074615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}