Isabel Bilotta, Scott Tonidandel, Winston R Liaw, Eden King, Diana N Carvajal, Ayana Taylor, Julie Thamby, Yang Xiang, Cui Tao, Michael Hansen
Background: Individuals from minoritized racial and ethnic backgrounds experience pernicious and pervasive health disparities that have emerged, in part, from clinician bias.
Objective: We used a natural language processing approach to examine whether linguistic markers in electronic health record (EHR) notes differ based on the race and ethnicity of the patient. To validate this methodological approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias.
Methods: In this cross-sectional study, we extracted EHR notes for patients who were aged 18 years or older; had more than 5 years of diabetes diagnosis codes; and received care between 2006 and 2014 from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics. The race and ethnicity of patients were defined as White non-Hispanic, Black non-Hispanic, or Hispanic or Latino. We hypothesized that Sentiment Analysis and Social Cognition Engine (SEANCE) components (ie, negative adjectives, positive adjectives, joy words, fear and disgust words, politics words, respect words, trust verbs, and well-being words) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which they thought variation in the use of SEANCE language domains for different racial and ethnic groups was reflective of bias in EHR notes.
Results: We examined EHR notes (n=12,905) of Black non-Hispanic, White non-Hispanic, and Hispanic or Latino patients (n=1562), who were seen by 281 physicians. A total of 27 clinicians participated in the validation study. In terms of bias, participants rated negative adjectives as 8.63 (SD 2.06), fear and disgust words as 8.11 (SD 2.15), and positive adjectives as 7.93 (SD 2.46) on a scale of 1 to 10, with 10 being extremely indicative of bias. Notes for Black non-Hispanic patients contained significantly more negative adjectives (coefficient 0.07, SE 0.02) and significantly more fear and disgust words (coefficient 0.007, SE 0.002) than those for White non-Hispanic patients. The notes for Hispanic or Latino patients included significantly fewer positive adjectives (coefficient -0.02, SE 0.007), trust verbs (coefficient -0.009, SE 0.004), and joy words (coefficient -0.03, SE 0.01) than those for White non-Hispanic patients.
Conclusions: This approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias.
{"title":"Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis.","authors":"Isabel Bilotta, Scott Tonidandel, Winston R Liaw, Eden King, Diana N Carvajal, Ayana Taylor, Julie Thamby, Yang Xiang, Cui Tao, Michael Hansen","doi":"10.2196/50428","DOIUrl":"10.2196/50428","url":null,"abstract":"<p><strong>Background: </strong>Individuals from minoritized racial and ethnic backgrounds experience pernicious and pervasive health disparities that have emerged, in part, from clinician bias.</p><p><strong>Objective: </strong>We used a natural language processing approach to examine whether linguistic markers in electronic health record (EHR) notes differ based on the race and ethnicity of the patient. To validate this methodological approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias.</p><p><strong>Methods: </strong>In this cross-sectional study, we extracted EHR notes for patients who were aged 18 years or older; had more than 5 years of diabetes diagnosis codes; and received care between 2006 and 2014 from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics. The race and ethnicity of patients were defined as White non-Hispanic, Black non-Hispanic, or Hispanic or Latino. We hypothesized that Sentiment Analysis and Social Cognition Engine (SEANCE) components (ie, negative adjectives, positive adjectives, joy words, fear and disgust words, politics words, respect words, trust verbs, and well-being words) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which they thought variation in the use of SEANCE language domains for different racial and ethnic groups was reflective of bias in EHR notes.</p><p><strong>Results: </strong>We examined EHR notes (n=12,905) of Black non-Hispanic, White non-Hispanic, and Hispanic or Latino patients (n=1562), who were seen by 281 physicians. A total of 27 clinicians participated in the validation study. In terms of bias, participants rated negative adjectives as 8.63 (SD 2.06), fear and disgust words as 8.11 (SD 2.15), and positive adjectives as 7.93 (SD 2.46) on a scale of 1 to 10, with 10 being extremely indicative of bias. Notes for Black non-Hispanic patients contained significantly more negative adjectives (coefficient 0.07, SE 0.02) and significantly more fear and disgust words (coefficient 0.007, SE 0.002) than those for White non-Hispanic patients. The notes for Hispanic or Latino patients included significantly fewer positive adjectives (coefficient -0.02, SE 0.007), trust verbs (coefficient -0.009, SE 0.004), and joy words (coefficient -0.03, SE 0.01) than those for White non-Hispanic patients.</p><p><strong>Conclusions: </strong>This approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11137426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141089296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thasina Tabashum, Robert Cooper Snyder, Megan K O'Brien, Mark V Albert
Background: With the increasing availability of data, computing resources, and easier-to-use software libraries, machine learning (ML) is increasingly utilized in disease detection and prediction, including Parkinson’s disease (PD). Despite the large number of studies published every year, very few ML systems have been adopted for real-world subject use. Objective: In particular, a lack of external validity may result in poor performance of these systems in clinical practice. Additional methodological issues in ML design and reporting can also hinder clinical adoption, even for applications that would benefit from such data-driven systems. To sample the current ML practices in PD applications, we conducted a systematic review of studies in 2020 and 2021 that use ML models to diagnose PD or to track PD progression. Methods: We conducted a systematic literature review in accordance with PRISMA in PubMed between January 2020 - April 2021, using the exact string “Parkinson’s” AND (“ML” OR “prediction” OR “classification” OR “detection” or “artificial intelligence” OR “AI”), resulting in 1085 publications from the search results. After a search query and review, we found 113 publications that used ML for the classification or regression-based prediction of PD or PD-related symptoms. Results: Only 25.7% of studies used a hold-out test set to avoid potentially inflated accuracies, and approximately half of the studies without a hold-out test set did not state this as a potential concern. Surprisingly, 38.9% of studies did not report on how or if models were tuned, and an additional 27.4% used ad hoc model tuning, which is generally frowned upon in ML model optimization. Only 15% of studies performed direct comparisons of results with other models, severely limiting the interpretation of results. Conclusions: This review highlights the notable limitations of current ML systems and techniques that may contribute to a gap between reported performance in research and the real-life applicability of ML models aiming to detect and predict diseases such as PD.
背景:随着数据、计算资源和易于使用的软件库的可用性不断提高,机器学习(ML)越来越多地被用于疾病检测和预测,包括帕金森病(PD)。尽管每年都有大量研究成果发表,但很少有 ML 系统被实际应用。目标:特别是,缺乏外部有效性可能会导致这些系统在临床实践中表现不佳。此外,ML 设计和报告中的其他方法问题也会阻碍临床应用,即使是那些能从此类数据驱动型系统中获益的应用。为了对目前帕金森病应用中的 ML 实践进行抽样调查,我们对 2020 年和 2021 年使用 ML 模型诊断帕金森病或跟踪帕金森病进展的研究进行了系统回顾。方法:2020 年 1 月至 2021 年 4 月期间,我们按照 PRISMA 在 PubMed 上进行了系统性文献综述,使用精确字符串 "帕金森病 "和("ML "或 "预测 "或 "分类 "或 "检测 "或 "人工智能 "或 "AI"),从搜索结果中获得了 1085 篇出版物。经过搜索查询和审查,我们找到了 113 篇使用 ML 对帕金森病或帕金森病相关症状进行分类或回归预测的文献。结果发现只有 25.7% 的研究使用了暂缓测试集以避免可能出现的误差,而在没有使用暂缓测试集的研究中,约有一半的研究没有将此作为潜在的关注点。令人惊讶的是,有 38.9% 的研究没有报告如何或是否对模型进行了调整,另有 27.4% 的研究使用了特别的模型调整,而这在 ML 模型优化中通常是不受欢迎的。只有 15% 的研究将结果与其他模型进行了直接比较,这严重限制了对结果的解释。结论:本综述强调了当前 ML 系统和技术的显著局限性,这些局限性可能导致研究报告的性能与旨在检测和预测 PD 等疾病的 ML 模型的实际应用性之间存在差距。
{"title":"Machine Learning Models for Parkinson Disease: Systematic Review","authors":"Thasina Tabashum, Robert Cooper Snyder, Megan K O'Brien, Mark V Albert","doi":"10.2196/50117","DOIUrl":"https://doi.org/10.2196/50117","url":null,"abstract":"Background: With the increasing availability of data, computing resources, and easier-to-use software libraries, machine learning (ML) is increasingly utilized in disease detection and prediction, including Parkinson’s disease (PD). Despite the large number of studies published every year, very few ML systems have been adopted for real-world subject use. Objective: In particular, a lack of external validity may result in poor performance of these systems in clinical practice. Additional methodological issues in ML design and reporting can also hinder clinical adoption, even for applications that would benefit from such data-driven systems. To sample the current ML practices in PD applications, we conducted a systematic review of studies in 2020 and 2021 that use ML models to diagnose PD or to track PD progression. Methods: We conducted a systematic literature review in accordance with PRISMA in PubMed between January 2020 - April 2021, using the exact string “Parkinson’s” AND (“ML” OR “prediction” OR “classification” OR “detection” or “artificial intelligence” OR “AI”), resulting in 1085 publications from the search results. After a search query and review, we found 113 publications that used ML for the classification or regression-based prediction of PD or PD-related symptoms. Results: Only 25.7% of studies used a hold-out test set to avoid potentially inflated accuracies, and approximately half of the studies without a hold-out test set did not state this as a potential concern. Surprisingly, 38.9% of studies did not report on how or if models were tuned, and an additional 27.4% used ad hoc model tuning, which is generally frowned upon in ML model optimization. Only 15% of studies performed direct comparisons of results with other models, severely limiting the interpretation of results. Conclusions: This review highlights the notable limitations of current ML systems and techniques that may contribute to a gap between reported performance in research and the real-life applicability of ML models aiming to detect and predict diseases such as PD.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141064119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yong Nam Gwon, Jae Heon Kim, Hyun Soo Chung, Eun Jee Jung, Joey Chun, Serin Lee, Sung Ryul Shim
Background: A large language model (LLM) is a type of artificial intelligence (AI) model that opens up great possibilities for healthcare practice, research, and education, although scholars have highlighted that there is a need to proactively address current issues regarding its use. One of the best-known LLMs is ChatGPT. Objective: This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support system (CDSS). Methods: The search results of a systematic review study on the treatment of Peyronie's Disease published by human experts were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing to compare with human researchers. To determine the accuracy of the retrieved literature, we graded it as A, B, C, and F for only those cases where actual literature exists. Results: The benchmark human researcher's randomized controlled trial search results were 24. ChatGPT collected 1287 literature search results through 639 questions, and 7 of them were exactly matched, and Microsoft Bing collected 48 literature search results through 223 questions, and 19 of them were exactly matched with human search results. Conclusions: This is the first study to compare artificial intelligence (AI) and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI.
{"title":"The Use of Generative AI for Scientific Literature Searches for Systematic Reviews: ChatGPT and Microsoft Bing AI Performance Evaluation","authors":"Yong Nam Gwon, Jae Heon Kim, Hyun Soo Chung, Eun Jee Jung, Joey Chun, Serin Lee, Sung Ryul Shim","doi":"10.2196/51187","DOIUrl":"https://doi.org/10.2196/51187","url":null,"abstract":"Background: A large language model (LLM) is a type of artificial intelligence (AI) model that opens up great possibilities for healthcare practice, research, and education, although scholars have highlighted that there is a need to proactively address current issues regarding its use. One of the best-known LLMs is ChatGPT. Objective: This study aims to explore the potential of ChatGPT as a real-time literature search tool for systematic reviews and clinical decision support system (CDSS). Methods: The search results of a systematic review study on the treatment of Peyronie's Disease published by human experts were selected as a benchmark, and the literature search formula of the study was applied to ChatGPT and Microsoft Bing to compare with human researchers. To determine the accuracy of the retrieved literature, we graded it as A, B, C, and F for only those cases where actual literature exists. Results: The benchmark human researcher's randomized controlled trial search results were 24. ChatGPT collected 1287 literature search results through 639 questions, and 7 of them were exactly matched, and Microsoft Bing collected 48 literature search results through 223 questions, and 19 of them were exactly matched with human search results. Conclusions: This is the first study to compare artificial intelligence (AI) and conventional human systematic review methods as a real-time literature collection tool for evidence-based medicine. The results suggest that the use of ChatGPT as a tool for real-time evidence generation is not yet accurate and feasible. Therefore, researchers should be cautious about using such AI.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinbo Zhang, Pingping Yang, Lu Zeng, Shan Li, Jiamei Zhou
Background: Ventilator-associated pneumonia (VAP) is a serious complication of mechanical ventilation therapy that affects patient treatment and prognosis. Owing to its excellent data mining capabilities, artificial intelligence (AI) has been increasingly used to predict VAP. Objective: This article reviews the prediction models for VAP based on AI, providing a reference for the early identification of high-risk groups in future clinical practice. Methods: A scoping review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) extension guidelines. The Wanfang, Chinese BioMedical Literature Database, Cochrane Library, Web of Science, PubMed, MEDLINE, and Embase databases were searched to identify relevant articles. Study selection and data extraction were independently conducted by two reviewers. The data extracted from the included studies were synthesized narratively. Results: From 137 publications, 11 were included in the scoping review. The included studies reported the use of AI for predicting VAP. All of the 11 studies predicted VAP occurrence, and studies on VAP prognosis were excluded. Further, these studies used text data, and none of them involved imaging data. Public databases were used as the primary data choice for model building (6/11, 55 %), whereas the remaining studies had sample sizes smaller than 1000. Machine learning is the primary algorithm for studying the VAP prediction models. However, deep learning and large language models are not used to construct VAP prediction models. Random forest is the most commonly used algorithm (5/11, 45 %). All studies are internal validations, and none of them address how the model is used. Conclusions: This review presents an overview of studies based on AI used to predict and diagnose VAP. AI models have better predictive performance than traditional methods and are expected to provide an indispensable tool for the risk prediction of VAP in the future. However, the current research is in the stage of model construction and validation, and the implementation and guidance for the clinical prediction of VAP require further research.
{"title":"Ventilator-Associated Pneumonia Prediction Models Based on AI: Scoping Review","authors":"Jinbo Zhang, Pingping Yang, Lu Zeng, Shan Li, Jiamei Zhou","doi":"10.2196/57026","DOIUrl":"https://doi.org/10.2196/57026","url":null,"abstract":"Background: Ventilator-associated pneumonia (VAP) is a serious complication of mechanical ventilation therapy that affects patient treatment and prognosis. Owing to its excellent data mining capabilities, artificial intelligence (AI) has been increasingly used to predict VAP. Objective: This article reviews the prediction models for VAP based on AI, providing a reference for the early identification of high-risk groups in future clinical practice. Methods: A scoping review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) extension guidelines. The Wanfang, Chinese BioMedical Literature Database, Cochrane Library, Web of Science, PubMed, MEDLINE, and Embase databases were searched to identify relevant articles. Study selection and data extraction were independently conducted by two reviewers. The data extracted from the included studies were synthesized narratively. Results: From 137 publications, 11 were included in the scoping review. The included studies reported the use of AI for predicting VAP. All of the 11 studies predicted VAP occurrence, and studies on VAP prognosis were excluded. Further, these studies used text data, and none of them involved imaging data. Public databases were used as the primary data choice for model building (6/11, 55 %), whereas the remaining studies had sample sizes smaller than 1000. Machine learning is the primary algorithm for studying the VAP prediction models. However, deep learning and large language models are not used to construct VAP prediction models. Random forest is the most commonly used algorithm (5/11, 45 %). All studies are internal validations, and none of them address how the model is used. Conclusions: This review presents an overview of studies based on AI used to predict and diagnose VAP. AI models have better predictive performance than traditional methods and are expected to provide an indispensable tool for the risk prediction of VAP in the future. However, the current research is in the stage of model construction and validation, and the implementation and guidance for the clinical prediction of VAP require further research.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carl Preiksaitis, Nicholas Ashenburg, Gabrielle Bunney, Andrew Chu, Rana Kabeer, Fran Riley, Ryan Ribeira, Christian Rose
Background: Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM.
Objective: Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs' potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field.
Methods: Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs' use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data.
Results: A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs' outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs' capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills.
Conclusions: LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outc
{"title":"The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review.","authors":"Carl Preiksaitis, Nicholas Ashenburg, Gabrielle Bunney, Andrew Chu, Rana Kabeer, Fran Riley, Ryan Ribeira, Christian Rose","doi":"10.2196/53787","DOIUrl":"10.2196/53787","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI), more specifically large language models (LLMs), holds significant potential in revolutionizing emergency care delivery by optimizing clinical workflows and enhancing the quality of decision-making. Although enthusiasm for integrating LLMs into emergency medicine (EM) is growing, the existing literature is characterized by a disparate collection of individual studies, conceptual analyses, and preliminary implementations. Given these complexities and gaps in understanding, a cohesive framework is needed to comprehend the existing body of knowledge on the application of LLMs in EM.</p><p><strong>Objective: </strong>Given the absence of a comprehensive framework for exploring the roles of LLMs in EM, this scoping review aims to systematically map the existing literature on LLMs' potential applications within EM and identify directions for future research. Addressing this gap will allow for informed advancements in the field.</p><p><strong>Methods: </strong>Using PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria, we searched Ovid MEDLINE, Embase, Web of Science, and Google Scholar for papers published between January 2018 and August 2023 that discussed LLMs' use in EM. We excluded other forms of AI. A total of 1994 unique titles and abstracts were screened, and each full-text paper was independently reviewed by 2 authors. Data were abstracted independently, and 5 authors performed a collaborative quantitative and qualitative synthesis of the data.</p><p><strong>Results: </strong>A total of 43 papers were included. Studies were predominantly from 2022 to 2023 and conducted in the United States and China. We uncovered four major themes: (1) clinical decision-making and support was highlighted as a pivotal area, with LLMs playing a substantial role in enhancing patient care, notably through their application in real-time triage, allowing early recognition of patient urgency; (2) efficiency, workflow, and information management demonstrated the capacity of LLMs to significantly boost operational efficiency, particularly through the automation of patient record synthesis, which could reduce administrative burden and enhance patient-centric care; (3) risks, ethics, and transparency were identified as areas of concern, especially regarding the reliability of LLMs' outputs, and specific studies highlighted the challenges of ensuring unbiased decision-making amidst potentially flawed training data sets, stressing the importance of thorough validation and ethical oversight; and (4) education and communication possibilities included LLMs' capacity to enrich medical training, such as through using simulated patient interactions that enhance communication skills.</p><p><strong>Conclusions: </strong>LLMs have the potential to fundamentally transform EM, enhancing clinical decision-making, optimizing workflows, and improving patient outc","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11127144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140905207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rashaud Senior, Timothy Tsai, William Ratliff, Lisa Nadler, Suresh Balu, Elizabeth Malcolm, Eugenia McPeek Hinz
Background: The Problem List (PL) is often poorly organized which makes the use for clinical care more challenging over time. Objective: To measure the accuracy of diagnoses sorting for PL system/conditions groupers based on SNOMED-CT concepts mapped to ICD-10 codes. Methods: We developed 21 system/condition-based groupers using SNOMED-CT hierarchal concepts refined with Boolean logic to re-organize the ICD-10-based PL in our electronic health record (EHR). We extracted the PL from a convenience sample of 50 patients divided across age and sex in a deidentified format for evaluation. Two clinicians independently determined whether a PL diagnosis was correctly attributed to a system/condition grouper. Discrepancies were discussed and, if no consensus was reached, were adjudicated by a third clinician. Descriptive statistics and Cohen’s kappa statistic for interrater reliability were calculated. Results: Our 50-patient sample had a total of 869 diagnoses (range 4–59; median 12, IQR 9-23.75). The reviewers initially agreed on 821 placements. Of the remaining 48 items, 16 required adjudication, leading to a final count of 787 True Positives and 37 True Negatives. We determined PL diagnoses were grouped with Sensitivity 97.6%, Specificity 58.7%, Positive Predictive Value 96.8%, and F1 Score 0.972. After discussion, the calculated kappa statistic was 0.9, confirming “near perfect” agreement. Conclusions: We successfully developed a structured methodology to organize diagnoses on the problem list that supports clinical review.
{"title":"Evaluation of SNOMED CT Grouper Accuracy and Coverage in Organizing the Electronic Health Record Problem List by Clinical System: Observational Study","authors":"Rashaud Senior, Timothy Tsai, William Ratliff, Lisa Nadler, Suresh Balu, Elizabeth Malcolm, Eugenia McPeek Hinz","doi":"10.2196/51274","DOIUrl":"https://doi.org/10.2196/51274","url":null,"abstract":"Background: The Problem List (PL) is often poorly organized which makes the use for clinical care more challenging over time. Objective: To measure the accuracy of diagnoses sorting for PL system/conditions groupers based on SNOMED-CT concepts mapped to ICD-10 codes. Methods: We developed 21 system/condition-based groupers using SNOMED-CT hierarchal concepts refined with Boolean logic to re-organize the ICD-10-based PL in our electronic health record (EHR). We extracted the PL from a convenience sample of 50 patients divided across age and sex in a deidentified format for evaluation. Two clinicians independently determined whether a PL diagnosis was correctly attributed to a system/condition grouper. Discrepancies were discussed and, if no consensus was reached, were adjudicated by a third clinician. Descriptive statistics and Cohen’s kappa statistic for interrater reliability were calculated. Results: Our 50-patient sample had a total of 869 diagnoses (range 4–59; median 12, IQR 9-23.75). The reviewers initially agreed on 821 placements. Of the remaining 48 items, 16 required adjudication, leading to a final count of 787 True Positives and 37 True Negatives. We determined PL diagnoses were grouped with Sensitivity 97.6%, Specificity 58.7%, Positive Predictive Value 96.8%, and F1 Score 0.972. After discussion, the calculated kappa statistic was 0.9, confirming “near perfect” agreement. Conclusions: We successfully developed a structured methodology to organize diagnoses on the problem list that supports clinical review.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140927505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas J Reese, Henry J Domenico, Antonio Hernandez, Daniel W Byrne, Ryan P Moore, Jessica B Williams, Brian J Douthit, Elise Russo, Allison B McCoy, Catherine H Ivory, Bryan D Steitz, Adam Wright
Background: Numerous pressure injury prediction models have been developed using electronic health record data, yet hospital-acquired pressure injuries (HAPIs) are increasing, which demonstrates the critical challenge of implementing these models in routine care.
Objective: To help bridge the gap between development and implementation, we sought to create a model that was feasible, broadly applicable, dynamic, actionable, and rigorously validated and then compare its performance to usual care (ie, the Braden scale).
Methods: We extracted electronic health record data from 197,991 adult hospital admissions with 51 candidate features. For risk prediction and feature selection, we used logistic regression with a least absolute shrinkage and selection operator (LASSO) approach. To compare the model with usual care, we used the area under the receiver operating curve (AUC), Brier score, slope, intercept, and integrated calibration index. The model was validated using a temporally staggered cohort.
Results: A total of 5458 HAPIs were identified between January 2018 and July 2022. We determined 22 features were necessary to achieve a parsimonious and highly accurate model. The top 5 features included tracheostomy, edema, central line, first albumin measure, and age. Our model achieved higher discrimination than the Braden scale (AUC 0.897, 95% CI 0.893-0.901 vs AUC 0.798, 95% CI 0.791-0.803).
Conclusions: We developed and validated an accurate prediction model for HAPIs that surpassed the standard-of-care risk assessment and fulfilled necessary elements for implementation. Future work includes a pragmatic randomized trial to assess whether our model improves patient outcomes.
背景:利用电子健康记录数据开发了许多压力损伤预测模型,但医院获得性压力损伤(HAPIs)却在不断增加,这表明在常规护理中实施这些模型是一项严峻的挑战:为了帮助弥合开发与实施之间的差距,我们试图创建一个可行、广泛适用、动态、可操作并经过严格验证的模型,然后将其性能与常规护理(即布莱登量表)进行比较:我们提取了 197,991 例成人住院患者的电子健康记录数据,其中包含 51 个候选特征。在进行风险预测和特征选择时,我们使用了逻辑回归和最小绝对收缩与选择算子(LASSO)方法。为了将该模型与常规护理进行比较,我们使用了接收者操作曲线下面积(AUC)、布赖尔评分、斜率、截距和综合校准指数。该模型使用时间错开的队列进行了验证:2018 年 1 月至 2022 年 7 月期间,共识别出 5458 例 HAPI。我们确定有 22 个特征是建立一个简约且高度准确的模型所必需的。前 5 个特征包括气管切开、水肿、中心管、首次白蛋白测量和年龄。我们的模型比布莱登量表具有更高的区分度(AUC 0.897,95% CI 0.893-0.901 vs AUC 0.798,95% CI 0.791-0.803):我们开发并验证了一个准确的 HAPIs 预测模型,该模型超越了标准护理风险评估,满足了实施的必要条件。未来的工作包括开展一项实用随机试验,以评估我们的模型是否能改善患者的预后。
{"title":"Implementable Prediction of Pressure Injuries in Hospitalized Adults: Model Development and Validation.","authors":"Thomas J Reese, Henry J Domenico, Antonio Hernandez, Daniel W Byrne, Ryan P Moore, Jessica B Williams, Brian J Douthit, Elise Russo, Allison B McCoy, Catherine H Ivory, Bryan D Steitz, Adam Wright","doi":"10.2196/51842","DOIUrl":"10.2196/51842","url":null,"abstract":"<p><strong>Background: </strong>Numerous pressure injury prediction models have been developed using electronic health record data, yet hospital-acquired pressure injuries (HAPIs) are increasing, which demonstrates the critical challenge of implementing these models in routine care.</p><p><strong>Objective: </strong>To help bridge the gap between development and implementation, we sought to create a model that was feasible, broadly applicable, dynamic, actionable, and rigorously validated and then compare its performance to usual care (ie, the Braden scale).</p><p><strong>Methods: </strong>We extracted electronic health record data from 197,991 adult hospital admissions with 51 candidate features. For risk prediction and feature selection, we used logistic regression with a least absolute shrinkage and selection operator (LASSO) approach. To compare the model with usual care, we used the area under the receiver operating curve (AUC), Brier score, slope, intercept, and integrated calibration index. The model was validated using a temporally staggered cohort.</p><p><strong>Results: </strong>A total of 5458 HAPIs were identified between January 2018 and July 2022. We determined 22 features were necessary to achieve a parsimonious and highly accurate model. The top 5 features included tracheostomy, edema, central line, first albumin measure, and age. Our model achieved higher discrimination than the Braden scale (AUC 0.897, 95% CI 0.893-0.901 vs AUC 0.798, 95% CI 0.791-0.803).</p><p><strong>Conclusions: </strong>We developed and validated an accurate prediction model for HAPIs that surpassed the standard-of-care risk assessment and fulfilled necessary elements for implementation. Future work includes a pragmatic randomized trial to assess whether our model improves patient outcomes.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11094428/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140900549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Synthetic patient data (SPD) generation for survival analysis in oncology trials holds significant potential for accelerating clinical development. Various machine learning methods, including classification and regression trees (CART), random forest (RF), Bayesian network (BN), and CTGAN, have been employed for this purpose, but their performance in reflecting actual patient survival data remains under investigation.
Objective: The aim of this study was to determine the most suitable SPD generation method for oncology trials, specifically focusing on both progression free survival (PFS) and overall survival (OS), which are the primary evaluation endpoints in oncology trials. To achieve this goal, we conducted a comparative simulation of 4 generation methods: CART, RF, BN, and the CTGAN, and the performance of each method was evaluated.
Methods: Using multiple clinical trial datasets, 1000 datasets were generated by using each method for each clinical trial dataset and evaluated as follows: 1) median survival time (MST) of PFS and OS, 2) hazard ratio distance (HRD), which indicates the similarity between the actual survival function and a synthetic survival function, and 3) visual analysis of Kaplan‒Meier (KM) plots. Each method's ability to mimic the statistical properties of real patient data was evaluated from these multiple angles.
Results: In most simulation cases, CART demonstrated the high percentages of MSTs of synthetic data falling within the range of 95% confidence interval (CI) of the MST of actual data. These percentages ranged from 88.8% to 98.0% for PFS and from 60.8% to 96.1% for OS. In the evaluation of HRD, CART demonstrated that HRD values were concentrated at approximately 0.9. Conversely, for the other methods, no consistent trend was observed for either PFS or OS. The reason why CART demonstrated better similarity than RF was that CART caused overfitting and RF, which is a kind of ensemble learning, prevented it. In SPD generation, the statistical properties close to the actual data should be the focus, not a well-generalized prediction model. Both the BN and CTGAN methods cannot accurately reflect the statistical properties of the actual data because small datasets are not suitable.
Conclusions: As a method for generating SPD for survival data from small datasets, such as clinical trial data, CART demonstrated to be the most effective method compared to RF, BN, and CTGAN. Additionally, it is possible to improve CART-based generation methods by incorporating feature engineering and other methods in future work.
背景:在肿瘤试验中生成用于生存分析的合成患者数据(SPD)对于加速临床开发具有巨大潜力。为此,人们采用了多种机器学习方法,包括分类和回归树(CART)、随机森林(RF)、贝叶斯网络(BN)和 CTGAN,但这些方法在反映实际患者生存数据方面的性能仍有待研究:本研究的目的是确定最适合肿瘤试验的 SPD 生成方法,尤其关注肿瘤试验的主要评价终点--无进展生存期(PFS)和总生存期(OS)。为了实现这一目标,我们对 4 种生成方法进行了比较模拟:方法:使用多个临床试验数据集,针对每个临床试验数据集使用每种方法生成 1000 个数据集,并进行如下评估:1)PFS 和 OS 的平均生存时间(MST);2)危险比距离(HRD),表示实际生存函数与合成生存函数之间的相似性;3)Kaplan-Meier(KM)图的视觉分析。从这些角度对每种方法模拟真实患者数据统计特性的能力进行了评估:在大多数模拟病例中,CART 显示的 MSTS 在 MSTA 95% 置信区间 (CI) 范围内的百分比很高。PFS和OS的百分比分别为88.8%和98.0%,OS的百分比分别为60.8%和96.1%。在 HRD 评估中,CART 显示 HRD 值集中在 0.9 左右。相反,对于其他方法,PFS 和 OS 均未观察到一致的趋势。CART 比 RF 显示出更好的相似性,原因在于 CART 会导致过度拟合,而 RF 作为一种集合学习,可以防止过度拟合。在生成 SPD 时,重点应放在与实际数据相近的统计特性上,而不是一个通用的预测模型。BN和CTGAN方法都无法准确反映实际数据的统计特性,因为小数据集并不适合:作为从临床试验数据等小数据集生成生存数据 SPD 的方法,与 RF、BN 和 CTGAN 相比,CART 被证明是最有效的方法。此外,在未来的工作中,还可以通过结合特征工程和其他方法来改进基于 CART 的生成方法:
{"title":"A Comparison of Synthetic Data Generation Techniques for Control Group Survival Data in Oncology Clinical Trials: Simulation Study.","authors":"Ippei Akiya, Takuma Ishihara, Keiichi Yamamoto","doi":"10.2196/55118","DOIUrl":"10.2196/55118","url":null,"abstract":"<p><strong>Background: </strong>Synthetic patient data (SPD) generation for survival analysis in oncology trials holds significant potential for accelerating clinical development. Various machine learning methods, including classification and regression trees (CART), random forest (RF), Bayesian network (BN), and CTGAN, have been employed for this purpose, but their performance in reflecting actual patient survival data remains under investigation.</p><p><strong>Objective: </strong>The aim of this study was to determine the most suitable SPD generation method for oncology trials, specifically focusing on both progression free survival (PFS) and overall survival (OS), which are the primary evaluation endpoints in oncology trials. To achieve this goal, we conducted a comparative simulation of 4 generation methods: CART, RF, BN, and the CTGAN, and the performance of each method was evaluated.</p><p><strong>Methods: </strong>Using multiple clinical trial datasets, 1000 datasets were generated by using each method for each clinical trial dataset and evaluated as follows: 1) median survival time (MST) of PFS and OS, 2) hazard ratio distance (HRD), which indicates the similarity between the actual survival function and a synthetic survival function, and 3) visual analysis of Kaplan‒Meier (KM) plots. Each method's ability to mimic the statistical properties of real patient data was evaluated from these multiple angles.</p><p><strong>Results: </strong>In most simulation cases, CART demonstrated the high percentages of MSTs of synthetic data falling within the range of 95% confidence interval (CI) of the MST of actual data. These percentages ranged from 88.8% to 98.0% for PFS and from 60.8% to 96.1% for OS. In the evaluation of HRD, CART demonstrated that HRD values were concentrated at approximately 0.9. Conversely, for the other methods, no consistent trend was observed for either PFS or OS. The reason why CART demonstrated better similarity than RF was that CART caused overfitting and RF, which is a kind of ensemble learning, prevented it. In SPD generation, the statistical properties close to the actual data should be the focus, not a well-generalized prediction model. Both the BN and CTGAN methods cannot accurately reflect the statistical properties of the actual data because small datasets are not suitable.</p><p><strong>Conclusions: </strong>As a method for generating SPD for survival data from small datasets, such as clinical trial data, CART demonstrated to be the most effective method compared to RF, BN, and CTGAN. Additionally, it is possible to improve CART-based generation methods by incorporating feature engineering and other methods in future work.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140900554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maroun Chedid, Fouad T Chebib, Erin Dahlen, Theodore Mueller, Theresa Schnell, Melissa Gay, Musab Hommos, Sundararaman Swaminathan, Arvind Garg, Michael Mao, Brigid Amberg, Kirk Balderes, Karen F Johnson, Alyssa Bishop, Jackqueline Kay Vaughn, Marie Hogan, Vicente Torres, Rajeev Chaudhry, Ziad Zoghby
Background: Tolvaptan is the only FDA-approved drug to slow the progression of autosomal dominant polycystic kidney disease (ADPKD) but requires strict clinical monitoring due to potential serious adverse events. Objective: We share our experience in developing and implementing an electronic health record (EHR) based application to monitor patients with ADPKD initiated on Tolvaptan. Methods: The application was developed in collaboration with clinical informatics based on our clinical protocol with frequent laboratory test monitoring to detect early drug-related toxicity. The application streamlines clinical workflow and enables our nursing team to take appropriate actions in real-time to prevent drug-related serious adverse events. We retrospectively analyzed the characteristics of enrolled patients. Results: As of September 2022, 214 patients were enrolled in the Tolvaptan program across all Mayo Clinic sites. Of these, 126 were enrolled in the “Tolvaptan monitoring application” and 88 in the “Past Tolvaptan patients’ application”. The mean age at enrollment was 43.1±9.9 years. A total of 20 (9.3%) developed liver toxicity but only 5 (2.3%) had to discontinue the drug. The two EHR-based applications allow consolidation of all necessary patient information and real-time data management at the individual or population level. This approach facilitates efficient staff workflow, monitoring of drug-related adverse events, and timely prescription renewal. Conclusions: Our study highlights the feasibility of integrating digital applications into the EHR workflow, to facilitate efficient and safe care delivery for patients enrolled in a Tolvaptan program. This workflow needs further validation but could be extended to other healthcare systems managing chronic disease requiring drug monitoring.
{"title":"An Electronic Health Record–Integrated Application for Standardizing Care and Monitoring Patients With Autosomal Dominant Polycystic Kidney Disease Enrolled in a Tolvaptan Clinic: Design and Implementation Study","authors":"Maroun Chedid, Fouad T Chebib, Erin Dahlen, Theodore Mueller, Theresa Schnell, Melissa Gay, Musab Hommos, Sundararaman Swaminathan, Arvind Garg, Michael Mao, Brigid Amberg, Kirk Balderes, Karen F Johnson, Alyssa Bishop, Jackqueline Kay Vaughn, Marie Hogan, Vicente Torres, Rajeev Chaudhry, Ziad Zoghby","doi":"10.2196/50164","DOIUrl":"https://doi.org/10.2196/50164","url":null,"abstract":"Background: Tolvaptan is the only FDA-approved drug to slow the progression of autosomal dominant polycystic kidney disease (ADPKD) but requires strict clinical monitoring due to potential serious adverse events. Objective: We share our experience in developing and implementing an electronic health record (EHR) based application to monitor patients with ADPKD initiated on Tolvaptan. Methods: The application was developed in collaboration with clinical informatics based on our clinical protocol with frequent laboratory test monitoring to detect early drug-related toxicity. The application streamlines clinical workflow and enables our nursing team to take appropriate actions in real-time to prevent drug-related serious adverse events. We retrospectively analyzed the characteristics of enrolled patients. Results: As of September 2022, 214 patients were enrolled in the Tolvaptan program across all Mayo Clinic sites. Of these, 126 were enrolled in the “Tolvaptan monitoring application” and 88 in the “Past Tolvaptan patients’ application”. The mean age at enrollment was 43.1±9.9 years. A total of 20 (9.3%) developed liver toxicity but only 5 (2.3%) had to discontinue the drug. The two EHR-based applications allow consolidation of all necessary patient information and real-time data management at the individual or population level. This approach facilitates efficient staff workflow, monitoring of drug-related adverse events, and timely prescription renewal. Conclusions: Our study highlights the feasibility of integrating digital applications into the EHR workflow, to facilitate efficient and safe care delivery for patients enrolled in a Tolvaptan program. This workflow needs further validation but could be extended to other healthcare systems managing chronic disease requiring drug monitoring.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140833238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Semantic interoperability facilitates the exchange of and access to health data that are being documented in Electronic Health Records (EHRs) with various semantic features. The main goals of semantic interoperability development entails patient data availability and use in diverse EHRs without loss of meaning. Internationally, there are current initiatives that aim to enhance semantic development of EHR data, and consequently, availability of patient data. Interoperability between health information systems is among the core goals of proposal for a regulation on the European Health Data Space and the WHO Global strategy on digital health. Objective: To achieve integrated health data ecosystems, stakeholders need to overcome challenges of implementing semantic interoperability elements. To research the available scientific evidence on the development of semantic interoperability, we defined the following research questions: What are the key elements of and approaches for building semantic interoperability integrated in EHRs? What kinds of goals are driving the development? What kinds of clinical benefits are perceived following this development? Methods: Our research questions focused on key aspects and approaches for semantic interoperability and on possible clinical and semantic benefits in EHR context of these choices. For that purpose, we performed a systematic literature review in PubMed by defining our study framework based on previous research. Results: Our analysis consisted of 14 studies where data models, ontologies, terminologies, classifications, and standards were applied for building interoperability. All articles reported clinical benefits of the selected approach to enhancing semantic interoperability. We identified three main categories for this purpose: increasing availability of data for clinicians (n = 6), increasing quality of care (n = 4) and enhancing clinical data use and re-use for varied purposes (n = 4). Regarding semantic development goals, data harmonization and developing semantic interoperability between different EHRs was the largest category (n = 8). Enhancing health data quality through standardization (n = 5) and developing EHR integrated tools based on interoperable data (n = 1) were the other identified categories. The results were closely coupled with the need to build usable and computable data out of heterogeneous medical information that is accessible through various EHRs and databases, e.g., registers. Conclusions: When heading towards semantic harmonization of clinical data, more experiences and analyses are needed to assess how applicable the chosen solutions are for semantic interoperability of health care data. Instead of promoting a single approach, semantic interoperability should be assessed through several levels of semantic requirements A dual- or multi-model approach is possibly usable to address different semantic interoperability issues during development. The objectives of semanti
{"title":"Semantic Interoperability of Electronic Health Records: Systematic Review of Alternative Approaches for Enhancing Patient Information Availability","authors":"Sari Palojoki, Lasse Lehtonen, Riikka Vuokko","doi":"10.2196/53535","DOIUrl":"https://doi.org/10.2196/53535","url":null,"abstract":"Background: Semantic interoperability facilitates the exchange of and access to health data that are being documented in Electronic Health Records (EHRs) with various semantic features. The main goals of semantic interoperability development entails patient data availability and use in diverse EHRs without loss of meaning. Internationally, there are current initiatives that aim to enhance semantic development of EHR data, and consequently, availability of patient data. Interoperability between health information systems is among the core goals of proposal for a regulation on the European Health Data Space and the WHO Global strategy on digital health. Objective: To achieve integrated health data ecosystems, stakeholders need to overcome challenges of implementing semantic interoperability elements. To research the available scientific evidence on the development of semantic interoperability, we defined the following research questions: What are the key elements of and approaches for building semantic interoperability integrated in EHRs? What kinds of goals are driving the development? What kinds of clinical benefits are perceived following this development? Methods: Our research questions focused on key aspects and approaches for semantic interoperability and on possible clinical and semantic benefits in EHR context of these choices. For that purpose, we performed a systematic literature review in PubMed by defining our study framework based on previous research. Results: Our analysis consisted of 14 studies where data models, ontologies, terminologies, classifications, and standards were applied for building interoperability. All articles reported clinical benefits of the selected approach to enhancing semantic interoperability. We identified three main categories for this purpose: increasing availability of data for clinicians (n = 6), increasing quality of care (n = 4) and enhancing clinical data use and re-use for varied purposes (n = 4). Regarding semantic development goals, data harmonization and developing semantic interoperability between different EHRs was the largest category (n = 8). Enhancing health data quality through standardization (n = 5) and developing EHR integrated tools based on interoperable data (n = 1) were the other identified categories. The results were closely coupled with the need to build usable and computable data out of heterogeneous medical information that is accessible through various EHRs and databases, e.g., registers. Conclusions: When heading towards semantic harmonization of clinical data, more experiences and analyses are needed to assess how applicable the chosen solutions are for semantic interoperability of health care data. Instead of promoting a single approach, semantic interoperability should be assessed through several levels of semantic requirements A dual- or multi-model approach is possibly usable to address different semantic interoperability issues during development. The objectives of semanti","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140804363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}