Unlabelled: Interoperability has been designed to improve the quality and efficiency of health care. It allows the Centers for Medicare and Medicaid Services to collect data on quality measures as a part of the Meaningful Use program. Covered providers who fail to provide data have lower rates of reimbursement. Unintended consequences also arise at each step of the data collection process: (1) providers are not reimbursed for the extra time required to generate data; (2) patients do not have control over when and how their data are provided to or used by the government; and (3) large datasets increase the chances of an accidental data breach or intentional hacker attack. After detailing the issues, we describe several solutions, including an appropriate data use review board, which is designed to oversee certain aspects of the process and ensure accountability and transparency.
{"title":"Unintended Consequences of Data Sharing Under the Meaningful Use Program.","authors":"Irmgard Ursula Willcockson, Ignacio Herman Valdes","doi":"10.2196/52675","DOIUrl":"10.2196/52675","url":null,"abstract":"<p><strong>Unlabelled: </strong>Interoperability has been designed to improve the quality and efficiency of health care. It allows the Centers for Medicare and Medicaid Services to collect data on quality measures as a part of the Meaningful Use program. Covered providers who fail to provide data have lower rates of reimbursement. Unintended consequences also arise at each step of the data collection process: (1) providers are not reimbursed for the extra time required to generate data; (2) patients do not have control over when and how their data are provided to or used by the government; and (3) large datasets increase the chances of an accidental data breach or intentional hacker attack. After detailing the issues, we describe several solutions, including an appropriate data use review board, which is designed to oversee certain aspects of the process and ensure accountability and transparency.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e52675"},"PeriodicalIF":3.1,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11581416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142633523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Artificial intelligence (AI) is rapidly being adopted to build products and aid in the decision-making process across industries. However, AI systems have been shown to exhibit and even amplify biases, causing a growing concern among people worldwide. Thus, investigating methods of measuring and mitigating bias within these AI-powered tools is necessary.
Objective: In natural language processing applications, the word embedding association test (WEAT) is a popular method of measuring bias in input embeddings, a common area of measure bias in AI. However, certain limitations of the WEAT have been identified (ie, their nonrobust measure of bias and their reliance on predefined and limited groups of words or sentences), which may lead to inadequate measurements and evaluations of bias. Thus, this study takes a new approach at modifying this popular measure of bias, with a focus on making it more robust and applicable in other domains.
Methods: In this study, we introduce the SD-WEAT, which is a modified version of the WEAT that uses the SD of multiple permutations of the WEATs to calculate bias in input embeddings. With the SD-WEAT, we evaluated the biases and stability of several language embedding models, including Global Vectors for Word Representation (GloVe), Word2Vec, and bidirectional encoder representations from transformers (BERT).
Results: This method produces results comparable to those of the WEAT, with strong correlations between the methods' bias scores or effect sizes (r=0.786) and P values (r=0.776), while addressing some of its largest limitations. More specifically, the SD-WEAT is more accessible, as it removes the need to predefine attribute groups, and because the SD-WEAT measures bias over multiple runs rather than one, it reduces the impact of outliers and sample size. Furthermore, the SD-WEAT was found to be more consistent and reliable than its predecessor.
Conclusions: Thus, the SD-WEAT shows promise for robustly measuring bias in the input embeddings fed to AI language models.
背景:人工智能(AI)正被各行各业迅速用于制造产品和辅助决策过程。然而,人工智能系统已被证明会表现出甚至放大偏见,这引起了全世界人们越来越多的关注。因此,有必要研究在这些人工智能驱动的工具中测量和减轻偏见的方法:在自然语言处理应用中,词嵌入关联测试(WEAT)是测量输入嵌入偏差的常用方法,也是人工智能测量偏差的常见领域。然而,WEAT 的某些局限性已被发现(即其对偏差的非稳健测量及其对预定义和有限的单词或句子组的依赖),这可能会导致对偏差的测量和评估不充分。因此,本研究采用了一种新方法来修改这种流行的偏差测量方法,重点是使其更加稳健并适用于其他领域:在本研究中,我们引入了SD-WEAT,它是WEAT的一个改进版本,使用WEAT多重排列的SD来计算输入嵌入中的偏差。利用SD-WEAT,我们评估了几种语言嵌入模型的偏差和稳定性,包括词表示的全局向量(GloVe)、Word2Vec和来自变换器的双向编码器表示(BERT):该方法得出的结果与 WEAT 的结果相当,方法的偏差分数或效应大小(r=0.786)和 P 值(r=0.776)之间具有很强的相关性,同时解决了其最大的一些局限性。更具体地说,SD-WEAT 更易于使用,因为它无需预先定义属性组,而且由于 SD-WEAT 是通过多次运行而不是一次运行来测量偏差的,因此它减少了异常值和样本大小的影响。此外,SD-WEAT 比其前身更一致、更可靠:因此,SD-WEAT有望稳健地测量人工智能语言模型输入嵌入的偏差。
{"title":"Enhancing Bias Assessment for Complex Term Groups in Language Embedding Models: Quantitative Comparison of Methods.","authors":"Magnus Gray, Mariofanna Milanova, Leihong Wu","doi":"10.2196/60272","DOIUrl":"https://doi.org/10.2196/60272","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) is rapidly being adopted to build products and aid in the decision-making process across industries. However, AI systems have been shown to exhibit and even amplify biases, causing a growing concern among people worldwide. Thus, investigating methods of measuring and mitigating bias within these AI-powered tools is necessary.</p><p><strong>Objective: </strong>In natural language processing applications, the word embedding association test (WEAT) is a popular method of measuring bias in input embeddings, a common area of measure bias in AI. However, certain limitations of the WEAT have been identified (ie, their nonrobust measure of bias and their reliance on predefined and limited groups of words or sentences), which may lead to inadequate measurements and evaluations of bias. Thus, this study takes a new approach at modifying this popular measure of bias, with a focus on making it more robust and applicable in other domains.</p><p><strong>Methods: </strong>In this study, we introduce the SD-WEAT, which is a modified version of the WEAT that uses the SD of multiple permutations of the WEATs to calculate bias in input embeddings. With the SD-WEAT, we evaluated the biases and stability of several language embedding models, including Global Vectors for Word Representation (GloVe), Word2Vec, and bidirectional encoder representations from transformers (BERT).</p><p><strong>Results: </strong>This method produces results comparable to those of the WEAT, with strong correlations between the methods' bias scores or effect sizes (r=0.786) and P values (r=0.776), while addressing some of its largest limitations. More specifically, the SD-WEAT is more accessible, as it removes the need to predefine attribute groups, and because the SD-WEAT measures bias over multiple runs rather than one, it reduces the impact of outliers and sample size. Furthermore, the SD-WEAT was found to be more consistent and reliable than its predecessor.</p><p><strong>Conclusions: </strong>Thus, the SD-WEAT shows promise for robustly measuring bias in the input embeddings fed to AI language models.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e60272"},"PeriodicalIF":3.1,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stuart W Sommers, Heather J Tolle, Katy E Trinkley, Christine G Johnston, Caitlin L Dietsche, Stephanie V Eldred, Abraham T Wick, Jason A Hoppe
<p><strong>Background: </strong>Coprescribing naloxone with opioid analgesics is a Centers for Disease Control and Prevention (CDC) best practice to mitigate the risk of fatal opioid overdose, yet coprescription by emergency medicine clinicians is rare, occurring less than 5% of the time it is indicated. Clinical decision support (CDS) has been associated with increased naloxone prescribing; however, key CDS design characteristics and pragmatic outcome measures necessary to understand replicability and effectiveness have not been reported.</p><p><strong>Objective: </strong>This study aimed to rigorously evaluate and quantify the impact of CDS designed to improve emergency department (ED) naloxone coprescribing. We hypothesized CDS would increase naloxone coprescribing and the number of naloxone prescriptions filled by patients discharged from EDs in a large health care system.</p><p><strong>Methods: </strong>Following user-centered design principles, we designed and implemented a fully automated, interruptive, electronic health record-based CDS to nudge clinicians to coprescribe naloxone with high-risk opioid prescriptions. "High-risk" opioid prescriptions were defined as any opioid analgesic prescription ≥90 total morphine milligram equivalents per day or for patients with a prior diagnosis of opioid use disorder or opioid overdose. The Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) framework was used to evaluate pragmatic CDS outcomes of reach, effectiveness, adoption, implementation, and maintenance. Effectiveness was the primary outcome of interest and was assessed by (1) constructing a Bayesian structural time-series model of the number of ED visits with naloxone coprescriptions before and after CDS implementation and (2) calculating the percentage of naloxone prescriptions associated with CDS that were filled at an outpatient pharmacy. Mann-Kendall tests were used to evaluate longitudinal trends in CDS adoption. All outcomes were analyzed in R (version 4.2.2; R Core Team).</p><p><strong>Unlabelled: </strong>Between November 2019 and July 2023, there were 1,994,994 ED visits. CDS reached clinicians in 0.83% (16,566/1,994,994) of all visits and 15.99% (16,566/103,606) of ED visits where an opioid was prescribed at discharge. Clinicians adopted CDS, coprescribing naloxone in 34.36% (6613/19,246) of alerts. CDS was effective, increasing naloxone coprescribing from baseline by 18.1 (95% CI 17.9-18.3) coprescriptions per week or 2,327% (95% CI 3390-3490). Patients filled 43.80% (1989/4541) of naloxone coprescriptions. The CDS was implemented simultaneously at every ED and no adaptations were made to CDS postimplementation. CDS was maintained beyond the study period and maintained its effect, with adoption increasing over time (τ=0.454; P<.001).</p><p><strong>Conclusions: </strong>Our findings advance the evidence that electronic health record-based CDS increases the number of naloxone coprescriptions and improves the dis
{"title":"Clinical Decision Support to Increase Emergency Department Naloxone Coprescribing: Implementation Report.","authors":"Stuart W Sommers, Heather J Tolle, Katy E Trinkley, Christine G Johnston, Caitlin L Dietsche, Stephanie V Eldred, Abraham T Wick, Jason A Hoppe","doi":"10.2196/58276","DOIUrl":"10.2196/58276","url":null,"abstract":"<p><strong>Background: </strong>Coprescribing naloxone with opioid analgesics is a Centers for Disease Control and Prevention (CDC) best practice to mitigate the risk of fatal opioid overdose, yet coprescription by emergency medicine clinicians is rare, occurring less than 5% of the time it is indicated. Clinical decision support (CDS) has been associated with increased naloxone prescribing; however, key CDS design characteristics and pragmatic outcome measures necessary to understand replicability and effectiveness have not been reported.</p><p><strong>Objective: </strong>This study aimed to rigorously evaluate and quantify the impact of CDS designed to improve emergency department (ED) naloxone coprescribing. We hypothesized CDS would increase naloxone coprescribing and the number of naloxone prescriptions filled by patients discharged from EDs in a large health care system.</p><p><strong>Methods: </strong>Following user-centered design principles, we designed and implemented a fully automated, interruptive, electronic health record-based CDS to nudge clinicians to coprescribe naloxone with high-risk opioid prescriptions. \"High-risk\" opioid prescriptions were defined as any opioid analgesic prescription ≥90 total morphine milligram equivalents per day or for patients with a prior diagnosis of opioid use disorder or opioid overdose. The Reach, Effectiveness, Adoption, Implementation, and Maintenance (RE-AIM) framework was used to evaluate pragmatic CDS outcomes of reach, effectiveness, adoption, implementation, and maintenance. Effectiveness was the primary outcome of interest and was assessed by (1) constructing a Bayesian structural time-series model of the number of ED visits with naloxone coprescriptions before and after CDS implementation and (2) calculating the percentage of naloxone prescriptions associated with CDS that were filled at an outpatient pharmacy. Mann-Kendall tests were used to evaluate longitudinal trends in CDS adoption. All outcomes were analyzed in R (version 4.2.2; R Core Team).</p><p><strong>Unlabelled: </strong>Between November 2019 and July 2023, there were 1,994,994 ED visits. CDS reached clinicians in 0.83% (16,566/1,994,994) of all visits and 15.99% (16,566/103,606) of ED visits where an opioid was prescribed at discharge. Clinicians adopted CDS, coprescribing naloxone in 34.36% (6613/19,246) of alerts. CDS was effective, increasing naloxone coprescribing from baseline by 18.1 (95% CI 17.9-18.3) coprescriptions per week or 2,327% (95% CI 3390-3490). Patients filled 43.80% (1989/4541) of naloxone coprescriptions. The CDS was implemented simultaneously at every ED and no adaptations were made to CDS postimplementation. CDS was maintained beyond the study period and maintained its effect, with adoption increasing over time (τ=0.454; P<.001).</p><p><strong>Conclusions: </strong>Our findings advance the evidence that electronic health record-based CDS increases the number of naloxone coprescriptions and improves the dis","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e58276"},"PeriodicalIF":3.1,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11560079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yordan P Penev, Timothy R Buchanan, Matthew M Ruppert, Michelle Liu, Ramin Shekouhi, Ziyuan Guan, Jeremy Balch, Tezcan Ozrazgat-Baslanti, Benjamin Shickel, Tyler J Loftus, Azra Bihorac
Background: Electronic health records (EHRs) have an enormous potential to advance medical research and practice through easily accessible and interpretable EHR-derived databases. Attainability of this potential is limited by issues with data quality (DQ) and performance assessment.
Objective: This review aims to streamline the current best practices on EHR DQ and performance assessments as a replicable standard for researchers in the field.
Methods: PubMed was systematically searched for original research articles assessing EHR DQ and performance from inception until May 7, 2023.
Results: Our search yielded 26 original research articles. Most articles had 1 or more significant limitations, including incomplete or inconsistent reporting (n=6, 30%), poor replicability (n=5, 25%), and limited generalizability of results (n=5, 25%). Completeness (n=21, 81%), conformance (n=18, 69%), and plausibility (n=16, 62%) were the most cited indicators of DQ, while correctness or accuracy (n=14, 54%) was most cited for data performance, with context-specific supplementation by recency (n=7, 27%), fairness (n=6, 23%), stability (n=4, 15%), and shareability (n=2, 8%) assessments. Artificial intelligence-based techniques, including natural language data extraction, data imputation, and fairness algorithms, were demonstrated to play a rising role in improving both dataset quality and performance.
Conclusions: This review highlights the need for incentivizing DQ and performance assessments and their standardization. The results suggest the usefulness of artificial intelligence-based techniques for enhancing DQ and performance to unlock the full potential of EHRs to improve medical research and practice.
{"title":"Electronic Health Record Data Quality and Performance Assessments: Scoping Review.","authors":"Yordan P Penev, Timothy R Buchanan, Matthew M Ruppert, Michelle Liu, Ramin Shekouhi, Ziyuan Guan, Jeremy Balch, Tezcan Ozrazgat-Baslanti, Benjamin Shickel, Tyler J Loftus, Azra Bihorac","doi":"10.2196/58130","DOIUrl":"10.2196/58130","url":null,"abstract":"<p><strong>Background: </strong>Electronic health records (EHRs) have an enormous potential to advance medical research and practice through easily accessible and interpretable EHR-derived databases. Attainability of this potential is limited by issues with data quality (DQ) and performance assessment.</p><p><strong>Objective: </strong>This review aims to streamline the current best practices on EHR DQ and performance assessments as a replicable standard for researchers in the field.</p><p><strong>Methods: </strong>PubMed was systematically searched for original research articles assessing EHR DQ and performance from inception until May 7, 2023.</p><p><strong>Results: </strong>Our search yielded 26 original research articles. Most articles had 1 or more significant limitations, including incomplete or inconsistent reporting (n=6, 30%), poor replicability (n=5, 25%), and limited generalizability of results (n=5, 25%). Completeness (n=21, 81%), conformance (n=18, 69%), and plausibility (n=16, 62%) were the most cited indicators of DQ, while correctness or accuracy (n=14, 54%) was most cited for data performance, with context-specific supplementation by recency (n=7, 27%), fairness (n=6, 23%), stability (n=4, 15%), and shareability (n=2, 8%) assessments. Artificial intelligence-based techniques, including natural language data extraction, data imputation, and fairness algorithms, were demonstrated to play a rising role in improving both dataset quality and performance.</p><p><strong>Conclusions: </strong>This review highlights the need for incentivizing DQ and performance assessments and their standardization. The results suggest the usefulness of artificial intelligence-based techniques for enhancing DQ and performance to unlock the full potential of EHRs to improve medical research and practice.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e58130"},"PeriodicalIF":3.1,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11559435/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ethan E Abbott, Donald Apakama, Lynne D Richardson, Lili Chan, Girish N Nadkarni
Background: Social determinants of health (SDOH) are critical drivers of health disparities and patient outcomes. However, accessing and collecting patient-level SDOH data can be operationally challenging in the emergency department (ED) clinical setting, requiring innovative approaches.
Objective: This scoping review examines the potential of AI and data science for modeling, extraction, and incorporation of SDOH data specifically within EDs, further identifying areas for advancement and investigation.
Methods: We conducted a standardized search for studies published between 2015 and 2022, across Medline (Ovid), Embase (Ovid), CINAHL, Web of Science, and ERIC databases. We focused on identifying studies using AI or data science related to SDOH within emergency care contexts or conditions. Two specialized reviewers in emergency medicine (EM) and clinical informatics independently assessed each article, resolving discrepancies through iterative reviews and discussion. We then extracted data covering study details, methodologies, patient demographics, care settings, and principal outcomes.
Results: Of the 1047 studies screened, 26 met the inclusion criteria. Notably, 9 out of 26 (35%) studies were solely concentrated on ED patients. Conditions studied spanned broad EM complaints and included sepsis, acute myocardial infarction, and asthma. The majority of studies (n=16) explored multiple SDOH domains, with homelessness/housing insecurity and neighborhood/built environment predominating. Machine learning (ML) techniques were used in 23 of 26 studies, with natural language processing (NLP) being the most commonly used approach (n=11). Rule-based NLP (n=5), deep learning (n=2), and pattern matching (n=4) were the most commonly used NLP techniques. NLP models in the reviewed studies displayed significant predictive performance with outcomes, with F1-scores ranging between 0.40 and 0.75 and specificities nearing 95.9%.
Conclusions: Although in its infancy, the convergence of AI and data science techniques, especially ML and NLP, with SDOH in EM offers transformative possibilities for better usage and integration of social data into clinical care and research. With a significant focus on the ED and notable NLP model performance, there is an imperative to standardize SDOH data collection, refine algorithms for diverse patient groups, and champion interdisciplinary synergies. These efforts aim to harness SDOH data optimally, enhancing patient care and mitigating health disparities. Our research underscores the vital need for continued investigation in this domain.
{"title":"Leveraging Artificial Intelligence and Data Science for Integration of Social Determinants of Health in Emergency Medicine: Scoping Review.","authors":"Ethan E Abbott, Donald Apakama, Lynne D Richardson, Lili Chan, Girish N Nadkarni","doi":"10.2196/57124","DOIUrl":"10.2196/57124","url":null,"abstract":"<p><strong>Background: </strong>Social determinants of health (SDOH) are critical drivers of health disparities and patient outcomes. However, accessing and collecting patient-level SDOH data can be operationally challenging in the emergency department (ED) clinical setting, requiring innovative approaches.</p><p><strong>Objective: </strong>This scoping review examines the potential of AI and data science for modeling, extraction, and incorporation of SDOH data specifically within EDs, further identifying areas for advancement and investigation.</p><p><strong>Methods: </strong>We conducted a standardized search for studies published between 2015 and 2022, across Medline (Ovid), Embase (Ovid), CINAHL, Web of Science, and ERIC databases. We focused on identifying studies using AI or data science related to SDOH within emergency care contexts or conditions. Two specialized reviewers in emergency medicine (EM) and clinical informatics independently assessed each article, resolving discrepancies through iterative reviews and discussion. We then extracted data covering study details, methodologies, patient demographics, care settings, and principal outcomes.</p><p><strong>Results: </strong>Of the 1047 studies screened, 26 met the inclusion criteria. Notably, 9 out of 26 (35%) studies were solely concentrated on ED patients. Conditions studied spanned broad EM complaints and included sepsis, acute myocardial infarction, and asthma. The majority of studies (n=16) explored multiple SDOH domains, with homelessness/housing insecurity and neighborhood/built environment predominating. Machine learning (ML) techniques were used in 23 of 26 studies, with natural language processing (NLP) being the most commonly used approach (n=11). Rule-based NLP (n=5), deep learning (n=2), and pattern matching (n=4) were the most commonly used NLP techniques. NLP models in the reviewed studies displayed significant predictive performance with outcomes, with F1-scores ranging between 0.40 and 0.75 and specificities nearing 95.9%.</p><p><strong>Conclusions: </strong>Although in its infancy, the convergence of AI and data science techniques, especially ML and NLP, with SDOH in EM offers transformative possibilities for better usage and integration of social data into clinical care and research. With a significant focus on the ED and notable NLP model performance, there is an imperative to standardize SDOH data collection, refine algorithms for diverse patient groups, and champion interdisciplinary synergies. These efforts aim to harness SDOH data optimally, enhancing patient care and mitigating health disparities. Our research underscores the vital need for continued investigation in this domain.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e57124"},"PeriodicalIF":3.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyungmo Kim, Seongkeun Park, Jeongwon Min, Sumin Park, Ju Yeon Kim, Jinsu Eun, Kyuha Jung, Yoobin Elyson Park, Esther Kim, Eun Young Lee, Joonhwan Lee, Jinwook Choi
Background: The bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application development without a thorough assessment of the model's comprehension of clinical context. Furthermore, limited comparative studies have been conducted on BERT models using medical documents from non-English-speaking countries. Therefore, the applicability of BERT models trained on English clinical notes to non-English contexts is yet to be confirmed. To address these gaps in literature, this study focused on identifying the most effective BERT model for non-English clinical notes.
Objective: In this study, we evaluated the contextual understanding abilities of various BERT models applied to mixed Korean and English clinical notes. The objective of this study was to identify the BERT model that excels in understanding the context of such documents.
Methods: Using data from 164,460 patients in a South Korean tertiary hospital, we pretrained BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT) to improve their contextual comprehension capabilities and subsequently compared their performances in 7 fine-tuning tasks.
Results: The model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks using classification ([CLS]) token embeddings, such as document classification. BioBERT achieved the highest F1-score of 89.32. Both BERT-base and BioBERT demonstrated their effectiveness in document pattern recognition, even with limited Korean tokens in the dictionary. Second, M-BERT exhibited a superior performance in reading comprehension tasks, achieving an F1-score of 93.77. Better results were obtained when fewer words were replaced with unknown ([UNK]) tokens. Third, M-BERT excelled in the knowledge inference task in which correct disease names were inferred from 63 candidate disease names in a document with disease names replaced with [MASK] tokens. M-BERT achieved the highest hit@10 score of 95.41.
Conclusions: This study highlighted the effectiveness of various BERT models in a multilingual clinical domain. The findings can be used as a reference in clinical and language-based applications.
{"title":"Multifaceted Natural Language Processing Task-Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation.","authors":"Kyungmo Kim, Seongkeun Park, Jeongwon Min, Sumin Park, Ju Yeon Kim, Jinsu Eun, Kyuha Jung, Yoobin Elyson Park, Esther Kim, Eun Young Lee, Joonhwan Lee, Jinwook Choi","doi":"10.2196/52897","DOIUrl":"10.2196/52897","url":null,"abstract":"<p><strong>Background: </strong>The bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application development without a thorough assessment of the model's comprehension of clinical context. Furthermore, limited comparative studies have been conducted on BERT models using medical documents from non-English-speaking countries. Therefore, the applicability of BERT models trained on English clinical notes to non-English contexts is yet to be confirmed. To address these gaps in literature, this study focused on identifying the most effective BERT model for non-English clinical notes.</p><p><strong>Objective: </strong>In this study, we evaluated the contextual understanding abilities of various BERT models applied to mixed Korean and English clinical notes. The objective of this study was to identify the BERT model that excels in understanding the context of such documents.</p><p><strong>Methods: </strong>Using data from 164,460 patients in a South Korean tertiary hospital, we pretrained BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT) to improve their contextual comprehension capabilities and subsequently compared their performances in 7 fine-tuning tasks.</p><p><strong>Results: </strong>The model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks using classification ([CLS]) token embeddings, such as document classification. BioBERT achieved the highest F1-score of 89.32. Both BERT-base and BioBERT demonstrated their effectiveness in document pattern recognition, even with limited Korean tokens in the dictionary. Second, M-BERT exhibited a superior performance in reading comprehension tasks, achieving an F1-score of 93.77. Better results were obtained when fewer words were replaced with unknown ([UNK]) tokens. Third, M-BERT excelled in the knowledge inference task in which correct disease names were inferred from 63 candidate disease names in a document with disease names replaced with [MASK] tokens. M-BERT achieved the highest hit@10 score of 95.41.</p><p><strong>Conclusions: </strong>This study highlighted the effectiveness of various BERT models in a multilingual clinical domain. The findings can be used as a reference in clinical and language-based applications.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e52897"},"PeriodicalIF":3.1,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11539635/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bruno Paiva, Marcos André Gonçalves, Leonardo Chaves Dutra da Rocha, Milena Soriano Marcolino, Fernanda Cristina Barbosa Lana, Maira Viana Rego Souza-Silva, Jussara M Almeida, Polianna Delfino Pereira, Claudio Moisés Valiense de Andrade, Angélica Gomides Dos Reis Gomes, Maria Angélica Pires Ferreira, Frederico Bartolazzi, Manuela Furtado Sacioto, Ana Paula Boscato, Milton Henriques Guimarães-Júnior, Priscilla Pereira Dos Reis, Felício Roberto Costa, Alzira de Oliveira Jorge, Laryssa Reis Coelho, Marcelo Carneiro, Thaís Lorenna Souza Sales, Silvia Ferreira Araújo, Daniel Vitório Silveira, Karen Brasil Ruschel, Fernanda Caldeira Veloso Santos, Evelin Paola de Almeida Cenci, Luanna Silva Monteiro Menezes, Fernando Anschau, Maria Aparecida Camargos Bicalho, Euler Roberto Fernandes Manenti, Renan Goulart Finger, Daniela Ponce, Filipe Carrilho de Aguiar, Luiza Margoto Marques, Luís César de Castro, Giovanna Grünewald Vietta, Mariana Frizzo de Godoy, Mariana do Nascimento Vilaça, Vivian Costa Morais
Background: Proper analysis and interpretation of health care data can significantly improve patient outcomes by enhancing services and revealing the impacts of new technologies and treatments. Understanding the substantial impact of temporal shifts in these data is crucial. For example, COVID-19 vaccination initially lowered the mean age of at-risk patients and later changed the characteristics of those who died. This highlights the importance of understanding these shifts for assessing factors that affect patient outcomes.
Objective: This study aims to propose detection, initial characterization, and semantic characterization (DIS), a new methodology for analyzing changes in health outcomes and variables over time while discovering contextual changes for outcomes in large volumes of data.
Methods: The DIS methodology involves 3 steps: detection, initial characterization, and semantic characterization. Detection uses metrics such as Jensen-Shannon divergence to identify significant data drifts. Initial characterization offers a global analysis of changes in data distribution and predictive feature significance over time. Semantic characterization uses natural language processing-inspired techniques to understand the local context of these changes, helping identify factors driving changes in patient outcomes. By integrating the outcomes from these 3 steps, our results can identify specific factors (eg, interventions and modifications in health care practices) that drive changes in patient outcomes. DIS was applied to the Brazilian COVID-19 Registry and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) data sets.
Results: Our approach allowed us to (1) identify drifts effectively, especially using metrics such as the Jensen-Shannon divergence, and (2) uncover reasons for the decline in overall mortality in both the COVID-19 and MIMIC-IV data sets, as well as changes in the cooccurrence between different diseases and this particular outcome. Factors such as vaccination during the COVID-19 pandemic and reduced iatrogenic events and cancer-related deaths in MIMIC-IV were highlighted. The methodology also pinpointed shifts in patient demographics and disease patterns, providing insights into the evolving health care landscape during the study period.
Conclusions: We developed a novel methodology combining machine learning and natural language processing techniques to detect, characterize, and understand temporal shifts in health care data. This understanding can enhance predictive algorithms, improve patient outcomes, and optimize health care resource allocation, ultimately improving the effectiveness of machine learning predictive algorithms applied to health care data. Our methodology can be applied to a variety of scenarios beyond those discussed in this paper.
{"title":"A New Natural Language Processing-Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study.","authors":"Bruno Paiva, Marcos André Gonçalves, Leonardo Chaves Dutra da Rocha, Milena Soriano Marcolino, Fernanda Cristina Barbosa Lana, Maira Viana Rego Souza-Silva, Jussara M Almeida, Polianna Delfino Pereira, Claudio Moisés Valiense de Andrade, Angélica Gomides Dos Reis Gomes, Maria Angélica Pires Ferreira, Frederico Bartolazzi, Manuela Furtado Sacioto, Ana Paula Boscato, Milton Henriques Guimarães-Júnior, Priscilla Pereira Dos Reis, Felício Roberto Costa, Alzira de Oliveira Jorge, Laryssa Reis Coelho, Marcelo Carneiro, Thaís Lorenna Souza Sales, Silvia Ferreira Araújo, Daniel Vitório Silveira, Karen Brasil Ruschel, Fernanda Caldeira Veloso Santos, Evelin Paola de Almeida Cenci, Luanna Silva Monteiro Menezes, Fernando Anschau, Maria Aparecida Camargos Bicalho, Euler Roberto Fernandes Manenti, Renan Goulart Finger, Daniela Ponce, Filipe Carrilho de Aguiar, Luiza Margoto Marques, Luís César de Castro, Giovanna Grünewald Vietta, Mariana Frizzo de Godoy, Mariana do Nascimento Vilaça, Vivian Costa Morais","doi":"10.2196/54246","DOIUrl":"10.2196/54246","url":null,"abstract":"<p><strong>Background: </strong>Proper analysis and interpretation of health care data can significantly improve patient outcomes by enhancing services and revealing the impacts of new technologies and treatments. Understanding the substantial impact of temporal shifts in these data is crucial. For example, COVID-19 vaccination initially lowered the mean age of at-risk patients and later changed the characteristics of those who died. This highlights the importance of understanding these shifts for assessing factors that affect patient outcomes.</p><p><strong>Objective: </strong>This study aims to propose detection, initial characterization, and semantic characterization (DIS), a new methodology for analyzing changes in health outcomes and variables over time while discovering contextual changes for outcomes in large volumes of data.</p><p><strong>Methods: </strong>The DIS methodology involves 3 steps: detection, initial characterization, and semantic characterization. Detection uses metrics such as Jensen-Shannon divergence to identify significant data drifts. Initial characterization offers a global analysis of changes in data distribution and predictive feature significance over time. Semantic characterization uses natural language processing-inspired techniques to understand the local context of these changes, helping identify factors driving changes in patient outcomes. By integrating the outcomes from these 3 steps, our results can identify specific factors (eg, interventions and modifications in health care practices) that drive changes in patient outcomes. DIS was applied to the Brazilian COVID-19 Registry and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) data sets.</p><p><strong>Results: </strong>Our approach allowed us to (1) identify drifts effectively, especially using metrics such as the Jensen-Shannon divergence, and (2) uncover reasons for the decline in overall mortality in both the COVID-19 and MIMIC-IV data sets, as well as changes in the cooccurrence between different diseases and this particular outcome. Factors such as vaccination during the COVID-19 pandemic and reduced iatrogenic events and cancer-related deaths in MIMIC-IV were highlighted. The methodology also pinpointed shifts in patient demographics and disease patterns, providing insights into the evolving health care landscape during the study period.</p><p><strong>Conclusions: </strong>We developed a novel methodology combining machine learning and natural language processing techniques to detect, characterize, and understand temporal shifts in health care data. This understanding can enhance predictive algorithms, improve patient outcomes, and optimize health care resource allocation, ultimately improving the effectiveness of machine learning predictive algorithms applied to health care data. Our methodology can be applied to a variety of scenarios beyond those discussed in this paper.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e54246"},"PeriodicalIF":3.1,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11555458/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hieu Minh Nguyen, William Anderson, Shih-Hsiung Chou, Andrew McWilliams, Jing Zhao, Nicholas Pajewski, Yhenneko Taylor
Background: Assessing disease progression among patients with uncontrolled hypertension is important for identifying opportunities for intervention.
Objective: We aim to develop and validate 2 models, one to predict sustained, uncontrolled hypertension (≥2 blood pressure [BP] readings ≥140/90 mm Hg or ≥1 BP reading ≥180/120 mm Hg) and one to predict hypertensive crisis (≥1 BP reading ≥180/120 mm Hg) within 1 year of an index visit (outpatient or ambulatory encounter in which an uncontrolled BP reading was recorded).
Methods: Data from 142,897 patients with uncontrolled hypertension within Atrium Health Greater Charlotte in 2018 were used. Electronic health record-based predictors were based on the 1-year period before a patient's index visit. The dataset was randomly split (80:20) into a training set and a validation set. In total, 4 machine learning frameworks were considered: L2-regularized logistic regression, multilayer perceptron, gradient boosting machines, and random forest. Model selection was performed with 10-fold cross-validation. The final models were assessed on discrimination (C-statistic), calibration (eg, integrated calibration index), and net benefit (with decision curve analysis). Additionally, internal-external cross-validation was performed at the county level to assess performance with new populations and summarized using random-effect meta-analyses.
Results: In internal validation, the C-statistic and integrated calibration index were 0.72 (95% CI 0.71-0.72) and 0.015 (95% CI 0.012-0.020) for the sustained, uncontrolled hypertension model, and 0.81 (95% CI 0.79-0.82) and 0.009 (95% CI 0.007-0.011) for the hypertensive crisis model. The models had higher net benefit than the default policies (ie, treat-all and treat-none) across different decision thresholds. In internal-external cross-validation, the pooled performance was consistent with internal validation results; in particular, the pooled C-statistics were 0.70 (95% CI 0.69-0.71) and 0.79 (95% CI 0.78-0.81) for the sustained, uncontrolled hypertension model and hypertensive crisis model, respectively.
Conclusions: An electronic health record-based model predicted hypertensive crisis reasonably well in internal and internal-external validations. The model can potentially be used to support population health surveillance and hypertension management. Further studies are needed to improve the ability to predict sustained, uncontrolled hypertension.
背景:评估未受控制的高血压患者的疾病进展对于确定干预机会非常重要:评估未得到控制的高血压患者的疾病进展对于确定干预机会非常重要:我们旨在开发并验证 2 个模型,一个用于预测持续、未控制的高血压(≥2 次血压 [BP] 读数≥140/90mmHg 或≥1 次血压读数≥180/120mmHg),另一个用于预测指数就诊(记录有未控制血压读数的门诊或非住院就诊)后 1 年内的高血压危象(≥1 次血压读数≥180/120mmHg):采用2018年Atrium Health大夏洛特地区142897名未控制高血压患者的数据。基于电子健康记录的预测因子基于患者指数就诊前的 1 年时间。数据集随机(80:20)分为训练集和验证集。总共考虑了 4 种机器学习框架:L2- 规则化逻辑回归、多层感知器、梯度提升机和随机森林。模型选择采用 10 倍交叉验证。对最终模型的判别(C 统计量)、校准(如综合校准指数)和净效益(决策曲线分析)进行了评估。此外,还在县一级进行了内部-外部交叉验证,以评估新人群的表现,并使用随机效应荟萃分析进行总结:在内部验证中,持续未控制高血压模型的 C 统计量和综合校准指数分别为 0.72(95% CI 0.71-0.72)和 0.015(95% CI 0.012-0.020),高血压危机模型的 C 统计量和综合校准指数分别为 0.81(95% CI 0.79-0.82)和 0.009(95% CI 0.007-0.011)。在不同的决策阈值下,这些模型的净效益均高于默认政策(即 "全部治疗 "和 "不治疗")。在内部-外部交叉验证中,汇总结果与内部验证结果一致;特别是,持续、未控制的高血压模型和高血压危机模型的汇总C统计量分别为0.70(95% CI 0.69-0.71)和0.79(95% CI 0.78-0.81):基于电子健康记录的模型在内部和内外部验证中对高血压危象的预测效果相当好。该模型可用于支持人群健康监测和高血压管理。要提高预测持续、失控高血压的能力,还需要进一步研究。
{"title":"Predictive Models for Sustained, Uncontrolled Hypertension and Hypertensive Crisis Based on Electronic Health Record Data: Algorithm Development and Validation.","authors":"Hieu Minh Nguyen, William Anderson, Shih-Hsiung Chou, Andrew McWilliams, Jing Zhao, Nicholas Pajewski, Yhenneko Taylor","doi":"10.2196/58732","DOIUrl":"10.2196/58732","url":null,"abstract":"<p><strong>Background: </strong>Assessing disease progression among patients with uncontrolled hypertension is important for identifying opportunities for intervention.</p><p><strong>Objective: </strong>We aim to develop and validate 2 models, one to predict sustained, uncontrolled hypertension (≥2 blood pressure [BP] readings ≥140/90 mm Hg or ≥1 BP reading ≥180/120 mm Hg) and one to predict hypertensive crisis (≥1 BP reading ≥180/120 mm Hg) within 1 year of an index visit (outpatient or ambulatory encounter in which an uncontrolled BP reading was recorded).</p><p><strong>Methods: </strong>Data from 142,897 patients with uncontrolled hypertension within Atrium Health Greater Charlotte in 2018 were used. Electronic health record-based predictors were based on the 1-year period before a patient's index visit. The dataset was randomly split (80:20) into a training set and a validation set. In total, 4 machine learning frameworks were considered: L2-regularized logistic regression, multilayer perceptron, gradient boosting machines, and random forest. Model selection was performed with 10-fold cross-validation. The final models were assessed on discrimination (C-statistic), calibration (eg, integrated calibration index), and net benefit (with decision curve analysis). Additionally, internal-external cross-validation was performed at the county level to assess performance with new populations and summarized using random-effect meta-analyses.</p><p><strong>Results: </strong>In internal validation, the C-statistic and integrated calibration index were 0.72 (95% CI 0.71-0.72) and 0.015 (95% CI 0.012-0.020) for the sustained, uncontrolled hypertension model, and 0.81 (95% CI 0.79-0.82) and 0.009 (95% CI 0.007-0.011) for the hypertensive crisis model. The models had higher net benefit than the default policies (ie, treat-all and treat-none) across different decision thresholds. In internal-external cross-validation, the pooled performance was consistent with internal validation results; in particular, the pooled C-statistics were 0.70 (95% CI 0.69-0.71) and 0.79 (95% CI 0.78-0.81) for the sustained, uncontrolled hypertension model and hypertensive crisis model, respectively.</p><p><strong>Conclusions: </strong>An electronic health record-based model predicted hypertensive crisis reasonably well in internal and internal-external validations. The model can potentially be used to support population health surveillance and hypertension management. Further studies are needed to improve the ability to predict sustained, uncontrolled hypertension.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e58732"},"PeriodicalIF":3.1,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11533385/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tarso Augusto Duenhas Accorsi, Anderson Aires Eduardo, Carlos Guilherme Baptista, Flavio Tocci Moreira, Renata Albaladejo Morbeck, Karen Francine Köhler, Karine de Amicis Lima, Carlos Henrique Sartorato Pedrotti
Background: Integrating decision support systems into telemedicine may optimize consultation efficiency and adherence to clinical guidelines; however, the extent of such effects remains underexplored.
Objective: This study aims to evaluate the use of ICD (International Classification of Disease)-coded prescription decision support systems (PDSSs) and the effects of these systems on consultation duration and guideline adherence during telemedicine encounters.
Methods: In this retrospective, single-center, observational study conducted from October 2021 to March 2022, adult patients who sought urgent digital care via direct-to-consumer video consultations were included. Physicians had access to current guidelines and could use an ICD-triggered PDSS (which was introduced in January 2022 after a preliminary test in the preceding month) for 26 guideline-based conditions. This study analyzed the impact of implementing automated prescription systems and compared these systems to manual prescription processes in terms of consultation duration and guideline adherence.
Results: This study included 10,485 telemedicine encounters involving 9644 patients, with 12,346 prescriptions issued by 290 physicians. Automated prescriptions were used in 5022 (40.67%) of the consultations following system integration. Before introducing decision support, 4497 (36.42%) prescriptions were issued, which increased to 7849 (63.57%) postimplementation. The physician's average consultation time decreased significantly to 9.5 (SD 5.5) minutes from 11.2 (SD 5.9) minutes after PDSS implementation (P<.001). Of the 12,346 prescriptions, 8683 (70.34%) were aligned with disease-specific international guidelines tailored for telemedicine encounters. Primary medication adherence in accordance with existing guidelines was significantly greater in the decision support group than in the manual group (n=4697, 93.53% vs n=1389, 49.14%; P<.001).
Conclusions: Most of the physicians adopted the PDSS, and the results demonstrated the use of the ICD-code system in reducing consultation times and increasing guideline adherence. These systems appear to be valuable for enhancing the efficiency and quality of telemedicine consultations by supporting evidence-based clinical decision-making.
{"title":"The Impact of International Classification of Disease-Triggered Prescription Support on Telemedicine: Observational Analysis of Efficiency and Guideline Adherence.","authors":"Tarso Augusto Duenhas Accorsi, Anderson Aires Eduardo, Carlos Guilherme Baptista, Flavio Tocci Moreira, Renata Albaladejo Morbeck, Karen Francine Köhler, Karine de Amicis Lima, Carlos Henrique Sartorato Pedrotti","doi":"10.2196/56681","DOIUrl":"10.2196/56681","url":null,"abstract":"<p><strong>Background: </strong>Integrating decision support systems into telemedicine may optimize consultation efficiency and adherence to clinical guidelines; however, the extent of such effects remains underexplored.</p><p><strong>Objective: </strong>This study aims to evaluate the use of ICD (International Classification of Disease)-coded prescription decision support systems (PDSSs) and the effects of these systems on consultation duration and guideline adherence during telemedicine encounters.</p><p><strong>Methods: </strong>In this retrospective, single-center, observational study conducted from October 2021 to March 2022, adult patients who sought urgent digital care via direct-to-consumer video consultations were included. Physicians had access to current guidelines and could use an ICD-triggered PDSS (which was introduced in January 2022 after a preliminary test in the preceding month) for 26 guideline-based conditions. This study analyzed the impact of implementing automated prescription systems and compared these systems to manual prescription processes in terms of consultation duration and guideline adherence.</p><p><strong>Results: </strong>This study included 10,485 telemedicine encounters involving 9644 patients, with 12,346 prescriptions issued by 290 physicians. Automated prescriptions were used in 5022 (40.67%) of the consultations following system integration. Before introducing decision support, 4497 (36.42%) prescriptions were issued, which increased to 7849 (63.57%) postimplementation. The physician's average consultation time decreased significantly to 9.5 (SD 5.5) minutes from 11.2 (SD 5.9) minutes after PDSS implementation (P<.001). Of the 12,346 prescriptions, 8683 (70.34%) were aligned with disease-specific international guidelines tailored for telemedicine encounters. Primary medication adherence in accordance with existing guidelines was significantly greater in the decision support group than in the manual group (n=4697, 93.53% vs n=1389, 49.14%; P<.001).</p><p><strong>Conclusions: </strong>Most of the physicians adopted the PDSS, and the results demonstrated the use of the ICD-code system in reducing consultation times and increasing guideline adherence. These systems appear to be valuable for enhancing the efficiency and quality of telemedicine consultations by supporting evidence-based clinical decision-making.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e56681"},"PeriodicalIF":3.1,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549578/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I S van Maurik, H J Doodeman, B W Veeger-Nuijens, R P M Möhringer, D R Sudiono, W Jongbloed, E van Soelen
Unlabelled: Before deploying a clinical prediction model (CPM) in clinical practice, its performance needs to be demonstrated in the population of intended use. This is also called "targeted validation." Many CPMs developed in tertiary settings may be most useful in secondary care, where the patient case mix is broad and practitioners need to triage patients efficiently. However, since structured or rich datasets of sufficient quality from secondary to assess the performance of a CPM are scarce, a validation gap exists that hampers the implementation of CPMs in secondary care settings. In this viewpoint, we highlight the importance of targeted validation and the use of CPMs in secondary care settings and discuss the potential and challenges of using electronic health record (EHR) data to overcome the existing validation gap. The introduction of software applications for text mining of EHRs allows the generation of structured "big" datasets, but the imperfection of EHRs as a research database requires careful validation of data quality. When using EHR data for the development and validation of CPMs, in addition to widely accepted checklists, we propose considering three additional practical steps: (1) involve a local EHR expert (clinician or nurse) in the data extraction process, (2) perform validity checks on the generated datasets, and (3) provide metadata on how variables were constructed from EHRs. These steps help to generate EHR datasets that are statistically powerful, of sufficient quality and replicable, and enable targeted development and validation of CPMs in secondary care settings. This approach can fill a major gap in prediction modeling research and appropriately advance CPMs into clinical practice.
{"title":"Targeted Development and Validation of Clinical Prediction Models in Secondary Care Settings: Opportunities and Challenges for Electronic Health Record Data.","authors":"I S van Maurik, H J Doodeman, B W Veeger-Nuijens, R P M Möhringer, D R Sudiono, W Jongbloed, E van Soelen","doi":"10.2196/57035","DOIUrl":"https://doi.org/10.2196/57035","url":null,"abstract":"<p><strong>Unlabelled: </strong>Before deploying a clinical prediction model (CPM) in clinical practice, its performance needs to be demonstrated in the population of intended use. This is also called \"targeted validation.\" Many CPMs developed in tertiary settings may be most useful in secondary care, where the patient case mix is broad and practitioners need to triage patients efficiently. However, since structured or rich datasets of sufficient quality from secondary to assess the performance of a CPM are scarce, a validation gap exists that hampers the implementation of CPMs in secondary care settings. In this viewpoint, we highlight the importance of targeted validation and the use of CPMs in secondary care settings and discuss the potential and challenges of using electronic health record (EHR) data to overcome the existing validation gap. The introduction of software applications for text mining of EHRs allows the generation of structured \"big\" datasets, but the imperfection of EHRs as a research database requires careful validation of data quality. When using EHR data for the development and validation of CPMs, in addition to widely accepted checklists, we propose considering three additional practical steps: (1) involve a local EHR expert (clinician or nurse) in the data extraction process, (2) perform validity checks on the generated datasets, and (3) provide metadata on how variables were constructed from EHRs. These steps help to generate EHR datasets that are statistically powerful, of sufficient quality and replicable, and enable targeted development and validation of CPMs in secondary care settings. This approach can fill a major gap in prediction modeling research and appropriately advance CPMs into clinical practice.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e57035"},"PeriodicalIF":3.1,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}