Cheng-Hao Hsu, Ching-Li Hsu, Chih-Hsiang Tsou, Kuo-Fang Hsu, Hung-Yu Yang
We used the free artificial intelligence (AI) tool Google NotebookLM, powered by the large language model Gemini 2.0, to construct a medical decision-making aid for diagnosing and managing airway diseases and subsequently evaluated its functionality and performance in a clinical workflow. After feeding this tool with relevant published clinical guidelines for these diseases, we evaluated the feasibility of the system regarding its behavior, ability, and potential, and we created simulated cases and used the system to solve associated medical problems. The test and simulation questions were designed by a pulmonologist, and the appropriateness (focusing on accuracy and completeness) of AI responses was judged by 3 pulmonologists independently. The system was then deployed in an emergency department setting, where it was tested by medical staff (n=20) to assess how it affected the process of clinical consultation. Test opinions were collected through a questionnaire. Most (56/84, 67%) of the specialists' ratings regarding AI responses were above average. The interrater reliability was moderate for accuracy (intraclass correlation coefficient=0.612; P<.001) and good on completeness (intraclass correlation coefficient=0.773; P<.001). When deployed in an emergency department (ED) setting, this system could respond with reasonable answers, enhance the literacy of personnel about these diseases. The potential to save the time spent in consultation did not reach statistical significance (Kolmogorov-Smirnov [K-S] D=0.223, P=.24) across all participants, but it indicated a favorable outcome when we analyzed only physicians' responses. We concluded that this system is customizable, cost efficient, and accessible to clinicians and allied health care professionals without any computer coding experience in treating airway diseases. It provides convincing guideline-based recommendations, increases the staff's medical literacy, and potentially saves physicians' time spent on consultation. This system warrants further evaluation in other medical disciplines and health care environments.
{"title":"Improving Clinical Decision-Making in Treating Airway Diseases With an Expert System Built Upon the Free AI Tool Google NotebookLM.","authors":"Cheng-Hao Hsu, Ching-Li Hsu, Chih-Hsiang Tsou, Kuo-Fang Hsu, Hung-Yu Yang","doi":"10.2196/78567","DOIUrl":"10.2196/78567","url":null,"abstract":"<p><p>We used the free artificial intelligence (AI) tool Google NotebookLM, powered by the large language model Gemini 2.0, to construct a medical decision-making aid for diagnosing and managing airway diseases and subsequently evaluated its functionality and performance in a clinical workflow. After feeding this tool with relevant published clinical guidelines for these diseases, we evaluated the feasibility of the system regarding its behavior, ability, and potential, and we created simulated cases and used the system to solve associated medical problems. The test and simulation questions were designed by a pulmonologist, and the appropriateness (focusing on accuracy and completeness) of AI responses was judged by 3 pulmonologists independently. The system was then deployed in an emergency department setting, where it was tested by medical staff (n=20) to assess how it affected the process of clinical consultation. Test opinions were collected through a questionnaire. Most (56/84, 67%) of the specialists' ratings regarding AI responses were above average. The interrater reliability was moderate for accuracy (intraclass correlation coefficient=0.612; P<.001) and good on completeness (intraclass correlation coefficient=0.773; P<.001). When deployed in an emergency department (ED) setting, this system could respond with reasonable answers, enhance the literacy of personnel about these diseases. The potential to save the time spent in consultation did not reach statistical significance (Kolmogorov-Smirnov [K-S] D=0.223, P=.24) across all participants, but it indicated a favorable outcome when we analyzed only physicians' responses. We concluded that this system is customizable, cost efficient, and accessible to clinicians and allied health care professionals without any computer coding experience in treating airway diseases. It provides convincing guideline-based recommendations, increases the staff's medical literacy, and potentially saves physicians' time spent on consultation. This system warrants further evaluation in other medical disciplines and health care environments.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e78567"},"PeriodicalIF":3.8,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145896990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chung Chun Lee, Grace Juyun Kim, Suhyun Kim, Jee Young Hong, Won Min Hwang, Jong-Yeup Kim, Kye Hwa Lee, Kwangsoo Kim, Mingyu Kang, Ju Han Kim, Suehyun Lee
Background: A rapidly aging population led to an increase in the number of patients with chronic diseases and polypharmacy. Although investigations on the appropriate number of drugs for older patients have been conducted, there is a shortage of studies on polypharmacy criteria in older inpatients from multiple institutions.
Objective: The aim of this study was to examine the patterns of polypharmacy and determine the criteria for the number of drugs defining polypharmacy in the geriatric inpatient population.
Methods: Electronic health records of 4 medical institutions for patients aged 65 years and older hospitalized between January 1, 2012, and December 31, 2020, were analyzed for the study. The maximum number of drugs prescribed was obtained for each patient and, along with a literature review, was used to determine the appropriate polypharmacy level for our population.
Results: We suggest a 4-level polypharmacy category system consisting of nonpolypharmacy, polypharmacy, major polypharmacy, and excessive polypharmacy based on a review of international guidelines and polypharmacy literature. Application of this system to our study population showed that the major polypharmacy category (use of 10-19 concurrent drugs) was an appropriate threshold for polypharmacy in hospitalized patients versus the traditional threshold of 5 or more concurrent drugs. The tendency of our study population to have a higher disease and drug count supports this threshold. Frequently prescribed therapeutic subgroups in this category were antibacterials for systemic use, anesthetics, and cardiac therapy.
Conclusions: This study proposes a polypharmacy categorization system for older inpatients, which differs from the common definition of the concomitant prescription of 5 or more drugs. The older population tends to have severe conditions including those requiring major surgeries; therefore, a drug count corresponding to the definition of major polypharmacy is appropriate.
{"title":"Multi-Institutional Drug Use Patterns in Hospitalized Older Patients: Retrospective Cross-Sectional Study.","authors":"Chung Chun Lee, Grace Juyun Kim, Suhyun Kim, Jee Young Hong, Won Min Hwang, Jong-Yeup Kim, Kye Hwa Lee, Kwangsoo Kim, Mingyu Kang, Ju Han Kim, Suehyun Lee","doi":"10.2196/78353","DOIUrl":"https://doi.org/10.2196/78353","url":null,"abstract":"<p><strong>Background: </strong>A rapidly aging population led to an increase in the number of patients with chronic diseases and polypharmacy. Although investigations on the appropriate number of drugs for older patients have been conducted, there is a shortage of studies on polypharmacy criteria in older inpatients from multiple institutions.</p><p><strong>Objective: </strong>The aim of this study was to examine the patterns of polypharmacy and determine the criteria for the number of drugs defining polypharmacy in the geriatric inpatient population.</p><p><strong>Methods: </strong>Electronic health records of 4 medical institutions for patients aged 65 years and older hospitalized between January 1, 2012, and December 31, 2020, were analyzed for the study. The maximum number of drugs prescribed was obtained for each patient and, along with a literature review, was used to determine the appropriate polypharmacy level for our population.</p><p><strong>Results: </strong>We suggest a 4-level polypharmacy category system consisting of nonpolypharmacy, polypharmacy, major polypharmacy, and excessive polypharmacy based on a review of international guidelines and polypharmacy literature. Application of this system to our study population showed that the major polypharmacy category (use of 10-19 concurrent drugs) was an appropriate threshold for polypharmacy in hospitalized patients versus the traditional threshold of 5 or more concurrent drugs. The tendency of our study population to have a higher disease and drug count supports this threshold. Frequently prescribed therapeutic subgroups in this category were antibacterials for systemic use, anesthetics, and cardiac therapy.</p><p><strong>Conclusions: </strong>This study proposes a polypharmacy categorization system for older inpatients, which differs from the common definition of the concomitant prescription of 5 or more drugs. The older population tends to have severe conditions including those requiring major surgeries; therefore, a drug count corresponding to the definition of major polypharmacy is appropriate.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e78353"},"PeriodicalIF":3.8,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>Emergency triage accuracy is critical but varies with clinician experience, cognitive load, and case complexity. Mis-triage can delay care for high-risk patients and exacerbate crowding through unnecessary prioritization. Large language models (LLMs) show promise as triage decision-support tools but are vulnerable to hallucinations. Retrieval-augmented generation (RAG) may improve reliability by grounding LLM reasoning in authoritative guidelines and real clinical cases.</p><p><strong>Objective: </strong>This study aimed to evaluate whether a dual-source RAG system that integrates guideline- and case-based evidence improves emergency triage performance versus a baseline LLM and to assess how closely its urgency assignments align with expert consensus and outcome-defined clinical severity.</p><p><strong>Methods: </strong>We developed a dual-source RAG system-Multi-Evidence Clinical Reasoning RAG (MECR-RAG)-that retrieves sections from the Hong Kong Accident and Emergency Triage Guidelines (HKAETG) and cases from a database of 3000 emergency department triage encounters. In a retrospective single‑center evaluation, MECR‑RAG and a prompt‑only baseline LLM (both DeepSeek‑V3) were tested on 236 routine triage encounters to predict 5‑level triage categories. Expert consensus reference labels were assigned by blinded senior triage nurses. Primary outcomes were quadratic weighted kappa (QWK) and accuracy versus consensus labels. Secondary analyses examined performance within 3 operationally and clinically relevant triage bands-immediate (categories 1 and 2), urgent (category 3), and nonurgent (categories 4 and 5). In 226 encounters with follow‑up, we also assigned outcome‑based severity tiers (R1-R3) using a published 3‑level urgency reference standard and defined a disposition‑safety composite.</p><p><strong>Results: </strong>MECR‑RAG achieved a mean QWK of 0.902 (SD 0.0021; 95% CI 0.901-0.904) and accuracy of 0.802 (SD 0.0082; 95% CI 0.795-0.808), outperforming the baseline LLM (QWK 0.801, SD 0.004; accuracy 0.542, SD 0.0073; both P<.001) and demonstrating expert‑comparable agreement with triage nurses (interrater QWK 0.887). In 3‑group analysis, MECR‑RAG reduced overtriage from 68/236 (28.8%) with the baseline LLM to 30/236 (12.7%) and maintained low undertriage from 4/236 (1.7%) to 3/236 (1.3%), with the largest gains in the diagnostically ambiguous yet operationally important categories 3 and 4. In a secondary outcome‑based analysis defining high‑severity courses as R1+R2, MECR‑RAG detected high-risk patients more sensitively than initial nurse triage (124/130, 95.4% vs 117/130, 90.0%; P=.02) while maintaining nurse‑level specificity. MECR‑RAG yielded the lowest weighted harm index (13.7, 19.5, and 20.3 per 100 patients for MECR‑RAG, nurses, and the baseline LLM, respectively).</p><p><strong>Conclusions: </strong>A dual‑source RAG triage system that combines guideline‑based rules with case‑based reasoning achieved exp
{"title":"Multi-Evidence Clinical Reasoning With Retrieval-Augmented Generation for Emergency Triage: Retrospective Evaluation Study.","authors":"Hang Sheung Wong, Tsz Kwan Wong","doi":"10.2196/82026","DOIUrl":"https://doi.org/10.2196/82026","url":null,"abstract":"<p><strong>Background: </strong>Emergency triage accuracy is critical but varies with clinician experience, cognitive load, and case complexity. Mis-triage can delay care for high-risk patients and exacerbate crowding through unnecessary prioritization. Large language models (LLMs) show promise as triage decision-support tools but are vulnerable to hallucinations. Retrieval-augmented generation (RAG) may improve reliability by grounding LLM reasoning in authoritative guidelines and real clinical cases.</p><p><strong>Objective: </strong>This study aimed to evaluate whether a dual-source RAG system that integrates guideline- and case-based evidence improves emergency triage performance versus a baseline LLM and to assess how closely its urgency assignments align with expert consensus and outcome-defined clinical severity.</p><p><strong>Methods: </strong>We developed a dual-source RAG system-Multi-Evidence Clinical Reasoning RAG (MECR-RAG)-that retrieves sections from the Hong Kong Accident and Emergency Triage Guidelines (HKAETG) and cases from a database of 3000 emergency department triage encounters. In a retrospective single‑center evaluation, MECR‑RAG and a prompt‑only baseline LLM (both DeepSeek‑V3) were tested on 236 routine triage encounters to predict 5‑level triage categories. Expert consensus reference labels were assigned by blinded senior triage nurses. Primary outcomes were quadratic weighted kappa (QWK) and accuracy versus consensus labels. Secondary analyses examined performance within 3 operationally and clinically relevant triage bands-immediate (categories 1 and 2), urgent (category 3), and nonurgent (categories 4 and 5). In 226 encounters with follow‑up, we also assigned outcome‑based severity tiers (R1-R3) using a published 3‑level urgency reference standard and defined a disposition‑safety composite.</p><p><strong>Results: </strong>MECR‑RAG achieved a mean QWK of 0.902 (SD 0.0021; 95% CI 0.901-0.904) and accuracy of 0.802 (SD 0.0082; 95% CI 0.795-0.808), outperforming the baseline LLM (QWK 0.801, SD 0.004; accuracy 0.542, SD 0.0073; both P<.001) and demonstrating expert‑comparable agreement with triage nurses (interrater QWK 0.887). In 3‑group analysis, MECR‑RAG reduced overtriage from 68/236 (28.8%) with the baseline LLM to 30/236 (12.7%) and maintained low undertriage from 4/236 (1.7%) to 3/236 (1.3%), with the largest gains in the diagnostically ambiguous yet operationally important categories 3 and 4. In a secondary outcome‑based analysis defining high‑severity courses as R1+R2, MECR‑RAG detected high-risk patients more sensitively than initial nurse triage (124/130, 95.4% vs 117/130, 90.0%; P=.02) while maintaining nurse‑level specificity. MECR‑RAG yielded the lowest weighted harm index (13.7, 19.5, and 20.3 per 100 patients for MECR‑RAG, nurses, and the baseline LLM, respectively).</p><p><strong>Conclusions: </strong>A dual‑source RAG triage system that combines guideline‑based rules with case‑based reasoning achieved exp","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e82026"},"PeriodicalIF":3.8,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Early life health risks can shape long-term morbidity trajectories, yet prevailing pediatric risk assessment paradigms are often fragmented and insufficiently capable of integrating heterogeneous data streams into actionable, individualized profiles.
Objective: This study aimed to design, implement, and validate an artificial intelligence-driven framework that fuses multimodal pediatric data and leverages advanced natural language processing and ensemble learning to improve early, accurate stratification of key pediatric health risks.
Methods: A retrospective dataset of over 40,000 pediatric participants aged 2-8 years was used to train and evaluate the framework. Data were split into training, validation, and test sets (70%, 15%, and 15%, respectively) with a temporally mindful partitioning strategy to approximate prospective evaluation. Baseline comparators included traditional statistical and machine learning models, and the statistical significance of area under the receiver operating characteristic curve (AUC-ROC) differences was assessed using the DeLong test.
Results: The proposed Bidirectional Encoder Representations From Transformers-based model achieved an AUC-ROC of 0.85 (95% CI 0.82-0.88), sensitivity of 0.78, specificity of 0.80, and F1-score of 0.75 on the test set, outperforming multiple baseline models. In an additional manual comparison evaluation, automated and expert assessments aligned with 78% accuracy (78/100), and most discrepancies arose in "equivalent" cases.
Conclusions: This study provides a validated, artificial intelligence-driven, multimodal pediatric health risk stratification framework that translates heterogeneous child health data into clinically actionable risk profiles, demonstrating strong discriminative performance and meaningful agreement with expert assessment. The framework supports proactive, individualized pediatric care and offers a scalable foundation for further validation across broader populations and longitudinal follow-up.
背景:生命早期健康风险可以形成长期发病率轨迹,但目前流行的儿科风险评估范式往往是碎片化的,无法将异质数据流整合成可操作的个性化概况。目的:本研究旨在设计、实施并验证一个人工智能驱动的框架,该框架融合了多模态儿科数据,并利用先进的自然语言处理和集成学习来提高关键儿科健康风险的早期、准确分层。方法:使用超过40,000名2-8岁儿童参与者的回顾性数据集来训练和评估该框架。数据被分成训练集、验证集和测试集(分别为70%、15%和15%),采用暂时注意分区策略来近似预期评估。基线比较包括传统统计模型和机器学习模型,采用DeLong检验评估受试者工作特征曲线下面积(AUC-ROC)差异的统计显著性。结果:提出的基于transformer的双向编码器表示模型的AUC-ROC为0.85 (95% CI 0.82-0.88),灵敏度为0.78,特异性为0.80,测试集的f1评分为0.75,优于多个基线模型。在额外的人工比较评估中,自动化和专家评估的准确率为78%(78/100),大多数差异出现在“等效”情况下。结论:本研究提供了一个经过验证的、人工智能驱动的、多模式的儿科健康风险分层框架,该框架将异质儿童健康数据转化为临床可操作的风险概况,显示出很强的鉴别性能,并与专家评估有意义的一致性。该框架支持主动、个性化的儿科护理,并为在更广泛的人群和纵向随访中进一步验证提供了可扩展的基础。
{"title":"Comprehensive Pediatric Health Risk Stratification Using an AI-Driven Framework in Children Aged 2 to 8 Years: Design and Validation Study.","authors":"Zhihe Mao, Jundan Chen","doi":"10.2196/80163","DOIUrl":"10.2196/80163","url":null,"abstract":"<p><strong>Background: </strong>Early life health risks can shape long-term morbidity trajectories, yet prevailing pediatric risk assessment paradigms are often fragmented and insufficiently capable of integrating heterogeneous data streams into actionable, individualized profiles.</p><p><strong>Objective: </strong>This study aimed to design, implement, and validate an artificial intelligence-driven framework that fuses multimodal pediatric data and leverages advanced natural language processing and ensemble learning to improve early, accurate stratification of key pediatric health risks.</p><p><strong>Methods: </strong>A retrospective dataset of over 40,000 pediatric participants aged 2-8 years was used to train and evaluate the framework. Data were split into training, validation, and test sets (70%, 15%, and 15%, respectively) with a temporally mindful partitioning strategy to approximate prospective evaluation. Baseline comparators included traditional statistical and machine learning models, and the statistical significance of area under the receiver operating characteristic curve (AUC-ROC) differences was assessed using the DeLong test.</p><p><strong>Results: </strong>The proposed Bidirectional Encoder Representations From Transformers-based model achieved an AUC-ROC of 0.85 (95% CI 0.82-0.88), sensitivity of 0.78, specificity of 0.80, and F1-score of 0.75 on the test set, outperforming multiple baseline models. In an additional manual comparison evaluation, automated and expert assessments aligned with 78% accuracy (78/100), and most discrepancies arose in \"equivalent\" cases.</p><p><strong>Conclusions: </strong>This study provides a validated, artificial intelligence-driven, multimodal pediatric health risk stratification framework that translates heterogeneous child health data into clinically actionable risk profiles, demonstrating strong discriminative performance and meaningful agreement with expert assessment. The framework supports proactive, individualized pediatric care and offers a scalable foundation for further validation across broader populations and longitudinal follow-up.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e80163"},"PeriodicalIF":3.8,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ngoc-Anh Nguyen, Grace Lee, Brendan Holderread, Terrie Holman, Sarah Pletcher, Roberta Schwartz
<p><strong>Background: </strong>Frequent vital sign (VS) monitoring is central to inpatient safety but is traditionally performed manually every 4 hours, a century-old practice that can miss early clinical deterioration, disrupt patient sleep, and impose a heavy documentation burden on nursing staff. Continuous VS monitoring (CVSM) using wearable remote patient monitoring devices enables near real-time, high-frequency VS measurement while reducing manual workload and preserving patient rest.</p><p><strong>Objective: </strong>This implementation report describes the large-scale implementation of CVSM across an 8-hospital health system. The initiative aimed to (1) enhance earlier detection of patient health deterioration through continuous, algorithm-driven monitoring; (2) improve nursing workflow efficiency by reducing reliance on manual VS checks; and (3) minimize nighttime disruptions to support patient rest and recovery.</p><p><strong>Methods: </strong>The program was designed for system-wide scalability and executed from 2022 to 2024 using a 4-phase framework: strategic program design, program planning, go-live preparation, and implementation and optimization. A Food and Drug Administration-cleared wearable device (BioButton) continuously measured heart rate, respiratory rate, and skin temperature, with data integrated into Epic and monitored 24×7 through a centralized virtual operations center. Rollout followed a staggered playbook across approximately 2700 adult non-intensive care unit beds and was supported by leadership engagement, supply chain readiness, staff training, and phased superuser-led adoption.</p><p><strong>Implementation (results): </strong>All 8 hospitals achieved full deployment between April 2023 and February 2024, with more than 95% device use rates and 100% nursing staff training completion. A standardized escalation workflow filtered approximately 50% of the alerts at the virtual operations center review stage, substantially reducing frontline alert burden. Operational refinements included revised heart rate and respiratory rate alert thresholds and the removal of temperature as a single alert trigger. Several units extended overnight manual VS intervals from every 4 hours to every 6 to 8 hours, with staff estimating approximately 4 hours saved per nursing shift. Patient care assistants redirected time toward patient mobility and personal care needs, while staff reported growing confidence in device performance over time.</p><p><strong>Conclusions: </strong>This initiative represents the first system-wide deployment of CVSM across a diverse, multihospital health system. Success was enabled by early strategic alignment, phased rollout, robust IT and monitoring infrastructure, and iterative optimization. The program demonstrates the feasibility of embedding CVSM into routine inpatient care to improve efficiency and patient experience. Transferable strategies, including phased rollouts, centralized monitoring, and structure
{"title":"Scaling Wireless Continuous Vital Sign Monitoring Across an 8-Hospital Health System: Digital Health Implementation Report.","authors":"Ngoc-Anh Nguyen, Grace Lee, Brendan Holderread, Terrie Holman, Sarah Pletcher, Roberta Schwartz","doi":"10.2196/78216","DOIUrl":"10.2196/78216","url":null,"abstract":"<p><strong>Background: </strong>Frequent vital sign (VS) monitoring is central to inpatient safety but is traditionally performed manually every 4 hours, a century-old practice that can miss early clinical deterioration, disrupt patient sleep, and impose a heavy documentation burden on nursing staff. Continuous VS monitoring (CVSM) using wearable remote patient monitoring devices enables near real-time, high-frequency VS measurement while reducing manual workload and preserving patient rest.</p><p><strong>Objective: </strong>This implementation report describes the large-scale implementation of CVSM across an 8-hospital health system. The initiative aimed to (1) enhance earlier detection of patient health deterioration through continuous, algorithm-driven monitoring; (2) improve nursing workflow efficiency by reducing reliance on manual VS checks; and (3) minimize nighttime disruptions to support patient rest and recovery.</p><p><strong>Methods: </strong>The program was designed for system-wide scalability and executed from 2022 to 2024 using a 4-phase framework: strategic program design, program planning, go-live preparation, and implementation and optimization. A Food and Drug Administration-cleared wearable device (BioButton) continuously measured heart rate, respiratory rate, and skin temperature, with data integrated into Epic and monitored 24×7 through a centralized virtual operations center. Rollout followed a staggered playbook across approximately 2700 adult non-intensive care unit beds and was supported by leadership engagement, supply chain readiness, staff training, and phased superuser-led adoption.</p><p><strong>Implementation (results): </strong>All 8 hospitals achieved full deployment between April 2023 and February 2024, with more than 95% device use rates and 100% nursing staff training completion. A standardized escalation workflow filtered approximately 50% of the alerts at the virtual operations center review stage, substantially reducing frontline alert burden. Operational refinements included revised heart rate and respiratory rate alert thresholds and the removal of temperature as a single alert trigger. Several units extended overnight manual VS intervals from every 4 hours to every 6 to 8 hours, with staff estimating approximately 4 hours saved per nursing shift. Patient care assistants redirected time toward patient mobility and personal care needs, while staff reported growing confidence in device performance over time.</p><p><strong>Conclusions: </strong>This initiative represents the first system-wide deployment of CVSM across a diverse, multihospital health system. Success was enabled by early strategic alignment, phased rollout, robust IT and monitoring infrastructure, and iterative optimization. The program demonstrates the feasibility of embedding CVSM into routine inpatient care to improve efficiency and patient experience. Transferable strategies, including phased rollouts, centralized monitoring, and structure","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e78216"},"PeriodicalIF":3.8,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<p><strong>Background: </strong>Quantitative magnetic resonance imaging (MRI) is an advanced technique that can map the physical properties (T1, T2, and proton density [PD]) of different tissues, offering crucial insights for disease diagnosis. Nonetheless, the practical application of this technology is indeed constrained by several factors, with the most notable being the protracted scanning duration.</p><p><strong>Objective: </strong>This study aimed to explore whether deep learning (DL)-based superresolution reconstruction of ultrafast whole brain synthetic MRI can obtain quantitative T1/T2/PD maps that are closely approximated to those from routine clinical scans, while substantially shortening scan time and preserving diagnostic image quality.</p><p><strong>Methods: </strong>A total of 151 healthy adults and 7 individuals with different pathologies were prospectively enrolled. Each individual was examined twice on a 3.0T scanner using routine and fast synthetic MRI protocols. The routine scans (acquisition matrix: 320×256) were interpolated to 512 by 512 for clinical display and served as reference images. The fast scans (acquisition matrix: 192×128) were preprocessed to 256 by 256 and used as inputs to a superresolution generative adversarial network (SRGAN), which reconstructed them to the same 512 by 512 interpolated resolution as the reference. For each quantitative chart, 120 (75.95%) healthy individuals' images were used for training, and 38 (24.05%) individuals' images (healthy individuals: n=31, 19.62%; patients: n=7, 4.43%) were used for testing. Agreement was assessed with a paired t test, two 1-sided tests, Bland-Altman analysis, and coefficients of variation.</p><p><strong>Results: </strong>DL reconstructed and reference T1/T2/PD values were strongly correlated (T1: R²=0.98; T2: R²=0.97; and PD: R²=0.99). The slopes of the linear regression were near 1.0 both for T1 (0.9418) and PD (0.9946), whereas T2 values were moderate, as the slope of the linear regression was 0.8057. Additionally, the average biases of T1, T2, and PD values were small (0.93%, -0.85%, and 0.31%, respectively). The intra- and intergroup coefficient of variation for most of the brain regions stayed below 5%, especially for PD values, and after DL reconstruction, it still has quantitative accuracy for lesions. Quantitative and qualitative analyses of image quality also indicate that SRGAN markedly suppressed noise and artifacts in fast acquisitions, restoring structural fidelity (structural similarity image measure) and signal fidelity (peak signal-to-noise ratio) close to the level of routine scans while substantially improving perceptual naturalness over fast scans (as measured by the naturalness image quality evaluator), although not yet matching that of routine imaging.</p><p><strong>Conclusions: </strong>SRGAN superresolution applied to ultrafast synthetic MRI yields whole brain T1, T2, and PD maps that show strong correlation with routine synthetic MRI w
{"title":"Two-Minute Deep Learning-Powered Brain Quantitative Mapping: Accelerating Clinical Imaging With Synthetic Magnetic Resonance Imaging.","authors":"Yawen Liu, Hongxia Yin, Zuofeng Zheng, Wenjuan Liu, Tingting Zhang, Linkun Cai, Haijun Niu, Han Lv, Zhenghan Yang, Zhenchang Wang, Pengling Ren","doi":"10.2196/79389","DOIUrl":"10.2196/79389","url":null,"abstract":"<p><strong>Background: </strong>Quantitative magnetic resonance imaging (MRI) is an advanced technique that can map the physical properties (T1, T2, and proton density [PD]) of different tissues, offering crucial insights for disease diagnosis. Nonetheless, the practical application of this technology is indeed constrained by several factors, with the most notable being the protracted scanning duration.</p><p><strong>Objective: </strong>This study aimed to explore whether deep learning (DL)-based superresolution reconstruction of ultrafast whole brain synthetic MRI can obtain quantitative T1/T2/PD maps that are closely approximated to those from routine clinical scans, while substantially shortening scan time and preserving diagnostic image quality.</p><p><strong>Methods: </strong>A total of 151 healthy adults and 7 individuals with different pathologies were prospectively enrolled. Each individual was examined twice on a 3.0T scanner using routine and fast synthetic MRI protocols. The routine scans (acquisition matrix: 320×256) were interpolated to 512 by 512 for clinical display and served as reference images. The fast scans (acquisition matrix: 192×128) were preprocessed to 256 by 256 and used as inputs to a superresolution generative adversarial network (SRGAN), which reconstructed them to the same 512 by 512 interpolated resolution as the reference. For each quantitative chart, 120 (75.95%) healthy individuals' images were used for training, and 38 (24.05%) individuals' images (healthy individuals: n=31, 19.62%; patients: n=7, 4.43%) were used for testing. Agreement was assessed with a paired t test, two 1-sided tests, Bland-Altman analysis, and coefficients of variation.</p><p><strong>Results: </strong>DL reconstructed and reference T1/T2/PD values were strongly correlated (T1: R²=0.98; T2: R²=0.97; and PD: R²=0.99). The slopes of the linear regression were near 1.0 both for T1 (0.9418) and PD (0.9946), whereas T2 values were moderate, as the slope of the linear regression was 0.8057. Additionally, the average biases of T1, T2, and PD values were small (0.93%, -0.85%, and 0.31%, respectively). The intra- and intergroup coefficient of variation for most of the brain regions stayed below 5%, especially for PD values, and after DL reconstruction, it still has quantitative accuracy for lesions. Quantitative and qualitative analyses of image quality also indicate that SRGAN markedly suppressed noise and artifacts in fast acquisitions, restoring structural fidelity (structural similarity image measure) and signal fidelity (peak signal-to-noise ratio) close to the level of routine scans while substantially improving perceptual naturalness over fast scans (as measured by the naturalness image quality evaluator), although not yet matching that of routine imaging.</p><p><strong>Conclusions: </strong>SRGAN superresolution applied to ultrafast synthetic MRI yields whole brain T1, T2, and PD maps that show strong correlation with routine synthetic MRI w","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e79389"},"PeriodicalIF":3.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12833913/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jia-Qian Yao, Wen-Wen Zhou, Zhi-Fei Chai, Fei Ren, Tong-Yi Huang, Tian-Tian Zhen, Hui-Juan Shi, Xiao-Yan Xie, Ze Zhao, Ming Xu
Background: Given the highly heterogeneous biology of breast cancer, a more effective noninvasive diagnostic tool that unravels microscopic histopathology patterns is urgently needed.
Objective: This study aims to identify cancerous regions in ultrasound images of breast cancer via convolutional neural network based on registered grayscale ultrasound images and readily accessible biopsy whole slide images (WSIs).
Methods: This single-center study prospectively included participants undergoing ultrasound-guided core needle biopsy procedures for Breast Imaging Reporting and Data System category 4 or 5 breast lesions for whom breast cancer was pathologically confirmed from July 2022 to February 2023 consecutively. The basic information, ultrasound image data, biopsy tissue specimens, and corresponding WSIs were collected. After core needle biopsy procedures, the stained breast tissue specimens were sliced and coregistered with an ultrasound image of a needle tract. Convolutional neural network models for identifying breast cancer cells in ultrasound images were developed using FCN-101 and DeepLabV3 networks. The image-level predictive performance was evaluated and compared quantitatively by pixel accuracy, Dice similarity coefficient, and recall. Pixel-level classification was illustrated through confusion matrices. The cancerous region in the testing dataset was further visualized in ultrasound images. Potential clinical applications were qualitatively assessed by comparing the automatic segmentation results and the actual pathological tissue distributions.
Results: A total of 105 participants with 386 ultrasound images of breast cancer were included, with 270 (70%), 78 (20.2%), and 38 (9.8%) images in the training, validation, and test datasets, respectively. Both models performed well in predicting the cancerous regions in the biopsy area, whereas the FCN-101 model was superior to the DeepLabV3 model in terms of pixel accuracy (86.91% vs 69.55%; P=.002) and Dice similarity coefficient (77.47% vs 69.90%; P<.001). The two models yielded recall values of 54.64% and 58.46%, with no significant difference between them (P=.80). Furthermore, the FCN-101 model had an advantage in predicting cancerous regions, while the DeepLabV3 model achieved more accurate predictive pixels in normal tissue (both P<.05). Visualization of cancerous regions on grayscale ultrasound images demonstrated high consistency with those identified on WSIs.
Conclusions: The technique for spatial registration of breast WSIs and ultrasound images of a needle tract was established. Breast cancer regions were accurately identified and localized on a pixel level in high-frequency ultrasound images via an advanced convolutional neural network with histopathologic WSI as the reference standard.
背景:鉴于乳腺癌的高度异质性生物学,迫切需要一种更有效的非侵入性诊断工具来揭示显微镜下的组织病理学模式。目的:本研究旨在基于注册的灰度超声图像和易获取的活检全切片图像(wsi),利用卷积神经网络识别乳腺癌超声图像中的癌区。方法:该单中心研究前瞻性纳入了2022年7月至2023年2月期间连续接受超声引导核心针活检的乳腺癌影像学报告和数据系统4或5类乳腺病变患者。收集基本信息、超声图像资料、活检组织标本及相应wsi。在核心针活检程序后,染色的乳腺组织标本被切片并与针束的超声图像共同登记。利用FCN-101和DeepLabV3网络建立超声图像中乳腺癌细胞识别的卷积神经网络模型。通过像素精度、Dice相似系数和召回率对图像级预测性能进行了定量评估和比较。通过混淆矩阵说明像素级分类。测试数据集中的癌区在超声图像中进一步可视化。将自动分割结果与实际病理组织分布进行比较,定性评价其临床应用潜力。结果:共纳入105名参与者,386张乳腺癌超声图像,其中训练、验证和测试数据集分别为270张(70%)、78张(20.2%)和38张(9.8%)。两种模型均能较好地预测活检区域的癌变区域,而FCN-101模型在像素精度(86.91% vs 69.55%; P= 0.002)和Dice相似系数(77.47% vs 69.90%)方面优于DeepLabV3模型。结论:建立了乳腺wsi与针道超声图像的空间配准技术。通过先进的卷积神经网络,以组织病理学WSI为参考标准,在高频超声图像中准确识别和定位乳腺癌区域。
{"title":"Identification and Localization of Breast Tumor Components via a Convolutional Neural Network Based on High-Frequency Ultrasound Combined With Histopathologic Registration: Prospective Study.","authors":"Jia-Qian Yao, Wen-Wen Zhou, Zhi-Fei Chai, Fei Ren, Tong-Yi Huang, Tian-Tian Zhen, Hui-Juan Shi, Xiao-Yan Xie, Ze Zhao, Ming Xu","doi":"10.2196/81181","DOIUrl":"10.2196/81181","url":null,"abstract":"<p><strong>Background: </strong>Given the highly heterogeneous biology of breast cancer, a more effective noninvasive diagnostic tool that unravels microscopic histopathology patterns is urgently needed.</p><p><strong>Objective: </strong>This study aims to identify cancerous regions in ultrasound images of breast cancer via convolutional neural network based on registered grayscale ultrasound images and readily accessible biopsy whole slide images (WSIs).</p><p><strong>Methods: </strong>This single-center study prospectively included participants undergoing ultrasound-guided core needle biopsy procedures for Breast Imaging Reporting and Data System category 4 or 5 breast lesions for whom breast cancer was pathologically confirmed from July 2022 to February 2023 consecutively. The basic information, ultrasound image data, biopsy tissue specimens, and corresponding WSIs were collected. After core needle biopsy procedures, the stained breast tissue specimens were sliced and coregistered with an ultrasound image of a needle tract. Convolutional neural network models for identifying breast cancer cells in ultrasound images were developed using FCN-101 and DeepLabV3 networks. The image-level predictive performance was evaluated and compared quantitatively by pixel accuracy, Dice similarity coefficient, and recall. Pixel-level classification was illustrated through confusion matrices. The cancerous region in the testing dataset was further visualized in ultrasound images. Potential clinical applications were qualitatively assessed by comparing the automatic segmentation results and the actual pathological tissue distributions.</p><p><strong>Results: </strong>A total of 105 participants with 386 ultrasound images of breast cancer were included, with 270 (70%), 78 (20.2%), and 38 (9.8%) images in the training, validation, and test datasets, respectively. Both models performed well in predicting the cancerous regions in the biopsy area, whereas the FCN-101 model was superior to the DeepLabV3 model in terms of pixel accuracy (86.91% vs 69.55%; P=.002) and Dice similarity coefficient (77.47% vs 69.90%; P<.001). The two models yielded recall values of 54.64% and 58.46%, with no significant difference between them (P=.80). Furthermore, the FCN-101 model had an advantage in predicting cancerous regions, while the DeepLabV3 model achieved more accurate predictive pixels in normal tissue (both P<.05). Visualization of cancerous regions on grayscale ultrasound images demonstrated high consistency with those identified on WSIs.</p><p><strong>Conclusions: </strong>The technique for spatial registration of breast WSIs and ultrasound images of a needle tract was established. Breast cancer regions were accurately identified and localized on a pixel level in high-frequency ultrasound images via an advanced convolutional neural network with histopathologic WSI as the reference standard.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e81181"},"PeriodicalIF":3.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12829891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chihung Lin, Alfred P Yoon, Chien-Wei Wang, Tung Chao, Kevin C Chung, Chang-Fu Kuo
Background: Deep learning models have shown strong potential for automated fracture detection in medical images. However, their robustness under varying image quality remains uncertain, particularly for small and subtle fractures, such as scaphoid fractures. Understanding how different types of image perturbations affect model performance is crucial for ensuring reliable deployment in clinical practice.
Objective: This study aimed to evaluate the robustness of a deep learning model trained to detect scaphoid fractures in radiographs when exposed to various image perturbations. We sought to identify which perturbations most strongly impact performance and to explore strategies to mitigate performance degradation.
Methods: Radiographic datasets were systematically modified by applying Gaussian noise, blurring, JPEG compression, contrast-limited adaptive histogram equalization, resizing, and geometric offsets. Model accuracy was evaluated across different perturbation types and levels. Image quality was quantified using peak signal-to-noise ratio and structural similarity index measure to assess correlations between degradation and model performance.
Results: Model accuracy declined with increasing perturbation severity, but the extent varied across perturbation types. Gaussian blur caused the most substantial performance drop, whereas contrast-limited adaptive histogram equalization increased the false-negative rate. The model demonstrated higher resilience to color perturbations than to grayscale degradations. A strong linear correlation was found between peak signal-to-noise ratio-structural similarity index measure and accuracy, suggesting that better image quality led to improved detection. Geometric offsets and pixel value rescaling had minimal influence, whereas resolution was the dominant factor affecting performance.
Conclusions: The findings indicate that image quality, especially resolution and blurring, substantially influences the robustness of deep learning-based fracture detection models. Ensuring adequate image resolution and quality control can enhance diagnostic reliability. These results provide valuable insights for designing more accurate and resilient medical imaging models under real-world variability.
{"title":"Effects of Image Degradation on Deep Neural Network Classification of Scaphoid Fracture Radiographs: Comparison Study of Different Noise Types.","authors":"Chihung Lin, Alfred P Yoon, Chien-Wei Wang, Tung Chao, Kevin C Chung, Chang-Fu Kuo","doi":"10.2196/65596","DOIUrl":"10.2196/65596","url":null,"abstract":"<p><strong>Background: </strong>Deep learning models have shown strong potential for automated fracture detection in medical images. However, their robustness under varying image quality remains uncertain, particularly for small and subtle fractures, such as scaphoid fractures. Understanding how different types of image perturbations affect model performance is crucial for ensuring reliable deployment in clinical practice.</p><p><strong>Objective: </strong>This study aimed to evaluate the robustness of a deep learning model trained to detect scaphoid fractures in radiographs when exposed to various image perturbations. We sought to identify which perturbations most strongly impact performance and to explore strategies to mitigate performance degradation.</p><p><strong>Methods: </strong>Radiographic datasets were systematically modified by applying Gaussian noise, blurring, JPEG compression, contrast-limited adaptive histogram equalization, resizing, and geometric offsets. Model accuracy was evaluated across different perturbation types and levels. Image quality was quantified using peak signal-to-noise ratio and structural similarity index measure to assess correlations between degradation and model performance.</p><p><strong>Results: </strong>Model accuracy declined with increasing perturbation severity, but the extent varied across perturbation types. Gaussian blur caused the most substantial performance drop, whereas contrast-limited adaptive histogram equalization increased the false-negative rate. The model demonstrated higher resilience to color perturbations than to grayscale degradations. A strong linear correlation was found between peak signal-to-noise ratio-structural similarity index measure and accuracy, suggesting that better image quality led to improved detection. Geometric offsets and pixel value rescaling had minimal influence, whereas resolution was the dominant factor affecting performance.</p><p><strong>Conclusions: </strong>The findings indicate that image quality, especially resolution and blurring, substantially influences the robustness of deep learning-based fracture detection models. Ensuring adequate image resolution and quality control can enhance diagnostic reliability. These results provide valuable insights for designing more accurate and resilient medical imaging models under real-world variability.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e65596"},"PeriodicalIF":3.8,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826633/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Yaseliani, Je-Won Hong, Jiang Bian, Larisa Cavallari, Julio D Duarte, Danielle Nelson, Wei-Hsuan Lo-Ciganic, Khoa Anh Nguyen, Md Mahmudul Hasan
<p><strong>Background: </strong>Opioids are a widely prescribed class of medication for pain management. However, they have variable efficacy and adverse effects among patients, due to the complex interplay between biological and clinical factors. Pharmacogenetic testing can be used to match patients' genetic profiles to individualize opioid therapy, improving pain relief and reducing the risk of adverse effects. Despite its potential, the pharmacogenetic testing uptake (use of pharmacogenetic testing) remains low due to a range of barriers at the patient, health care provider, infrastructure, and financial levels. Since testing typically involves a shared decision between the provider and patient, predicting the likelihood of a patient undergoing pharmacogenetic testing and understanding the factors influencing that decision can help optimize resource use and improve outcomes in pain management.</p><p><strong>Objective: </strong>This study aimed to develop machine learning (ML) models, identifying patients' likelihood of pharmacogenetic uptake based on their demographics, clinical variables, medication use, and social determinants of health.</p><p><strong>Methods: </strong>We used electronic health record data from a single center health care system to identify patients prescribed opioids. We extracted patients' demographics, clinical variables, medication use, and social determinants of health, and developed and validated ML models, including a neural network, logistic regression, random forest, extreme gradient boosting (XGB), naïve Bayes, and support vector machines for pharmacogenetic testing uptake prediction based on procedure codes. We performed 5-fold cross-validation and created an ensemble probability-based classifier using the best-performing ML models for pharmacogenetic testing uptake prediction. Various performance metrics, uptake stratification analysis, and feature importance analysis were used to evaluate the performance of the models.</p><p><strong>Results: </strong>The ensemble model using XGB and support vector machine-radial basis function classifiers had the highest C-statistics at 79.61%, followed by XGB (78.94%), and neural network (78.05%). While XGB was the best-performing model, the ensemble model achieved a high accuracy (32,699/48,528, 67.38%), recall (537/702, 76.50%), specificity (32,162/47,826, 67.25%), and negative predictive value (32,162/32,327, 99.49%). The uptake stratification analysis using the ensemble model indicated that it can effectively distinguish across uptake probability deciles, where those in the higher strata are more likely to undergo pharmacogenetic testing in the real world (320/4853, 6.59% in the highest decile compared to 6/4853, 0.12% in the lowest). Furthermore, Shapley Additive Explanations value analysis using the XGB model indicated age, hypertension, and household income as the most influential factors for pharmacogenetic testing uptake prediction.</p><p><strong>Conclusions: </strong>
{"title":"Machine Learning Prediction of Pharmacogenetic Testing Uptake Among Opioid-Prescribed Patients Using Electronic Health Records: Retrospective Cohort Study.","authors":"Mohammad Yaseliani, Je-Won Hong, Jiang Bian, Larisa Cavallari, Julio D Duarte, Danielle Nelson, Wei-Hsuan Lo-Ciganic, Khoa Anh Nguyen, Md Mahmudul Hasan","doi":"10.2196/81048","DOIUrl":"10.2196/81048","url":null,"abstract":"<p><strong>Background: </strong>Opioids are a widely prescribed class of medication for pain management. However, they have variable efficacy and adverse effects among patients, due to the complex interplay between biological and clinical factors. Pharmacogenetic testing can be used to match patients' genetic profiles to individualize opioid therapy, improving pain relief and reducing the risk of adverse effects. Despite its potential, the pharmacogenetic testing uptake (use of pharmacogenetic testing) remains low due to a range of barriers at the patient, health care provider, infrastructure, and financial levels. Since testing typically involves a shared decision between the provider and patient, predicting the likelihood of a patient undergoing pharmacogenetic testing and understanding the factors influencing that decision can help optimize resource use and improve outcomes in pain management.</p><p><strong>Objective: </strong>This study aimed to develop machine learning (ML) models, identifying patients' likelihood of pharmacogenetic uptake based on their demographics, clinical variables, medication use, and social determinants of health.</p><p><strong>Methods: </strong>We used electronic health record data from a single center health care system to identify patients prescribed opioids. We extracted patients' demographics, clinical variables, medication use, and social determinants of health, and developed and validated ML models, including a neural network, logistic regression, random forest, extreme gradient boosting (XGB), naïve Bayes, and support vector machines for pharmacogenetic testing uptake prediction based on procedure codes. We performed 5-fold cross-validation and created an ensemble probability-based classifier using the best-performing ML models for pharmacogenetic testing uptake prediction. Various performance metrics, uptake stratification analysis, and feature importance analysis were used to evaluate the performance of the models.</p><p><strong>Results: </strong>The ensemble model using XGB and support vector machine-radial basis function classifiers had the highest C-statistics at 79.61%, followed by XGB (78.94%), and neural network (78.05%). While XGB was the best-performing model, the ensemble model achieved a high accuracy (32,699/48,528, 67.38%), recall (537/702, 76.50%), specificity (32,162/47,826, 67.25%), and negative predictive value (32,162/32,327, 99.49%). The uptake stratification analysis using the ensemble model indicated that it can effectively distinguish across uptake probability deciles, where those in the higher strata are more likely to undergo pharmacogenetic testing in the real world (320/4853, 6.59% in the highest decile compared to 6/4853, 0.12% in the lowest). Furthermore, Shapley Additive Explanations value analysis using the XGB model indicated age, hypertension, and household income as the most influential factors for pharmacogenetic testing uptake prediction.</p><p><strong>Conclusions: </strong>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e81048"},"PeriodicalIF":3.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12822862/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rico Paridaens, Steve Van den Bulck, Michel De Jonghe, Benjamin Fauquert, Liesbeth Meel, Willem Raat, Bert Vaes
Background: When used correctly, electronic medical records (EMRs) can support clinical decision-making, provide information for research, facilitate coordination of care, reduce medical errors, and generate patient health summaries. Studies have reported large differences in the quality of EMR data.
Objective: Our study aimed to develop an evidence-based set of electronically extractable quality indicators (QIs) approved by expert consensus to assess the good use of EMRs by general practitioners (GPs) from a medical perspective.
Methods: The RAND-modified Delphi method was used in this study. The TRIP and MEDLINE databases were searched, and a selection of recommendations was filtered using the specific, measurable, assignable, realistic, and time-bound principles. The panel comprised 12 GPs and 6 EMR developers. The selected recommendations were transformed into QIs as percentages.
Results: A combined list of 20 indicators and 30 recommendations was created from 9 guidelines and 4 review articles. After the consensus round, 20 (100%) indicators and 20 (67%) recommendations were approved by the panel. All 20 recommendations were transformed into QIs. Most (16, 40%) QIs evaluated the completeness and adequacy of the problem list.
Conclusions: This study provided a set of 40 EMR-extractable QIs for the correct use of EMRs in primary care. These QIs can be used to map the completeness of EMRs by setting up an audit and feedback system, and to develop specific (computer-based) training for GPs.
{"title":"Development of Quality Indicators for the Correct Use of Electronic Medical Records in Primary Care: Modified Delphi Study.","authors":"Rico Paridaens, Steve Van den Bulck, Michel De Jonghe, Benjamin Fauquert, Liesbeth Meel, Willem Raat, Bert Vaes","doi":"10.2196/80057","DOIUrl":"10.2196/80057","url":null,"abstract":"<p><strong>Background: </strong>When used correctly, electronic medical records (EMRs) can support clinical decision-making, provide information for research, facilitate coordination of care, reduce medical errors, and generate patient health summaries. Studies have reported large differences in the quality of EMR data.</p><p><strong>Objective: </strong>Our study aimed to develop an evidence-based set of electronically extractable quality indicators (QIs) approved by expert consensus to assess the good use of EMRs by general practitioners (GPs) from a medical perspective.</p><p><strong>Methods: </strong>The RAND-modified Delphi method was used in this study. The TRIP and MEDLINE databases were searched, and a selection of recommendations was filtered using the specific, measurable, assignable, realistic, and time-bound principles. The panel comprised 12 GPs and 6 EMR developers. The selected recommendations were transformed into QIs as percentages.</p><p><strong>Results: </strong>A combined list of 20 indicators and 30 recommendations was created from 9 guidelines and 4 review articles. After the consensus round, 20 (100%) indicators and 20 (67%) recommendations were approved by the panel. All 20 recommendations were transformed into QIs. Most (16, 40%) QIs evaluated the completeness and adequacy of the problem list.</p><p><strong>Conclusions: </strong>This study provided a set of 40 EMR-extractable QIs for the correct use of EMRs in primary care. These QIs can be used to map the completeness of EMRs by setting up an audit and feedback system, and to develop specific (computer-based) training for GPs.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e80057"},"PeriodicalIF":3.8,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12865340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}