首页 > 最新文献

JMIR Medical Informatics最新文献

英文 中文
Comprehensive Pediatric Health Risk Stratification Using an AI-Driven Framework in Children Aged 2 to 8 Years: Design and Validation Study. 在2至8岁儿童中使用人工智能驱动框架的综合儿科健康风险分层:设计和验证研究
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-26 DOI: 10.2196/80163
Zhihe Mao, Jundan Chen

Background: Early life health risks can shape long-term morbidity trajectories, yet prevailing pediatric risk assessment paradigms are often fragmented and insufficiently capable of integrating heterogeneous data streams into actionable, individualized profiles.

Objective: This study aimed to design, implement, and validate an artificial intelligence-driven framework that fuses multimodal pediatric data and leverages advanced natural language processing and ensemble learning to improve early, accurate stratification of key pediatric health risks.

Methods: A retrospective dataset of over 40,000 pediatric participants aged 2-8 years was used to train and evaluate the framework. Data were split into training, validation, and test sets (70%, 15%, and 15%, respectively) with a temporally mindful partitioning strategy to approximate prospective evaluation. Baseline comparators included traditional statistical and machine learning models, and the statistical significance of area under the receiver operating characteristic curve (AUC-ROC) differences was assessed using the DeLong test.

Results: The proposed Bidirectional Encoder Representations From Transformers-based model achieved an AUC-ROC of 0.85 (95% CI 0.82-0.88), sensitivity of 0.78, specificity of 0.80, and F1-score of 0.75 on the test set, outperforming multiple baseline models. In an additional manual comparison evaluation, automated and expert assessments aligned with 78% accuracy (78/100), and most discrepancies arose in "equivalent" cases.

Conclusions: This study provides a validated, artificial intelligence-driven, multimodal pediatric health risk stratification framework that translates heterogeneous child health data into clinically actionable risk profiles, demonstrating strong discriminative performance and meaningful agreement with expert assessment. The framework supports proactive, individualized pediatric care and offers a scalable foundation for further validation across broader populations and longitudinal follow-up.

背景:生命早期健康风险可以形成长期发病率轨迹,但目前流行的儿科风险评估范式往往是碎片化的,无法将异质数据流整合成可操作的个性化概况。目的:本研究旨在设计、实施并验证一个人工智能驱动的框架,该框架融合了多模态儿科数据,并利用先进的自然语言处理和集成学习来提高关键儿科健康风险的早期、准确分层。方法:使用超过40,000名2-8岁儿童参与者的回顾性数据集来训练和评估该框架。数据被分成训练集、验证集和测试集(分别为70%、15%和15%),采用暂时注意分区策略来近似预期评估。基线比较包括传统统计模型和机器学习模型,采用DeLong检验评估受试者工作特征曲线下面积(AUC-ROC)差异的统计显著性。结果:提出的基于transformer的双向编码器表示模型的AUC-ROC为0.85 (95% CI 0.82-0.88),灵敏度为0.78,特异性为0.80,测试集的f1评分为0.75,优于多个基线模型。在额外的人工比较评估中,自动化和专家评估的准确率为78%(78/100),大多数差异出现在“等效”情况下。结论:本研究提供了一个经过验证的、人工智能驱动的、多模式的儿科健康风险分层框架,该框架将异质儿童健康数据转化为临床可操作的风险概况,显示出很强的鉴别性能,并与专家评估有意义的一致性。该框架支持主动、个性化的儿科护理,并为在更广泛的人群和纵向随访中进一步验证提供了可扩展的基础。
{"title":"Comprehensive Pediatric Health Risk Stratification Using an AI-Driven Framework in Children Aged 2 to 8 Years: Design and Validation Study.","authors":"Zhihe Mao, Jundan Chen","doi":"10.2196/80163","DOIUrl":"10.2196/80163","url":null,"abstract":"<p><strong>Background: </strong>Early life health risks can shape long-term morbidity trajectories, yet prevailing pediatric risk assessment paradigms are often fragmented and insufficiently capable of integrating heterogeneous data streams into actionable, individualized profiles.</p><p><strong>Objective: </strong>This study aimed to design, implement, and validate an artificial intelligence-driven framework that fuses multimodal pediatric data and leverages advanced natural language processing and ensemble learning to improve early, accurate stratification of key pediatric health risks.</p><p><strong>Methods: </strong>A retrospective dataset of over 40,000 pediatric participants aged 2-8 years was used to train and evaluate the framework. Data were split into training, validation, and test sets (70%, 15%, and 15%, respectively) with a temporally mindful partitioning strategy to approximate prospective evaluation. Baseline comparators included traditional statistical and machine learning models, and the statistical significance of area under the receiver operating characteristic curve (AUC-ROC) differences was assessed using the DeLong test.</p><p><strong>Results: </strong>The proposed Bidirectional Encoder Representations From Transformers-based model achieved an AUC-ROC of 0.85 (95% CI 0.82-0.88), sensitivity of 0.78, specificity of 0.80, and F1-score of 0.75 on the test set, outperforming multiple baseline models. In an additional manual comparison evaluation, automated and expert assessments aligned with 78% accuracy (78/100), and most discrepancies arose in \"equivalent\" cases.</p><p><strong>Conclusions: </strong>This study provides a validated, artificial intelligence-driven, multimodal pediatric health risk stratification framework that translates heterogeneous child health data into clinically actionable risk profiles, demonstrating strong discriminative performance and meaningful agreement with expert assessment. The framework supports proactive, individualized pediatric care and offers a scalable foundation for further validation across broader populations and longitudinal follow-up.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e80163"},"PeriodicalIF":3.8,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scaling Wireless Continuous Vital Sign Monitoring Across an 8-Hospital Health System: Digital Health Implementation Report. 在八家医院的健康系统中扩展无线连续生命体征监测:数字健康实施报告。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-26 DOI: 10.2196/78216
Ngoc-Anh Nguyen, Grace Lee, Brendan Holderread, Terrie Holman, Sarah Pletcher, Roberta Schwartz
<p><strong>Background: </strong>Frequent vital sign (VS) monitoring is central to inpatient safety but is traditionally performed manually every 4 hours, a century-old practice that can miss early clinical deterioration, disrupt patient sleep, and impose a heavy documentation burden on nursing staff. Continuous VS monitoring (CVSM) using wearable remote patient monitoring devices enables near real-time, high-frequency VS measurement while reducing manual workload and preserving patient rest.</p><p><strong>Objective: </strong>This implementation report describes the large-scale implementation of CVSM across an 8-hospital health system. The initiative aimed to (1) enhance earlier detection of patient health deterioration through continuous, algorithm-driven monitoring; (2) improve nursing workflow efficiency by reducing reliance on manual VS checks; and (3) minimize nighttime disruptions to support patient rest and recovery.</p><p><strong>Methods: </strong>The program was designed for system-wide scalability and executed from 2022 to 2024 using a 4-phase framework: strategic program design, program planning, go-live preparation, and implementation and optimization. A Food and Drug Administration-cleared wearable device (BioButton) continuously measured heart rate, respiratory rate, and skin temperature, with data integrated into Epic and monitored 24×7 through a centralized virtual operations center. Rollout followed a staggered playbook across approximately 2700 adult non-intensive care unit beds and was supported by leadership engagement, supply chain readiness, staff training, and phased superuser-led adoption.</p><p><strong>Implementation (results): </strong>All 8 hospitals achieved full deployment between April 2023 and February 2024, with more than 95% device use rates and 100% nursing staff training completion. A standardized escalation workflow filtered approximately 50% of the alerts at the virtual operations center review stage, substantially reducing frontline alert burden. Operational refinements included revised heart rate and respiratory rate alert thresholds and the removal of temperature as a single alert trigger. Several units extended overnight manual VS intervals from every 4 hours to every 6 to 8 hours, with staff estimating approximately 4 hours saved per nursing shift. Patient care assistants redirected time toward patient mobility and personal care needs, while staff reported growing confidence in device performance over time.</p><p><strong>Conclusions: </strong>This initiative represents the first system-wide deployment of CVSM across a diverse, multihospital health system. Success was enabled by early strategic alignment, phased rollout, robust IT and monitoring infrastructure, and iterative optimization. The program demonstrates the feasibility of embedding CVSM into routine inpatient care to improve efficiency and patient experience. Transferable strategies, including phased rollouts, centralized monitoring, and structure
背景:频繁监测生命体征(VS)对住院患者安全至关重要,但传统上每4小时人工测量一次,这是一种有百年历史的做法,可能错过早期恶化,扰乱患者睡眠,并给护理人员带来沉重的文件负担。使用可穿戴式远程患者监测(RPM)设备的连续生命体征监测(CVSM)可以实现近实时、高频的VS测量,同时减少人工工作量并保证患者休息。目的:本实施报告描述了在八家医院的卫生系统中大规模实施CVSM。该计划旨在:(1)通过持续的、算法驱动的监测,加强对患者病情恶化的早期发现;(2)减少对人工VS检查的依赖,提高护理工作流程效率;(3)尽量减少夜间干扰,以支持患者休息和恢复。方法:采用战略方案设计、方案规划、上线准备、实施与优化四阶段框架,从2022年至2024年实施全系统可扩展性方案。fda批准的可穿戴设备(BioButton®;BioIntelliSense, Golden, CO, USA)连续测量心率(HR),呼吸频率(RR)和皮肤温度,并通过集中式虚拟操作中心(VOC)集成到Epic和24/7监督。在领导参与、供应链准备、培训和分阶段的超级用户主导采用的支持下,在约2700张成人非icu病床上错开了剧本。结果:8家医院均于2023年4月至2024年2月实现全面部署,设备使用率达95%,护理人员培训完成率达100%。标准化的升级工作流程在VOC审查步骤中过滤了约50%的警报,大大减少了一线警报负担。操作改进包括修改了HR和RR阈值,并取消了作为单一警报触发器的温度。一些单位将夜间手动VS间隔从每4小时延长到每6 - 8小时,工作人员估计每个护理班次节省约4小时。病人护理助理将时间重新分配到移动性和个人需求上,而工作人员则报告对设备性能的信心日益增强。结论:这一举措代表了CVSM在一个多样化、多医院的卫生系统中的首次全系统部署。成功是通过早期的战略调整、分阶段推出、健壮的IT和监视基础设施以及迭代优化实现的。该计划证明了将CVSM嵌入日常住院护理以提高效率和患者体验的可行性。可转移的策略,包括分阶段推广、集中监测和结构化变更管理,可以为其他追求数字生命体征重新设计的卫生系统提供信息。未来的工作应严格评估对患者预后的影响,成本效益,以及对急性后和门诊护理的适用性。临床试验:
{"title":"Scaling Wireless Continuous Vital Sign Monitoring Across an 8-Hospital Health System: Digital Health Implementation Report.","authors":"Ngoc-Anh Nguyen, Grace Lee, Brendan Holderread, Terrie Holman, Sarah Pletcher, Roberta Schwartz","doi":"10.2196/78216","DOIUrl":"10.2196/78216","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Frequent vital sign (VS) monitoring is central to inpatient safety but is traditionally performed manually every 4 hours, a century-old practice that can miss early clinical deterioration, disrupt patient sleep, and impose a heavy documentation burden on nursing staff. Continuous VS monitoring (CVSM) using wearable remote patient monitoring devices enables near real-time, high-frequency VS measurement while reducing manual workload and preserving patient rest.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This implementation report describes the large-scale implementation of CVSM across an 8-hospital health system. The initiative aimed to (1) enhance earlier detection of patient health deterioration through continuous, algorithm-driven monitoring; (2) improve nursing workflow efficiency by reducing reliance on manual VS checks; and (3) minimize nighttime disruptions to support patient rest and recovery.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;The program was designed for system-wide scalability and executed from 2022 to 2024 using a 4-phase framework: strategic program design, program planning, go-live preparation, and implementation and optimization. A Food and Drug Administration-cleared wearable device (BioButton) continuously measured heart rate, respiratory rate, and skin temperature, with data integrated into Epic and monitored 24×7 through a centralized virtual operations center. Rollout followed a staggered playbook across approximately 2700 adult non-intensive care unit beds and was supported by leadership engagement, supply chain readiness, staff training, and phased superuser-led adoption.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Implementation (results): &lt;/strong&gt;All 8 hospitals achieved full deployment between April 2023 and February 2024, with more than 95% device use rates and 100% nursing staff training completion. A standardized escalation workflow filtered approximately 50% of the alerts at the virtual operations center review stage, substantially reducing frontline alert burden. Operational refinements included revised heart rate and respiratory rate alert thresholds and the removal of temperature as a single alert trigger. Several units extended overnight manual VS intervals from every 4 hours to every 6 to 8 hours, with staff estimating approximately 4 hours saved per nursing shift. Patient care assistants redirected time toward patient mobility and personal care needs, while staff reported growing confidence in device performance over time.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This initiative represents the first system-wide deployment of CVSM across a diverse, multihospital health system. Success was enabled by early strategic alignment, phased rollout, robust IT and monitoring infrastructure, and iterative optimization. The program demonstrates the feasibility of embedding CVSM into routine inpatient care to improve efficiency and patient experience. Transferable strategies, including phased rollouts, centralized monitoring, and structure","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e78216"},"PeriodicalIF":3.8,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12887559/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145893513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-Minute Deep Learning-Powered Brain Quantitative Mapping: Accelerating Clinical Imaging With Synthetic Magnetic Resonance Imaging. 两分钟深度学习驱动的大脑定量映射:用合成磁共振成像加速临床成像。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-23 DOI: 10.2196/79389
Yawen Liu, Hongxia Yin, Zuofeng Zheng, Wenjuan Liu, Tingting Zhang, Linkun Cai, Haijun Niu, Han Lv, Zhenghan Yang, Zhenchang Wang, Pengling Ren
<p><strong>Background: </strong>Quantitative magnetic resonance imaging (MRI) is an advanced technique that can map the physical properties (T1, T2, and proton density [PD]) of different tissues, offering crucial insights for disease diagnosis. Nonetheless, the practical application of this technology is indeed constrained by several factors, with the most notable being the protracted scanning duration.</p><p><strong>Objective: </strong>This study aimed to explore whether deep learning (DL)-based superresolution reconstruction of ultrafast whole brain synthetic MRI can obtain quantitative T1/T2/PD maps that are closely approximated to those from routine clinical scans, while substantially shortening scan time and preserving diagnostic image quality.</p><p><strong>Methods: </strong>A total of 151 healthy adults and 7 individuals with different pathologies were prospectively enrolled. Each individual was examined twice on a 3.0T scanner using routine and fast synthetic MRI protocols. The routine scans (acquisition matrix: 320×256) were interpolated to 512 by 512 for clinical display and served as reference images. The fast scans (acquisition matrix: 192×128) were preprocessed to 256 by 256 and used as inputs to a superresolution generative adversarial network (SRGAN), which reconstructed them to the same 512 by 512 interpolated resolution as the reference. For each quantitative chart, 120 (75.95%) healthy individuals' images were used for training, and 38 (24.05%) individuals' images (healthy individuals: n=31, 19.62%; patients: n=7, 4.43%) were used for testing. Agreement was assessed with a paired t test, two 1-sided tests, Bland-Altman analysis, and coefficients of variation.</p><p><strong>Results: </strong>DL reconstructed and reference T1/T2/PD values were strongly correlated (T1: R²=0.98; T2: R²=0.97; and PD: R²=0.99). The slopes of the linear regression were near 1.0 both for T1 (0.9418) and PD (0.9946), whereas T2 values were moderate, as the slope of the linear regression was 0.8057. Additionally, the average biases of T1, T2, and PD values were small (0.93%, -0.85%, and 0.31%, respectively). The intra- and intergroup coefficient of variation for most of the brain regions stayed below 5%, especially for PD values, and after DL reconstruction, it still has quantitative accuracy for lesions. Quantitative and qualitative analyses of image quality also indicate that SRGAN markedly suppressed noise and artifacts in fast acquisitions, restoring structural fidelity (structural similarity image measure) and signal fidelity (peak signal-to-noise ratio) close to the level of routine scans while substantially improving perceptual naturalness over fast scans (as measured by the naturalness image quality evaluator), although not yet matching that of routine imaging.</p><p><strong>Conclusions: </strong>SRGAN superresolution applied to ultrafast synthetic MRI yields whole brain T1, T2, and PD maps that show strong correlation with routine synthetic MRI w
背景:定量磁共振成像(MRI)是一种先进的技术,可以绘制不同组织的物理性质(T1、T2和质子密度[PD]),为疾病诊断提供重要见解。然而,该技术的实际应用确实受到几个因素的限制,其中最显著的是扫描时间过长。目的:本研究旨在探讨基于深度学习(DL)的超分辨率重建超快全脑合成MRI能否获得与常规临床扫描结果非常接近的定量T1/T2/PD图,同时大幅缩短扫描时间并保持诊断图像质量。方法:前瞻性纳入151名健康成人和7名不同病理的个体。每个个体在3.0T扫描仪上使用常规和快速合成MRI协议检查两次。将常规扫描(采集矩阵:320×256)插值为512 × 512供临床显示,并作为参考图像。快速扫描(采集矩阵:192×128)被预处理为256 × 256,并用作超分辨率生成对抗网络(SRGAN)的输入,该网络将它们重构为与参考相同的512 × 512插值分辨率。每个定量图使用健康个体图像120张(75.95%)进行训练,使用健康个体图像38张(24.05%)进行检验(健康个体:n=31, 19.62%;患者:n=7, 4.43%)。采用配对t检验、两个单侧检验、Bland-Altman分析和变异系数来评估一致性。结果:DL重建值与参考T1/T2/PD值呈强相关(T1: R²=0.98;T2: R²=0.97;PD: R²=0.99)。T1(0.9418)和PD(0.9946)的线性回归斜率均接近1.0,而T2的线性回归斜率为0.8057,为中等。此外,T1、T2和PD值的平均偏差较小(分别为0.93%、-0.85%和0.31%)。大部分脑区组内和组间变异系数保持在5%以下,尤其是PD值,重建DL后对病灶仍有定量准确性。图像质量的定量和定性分析也表明,SRGAN显著抑制了快速采集中的噪声和伪像,恢复了接近常规扫描水平的结构保真度(图像结构相似性测量)和信号保真度(峰值信噪比),同时大大提高了快速扫描的感知自然度(由自然度图像质量评估器测量),尽管还不能与常规成像相匹配。结论:SRGAN超分辨率应用于超快合成MRI,获得的全脑T1、T2和PD图与常规合成MRI具有很强的相关性,同时将采集时间减半并保持诊断图像质量。尽管T1和PD值显示出接近理想的一致性,而T2值显示出适度的系统性低估,但该方法代表了加速定量脑成像临床应用的有希望的一步。
{"title":"Two-Minute Deep Learning-Powered Brain Quantitative Mapping: Accelerating Clinical Imaging With Synthetic Magnetic Resonance Imaging.","authors":"Yawen Liu, Hongxia Yin, Zuofeng Zheng, Wenjuan Liu, Tingting Zhang, Linkun Cai, Haijun Niu, Han Lv, Zhenghan Yang, Zhenchang Wang, Pengling Ren","doi":"10.2196/79389","DOIUrl":"10.2196/79389","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Quantitative magnetic resonance imaging (MRI) is an advanced technique that can map the physical properties (T1, T2, and proton density [PD]) of different tissues, offering crucial insights for disease diagnosis. Nonetheless, the practical application of this technology is indeed constrained by several factors, with the most notable being the protracted scanning duration.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to explore whether deep learning (DL)-based superresolution reconstruction of ultrafast whole brain synthetic MRI can obtain quantitative T1/T2/PD maps that are closely approximated to those from routine clinical scans, while substantially shortening scan time and preserving diagnostic image quality.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;A total of 151 healthy adults and 7 individuals with different pathologies were prospectively enrolled. Each individual was examined twice on a 3.0T scanner using routine and fast synthetic MRI protocols. The routine scans (acquisition matrix: 320×256) were interpolated to 512 by 512 for clinical display and served as reference images. The fast scans (acquisition matrix: 192×128) were preprocessed to 256 by 256 and used as inputs to a superresolution generative adversarial network (SRGAN), which reconstructed them to the same 512 by 512 interpolated resolution as the reference. For each quantitative chart, 120 (75.95%) healthy individuals' images were used for training, and 38 (24.05%) individuals' images (healthy individuals: n=31, 19.62%; patients: n=7, 4.43%) were used for testing. Agreement was assessed with a paired t test, two 1-sided tests, Bland-Altman analysis, and coefficients of variation.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;DL reconstructed and reference T1/T2/PD values were strongly correlated (T1: R²=0.98; T2: R²=0.97; and PD: R²=0.99). The slopes of the linear regression were near 1.0 both for T1 (0.9418) and PD (0.9946), whereas T2 values were moderate, as the slope of the linear regression was 0.8057. Additionally, the average biases of T1, T2, and PD values were small (0.93%, -0.85%, and 0.31%, respectively). The intra- and intergroup coefficient of variation for most of the brain regions stayed below 5%, especially for PD values, and after DL reconstruction, it still has quantitative accuracy for lesions. Quantitative and qualitative analyses of image quality also indicate that SRGAN markedly suppressed noise and artifacts in fast acquisitions, restoring structural fidelity (structural similarity image measure) and signal fidelity (peak signal-to-noise ratio) close to the level of routine scans while substantially improving perceptual naturalness over fast scans (as measured by the naturalness image quality evaluator), although not yet matching that of routine imaging.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;SRGAN superresolution applied to ultrafast synthetic MRI yields whole brain T1, T2, and PD maps that show strong correlation with routine synthetic MRI w","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e79389"},"PeriodicalIF":3.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12833913/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification and Localization of Breast Tumor Components via a Convolutional Neural Network Based on High-Frequency Ultrasound Combined With Histopathologic Registration: Prospective Study. 基于高频超声结合组织病理登记的卷积神经网络识别和定位乳腺肿瘤成分:前瞻性研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-23 DOI: 10.2196/81181
Jia-Qian Yao, Wen-Wen Zhou, Zhi-Fei Chai, Fei Ren, Tong-Yi Huang, Tian-Tian Zhen, Hui-Juan Shi, Xiao-Yan Xie, Ze Zhao, Ming Xu

Background: Given the highly heterogeneous biology of breast cancer, a more effective noninvasive diagnostic tool that unravels microscopic histopathology patterns is urgently needed.

Objective: This study aims to identify cancerous regions in ultrasound images of breast cancer via convolutional neural network based on registered grayscale ultrasound images and readily accessible biopsy whole slide images (WSIs).

Methods: This single-center study prospectively included participants undergoing ultrasound-guided core needle biopsy procedures for Breast Imaging Reporting and Data System category 4 or 5 breast lesions for whom breast cancer was pathologically confirmed from July 2022 to February 2023 consecutively. The basic information, ultrasound image data, biopsy tissue specimens, and corresponding WSIs were collected. After core needle biopsy procedures, the stained breast tissue specimens were sliced and coregistered with an ultrasound image of a needle tract. Convolutional neural network models for identifying breast cancer cells in ultrasound images were developed using FCN-101 and DeepLabV3 networks. The image-level predictive performance was evaluated and compared quantitatively by pixel accuracy, Dice similarity coefficient, and recall. Pixel-level classification was illustrated through confusion matrices. The cancerous region in the testing dataset was further visualized in ultrasound images. Potential clinical applications were qualitatively assessed by comparing the automatic segmentation results and the actual pathological tissue distributions.

Results: A total of 105 participants with 386 ultrasound images of breast cancer were included, with 270 (70%), 78 (20.2%), and 38 (9.8%) images in the training, validation, and test datasets, respectively. Both models performed well in predicting the cancerous regions in the biopsy area, whereas the FCN-101 model was superior to the DeepLabV3 model in terms of pixel accuracy (86.91% vs 69.55%; P=.002) and Dice similarity coefficient (77.47% vs 69.90%; P<.001). The two models yielded recall values of 54.64% and 58.46%, with no significant difference between them (P=.80). Furthermore, the FCN-101 model had an advantage in predicting cancerous regions, while the DeepLabV3 model achieved more accurate predictive pixels in normal tissue (both P<.05). Visualization of cancerous regions on grayscale ultrasound images demonstrated high consistency with those identified on WSIs.

Conclusions: The technique for spatial registration of breast WSIs and ultrasound images of a needle tract was established. Breast cancer regions were accurately identified and localized on a pixel level in high-frequency ultrasound images via an advanced convolutional neural network with histopathologic WSI as the reference standard.

背景:鉴于乳腺癌的高度异质性生物学,迫切需要一种更有效的非侵入性诊断工具来揭示显微镜下的组织病理学模式。目的:本研究旨在基于注册的灰度超声图像和易获取的活检全切片图像(wsi),利用卷积神经网络识别乳腺癌超声图像中的癌区。方法:该单中心研究前瞻性纳入了2022年7月至2023年2月期间连续接受超声引导核心针活检的乳腺癌影像学报告和数据系统4或5类乳腺病变患者。收集基本信息、超声图像资料、活检组织标本及相应wsi。在核心针活检程序后,染色的乳腺组织标本被切片并与针束的超声图像共同登记。利用FCN-101和DeepLabV3网络建立超声图像中乳腺癌细胞识别的卷积神经网络模型。通过像素精度、Dice相似系数和召回率对图像级预测性能进行了定量评估和比较。通过混淆矩阵说明像素级分类。测试数据集中的癌区在超声图像中进一步可视化。将自动分割结果与实际病理组织分布进行比较,定性评价其临床应用潜力。结果:共纳入105名参与者,386张乳腺癌超声图像,其中训练、验证和测试数据集分别为270张(70%)、78张(20.2%)和38张(9.8%)。两种模型均能较好地预测活检区域的癌变区域,而FCN-101模型在像素精度(86.91% vs 69.55%; P= 0.002)和Dice相似系数(77.47% vs 69.90%)方面优于DeepLabV3模型。结论:建立了乳腺wsi与针道超声图像的空间配准技术。通过先进的卷积神经网络,以组织病理学WSI为参考标准,在高频超声图像中准确识别和定位乳腺癌区域。
{"title":"Identification and Localization of Breast Tumor Components via a Convolutional Neural Network Based on High-Frequency Ultrasound Combined With Histopathologic Registration: Prospective Study.","authors":"Jia-Qian Yao, Wen-Wen Zhou, Zhi-Fei Chai, Fei Ren, Tong-Yi Huang, Tian-Tian Zhen, Hui-Juan Shi, Xiao-Yan Xie, Ze Zhao, Ming Xu","doi":"10.2196/81181","DOIUrl":"10.2196/81181","url":null,"abstract":"<p><strong>Background: </strong>Given the highly heterogeneous biology of breast cancer, a more effective noninvasive diagnostic tool that unravels microscopic histopathology patterns is urgently needed.</p><p><strong>Objective: </strong>This study aims to identify cancerous regions in ultrasound images of breast cancer via convolutional neural network based on registered grayscale ultrasound images and readily accessible biopsy whole slide images (WSIs).</p><p><strong>Methods: </strong>This single-center study prospectively included participants undergoing ultrasound-guided core needle biopsy procedures for Breast Imaging Reporting and Data System category 4 or 5 breast lesions for whom breast cancer was pathologically confirmed from July 2022 to February 2023 consecutively. The basic information, ultrasound image data, biopsy tissue specimens, and corresponding WSIs were collected. After core needle biopsy procedures, the stained breast tissue specimens were sliced and coregistered with an ultrasound image of a needle tract. Convolutional neural network models for identifying breast cancer cells in ultrasound images were developed using FCN-101 and DeepLabV3 networks. The image-level predictive performance was evaluated and compared quantitatively by pixel accuracy, Dice similarity coefficient, and recall. Pixel-level classification was illustrated through confusion matrices. The cancerous region in the testing dataset was further visualized in ultrasound images. Potential clinical applications were qualitatively assessed by comparing the automatic segmentation results and the actual pathological tissue distributions.</p><p><strong>Results: </strong>A total of 105 participants with 386 ultrasound images of breast cancer were included, with 270 (70%), 78 (20.2%), and 38 (9.8%) images in the training, validation, and test datasets, respectively. Both models performed well in predicting the cancerous regions in the biopsy area, whereas the FCN-101 model was superior to the DeepLabV3 model in terms of pixel accuracy (86.91% vs 69.55%; P=.002) and Dice similarity coefficient (77.47% vs 69.90%; P<.001). The two models yielded recall values of 54.64% and 58.46%, with no significant difference between them (P=.80). Furthermore, the FCN-101 model had an advantage in predicting cancerous regions, while the DeepLabV3 model achieved more accurate predictive pixels in normal tissue (both P<.05). Visualization of cancerous regions on grayscale ultrasound images demonstrated high consistency with those identified on WSIs.</p><p><strong>Conclusions: </strong>The technique for spatial registration of breast WSIs and ultrasound images of a needle tract was established. Breast cancer regions were accurately identified and localized on a pixel level in high-frequency ultrasound images via an advanced convolutional neural network with histopathologic WSI as the reference standard.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e81181"},"PeriodicalIF":3.8,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12829891/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of Image Degradation on Deep Neural Network Classification of Scaphoid Fracture Radiographs: Comparison Study of Different Noise Types. 图像退化对舟状骨骨折x线片深度神经网络分类的影响:不同噪声类型的比较研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-22 DOI: 10.2196/65596
Chihung Lin, Alfred P Yoon, Chien-Wei Wang, Tung Chao, Kevin C Chung, Chang-Fu Kuo

Background: Deep learning models have shown strong potential for automated fracture detection in medical images. However, their robustness under varying image quality remains uncertain, particularly for small and subtle fractures, such as scaphoid fractures. Understanding how different types of image perturbations affect model performance is crucial for ensuring reliable deployment in clinical practice.

Objective: This study aimed to evaluate the robustness of a deep learning model trained to detect scaphoid fractures in radiographs when exposed to various image perturbations. We sought to identify which perturbations most strongly impact performance and to explore strategies to mitigate performance degradation.

Methods: Radiographic datasets were systematically modified by applying Gaussian noise, blurring, JPEG compression, contrast-limited adaptive histogram equalization, resizing, and geometric offsets. Model accuracy was evaluated across different perturbation types and levels. Image quality was quantified using peak signal-to-noise ratio and structural similarity index measure to assess correlations between degradation and model performance.

Results: Model accuracy declined with increasing perturbation severity, but the extent varied across perturbation types. Gaussian blur caused the most substantial performance drop, whereas contrast-limited adaptive histogram equalization increased the false-negative rate. The model demonstrated higher resilience to color perturbations than to grayscale degradations. A strong linear correlation was found between peak signal-to-noise ratio-structural similarity index measure and accuracy, suggesting that better image quality led to improved detection. Geometric offsets and pixel value rescaling had minimal influence, whereas resolution was the dominant factor affecting performance.

Conclusions: The findings indicate that image quality, especially resolution and blurring, substantially influences the robustness of deep learning-based fracture detection models. Ensuring adequate image resolution and quality control can enhance diagnostic reliability. These results provide valuable insights for designing more accurate and resilient medical imaging models under real-world variability.

背景:深度学习模型在医学图像的自动骨折检测方面显示出强大的潜力。然而,它们在不同图像质量下的鲁棒性仍然不确定,特别是对于小而微妙的骨折,如舟状骨骨折。了解不同类型的图像扰动如何影响模型性能对于确保在临床实践中可靠部署至关重要。目的:本研究旨在评估深度学习模型在暴露于各种图像扰动时在x线片上检测舟状骨骨折的稳健性。我们试图确定哪些扰动对性能影响最大,并探索减轻性能下降的策略。方法:采用高斯噪声、模糊、JPEG压缩、对比度限制的自适应直方图均衡化、调整大小和几何偏移等方法对放射数据集进行系统修改。在不同的扰动类型和水平下评估模型精度。使用峰值信噪比和结构相似指数来量化图像质量,以评估退化与模型性能之间的相关性。结果:模型精度随扰动严重程度的增加而下降,但不同扰动类型的程度不同。高斯模糊导致的性能下降最为显著,而对比度有限的自适应直方图均衡化则增加了假阴性率。该模型对颜色扰动比对灰度退化表现出更高的恢复能力。峰值信噪比-结构相似指数测量值与精度之间存在很强的线性相关性,表明图像质量越好,检测效果越好。几何偏移和像素值重新缩放对性能的影响最小,而分辨率是影响性能的主要因素。结论:研究结果表明,图像质量,特别是分辨率和模糊程度,极大地影响了基于深度学习的断裂检测模型的鲁棒性。确保足够的图像分辨率和质量控制可以提高诊断的可靠性。这些结果为在现实世界的可变性下设计更准确、更有弹性的医学成像模型提供了有价值的见解。
{"title":"Effects of Image Degradation on Deep Neural Network Classification of Scaphoid Fracture Radiographs: Comparison Study of Different Noise Types.","authors":"Chihung Lin, Alfred P Yoon, Chien-Wei Wang, Tung Chao, Kevin C Chung, Chang-Fu Kuo","doi":"10.2196/65596","DOIUrl":"10.2196/65596","url":null,"abstract":"<p><strong>Background: </strong>Deep learning models have shown strong potential for automated fracture detection in medical images. However, their robustness under varying image quality remains uncertain, particularly for small and subtle fractures, such as scaphoid fractures. Understanding how different types of image perturbations affect model performance is crucial for ensuring reliable deployment in clinical practice.</p><p><strong>Objective: </strong>This study aimed to evaluate the robustness of a deep learning model trained to detect scaphoid fractures in radiographs when exposed to various image perturbations. We sought to identify which perturbations most strongly impact performance and to explore strategies to mitigate performance degradation.</p><p><strong>Methods: </strong>Radiographic datasets were systematically modified by applying Gaussian noise, blurring, JPEG compression, contrast-limited adaptive histogram equalization, resizing, and geometric offsets. Model accuracy was evaluated across different perturbation types and levels. Image quality was quantified using peak signal-to-noise ratio and structural similarity index measure to assess correlations between degradation and model performance.</p><p><strong>Results: </strong>Model accuracy declined with increasing perturbation severity, but the extent varied across perturbation types. Gaussian blur caused the most substantial performance drop, whereas contrast-limited adaptive histogram equalization increased the false-negative rate. The model demonstrated higher resilience to color perturbations than to grayscale degradations. A strong linear correlation was found between peak signal-to-noise ratio-structural similarity index measure and accuracy, suggesting that better image quality led to improved detection. Geometric offsets and pixel value rescaling had minimal influence, whereas resolution was the dominant factor affecting performance.</p><p><strong>Conclusions: </strong>The findings indicate that image quality, especially resolution and blurring, substantially influences the robustness of deep learning-based fracture detection models. Ensuring adequate image resolution and quality control can enhance diagnostic reliability. These results provide valuable insights for designing more accurate and resilient medical imaging models under real-world variability.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e65596"},"PeriodicalIF":3.8,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826633/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146031729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning Prediction of Pharmacogenetic Testing Uptake Among Opioid-Prescribed Patients Using Electronic Health Records: Retrospective Cohort Study. 机器学习预测阿片类药物处方患者使用电子健康记录的药物遗传检测:回顾性队列研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-21 DOI: 10.2196/81048
Mohammad Yaseliani, Je-Won Hong, Jiang Bian, Larisa Cavallari, Julio D Duarte, Danielle Nelson, Wei-Hsuan Lo-Ciganic, Khoa Anh Nguyen, Md Mahmudul Hasan
<p><strong>Background: </strong>Opioids are a widely prescribed class of medication for pain management. However, they have variable efficacy and adverse effects among patients, due to the complex interplay between biological and clinical factors. Pharmacogenetic testing can be used to match patients' genetic profiles to individualize opioid therapy, improving pain relief and reducing the risk of adverse effects. Despite its potential, the pharmacogenetic testing uptake (use of pharmacogenetic testing) remains low due to a range of barriers at the patient, health care provider, infrastructure, and financial levels. Since testing typically involves a shared decision between the provider and patient, predicting the likelihood of a patient undergoing pharmacogenetic testing and understanding the factors influencing that decision can help optimize resource use and improve outcomes in pain management.</p><p><strong>Objective: </strong>This study aimed to develop machine learning (ML) models, identifying patients' likelihood of pharmacogenetic uptake based on their demographics, clinical variables, medication use, and social determinants of health.</p><p><strong>Methods: </strong>We used electronic health record data from a single center health care system to identify patients prescribed opioids. We extracted patients' demographics, clinical variables, medication use, and social determinants of health, and developed and validated ML models, including a neural network, logistic regression, random forest, extreme gradient boosting (XGB), naïve Bayes, and support vector machines for pharmacogenetic testing uptake prediction based on procedure codes. We performed 5-fold cross-validation and created an ensemble probability-based classifier using the best-performing ML models for pharmacogenetic testing uptake prediction. Various performance metrics, uptake stratification analysis, and feature importance analysis were used to evaluate the performance of the models.</p><p><strong>Results: </strong>The ensemble model using XGB and support vector machine-radial basis function classifiers had the highest C-statistics at 79.61%, followed by XGB (78.94%), and neural network (78.05%). While XGB was the best-performing model, the ensemble model achieved a high accuracy (32,699/48,528, 67.38%), recall (537/702, 76.50%), specificity (32,162/47,826, 67.25%), and negative predictive value (32,162/32,327, 99.49%). The uptake stratification analysis using the ensemble model indicated that it can effectively distinguish across uptake probability deciles, where those in the higher strata are more likely to undergo pharmacogenetic testing in the real world (320/4853, 6.59% in the highest decile compared to 6/4853, 0.12% in the lowest). Furthermore, Shapley Additive Explanations value analysis using the XGB model indicated age, hypertension, and household income as the most influential factors for pharmacogenetic testing uptake prediction.</p><p><strong>Conclusions: </strong>
背景:阿片类药物是一种广泛用于疼痛管理的药物。然而,由于生物学和临床因素之间复杂的相互作用,它们在患者中的疗效和不良反应各不相同。药物遗传学检测可用于匹配患者的基因图谱,以个性化阿片类药物治疗,改善疼痛缓解并降低不良反应的风险。尽管具有潜力,但由于患者、卫生保健提供者、基础设施和财政层面的一系列障碍,药物遗传检测的吸收(药物遗传检测的使用)仍然很低。由于检测通常涉及提供者和患者之间的共同决策,因此预测患者接受药物遗传检测的可能性并了解影响该决策的因素有助于优化资源利用并改善疼痛管理的结果。目的:本研究旨在开发机器学习(ML)模型,根据患者的人口统计学、临床变量、药物使用和健康的社会决定因素,确定患者药物遗传摄取的可能性。方法:我们使用来自单一中心医疗保健系统的电子健康记录数据来识别处方阿片类药物的患者。我们提取了患者的人口统计数据、临床变量、药物使用和健康的社会决定因素,并开发并验证了ML模型,包括神经网络、逻辑回归、随机森林、极端梯度增强(XGB)、naïve贝叶斯和支持向量机,用于基于程序代码的药物遗传学测试摄入预测。我们进行了5倍交叉验证,并使用性能最好的ML模型创建了一个基于概率的集成分类器,用于药物遗传学测试摄取预测。使用各种性能指标、摄取分层分析和特征重要性分析来评估模型的性能。结果:使用XGB和支持向量机-径向基函数分类器的集成模型的c统计量最高,为79.61%,其次是XGB(78.94%)和神经网络(78.05%)。虽然XGB是表现最好的模型,但集成模型具有较高的准确率(32,699/48,528,67.38%)、召回率(537/702,76.50%)、特异性(32,162/47,826,67.25%)和阴性预测值(32,162/32,327,99.49%)。使用集合模型的摄取分层分析表明,它可以有效地区分摄取概率十分位数,其中较高层次的人更有可能在现实世界中进行药物遗传测试(320/4853,最高十分位数为6.59%,最低十分位数为6/4853,0.12%)。此外,使用XGB模型进行Shapley加性解释值分析表明,年龄、高血压和家庭收入是影响药物遗传检测摄取预测的最重要因素。结论:所提出的集成模型在阿片类药物治疗疼痛患者的药物遗传学检测摄取预测中表现出高性能。该模型可作为决策支持工具,帮助临床医生确定患者接受药物遗传学检测的可能性,并指导适当的决策。
{"title":"Machine Learning Prediction of Pharmacogenetic Testing Uptake Among Opioid-Prescribed Patients Using Electronic Health Records: Retrospective Cohort Study.","authors":"Mohammad Yaseliani, Je-Won Hong, Jiang Bian, Larisa Cavallari, Julio D Duarte, Danielle Nelson, Wei-Hsuan Lo-Ciganic, Khoa Anh Nguyen, Md Mahmudul Hasan","doi":"10.2196/81048","DOIUrl":"10.2196/81048","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Opioids are a widely prescribed class of medication for pain management. However, they have variable efficacy and adverse effects among patients, due to the complex interplay between biological and clinical factors. Pharmacogenetic testing can be used to match patients' genetic profiles to individualize opioid therapy, improving pain relief and reducing the risk of adverse effects. Despite its potential, the pharmacogenetic testing uptake (use of pharmacogenetic testing) remains low due to a range of barriers at the patient, health care provider, infrastructure, and financial levels. Since testing typically involves a shared decision between the provider and patient, predicting the likelihood of a patient undergoing pharmacogenetic testing and understanding the factors influencing that decision can help optimize resource use and improve outcomes in pain management.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to develop machine learning (ML) models, identifying patients' likelihood of pharmacogenetic uptake based on their demographics, clinical variables, medication use, and social determinants of health.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We used electronic health record data from a single center health care system to identify patients prescribed opioids. We extracted patients' demographics, clinical variables, medication use, and social determinants of health, and developed and validated ML models, including a neural network, logistic regression, random forest, extreme gradient boosting (XGB), naïve Bayes, and support vector machines for pharmacogenetic testing uptake prediction based on procedure codes. We performed 5-fold cross-validation and created an ensemble probability-based classifier using the best-performing ML models for pharmacogenetic testing uptake prediction. Various performance metrics, uptake stratification analysis, and feature importance analysis were used to evaluate the performance of the models.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The ensemble model using XGB and support vector machine-radial basis function classifiers had the highest C-statistics at 79.61%, followed by XGB (78.94%), and neural network (78.05%). While XGB was the best-performing model, the ensemble model achieved a high accuracy (32,699/48,528, 67.38%), recall (537/702, 76.50%), specificity (32,162/47,826, 67.25%), and negative predictive value (32,162/32,327, 99.49%). The uptake stratification analysis using the ensemble model indicated that it can effectively distinguish across uptake probability deciles, where those in the higher strata are more likely to undergo pharmacogenetic testing in the real world (320/4853, 6.59% in the highest decile compared to 6/4853, 0.12% in the lowest). Furthermore, Shapley Additive Explanations value analysis using the XGB model indicated age, hypertension, and household income as the most influential factors for pharmacogenetic testing uptake prediction.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e81048"},"PeriodicalIF":3.8,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12822862/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of Quality Indicators for the Correct Use of Electronic Medical Records in Primary Care: Modified Delphi Study. 初级保健中正确使用电子病历质量指标的发展:修正德尔菲研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-19 DOI: 10.2196/80057
Rico Paridaens, Steve Van den Bulck, Michel De Jonghe, Benjamin Fauquert, Liesbeth Meel, Willem Raat, Bert Vaes

Background: When used correctly, electronic medical records (EMRs) can support clinical decision-making, provide information for research, facilitate coordination of care, reduce medical errors, and generate patient health summaries. Studies have reported large differences in the quality of EMR data.

Objective: Our study aimed to develop an evidence-based set of electronically extractable quality indicators (QIs) approved by expert consensus to assess the good use of EMRs by general practitioners (GPs) from a medical perspective.

Methods: The RAND-modified Delphi method was used in this study. The TRIP and MEDLINE databases were searched, and a selection of recommendations was filtered using the specific, measurable, assignable, realistic, and time-bound principles. The panel comprised 12 GPs and 6 EMR developers. The selected recommendations were transformed into QIs as percentages.

Results: A combined list of 20 indicators and 30 recommendations was created from 9 guidelines and 4 review articles. After the consensus round, 20 (100%) indicators and 20 (67%) recommendations were approved by the panel. All 20 recommendations were transformed into QIs. Most (16, 40%) QIs evaluated the completeness and adequacy of the problem list.

Conclusions: This study provided a set of 40 EMR-extractable QIs for the correct use of EMRs in primary care. These QIs can be used to map the completeness of EMRs by setting up an audit and feedback system, and to develop specific (computer-based) training for GPs.

背景:如果使用得当,电子病历(emr)可以支持临床决策,为研究提供信息,促进护理协调,减少医疗差错,并生成患者健康摘要。研究报告了电子病历数据质量的巨大差异。目的:本研究旨在开发一套经专家共识批准的循证电子可提取质量指标(QIs),从医学角度评估全科医生(gp)对电子病历的良好使用。方法:采用rand修正的德尔菲法进行研究。检索了TRIP和MEDLINE数据库,并根据具体的、可测量的、可分配的、现实的和有时间限制的原则筛选了一系列建议。该小组由12名全科医生和6名电子病历开发人员组成。选定的建议以百分比形式转换为质量指数。结果:从9个指南和4篇综述文章中创建了一个包含20个指标和30个建议的综合清单。经过协商一致,专家组通过了20项(100%)指标和20项(67%)建议。所有20条建议都转化为QIs。大多数(16.40%)QIs评估了问题列表的完整性和充分性。结论:本研究为emr在初级保健中的正确使用提供了一套40个可提取的QIs。这些质量指标可用于通过建立审核和反馈系统来确定电子病历的完整性,并为全科医生制定具体的(以计算机为基础的)培训。
{"title":"Development of Quality Indicators for the Correct Use of Electronic Medical Records in Primary Care: Modified Delphi Study.","authors":"Rico Paridaens, Steve Van den Bulck, Michel De Jonghe, Benjamin Fauquert, Liesbeth Meel, Willem Raat, Bert Vaes","doi":"10.2196/80057","DOIUrl":"10.2196/80057","url":null,"abstract":"<p><strong>Background: </strong>When used correctly, electronic medical records (EMRs) can support clinical decision-making, provide information for research, facilitate coordination of care, reduce medical errors, and generate patient health summaries. Studies have reported large differences in the quality of EMR data.</p><p><strong>Objective: </strong>Our study aimed to develop an evidence-based set of electronically extractable quality indicators (QIs) approved by expert consensus to assess the good use of EMRs by general practitioners (GPs) from a medical perspective.</p><p><strong>Methods: </strong>The RAND-modified Delphi method was used in this study. The TRIP and MEDLINE databases were searched, and a selection of recommendations was filtered using the specific, measurable, assignable, realistic, and time-bound principles. The panel comprised 12 GPs and 6 EMR developers. The selected recommendations were transformed into QIs as percentages.</p><p><strong>Results: </strong>A combined list of 20 indicators and 30 recommendations was created from 9 guidelines and 4 review articles. After the consensus round, 20 (100%) indicators and 20 (67%) recommendations were approved by the panel. All 20 recommendations were transformed into QIs. Most (16, 40%) QIs evaluated the completeness and adequacy of the problem list.</p><p><strong>Conclusions: </strong>This study provided a set of 40 EMR-extractable QIs for the correct use of EMRs in primary care. These QIs can be used to map the completeness of EMRs by setting up an audit and feedback system, and to develop specific (computer-based) training for GPs.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e80057"},"PeriodicalIF":3.8,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12865340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of Venous Thromboembolism Risk Prediction Models Based on Whole Blood Gene Expression Profiling Using 20 Machine Learning Algorithms: Comprehensive Analysis Study. 利用20种机器学习算法建立基于全血基因表达谱的静脉血栓栓塞风险预测模型:综合分析研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-16 DOI: 10.2196/75565
Yedong Huang, Xiaoyun Chen, Guannan Bai, Yajun Zhao, Dapeng Kuang, Lin Zhang, Wei Lu

Background: There is a lack of venous thromboembolism (VTE) risk prediction models based on gene expression information.

Objective: This study aimed to construct a VTE prediction model based on whole blood gene expression profiling, by performing a comprehensive analysis of 20 machine learning (ML) algorithms.

Methods: Two transcriptome datasets containing patients with VTE and healthy controls were obtained by searching the Gene Expression Omnibus database and used as the training and validation sets, respectively. Feature selection for model construction was performed on the training set using the least absolute shrinkage and selection operator and random forest, followed by the selection of the intersection of the chosen features. Subsequently, recursive feature elimination was applied to further refine the selected features. The selected features underwent model construction using 20 ML algorithms. The performance of the models was evaluated using various methods such as receiver operating characteristic and confusion matrix. The validation set was used for external model validation.

Results: The final results demonstrated that all algorithm models, except for k-nearest neighbor, exhibited good performance in VTE prediction. External validation data indicated that 9 algorithm models had an area under the curve greater than 0.75. The confusion matrix analysis revealed that the algorithm models maintained high specificity in the external validation cohort.

Conclusions: This study used 20 ML algorithms to construct VTE prediction models based on whole blood gene expression information, with 9 of these models demonstrating good diagnostic performance in external validation cohorts. The above models, when used in conjunction with D-dimer, may provide more valuable references for VTE diagnosis.

背景:目前缺乏基于基因表达信息的静脉血栓栓塞(VTE)风险预测模型。目的:通过对20种机器学习(ML)算法的综合分析,构建基于全血基因表达谱的静脉血栓栓塞预测模型。方法:通过检索Gene Expression Omnibus数据库获取VTE患者和健康对照的转录组数据集,分别作为训练集和验证集。使用最小绝对收缩算子和随机森林对训练集进行特征选择,然后选择所选特征的交集进行模型构建。随后,采用递归特征消去法进一步细化所选特征。选择的特征使用20ml算法进行模型构建。利用接收机工作特性和混淆矩阵等方法对模型的性能进行了评价。验证集用于外部模型验证。结果:最终结果表明,除k近邻外,所有算法模型在VTE预测中表现良好。外部验证数据表明,有9个算法模型的曲线下面积大于0.75。混淆矩阵分析显示,算法模型在外部验证队列中保持高特异性。结论:本研究采用20 ML算法构建了基于全血基因表达信息的VTE预测模型,其中9个模型在外部验证队列中表现出较好的诊断性能。上述模型与d -二聚体结合使用,可为VTE诊断提供更有价值的参考。
{"title":"Development of Venous Thromboembolism Risk Prediction Models Based on Whole Blood Gene Expression Profiling Using 20 Machine Learning Algorithms: Comprehensive Analysis Study.","authors":"Yedong Huang, Xiaoyun Chen, Guannan Bai, Yajun Zhao, Dapeng Kuang, Lin Zhang, Wei Lu","doi":"10.2196/75565","DOIUrl":"10.2196/75565","url":null,"abstract":"<p><strong>Background: </strong>There is a lack of venous thromboembolism (VTE) risk prediction models based on gene expression information.</p><p><strong>Objective: </strong>This study aimed to construct a VTE prediction model based on whole blood gene expression profiling, by performing a comprehensive analysis of 20 machine learning (ML) algorithms.</p><p><strong>Methods: </strong>Two transcriptome datasets containing patients with VTE and healthy controls were obtained by searching the Gene Expression Omnibus database and used as the training and validation sets, respectively. Feature selection for model construction was performed on the training set using the least absolute shrinkage and selection operator and random forest, followed by the selection of the intersection of the chosen features. Subsequently, recursive feature elimination was applied to further refine the selected features. The selected features underwent model construction using 20 ML algorithms. The performance of the models was evaluated using various methods such as receiver operating characteristic and confusion matrix. The validation set was used for external model validation.</p><p><strong>Results: </strong>The final results demonstrated that all algorithm models, except for k-nearest neighbor, exhibited good performance in VTE prediction. External validation data indicated that 9 algorithm models had an area under the curve greater than 0.75. The confusion matrix analysis revealed that the algorithm models maintained high specificity in the external validation cohort.</p><p><strong>Conclusions: </strong>This study used 20 ML algorithms to construct VTE prediction models based on whole blood gene expression information, with 9 of these models demonstrating good diagnostic performance in external validation cohorts. The above models, when used in conjunction with D-dimer, may provide more valuable references for VTE diagnosis.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e75565"},"PeriodicalIF":3.8,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12810949/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mild Cognitive Impairment Detection System Based on Unstructured Spontaneous Speech: Longitudinal Dual-Modal Framework. 基于非结构化自发语音的轻度认知障碍检测系统:纵向双模态框架。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-15 DOI: 10.2196/80883
Yu-Shan Liao, Thiri Wai, Ting-Yun Liao, Ho-Ling Chang, Yu-Ling Chang, Li-Chen Fu

Background: In recent years, the incidence of cognitive diseases has also risen with the significant increase in population aging. Among these diseases, Alzheimer disease constitutes a substantial proportion, placing a high-cost burden on health care systems. To give early treatment and slow the progression of patient deterioration, it is crucial to diagnose mild cognitive impairment (MCI), a transitional stage.

Objective: In this study, we use autobiographical memory (AM) test speech data to establish a dual-modal longitudinal cognitive detection system for MCI. The AM test is a psychological assessment method that evaluates the cognitive status of subjects as they freely narrate important life experiences.

Methods: Identifying hidden disease-related information in unstructured, spontaneous speech is more difficult than in structured speech. To improve this process, we use both speech and text data, which provide more clues about a person's cognitive state. In addition, to track how cognition changes over time in spontaneous speech, we introduce an aging trajectory module. This module uses local and global alignment loss functions to better learn time-related features by aligning cognitive changes across different time points.

Results: In our experiments on the Chinese dataset, the longitudinal model incorporating the aging trajectory module achieved area under the receiver operating characteristic curve of 0.85 and 0.89 on 2 datasets, respectively, showing significant improvement over cross-sectional, single time point models. We also conducted ablation studies to verify the necessity of the proposed aging trajectory module. To confirm that the model not only applies to AM test data, we used part of the model to evaluate the performance on the ADReSSo dataset, a single time point semistructured data for validation, with results showing an accuracy exceeding 0.88.

Conclusions: This study presents a noninvasive and scalable approach for early MCI detection by leveraging AM speech data across multiple time points. Through dual-modal analysis and the introduction of an aging trajectory module, our system effectively captures cognitive decline trends over time. Experimental results demonstrate the method's robustness and generalizability, highlighting its potential for real-world, long-term cognitive monitoring.

背景:近年来,随着人口老龄化的显著加剧,认知疾病的发病率也有所上升。在这些疾病中,阿尔茨海默病占很大比例,给卫生保健系统带来了高昂的费用负担。为了给予早期治疗和减缓患者恶化的进展,诊断轻度认知障碍(MCI)是一个过渡性阶段是至关重要的。目的:利用自传体记忆(AM)测试语音数据,建立MCI双模态纵向认知检测系统。AM测试是一种心理评估方法,评估受试者在自由叙述重要生活经历时的认知状况。方法:在非结构化、自发语言中识别隐性疾病相关信息比在结构化语言中更困难。为了改进这一过程,我们同时使用语音和文本数据,它们提供了更多关于一个人的认知状态的线索。此外,为了跟踪认知随时间的变化,我们引入了一个老化轨迹模块。该模块使用局部和全局对齐损失函数,通过对齐不同时间点的认知变化,更好地学习与时间相关的特征。结果:在中国数据集的实验中,纳入老化轨迹模块的纵向模型在2个数据集上的受试者工作特征曲线下面积分别达到0.85和0.89,比横截面、单时间点模型有显著改善。我们还进行了烧蚀研究,以验证所提出的老化轨迹模块的必要性。为了证实该模型不仅适用于AM测试数据,我们使用部分模型在ADReSSo数据集(单时间点半结构化数据)上评估性能进行验证,结果显示准确率超过0.88。结论:本研究提出了一种非侵入性和可扩展的方法,通过利用AM语音数据跨多个时间点进行早期MCI检测。通过双模态分析和衰老轨迹模块的引入,我们的系统有效地捕捉了认知能力随时间的下降趋势。实验结果证明了该方法的鲁棒性和泛化性,突出了其在现实世界中长期认知监测的潜力。
{"title":"Mild Cognitive Impairment Detection System Based on Unstructured Spontaneous Speech: Longitudinal Dual-Modal Framework.","authors":"Yu-Shan Liao, Thiri Wai, Ting-Yun Liao, Ho-Ling Chang, Yu-Ling Chang, Li-Chen Fu","doi":"10.2196/80883","DOIUrl":"10.2196/80883","url":null,"abstract":"<p><strong>Background: </strong>In recent years, the incidence of cognitive diseases has also risen with the significant increase in population aging. Among these diseases, Alzheimer disease constitutes a substantial proportion, placing a high-cost burden on health care systems. To give early treatment and slow the progression of patient deterioration, it is crucial to diagnose mild cognitive impairment (MCI), a transitional stage.</p><p><strong>Objective: </strong>In this study, we use autobiographical memory (AM) test speech data to establish a dual-modal longitudinal cognitive detection system for MCI. The AM test is a psychological assessment method that evaluates the cognitive status of subjects as they freely narrate important life experiences.</p><p><strong>Methods: </strong>Identifying hidden disease-related information in unstructured, spontaneous speech is more difficult than in structured speech. To improve this process, we use both speech and text data, which provide more clues about a person's cognitive state. In addition, to track how cognition changes over time in spontaneous speech, we introduce an aging trajectory module. This module uses local and global alignment loss functions to better learn time-related features by aligning cognitive changes across different time points.</p><p><strong>Results: </strong>In our experiments on the Chinese dataset, the longitudinal model incorporating the aging trajectory module achieved area under the receiver operating characteristic curve of 0.85 and 0.89 on 2 datasets, respectively, showing significant improvement over cross-sectional, single time point models. We also conducted ablation studies to verify the necessity of the proposed aging trajectory module. To confirm that the model not only applies to AM test data, we used part of the model to evaluate the performance on the ADReSSo dataset, a single time point semistructured data for validation, with results showing an accuracy exceeding 0.88.</p><p><strong>Conclusions: </strong>This study presents a noninvasive and scalable approach for early MCI detection by leveraging AM speech data across multiple time points. Through dual-modal analysis and the introduction of an aging trajectory module, our system effectively captures cognitive decline trends over time. Experimental results demonstrate the method's robustness and generalizability, highlighting its potential for real-world, long-term cognitive monitoring.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e80883"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12807404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145985738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prompting and Fine-Tuning Large Language Models for Parkinson Disease Diagnosis: Comparative Evaluation Study Using the PPMI Structured Dataset. 帕金森病诊断的提示和微调大语言模型:使用PPMI结构化数据集的比较评估研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2026-01-15 DOI: 10.2196/77561
Hyun-Ji Shin, Young Jin Jeong, Sungmin Jun, Do-Young Kang
<p><strong>Background: </strong>Parkinson disease (PD) presents diagnostic challenges due to its heterogeneous motor and nonmotor manifestations. Traditional machine learning (ML) approaches have been evaluated on structured clinical variables. However, the diagnostic utility of large language models (LLMs) using natural language representations of structured clinical data remains underexplored.</p><p><strong>Objective: </strong>This study aimed to evaluate the diagnostic classification performance of multiple LLMs using natural language prompts derived from structured clinical data and to compare their performance with traditional ML baselines.</p><p><strong>Methods: </strong>We reformatted structured clinical variables from the Parkinson's Progression Markers Initiative (PPMI) dataset into natural language prompts and used them as inputs for several LLMs. Variables with high multicollinearity were removed, and the top 10 features were selected using Shapley additive explanations (SHAP)-based feature ranking. LLM performance was examined across few-shot prompting, dual-output prompting that additionally generated post hoc explanatory text as an exploratory component, and supervised fine-tuning. Logistic regression (LR) and support vector machine (SVM) classifiers served as ML baselines. Model performance was evaluated using F<sub>1</sub>-scores on both the test set and a temporally independent validation set (temporal validation set) of limited size, and repeated output generation was carried out to assess stability.</p><p><strong>Results: </strong>On the test set of 122 participants, LR and SVM trained on the 10 SHAP-selected clinical variables each achieved a macro-averaged F<sub>1</sub>-score of 0.960 (accuracy 0.975). LLMs receiving natural language prompts derived from the same variables reached comparable performance, with the best few-shot configurations achieving macro-averaged F<sub>1</sub>-scores of 0.987 (accuracy 0.992). In the temporal validation set of 31 participants, LR maintained a macro-averaged F<sub>1</sub>-score of 0.903, whereas SVM showed substantial performance degradation. In contrast, multiple LLMs sustained high diagnostic performance, reaching macro-averaged F<sub>1</sub>-scores up to 0.968 and high recall for PD. Repeated output generation across LLM conditions produced generally stable predictions, with rare variability observed across runs. Under dual-output prompting, diagnostic performance showed a reduction relative to few-shot prompting while remaining generally stable. Supervised fine-tuning of lightweight models improved stability and enabled GPT-4o-mini to achieve a macro-averaged F<sub>1</sub>-score of 0.987 on the test set, with uniformly correct predictions observed in the small temporal validation set, which should be interpreted cautiously given the limited sample size and exploratory nature of the evaluation.</p><p><strong>Conclusions: </strong>This study provides an exploratory benchmark of how modern
背景:帕金森病(PD)由于其异质性的运动和非运动表现,给诊断带来了挑战。传统的机器学习(ML)方法已经在结构化临床变量上进行了评估。然而,使用结构化临床数据的自然语言表示的大型语言模型(llm)的诊断效用仍未得到充分探索。目的:本研究旨在评估基于结构化临床数据的自然语言提示对多种LLMs的诊断分类性能,并将其性能与传统ML基线进行比较。方法:我们将帕金森进展标记计划(PPMI)数据集中的结构化临床变量重新格式化为自然语言提示,并将其用作几个llm的输入。剔除多重共线性较高的变量,采用基于Shapley加性解释(SHAP)的特征排序方法选出前10个特征。通过少量提示、双输出提示和监督微调来检查LLM的性能,双输出提示额外生成临时解释性文本作为探索性组件,并监督微调。逻辑回归(LR)和支持向量机(SVM)分类器作为ML基线。使用f1分数在测试集和有限大小的时间独立验证集(时间验证集)上评估模型性能,并进行重复输出生成以评估稳定性。结果:在122名受试者的测试集上,对10个shap选择的临床变量进行训练的LR和SVM的宏观平均f1得分均为0.960(准确率0.975)。接收来自相同变量的自然语言提示的llm达到了相当的性能,最佳的少射配置实现了0.987的宏观平均f1分数(准确率0.992)。在31个参与者的时间验证集中,LR保持了0.903的宏观平均f1得分,而SVM表现出明显的性能下降。相比之下,多个llm保持了较高的诊断性能,宏观平均f1得分高达0.968,PD的召回率也很高。在LLM条件下重复生成输出,通常会产生稳定的预测,在运行期间观察到罕见的可变性。在双输出提示下,诊断性能相对于少量提示有所下降,但总体保持稳定。轻量级模型的监督微调提高了稳定性,使gpt - 40 -mini在测试集中实现了宏观平均f1得分0.987,在小时间验证集中观察到一致正确的预测,考虑到有限的样本量和评估的探索性,应该谨慎解释。结论:本研究为现代法学硕士如何以自然语言形式处理结构化临床变量提供了探索性基准。虽然有几个模型在测试和时间验证数据集上实现了与LR相当的诊断性能,但它们的输出对提示格式、模型选择和类别分布很敏感。重复输出代之间的偶然性反映了llm的随机性质,轻量级模型需要监督微调以实现稳定的泛化。这些发现强调了当前llm在处理表格临床信息方面的能力和局限性,并强调了谨慎应用和进一步研究的必要性。
{"title":"Prompting and Fine-Tuning Large Language Models for Parkinson Disease Diagnosis: Comparative Evaluation Study Using the PPMI Structured Dataset.","authors":"Hyun-Ji Shin, Young Jin Jeong, Sungmin Jun, Do-Young Kang","doi":"10.2196/77561","DOIUrl":"10.2196/77561","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Parkinson disease (PD) presents diagnostic challenges due to its heterogeneous motor and nonmotor manifestations. Traditional machine learning (ML) approaches have been evaluated on structured clinical variables. However, the diagnostic utility of large language models (LLMs) using natural language representations of structured clinical data remains underexplored.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to evaluate the diagnostic classification performance of multiple LLMs using natural language prompts derived from structured clinical data and to compare their performance with traditional ML baselines.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We reformatted structured clinical variables from the Parkinson's Progression Markers Initiative (PPMI) dataset into natural language prompts and used them as inputs for several LLMs. Variables with high multicollinearity were removed, and the top 10 features were selected using Shapley additive explanations (SHAP)-based feature ranking. LLM performance was examined across few-shot prompting, dual-output prompting that additionally generated post hoc explanatory text as an exploratory component, and supervised fine-tuning. Logistic regression (LR) and support vector machine (SVM) classifiers served as ML baselines. Model performance was evaluated using F&lt;sub&gt;1&lt;/sub&gt;-scores on both the test set and a temporally independent validation set (temporal validation set) of limited size, and repeated output generation was carried out to assess stability.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;On the test set of 122 participants, LR and SVM trained on the 10 SHAP-selected clinical variables each achieved a macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-score of 0.960 (accuracy 0.975). LLMs receiving natural language prompts derived from the same variables reached comparable performance, with the best few-shot configurations achieving macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-scores of 0.987 (accuracy 0.992). In the temporal validation set of 31 participants, LR maintained a macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-score of 0.903, whereas SVM showed substantial performance degradation. In contrast, multiple LLMs sustained high diagnostic performance, reaching macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-scores up to 0.968 and high recall for PD. Repeated output generation across LLM conditions produced generally stable predictions, with rare variability observed across runs. Under dual-output prompting, diagnostic performance showed a reduction relative to few-shot prompting while remaining generally stable. Supervised fine-tuning of lightweight models improved stability and enabled GPT-4o-mini to achieve a macro-averaged F&lt;sub&gt;1&lt;/sub&gt;-score of 0.987 on the test set, with uniformly correct predictions observed in the small temporal validation set, which should be interpreted cautiously given the limited sample size and exploratory nature of the evaluation.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This study provides an exploratory benchmark of how modern","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"14 ","pages":"e77561"},"PeriodicalIF":3.8,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12856398/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1