首页 > 最新文献

JMIR Medical Informatics最新文献

英文 中文
Unsupervised Characterization of Temporal Dataset Shifts as an Early Indicator of AI Performance Variations: Evaluation Study Using the Medical Information Mart for Intensive Care-IV Dataset. 时间数据集移位的无监督表征作为人工智能性能变化的早期指标:使用重症监护- iv数据集的医疗信息集市的评估研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-12-03 DOI: 10.2196/78309
David Fernández-Narro, Pablo Ferri, Alba Gutiérrez-Sacristán, Juan M García-Gómez, Carlos Sáez

Background: Reusing long-term data from electronic health records is essential for training reliable and effective health artificial intelligence (AI). However, intrinsic changes in health data distributions over time-known as dataset shifts, which include concept, covariate, and prior shifts-can compromise model performance, leading to model obsolescence and inaccurate decisions.

Objective: In this study, we investigate whether unsupervised, model-agnostic characterization of temporal dataset shifts using data distribution analyses through Information Geometric Temporal (IGT) projections is an early indicator of potential AI performance variations before model development.

Methods: Using the real-world Medical Information Mart for Intensive Care-IV (MIMIC-IV) electronic health record database, encompassing data from over 40,000 patients from 2008 to 2019, we characterized its inherent dataset shift patterns through an unsupervised approach using IGT projections and data temporal heatmaps. We trained and evaluated annually a set of random forests and gradient boosting models to predict in-hospital mortality. To assess the impact of shifts on model performance, we checked the association between the temporal clusters found in both IGT projections and the intertime embedding of model performances using the Fisher exact test.

Results: Our results demonstrate a significant relationship between the unsupervised temporal shift patterns, specifically covariate and concept shifts, identified using the IGT projection method and the performance of the random forest and gradient boosting models (P<.05). We identified 2 primary temporal clusters that correspond to the periods before and after ICD-10 (International Statistical Classification of Diseases, Tenth Revision) implementation. The transition from ICD-9 (International Classification of Diseases, Ninth Revision) to ICD-10 was a major source of dataset shift, associated with a performance degradation.

Conclusions: Unsupervised, model-agnostic characterization of temporal shifts via IGT projections can serve as a proactive monitoring tool to anticipate performance shifts in clinical AI models. By incorporating early shift detection into the development pipeline, we can enhance decision-making during the training and maintenance of these models. This approach paves the way for more robust, trustworthy, and self-adapting AI systems in health care.

背景:重用电子健康记录中的长期数据对于训练可靠和有效的卫生人工智能(AI)至关重要。然而,随着时间的推移,健康数据分布的内在变化(即数据集移位,包括概念、协变量和先验移位)会损害模型性能,导致模型过时和决策不准确。目的:在本研究中,我们研究了通过信息几何时间(IGT)预测使用数据分布分析对时间数据集移动进行无监督、模型不可知的表征是否是在模型开发之前潜在的人工智能性能变化的早期指标。方法:利用真实世界的重症监护医疗信息市场- iv (MIMIC-IV)电子健康记录数据库,包括2008年至2019年超过40,000名患者的数据,我们通过使用IGT预测和数据时间热图的无监督方法表征了其固有的数据集转移模式。我们每年训练和评估一组随机森林和梯度增强模型来预测住院死亡率。为了评估变化对模型性能的影响,我们使用Fisher精确检验检查了在两个IGT预测中发现的时间聚类与模型性能的间期嵌入之间的关联。结果:我们的研究结果表明,使用IGT投影方法识别的无监督时间转移模式(特别是协变量和概念转移)与随机森林和梯度增强模型的性能之间存在显著关系(p结论:通过IGT投影识别的无监督、与模型无关的时间转移特征可以作为预测临床人工智能模型性能变化的主动监测工具)。通过将早期的转移检测合并到开发管道中,我们可以在培训和维护这些模型期间增强决策。这种方法为医疗保健领域更强大、更值得信赖和自适应的人工智能系统铺平了道路。
{"title":"Unsupervised Characterization of Temporal Dataset Shifts as an Early Indicator of AI Performance Variations: Evaluation Study Using the Medical Information Mart for Intensive Care-IV Dataset.","authors":"David Fernández-Narro, Pablo Ferri, Alba Gutiérrez-Sacristán, Juan M García-Gómez, Carlos Sáez","doi":"10.2196/78309","DOIUrl":"10.2196/78309","url":null,"abstract":"<p><strong>Background: </strong>Reusing long-term data from electronic health records is essential for training reliable and effective health artificial intelligence (AI). However, intrinsic changes in health data distributions over time-known as dataset shifts, which include concept, covariate, and prior shifts-can compromise model performance, leading to model obsolescence and inaccurate decisions.</p><p><strong>Objective: </strong>In this study, we investigate whether unsupervised, model-agnostic characterization of temporal dataset shifts using data distribution analyses through Information Geometric Temporal (IGT) projections is an early indicator of potential AI performance variations before model development.</p><p><strong>Methods: </strong>Using the real-world Medical Information Mart for Intensive Care-IV (MIMIC-IV) electronic health record database, encompassing data from over 40,000 patients from 2008 to 2019, we characterized its inherent dataset shift patterns through an unsupervised approach using IGT projections and data temporal heatmaps. We trained and evaluated annually a set of random forests and gradient boosting models to predict in-hospital mortality. To assess the impact of shifts on model performance, we checked the association between the temporal clusters found in both IGT projections and the intertime embedding of model performances using the Fisher exact test.</p><p><strong>Results: </strong>Our results demonstrate a significant relationship between the unsupervised temporal shift patterns, specifically covariate and concept shifts, identified using the IGT projection method and the performance of the random forest and gradient boosting models (P<.05). We identified 2 primary temporal clusters that correspond to the periods before and after ICD-10 (International Statistical Classification of Diseases, Tenth Revision) implementation. The transition from ICD-9 (International Classification of Diseases, Ninth Revision) to ICD-10 was a major source of dataset shift, associated with a performance degradation.</p><p><strong>Conclusions: </strong>Unsupervised, model-agnostic characterization of temporal shifts via IGT projections can serve as a proactive monitoring tool to anticipate performance shifts in clinical AI models. By incorporating early shift detection into the development pipeline, we can enhance decision-making during the training and maintenance of these models. This approach paves the way for more robust, trustworthy, and self-adapting AI systems in health care.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e78309"},"PeriodicalIF":3.8,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12712564/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145671034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Medical Feature Extraction From Clinical Examination Notes: Development and Evaluation of a Two-Phase Large Language Model Framework. 从临床检查笔记中提取医学特征:两阶段大型语言模型框架的开发和评估。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-12-03 DOI: 10.2196/78432
Manal Abumelha, Abdullah Al-Malaise Al-Ghamdi, Ayman Fayoumi, Mahmoud Ragab
<p><strong>Background: </strong>Medical feature extraction from clinical text is challenging because of limited data availability, variability in medical terminology, and the critical need for trustworthy outputs. Large language models (LLMs) offer promising capabilities but face critical challenges with hallucination.</p><p><strong>Objective: </strong>This study aims to develop a robust framework for medical feature extraction that enhances accuracy by minimizing the risk of hallucination, even with limited training data.</p><p><strong>Methods: </strong>We developed a two-phase training approach. Phase 1 used instructing fine-tuning to teach feature extraction. Phase 2 introduced confidence-regularization fine-tuning with loss functions penalizing overconfident incorrect predictions, which were captured using bidirectional matching targeting hallucination and missing features. The model was trained using the full data of 700 patient notes and on few-shot 100 patient notes. We evaluated the framework on the United States Medical Licensing Examination Step-2 Clinical Skills dataset, testing on a public split of 200 patient notes and a private split of 1839 patient notes. Performance was assessed using precision, recall, and F<sub>1</sub>-scores, with error analysis conducted on predicted features from the private test set.</p><p><strong>Results: </strong>The framework achieved an F<sub>1</sub>-score of 0.968-0.983 on the full dataset of 700 patient notes and 0.960-0.973 with a few-shot subset of 100 of 700 patient notes (14.2%), outperforming INCITE (intelligent clinical text evaluator; F<sub>1</sub>=0.883) and DeBERTa (decoding-enhanced bidirectional encoder representations from transformers with disentangled attention; F<sub>1</sub>=0.958). It reduced hallucinations by 89.9% (from 3081 to 311 features) and missing features by 88.9% (from 6376 to 708) on the private dataset compared with the baseline LLM with few-shot in-context learning. Calibration evaluation on few-shot training (100 patient notes) showed that the expected calibration error increased from 0.060 to 0.147, whereas the Brier score improved from 0.087 to 0.036. Notably, the average model confidence remained stable at 0.84 (SD 0.003) despite F<sub>1</sub> improvements from 0.819 to 0.986.</p><p><strong>Conclusions: </strong>Our two-phase LLM framework successfully addresses critical challenges in automated medical feature extraction, achieving state-of-the-art performance while reducing hallucination and missing features. The framework's ability to achieve high performance with minimal training data (F<sub>1</sub>=0.960-0.973 with 100 samples) demonstrates strong generalization capabilities essential for resource-constrained settings in medical education. While traditional calibration metrics show misalignment, the practical benefits of confidence injection led to reduced errors, and inference-time filtering provided reliable outputs suitable for automated clinical assessment appli
背景:由于有限的数据可用性、医学术语的可变性以及对可靠输出的迫切需求,从临床文本中提取医学特征具有挑战性。大型语言模型(llm)提供了有前景的功能,但面临着幻觉的关键挑战。目的:本研究旨在开发一个强大的医学特征提取框架,即使在训练数据有限的情况下,也可以通过最小化幻觉风险来提高准确性。方法:采用两阶段训练方法。第一阶段采用指导微调来教授特征提取。第二阶段引入了带有损失函数的信心正则化微调,用于惩罚过度自信的错误预测,这些预测使用针对幻觉和缺失特征的双向匹配来捕获。该模型使用了700个病人记录的全部数据,以及少数100个病人的记录进行了训练。我们在USMLE step2临床技能数据集上评估了该框架,测试了200个患者笔记的公开分割和1839个患者笔记的私人分割。使用精确度、召回率和F1分数评估性能,并对来自私有测试集的预测特征进行误差分析。结果:该框架在完整数据集(700份病历)上的F1得分为0.968 ~ 0.983,在较少数据集(700份病历)上的F1得分为0.960 ~ 0.973(14.2%),优于INCITE (F1 = 0.883)和DeBERTa (F1 = 0.958)。与具有少量上下文学习ICL的基线LLM模型相比,它在私有数据集中减少了89.9%的幻觉(从3081个特征到311个特征)和88.9%的缺失特征(从6376个到708个)。少针训练(100例患者笔记)的校准评价显示,预期校准误差ECE从0.060增加到0.147,Brier评分从0.087提高到0.036。值得注意的是,尽管F1从0.819提高到0.986,但平均模型置信度仍然稳定在0.84(±0.003)。结论:我们的两阶段LLM框架成功解决了自动化医学特征提取的关键挑战,在减少幻觉和特征缺失的同时实现了最先进的性能。该框架能够以最少的训练数据实现高性能(100个样本的F1=0.960-0.973),这证明了强大的泛化能力对于资源受限的医学教育至关重要。虽然传统的校准指标显示不一致,但置信度注入的实际好处可以减少误差,并且推断时间过滤也提供适合自动临床评估应用的可靠输出。临床试验:不适用。这项研究没有涉及临床试验或人类参与者的前瞻性登记。仅使用回顾性的、完全去识别的数据。
{"title":"Medical Feature Extraction From Clinical Examination Notes: Development and Evaluation of a Two-Phase Large Language Model Framework.","authors":"Manal Abumelha, Abdullah Al-Malaise Al-Ghamdi, Ayman Fayoumi, Mahmoud Ragab","doi":"10.2196/78432","DOIUrl":"10.2196/78432","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Medical feature extraction from clinical text is challenging because of limited data availability, variability in medical terminology, and the critical need for trustworthy outputs. Large language models (LLMs) offer promising capabilities but face critical challenges with hallucination.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aims to develop a robust framework for medical feature extraction that enhances accuracy by minimizing the risk of hallucination, even with limited training data.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We developed a two-phase training approach. Phase 1 used instructing fine-tuning to teach feature extraction. Phase 2 introduced confidence-regularization fine-tuning with loss functions penalizing overconfident incorrect predictions, which were captured using bidirectional matching targeting hallucination and missing features. The model was trained using the full data of 700 patient notes and on few-shot 100 patient notes. We evaluated the framework on the United States Medical Licensing Examination Step-2 Clinical Skills dataset, testing on a public split of 200 patient notes and a private split of 1839 patient notes. Performance was assessed using precision, recall, and F&lt;sub&gt;1&lt;/sub&gt;-scores, with error analysis conducted on predicted features from the private test set.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The framework achieved an F&lt;sub&gt;1&lt;/sub&gt;-score of 0.968-0.983 on the full dataset of 700 patient notes and 0.960-0.973 with a few-shot subset of 100 of 700 patient notes (14.2%), outperforming INCITE (intelligent clinical text evaluator; F&lt;sub&gt;1&lt;/sub&gt;=0.883) and DeBERTa (decoding-enhanced bidirectional encoder representations from transformers with disentangled attention; F&lt;sub&gt;1&lt;/sub&gt;=0.958). It reduced hallucinations by 89.9% (from 3081 to 311 features) and missing features by 88.9% (from 6376 to 708) on the private dataset compared with the baseline LLM with few-shot in-context learning. Calibration evaluation on few-shot training (100 patient notes) showed that the expected calibration error increased from 0.060 to 0.147, whereas the Brier score improved from 0.087 to 0.036. Notably, the average model confidence remained stable at 0.84 (SD 0.003) despite F&lt;sub&gt;1&lt;/sub&gt; improvements from 0.819 to 0.986.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;Our two-phase LLM framework successfully addresses critical challenges in automated medical feature extraction, achieving state-of-the-art performance while reducing hallucination and missing features. The framework's ability to achieve high performance with minimal training data (F&lt;sub&gt;1&lt;/sub&gt;=0.960-0.973 with 100 samples) demonstrates strong generalization capabilities essential for resource-constrained settings in medical education. While traditional calibration metrics show misalignment, the practical benefits of confidence injection led to reduced errors, and inference-time filtering provided reliable outputs suitable for automated clinical assessment appli","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":"e78432"},"PeriodicalIF":3.8,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12712565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145423572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Key Variances in Clinical Pathways Associated With Prolonged Hospital Stays Using Machine Learning and ePath Real-World Data: Model Development and Validation Study. 使用机器学习和ePath真实世界数据识别与延长住院时间相关的临床路径的关键差异:模型开发和验证研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-12-01 DOI: 10.2196/71617
Saori Tou, Koutarou Matsumoto, Asato Hashinokuchi, Fumihiko Kinoshita, Yasunobu Nohara, Takanori Yamashita, Yoshifumi Wakata, Tomoyoshi Takenaka, Hidehisa Soejima, Tomoharu Yoshizumi, Naoki Nakashima, Masahiro Kamouchi

Background: Prolonged hospital stays can lead to inefficiencies in health care delivery and unnecessary consumption of medical resources.

Objective: This study aimed to identify key clinical variances associated with prolonged length of stay (PLOS) in clinical pathways using a machine learning model trained on real-world data from the ePath system.

Methods: We analyzed data from 480 patients with lung cancer (age: mean 68.3, SD 11.2 years; n=263, 54.8% men) who underwent video-assisted thoracoscopic surgery at a university hospital between 2019 and 2023. PLOS was defined as a hospital stay exceeding 9 days after video-assisted thoracoscopic surgery. The variables collected between admission and 4 days after surgery were examined, and those that showed a significant association with PLOS in univariate analyses (P<.01) were selected as predictors. Predictive models were developed using sparse linear regression methods (Lasso, ridge, and elastic net) and decision tree ensembles (random forest and extreme gradient boosting). The data were divided into derivation (earlier study period) and testing (later period) cohorts for temporal validation. The model performance was assessed using the area under the receiver operating characteristic curve, Brier score, and calibration plots. Counterfactual analysis was used to identify key clinical factors influencing PLOS.

Results: A 3D heatmap illustrated the temporal relationships between clinical factors and PLOS based on patient demographics, comorbidities, functional status, surgical details, care processes, medications, and variances recorded from admission to 4 days after surgery. Among the 5 algorithms evaluated, the ridge regression model demonstrated the best performance in terms of both discrimination and calibration. Specifically, it achieved area under the receiver operating characteristic curve values of 0.84 and 0.82 and Brier scores of 0.16 and 0.17 in the derivation and test cohorts, respectively. In the final model, a range of variables, including blood tests, care, patient background, procedures, and clinical variances, were associated with PLOS. Among these, particular emphasis was placed on clinical variances. Counterfactual analysis using the ridge regression model identified 6 key variables strongly linked to PLOS. In order of impact, these were abnormal respiratory sounds, postoperative fever, arrhythmia, impaired ambulation, complications after drain removal, and pulmonary air leaks.

Conclusions: A machine learning-based model using ePath data effectively identified critical variances in the clinical pathways associated with PLOS. This automated tool may enhance clinical decision-making and improve patient management.

背景:延长住院时间会导致医疗服务效率低下和不必要的医疗资源消耗。目的:本研究旨在利用ePath系统的真实世界数据训练的机器学习模型,确定与临床路径中延长住院时间(PLOS)相关的关键临床差异。方法:我们分析了2019年至2023年间在一所大学医院接受视频辅助胸腔镜手术的480例肺癌患者(平均68.3岁,SD 11.2岁;n=263,男性54.8%)的数据。PLOS被定义为电视胸腔镜手术后住院时间超过9天。研究了入院至术后4天收集的变量,以及在单变量分析中显示与PLOS显著相关的变量(结果:3D热图显示了临床因素与PLOS之间的时间关系,该关系基于患者人口统计学、合并症、功能状态、手术细节、护理过程、药物以及入院至术后4天记录的差异)。在评估的5种算法中,脊回归模型在识别和校准方面表现最好。具体而言,推导组和检验组的受试者工作特征曲线下面积分别为0.84和0.82,Brier评分分别为0.16和0.17。在最后的模型中,一系列变量,包括血液测试、护理、患者背景、程序和临床差异,都与PLOS相关。其中,特别强调的是临床差异。使用脊回归模型进行反事实分析,确定了与PLOS密切相关的6个关键变量。影响因素依次为异常呼吸音、术后发热、心律失常、行动障碍、引流管取出后并发症和肺部漏气。结论:使用ePath数据的基于机器学习的模型有效地识别了与PLOS相关的临床路径的关键差异。这种自动化工具可以增强临床决策,改善患者管理。
{"title":"Identifying Key Variances in Clinical Pathways Associated With Prolonged Hospital Stays Using Machine Learning and ePath Real-World Data: Model Development and Validation Study.","authors":"Saori Tou, Koutarou Matsumoto, Asato Hashinokuchi, Fumihiko Kinoshita, Yasunobu Nohara, Takanori Yamashita, Yoshifumi Wakata, Tomoyoshi Takenaka, Hidehisa Soejima, Tomoharu Yoshizumi, Naoki Nakashima, Masahiro Kamouchi","doi":"10.2196/71617","DOIUrl":"10.2196/71617","url":null,"abstract":"<p><strong>Background: </strong>Prolonged hospital stays can lead to inefficiencies in health care delivery and unnecessary consumption of medical resources.</p><p><strong>Objective: </strong>This study aimed to identify key clinical variances associated with prolonged length of stay (PLOS) in clinical pathways using a machine learning model trained on real-world data from the ePath system.</p><p><strong>Methods: </strong>We analyzed data from 480 patients with lung cancer (age: mean 68.3, SD 11.2 years; n=263, 54.8% men) who underwent video-assisted thoracoscopic surgery at a university hospital between 2019 and 2023. PLOS was defined as a hospital stay exceeding 9 days after video-assisted thoracoscopic surgery. The variables collected between admission and 4 days after surgery were examined, and those that showed a significant association with PLOS in univariate analyses (P<.01) were selected as predictors. Predictive models were developed using sparse linear regression methods (Lasso, ridge, and elastic net) and decision tree ensembles (random forest and extreme gradient boosting). The data were divided into derivation (earlier study period) and testing (later period) cohorts for temporal validation. The model performance was assessed using the area under the receiver operating characteristic curve, Brier score, and calibration plots. Counterfactual analysis was used to identify key clinical factors influencing PLOS.</p><p><strong>Results: </strong>A 3D heatmap illustrated the temporal relationships between clinical factors and PLOS based on patient demographics, comorbidities, functional status, surgical details, care processes, medications, and variances recorded from admission to 4 days after surgery. Among the 5 algorithms evaluated, the ridge regression model demonstrated the best performance in terms of both discrimination and calibration. Specifically, it achieved area under the receiver operating characteristic curve values of 0.84 and 0.82 and Brier scores of 0.16 and 0.17 in the derivation and test cohorts, respectively. In the final model, a range of variables, including blood tests, care, patient background, procedures, and clinical variances, were associated with PLOS. Among these, particular emphasis was placed on clinical variances. Counterfactual analysis using the ridge regression model identified 6 key variables strongly linked to PLOS. In order of impact, these were abnormal respiratory sounds, postoperative fever, arrhythmia, impaired ambulation, complications after drain removal, and pulmonary air leaks.</p><p><strong>Conclusions: </strong>A machine learning-based model using ePath data effectively identified critical variances in the clinical pathways associated with PLOS. This automated tool may enhance clinical decision-making and improve patient management.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e71617"},"PeriodicalIF":3.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12706448/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data. 使用大型语言模型实现实时临床肿瘤分析:使用非结构化合成数据的可行性和有效性研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-12-01 DOI: 10.2196/78332
Peter May, Julian Greß, Christoph Seidel, Sebastian Sommer, Markus K Schuler, Sina Nokodian, Florian Schröder, Johannes Jung
<p><strong>Background: </strong>Traditional cancer registries, limited by labor-intensive manual data abstraction and rigid, predefined schemas, often hinder timely and comprehensive oncology research. While large language models (LLMs) have shown promise in automating data extraction, their potential to perform direct, just-in-time (JIT) analysis on unstructured clinical narratives-potentially bypassing intermediate structured databases for many analytical tasks-remains largely unexplored.</p><p><strong>Objective: </strong>This study aimed to evaluate whether a state-of-the-art LLM (Gemini 2.5 Pro) can enable a JIT clinical oncology analysis paradigm by assessing its ability to (1) perform high-fidelity multiparameter data extraction, (2) answer complex clinical queries directly from raw text, (3) automate multistep survival analyses including executable code generation, and (4) generate novel, clinically plausible hypotheses from free-text documentation.</p><p><strong>Methods: </strong>A synthetic dataset of 240 unstructured clinical letters from patients with stage IV non-small cell lung cancer (NSCLC), embedding 14 predefined variables, was used. Gemini 2.5 Pro was evaluated on four core JIT capabilities. Performance was measured by using the following metrics: extraction accuracy (compared to human extraction of n=40 letters and across the full n=240 dataset); numerical deviation for direct question answering (n=40 to 240 letters, 5 questions); log-rank P value and Harrell concordance index for LLM-generated versus ground-truth Kaplan-Meier survival analyses (n=160 letters, overall survival and progression-free survival); and correct justification, novelty, and a qualitative evaluation of LLM-generated hypotheses (n=80 and n=160 letters).</p><p><strong>Results: </strong>For multiparameter extraction from 40 letters, the LLM achieved >99% average accuracy, comparable to human extraction, but in significantly less time (LLM: 3.7 min vs human: 133.8 min). Across the full 240-letter dataset, LLM multiparameter extraction maintained >98% accuracy for most variables. The LLM answered multiconditional clinical queries directly from raw text with a relative deviation rarely exceeding 1.5%, even with up to 240 letters. Crucially, it autonomously performed end-to-end survival analysis, generating text-to-R-code that produced Kaplan-Meier curves statistically indistinguishable from ground truth. Consistent performance was demonstrated on a small validation cohort of 80 synthetic acute myeloid leukemia reports. Stress testing on data with simulated imperfections revealed a key role of a human-in-the-loop to resolve AI-flagged ambiguities. Furthermore, the LLM generated several correctly justified, biologically plausible, and potentially novel hypotheses from datasets up to 80 letters.</p><p><strong>Conclusions: </strong>This feasibility study demonstrated that a frontier LLM (Gemini 2.5 Pro) can successfully perform high-fidelity data extraction, multic
背景:传统的癌症登记处受劳动密集型的人工数据抽象和僵化的预定义模式的限制,往往阻碍及时和全面的肿瘤学研究。虽然大型语言模型(llm)在自动化数据提取方面显示出了希望,但它们对非结构化临床叙述执行直接、即时(JIT)分析的潜力——可能绕过中间结构化数据库进行许多分析任务——在很大程度上仍未被探索。目的:本研究旨在评估最先进的LLM (Gemini 2.5 Pro)是否能够通过评估其能力来实现JIT临床肿瘤学分析范式:(1)执行高保真多参数数据提取,(2)直接从原始文本回答复杂的临床查询,(3)自动化多步骤生存分析,包括可执行代码生成,以及(4)从自由文本文档生成新颖的,临床可信的假设。方法:使用包含240份来自IV期非小细胞肺癌(NSCLC)患者的非结构化临床信函的合成数据集,嵌入14个预定义变量。对Gemini 2.5 Pro进行了四项核心JIT功能的评估。性能通过使用以下指标来衡量:提取准确性(与人类提取n=40个字母和整个n=240个数据集相比);直接问答的数值偏差(n=40 ~ 240个字母,5个问题);llm生成与真实Kaplan-Meier生存分析的log-rank P值和Harrell一致性指数(n=160个字母,总生存期和无进展生存期);以及法学硕士生成的假设的正确论证、新颖性和定性评价(n=80和n=160个字母)。结果:对于40个字母的多参数提取,LLM达到了bbbb99 %的平均准确率,与人类提取相当,但时间明显更短(LLM: 3.7 min vs .人类:133.8 min)。在整个240个字母的数据集中,LLM多参数提取对大多数变量保持了bb0 98%的准确率。LLM直接从原始文本回答多条件临床查询,相对偏差很少超过1.5%,即使高达240个字母。至关重要的是,它能自主执行端到端生存分析,生成文本到r代码,生成Kaplan-Meier曲线,在统计上与基础事实难以区分。在80例合成急性髓性白血病报告的小型验证队列中证明了一致的性能。对具有模拟缺陷的数据进行的压力测试显示,人工在解决人工智能标记的歧义方面发挥了关键作用。此外,法学硕士从多达80个字母的数据集中生成了几个正确证明的、生物学上合理的、潜在的新颖假设。结论:该可行性研究表明,前沿LLM (Gemini 2.5 Pro)可以成功地从非结构化文本中进行高保真数据提取、多条件查询和自动生存分析。这些结果为JIT临床分析方法的概念提供了基础证明。然而,这些发现仅限于合成患者,在考虑临床应用之前,对现实世界临床数据的严格验证是必不可少的下一步。
{"title":"Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data.","authors":"Peter May, Julian Greß, Christoph Seidel, Sebastian Sommer, Markus K Schuler, Sina Nokodian, Florian Schröder, Johannes Jung","doi":"10.2196/78332","DOIUrl":"10.2196/78332","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Traditional cancer registries, limited by labor-intensive manual data abstraction and rigid, predefined schemas, often hinder timely and comprehensive oncology research. While large language models (LLMs) have shown promise in automating data extraction, their potential to perform direct, just-in-time (JIT) analysis on unstructured clinical narratives-potentially bypassing intermediate structured databases for many analytical tasks-remains largely unexplored.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to evaluate whether a state-of-the-art LLM (Gemini 2.5 Pro) can enable a JIT clinical oncology analysis paradigm by assessing its ability to (1) perform high-fidelity multiparameter data extraction, (2) answer complex clinical queries directly from raw text, (3) automate multistep survival analyses including executable code generation, and (4) generate novel, clinically plausible hypotheses from free-text documentation.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;A synthetic dataset of 240 unstructured clinical letters from patients with stage IV non-small cell lung cancer (NSCLC), embedding 14 predefined variables, was used. Gemini 2.5 Pro was evaluated on four core JIT capabilities. Performance was measured by using the following metrics: extraction accuracy (compared to human extraction of n=40 letters and across the full n=240 dataset); numerical deviation for direct question answering (n=40 to 240 letters, 5 questions); log-rank P value and Harrell concordance index for LLM-generated versus ground-truth Kaplan-Meier survival analyses (n=160 letters, overall survival and progression-free survival); and correct justification, novelty, and a qualitative evaluation of LLM-generated hypotheses (n=80 and n=160 letters).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;For multiparameter extraction from 40 letters, the LLM achieved &gt;99% average accuracy, comparable to human extraction, but in significantly less time (LLM: 3.7 min vs human: 133.8 min). Across the full 240-letter dataset, LLM multiparameter extraction maintained &gt;98% accuracy for most variables. The LLM answered multiconditional clinical queries directly from raw text with a relative deviation rarely exceeding 1.5%, even with up to 240 letters. Crucially, it autonomously performed end-to-end survival analysis, generating text-to-R-code that produced Kaplan-Meier curves statistically indistinguishable from ground truth. Consistent performance was demonstrated on a small validation cohort of 80 synthetic acute myeloid leukemia reports. Stress testing on data with simulated imperfections revealed a key role of a human-in-the-loop to resolve AI-flagged ambiguities. Furthermore, the LLM generated several correctly justified, biologically plausible, and potentially novel hypotheses from datasets up to 80 letters.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This feasibility study demonstrated that a frontier LLM (Gemini 2.5 Pro) can successfully perform high-fidelity data extraction, multic","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e78332"},"PeriodicalIF":3.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12670046/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaps and Pathways to Success in Global Health Informatics Academic Collaborations: Reflecting on Current Practices. 差距和途径成功的全球卫生信息学学术合作:反映当前的做法。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-28 DOI: 10.2196/67326
Elizabeth A Campbell, Felix Holl, Oliver J Bear Don't Walk Iv, Badisa Mosesane, Andrew S Kanter, Hamish Fraser, Amanda L Joseph, Judy Wawira Gichoya, Kabelo Leonard Mauco, Sansanee Craig

Unlabelled: Academic global health informatics (GHI) projects are impactful collaborations between institutions in high-income and low- and middle-income countries (LMICs) and play a crucial role in enhancing health care services and access in LMICs using eHealth practices. Researchers across all involved organizations bring unique expertise to these collaborations. However, these projects often face significant obstacles, including cultural and linguistic barriers, resource limitations, and sustainability issues. The lack of representation from LMIC researchers in knowledge generation and the high costs of open-access publications further complicate efforts to ensure inclusive, accessible, and collaborative scholarship. This viewpoint describes present gaps in the literature on academic GHI collaborations and describes a path forward for future research directions and successful research community development. Key recommendations include centering community-based participatory research, developing post-growth solutions, and creating sustainable funding models. Addressing these challenges is essential for fostering effective, scalable, and equitable GHI interventions that improve global health outcomes.

未标记:全球卫生信息学学术项目是高收入国家和中低收入国家机构之间有影响力的合作,在利用电子卫生实践加强中低收入国家的卫生保健服务和获取方面发挥着至关重要的作用。所有相关组织的研究人员都为这些合作带来了独特的专业知识。然而,这些项目往往面临重大障碍,包括文化和语言障碍、资源限制和可持续性问题。LMIC研究人员在知识生成方面缺乏代表性,以及开放获取出版物的高成本,进一步使确保包容性、可获取性和合作性学术的努力复杂化。这一观点描述了目前在学术GHI合作方面的文献差距,并描述了未来研究方向和成功的研究社区发展的前进道路。主要建议包括以社区为中心的参与性研究、制定增长后解决方案以及创建可持续的筹资模式。应对这些挑战对于促进有效、可扩展和公平的全球健康指数干预措施以改善全球健康结果至关重要。
{"title":"Gaps and Pathways to Success in Global Health Informatics Academic Collaborations: Reflecting on Current Practices.","authors":"Elizabeth A Campbell, Felix Holl, Oliver J Bear Don't Walk Iv, Badisa Mosesane, Andrew S Kanter, Hamish Fraser, Amanda L Joseph, Judy Wawira Gichoya, Kabelo Leonard Mauco, Sansanee Craig","doi":"10.2196/67326","DOIUrl":"10.2196/67326","url":null,"abstract":"<p><strong>Unlabelled: </strong>Academic global health informatics (GHI) projects are impactful collaborations between institutions in high-income and low- and middle-income countries (LMICs) and play a crucial role in enhancing health care services and access in LMICs using eHealth practices. Researchers across all involved organizations bring unique expertise to these collaborations. However, these projects often face significant obstacles, including cultural and linguistic barriers, resource limitations, and sustainability issues. The lack of representation from LMIC researchers in knowledge generation and the high costs of open-access publications further complicate efforts to ensure inclusive, accessible, and collaborative scholarship. This viewpoint describes present gaps in the literature on academic GHI collaborations and describes a path forward for future research directions and successful research community development. Key recommendations include centering community-based participatory research, developing post-growth solutions, and creating sustainable funding models. Addressing these challenges is essential for fostering effective, scalable, and equitable GHI interventions that improve global health outcomes.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e67326"},"PeriodicalIF":3.8,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12669914/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive Performance of Radiomics-Based Machine Learning for Colorectal Cancer Recurrence Risk: Systematic Review and Meta-Analysis. 基于放射组学的机器学习对结直肠癌复发风险的预测性能:系统评价和荟萃分析。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-28 DOI: 10.2196/78644
Yuan Sun, Bo Li, Chuanlan Ju, Liming Hu, Huiyi Sun, Jing An, Tae-Hun Kim, Zhijun Bu, Zeyang Shi, Jianping Liu, Zhaolan Liu
<p><strong>Background: </strong>Predicting colorectal cancer (CRC) recurrence risk remains a challenge in clinical practice. Owing to the widespread use of radiomics in CRC diagnosis and treatment, some researchers recently explored the effectiveness of radiomics-based models in forecasting CRC recurrence risk. Nonetheless, the lack of systematic evidence of the efficacy of such models has hampered their clinical adoption.</p><p><strong>Objective: </strong>This study aimed to explore the value of radiomics in predicting CRC recurrence, providing a scholarly rationale for developing more specific interventions.</p><p><strong>Methods: </strong>Overall, 4 databases (Embase, PubMed, the Cochrane Library, and Web of Science) were searched for relevant articles from inception to January 1, 2025. We included studies that developed or validated radiomics-based machine learning models for predicting CRC recurrence using computed tomography or magnetic resonance imaging and provided discriminative performance metrics (c-index). Nonoriginal articles, studies that did not develop a model, and those lacking clear outcome measures were excluded from the study. The quality of the included original studies was assessed using the Radiomics Quality Score. A bivariate mixed-effects model was used to conduct a meta-analysis in which the c-index values with 95% CI were pooled. For the meta-analysis, subgroup analyses were conducted separately on the validation and training sets.</p><p><strong>Results: </strong>This meta-analysis included 17 original studies involving 4600 patients with CRC. The quality of the identified studies was low (mean Radiomics Quality Score 13.23/36, SD 2.56), with limitations in prospective design and biological validation. In the validation set, the c-index values based on clinical features, radiomics features, and radiomics features combined with clinical features were 0.73 (95% CI 0.68-0.79), 0.80 (95% CI 0.75-0.85), and 0.83 (95% CI 0.79-0.87), respectively. In the internal validation set, the c-index values based on clinical features, radiomics features, and radiomics features+clinical features were 0.70 (95% CI 0.61-0.79), 0.83 (95% CI 0.78-0.88), and 0.83 (95% CI 0.78-0.88), respectively. Finally, in the external validation set, the c-index values based on clinical features, radiomics features, and radiomics features combined with clinical features were 0.76 (95% CI 0.70-0.83), 0.75 (95% CI 0.66-0.83), and 0.83 (95% CI 0.78-0.88), respectively.</p><p><strong>Conclusions: </strong>Radiomics-based machine learning models, especially those integrating radiomics and clinical features, showed promising predictive performance for CRC recurrence risk. However, this study has several limitations, such as moderate study quality, limited sample size, and high heterogeneity in modeling approaches. These findings suggest the potential clinical value of integrated models in risk stratification and their potential to enhance personalized treatment,
背景:预测结直肠癌(CRC)复发风险在临床实践中仍然是一个挑战。由于放射组学在CRC诊断和治疗中的广泛应用,一些研究人员最近探索了基于放射组学的模型在预测CRC复发风险方面的有效性。然而,缺乏系统的证据,这些模型的有效性阻碍了他们的临床应用。目的:本研究旨在探讨放射组学在预测结直肠癌复发中的价值,为制定更具体的干预措施提供学术依据。方法:在Embase、PubMed、Cochrane Library和Web of Science 4个数据库中检索从成立到2025年1月1日的相关文章。我们纳入了开发或验证基于放射组学的机器学习模型的研究,这些模型用于使用计算机断层扫描或磁共振成像预测CRC复发,并提供了判别性能指标(c-index)。非原创文章、未建立模型的研究和缺乏明确结果测量的研究被排除在研究之外。使用放射组学质量评分评估纳入的原始研究的质量。采用双变量混合效应模型进行荟萃分析,合并95% CI的c指数值。对于meta分析,分别对验证集和训练集进行亚组分析。结果:该荟萃分析包括17项原始研究,涉及4600例结直肠癌患者。所确定的研究质量较低(平均放射组学质量评分13.23/36,SD 2.56),在前瞻性设计和生物学验证方面存在局限性。在验证集中,基于临床特征、放射组学特征和放射组学特征联合临床特征的c指数分别为0.73 (95% CI 0.68-0.79)、0.80 (95% CI 0.75-0.85)和0.83 (95% CI 0.79-0.87)。在内部验证集中,基于临床特征、放射组学特征和放射组学特征+临床特征的c-指数值分别为0.70 (95% CI 0.61-0.79)、0.83 (95% CI 0.78-0.88)和0.83 (95% CI 0.78-0.88)。最后,在外部验证集中,基于临床特征、放射组学特征和放射组学特征结合临床特征的c-指数值分别为0.76 (95% CI 0.70-0.83)、0.75 (95% CI 0.66-0.83)和0.83 (95% CI 0.78-0.88)。结论:基于放射组学的机器学习模型,特别是结合放射组学和临床特征的机器学习模型,在预测结直肠癌复发风险方面表现出很好的效果。然而,本研究存在一些局限性,如研究质量适中,样本量有限,建模方法异质性高。这些发现表明,综合模型在风险分层中的潜在临床价值及其增强个性化治疗的潜力,但需要进一步进行高质量的前瞻性研究。
{"title":"Predictive Performance of Radiomics-Based Machine Learning for Colorectal Cancer Recurrence Risk: Systematic Review and Meta-Analysis.","authors":"Yuan Sun, Bo Li, Chuanlan Ju, Liming Hu, Huiyi Sun, Jing An, Tae-Hun Kim, Zhijun Bu, Zeyang Shi, Jianping Liu, Zhaolan Liu","doi":"10.2196/78644","DOIUrl":"10.2196/78644","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Predicting colorectal cancer (CRC) recurrence risk remains a challenge in clinical practice. Owing to the widespread use of radiomics in CRC diagnosis and treatment, some researchers recently explored the effectiveness of radiomics-based models in forecasting CRC recurrence risk. Nonetheless, the lack of systematic evidence of the efficacy of such models has hampered their clinical adoption.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aimed to explore the value of radiomics in predicting CRC recurrence, providing a scholarly rationale for developing more specific interventions.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;Overall, 4 databases (Embase, PubMed, the Cochrane Library, and Web of Science) were searched for relevant articles from inception to January 1, 2025. We included studies that developed or validated radiomics-based machine learning models for predicting CRC recurrence using computed tomography or magnetic resonance imaging and provided discriminative performance metrics (c-index). Nonoriginal articles, studies that did not develop a model, and those lacking clear outcome measures were excluded from the study. The quality of the included original studies was assessed using the Radiomics Quality Score. A bivariate mixed-effects model was used to conduct a meta-analysis in which the c-index values with 95% CI were pooled. For the meta-analysis, subgroup analyses were conducted separately on the validation and training sets.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;This meta-analysis included 17 original studies involving 4600 patients with CRC. The quality of the identified studies was low (mean Radiomics Quality Score 13.23/36, SD 2.56), with limitations in prospective design and biological validation. In the validation set, the c-index values based on clinical features, radiomics features, and radiomics features combined with clinical features were 0.73 (95% CI 0.68-0.79), 0.80 (95% CI 0.75-0.85), and 0.83 (95% CI 0.79-0.87), respectively. In the internal validation set, the c-index values based on clinical features, radiomics features, and radiomics features+clinical features were 0.70 (95% CI 0.61-0.79), 0.83 (95% CI 0.78-0.88), and 0.83 (95% CI 0.78-0.88), respectively. Finally, in the external validation set, the c-index values based on clinical features, radiomics features, and radiomics features combined with clinical features were 0.76 (95% CI 0.70-0.83), 0.75 (95% CI 0.66-0.83), and 0.83 (95% CI 0.78-0.88), respectively.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;Radiomics-based machine learning models, especially those integrating radiomics and clinical features, showed promising predictive performance for CRC recurrence risk. However, this study has several limitations, such as moderate study quality, limited sample size, and high heterogeneity in modeling approaches. These findings suggest the potential clinical value of integrated models in risk stratification and their potential to enhance personalized treatment,","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e78644"},"PeriodicalIF":3.8,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12669921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patient Attitudes Toward Ambient Voice Technology: Preimplementation Patient Survey in an Academic Medical Center. 患者对环境语音技术的态度:在学术医疗中心实施前的患者调查。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-27 DOI: 10.2196/77901
Gary Leiserowitz, Jeff Mansfield, Scott MacDonald, Melissa Jost
<p><strong>Background: </strong>Many institutions are in various stages of deploying an artificial intelligence (AI) scribe system for clinic electronic health record (EHR) documentation. In anticipation of the University of California, Davis Health's deployment of an AI scribe program, we surveyed current patients about their perceptions of this technology to inform a patient-centered implementation.</p><p><strong>Objective: </strong>We assessed patient perceptions about current clinician EHR documentation practices before implementation of the AI scribe program, and preconceptions regarding the AI scribe's introduction.</p><p><strong>Methods: </strong>We conducted a descriptive preimplementation survey as a quality improvement study. A convenience sample of 9171 patients (aged ≥18 years) who had a clinic visit within the previous year, was recruited via an email postvisit survey. Patient-identified demographics (age, gender, and race and ethnicity) were collected. The survey included rating scales on questions related to the patient perception of the AI scribe program, plus open-ended comments. Data were collated to analyze patient perceptions of including AI Scribe technology in a clinician visit.</p><p><strong>Results: </strong>In total, 1893 patients completed the survey (20% response rate), with partial responses from another 549. Sixty-three percent (n=1205) of the respondents were female, and most were 51 years and older (87%, n=1649). Most patients identified themselves as White (69%, n=1312), multirace (8%, n=154), Latinx (7%, n=130), and Black (2%, n=42). The respondents were not representative of the overall clinic populations and skewed more toward being female, ages 50 years and older, and White in comparison. Patients reacted to the current EHR documentation system, with 71% (n=1349) feeling heard or sometimes heard, but 23% (n=416) expressed frustrations that their physician focused too much on typing into the computer. When asked about their anticipated response to the use of an AI scribe, 48% (n=904) were favorable, 33% (n=630) were neutral, and 19% (n=359) were unfavorable. Younger patients (ages 18-30 years) expressed more skepticism than those aged 51 years and older. Further, 42% (655/1567) of positive comments received indicated this technology could improve human interaction during their visits. Comments supported that the use of an AI scribe would enhance patient experience by allowing the clinician to focus on the patient. However, when asked about concerns regarding the AI scribe, 39% (515/1330) and 15% (203/1330) of comments expressed concerns about documentation accuracy and privacy, respectively. Providing previsit patient education and obtaining permission were viewed as very important.</p><p><strong>Conclusions: </strong>This patient survey showed that respondents are generally open to the use of an AI scribe program for EHR documentation to allow the clinician to focus on the patient during the actual encounter ra
背景:许多机构正处于为诊所电子健康记录(EHR)文档部署人工智能(AI)抄写系统的不同阶段。考虑到加州大学戴维斯健康中心(University of California, Davis Health)部署的人工智能记录程序,我们调查了当前患者对这项技术的看法,以告知以患者为中心的实施。目的:在实施人工智能抄写员计划之前,我们评估了患者对当前临床医生电子病历记录实践的看法,以及对人工智能抄写员引入的先入之见。方法:我们进行了一项描述性的实施前调查作为质量改进研究。通过电子邮件访后调查,选取9171例在前一年就诊的患者(年龄≥18岁)作为方便样本。收集患者确定的人口统计数据(年龄、性别、种族和民族)。该调查包括对患者对人工智能抄写程序的看法相关问题的评分量表,以及开放式评论。整理数据以分析患者对在临床医生访问中使用AI Scribe技术的看法。结果:共有1893名患者完成了调查(20%的有效率),另有549名患者部分应答。受访者中女性占63% (n=1205), 51岁及以上的占87% (n= 1649)。大多数患者认为自己是白人(69%,n=1312)、多种族(8%,n=154)、拉丁裔(7%,n=130)和黑人(2%,n=42)。受访者并不能代表整个诊所的人群,相比之下,他们更倾向于50岁及以上的女性和白人。患者对当前的EHR文件系统的反应是,71% (n=1349)的患者感觉被听到或有时被听到,但23% (n=416)的患者对他们的医生过于专注于在电脑上打字表示失望。当被问及他们对使用AI抄写器的预期反应时,48% (n=904)表示赞成,33% (n=630)表示中立,19% (n=359)表示不赞成。年轻患者(18-30岁)比51岁及以上的患者表达更多的怀疑。此外,收到的42%(655/1567)的积极评论表明,这项技术可以改善他们访问期间的人际互动。评论认为,使用人工智能抄写员可以让临床医生专注于患者,从而提高患者体验。然而,当被问及对AI抄写员的担忧时,39%(515/1330)和15%(203/1330)的评论分别表达了对文档准确性和隐私性的担忧。在就诊前对患者进行教育并获得许可是非常重要的。结论:该患者调查显示,受访者通常对使用人工智能抄写程序进行电子病历记录持开放态度,以便临床医生在实际遇到患者时专注于患者,而不是计算机。在使用人工智能之前提供患者教育和征得患者同意是获得患者信任的重要组成部分。考虑到低回复率和非代表性,对结果保持谨慎是适当的。
{"title":"Patient Attitudes Toward Ambient Voice Technology: Preimplementation Patient Survey in an Academic Medical Center.","authors":"Gary Leiserowitz, Jeff Mansfield, Scott MacDonald, Melissa Jost","doi":"10.2196/77901","DOIUrl":"10.2196/77901","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Many institutions are in various stages of deploying an artificial intelligence (AI) scribe system for clinic electronic health record (EHR) documentation. In anticipation of the University of California, Davis Health's deployment of an AI scribe program, we surveyed current patients about their perceptions of this technology to inform a patient-centered implementation.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;We assessed patient perceptions about current clinician EHR documentation practices before implementation of the AI scribe program, and preconceptions regarding the AI scribe's introduction.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;We conducted a descriptive preimplementation survey as a quality improvement study. A convenience sample of 9171 patients (aged ≥18 years) who had a clinic visit within the previous year, was recruited via an email postvisit survey. Patient-identified demographics (age, gender, and race and ethnicity) were collected. The survey included rating scales on questions related to the patient perception of the AI scribe program, plus open-ended comments. Data were collated to analyze patient perceptions of including AI Scribe technology in a clinician visit.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;In total, 1893 patients completed the survey (20% response rate), with partial responses from another 549. Sixty-three percent (n=1205) of the respondents were female, and most were 51 years and older (87%, n=1649). Most patients identified themselves as White (69%, n=1312), multirace (8%, n=154), Latinx (7%, n=130), and Black (2%, n=42). The respondents were not representative of the overall clinic populations and skewed more toward being female, ages 50 years and older, and White in comparison. Patients reacted to the current EHR documentation system, with 71% (n=1349) feeling heard or sometimes heard, but 23% (n=416) expressed frustrations that their physician focused too much on typing into the computer. When asked about their anticipated response to the use of an AI scribe, 48% (n=904) were favorable, 33% (n=630) were neutral, and 19% (n=359) were unfavorable. Younger patients (ages 18-30 years) expressed more skepticism than those aged 51 years and older. Further, 42% (655/1567) of positive comments received indicated this technology could improve human interaction during their visits. Comments supported that the use of an AI scribe would enhance patient experience by allowing the clinician to focus on the patient. However, when asked about concerns regarding the AI scribe, 39% (515/1330) and 15% (203/1330) of comments expressed concerns about documentation accuracy and privacy, respectively. Providing previsit patient education and obtaining permission were viewed as very important.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;This patient survey showed that respondents are generally open to the use of an AI scribe program for EHR documentation to allow the clinician to focus on the patient during the actual encounter ra","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e77901"},"PeriodicalIF":3.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12699246/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Risk Prediction of Major Adverse Cardiovascular Events Within One Year After Percutaneous Coronary Intervention in Patients With Acute Coronary Syndrome: Machine Learning-Based Time-to-Event Analysis. 急性冠脉综合征患者经皮冠状动脉介入治疗后一年内主要不良心血管事件的风险预测:基于机器学习的事件时间分析。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-27 DOI: 10.2196/81778
Hong-Jae Choi, Changhee Lee, Hack-Lyoung Kim, Youn-Jung Son

Background: Patients with acute coronary syndrome (ACS) who undergo percutaneous coronary intervention (PCI) remain at high risk for major adverse cardiovascular events (MACE). Conventional risk scores may not capture dynamic or nonlinear changes in postdischarge MACE risk, whereas machine learning (ML) approaches can improve predictive performance. However, few ML models have incorporated time-to-event analysis to reflect changes in MACE risk over time.

Objective: This study aimed to develop a time-to-event ML model for predicting MACE after PCI in patients with ACS and to identify the risk factors with time-varying contributions.

Methods: We analyzed electronic health records of 3159 patients with ACS who underwent PCI at a tertiary hospital in South Korea between 2008 and 2020. Six time-to-event ML models were developed using 54 variables. Model performance was evaluated using the time-dependent concordance index and Brier score. Variable importance was assessed using permutation importance and visualized with partial dependence plots to identify variables contributing to MACE risk over time.

Results: During a median follow-up of 3.8 years, 626 (19.8%) patients experienced MACE. The best-performing model achieved a time-dependent concordance index of 0.743 at day 30 and 0.616 at 1 year. Time-dependent Brier scores increased and remained stable across all ML models. Key predictors included contrast volume, age, medication adherence, coronary artery disease severity, and glomerular filtration rate. Contrast volume ≥300 mL, age ≥60 years, and medication adherence score ≥30 were associated with early postdischarge risk, whereas coronary artery disease severity and glomerular filtration rate became more influential beyond 60 days.

Conclusions: The proposed time-to-event ML model effectively captured dynamic risk patterns after PCI and identified key predictors with time-varying effects. These findings may support individualized postdischarge management and early intervention strategies to prevent MACE in high-risk patients.

背景:急性冠脉综合征(ACS)患者接受经皮冠状动脉介入治疗(PCI)后仍有发生重大心血管不良事件(MACE)的高风险。传统的风险评分可能无法捕捉出院后MACE风险的动态或非线性变化,而机器学习(ML)方法可以提高预测性能。然而,很少有机器学习模型结合了事件时间分析来反映MACE风险随时间的变化。目的:本研究旨在建立一个预测ACS患者PCI术后MACE的时间-事件ML模型,并确定具有时变贡献的危险因素。方法:我们分析了2008年至2020年在韩国一家三级医院接受PCI治疗的3159例ACS患者的电子健康记录。使用54个变量开发了6个时间到事件的ML模型。使用时间相关的一致性指数和Brier评分来评估模型的性能。变量重要性评估使用排列重要性和可视化的部分依赖图,以确定影响MACE风险的变量随着时间的推移。结果:在中位随访3.8年期间,626例(19.8%)患者经历了MACE。表现最好的模型在第30天和第1年的时间相关一致性指数分别为0.743和0.616。时间相关的Brier评分在所有ML模型中增加并保持稳定。主要预测因素包括造影剂体积、年龄、药物依从性、冠状动脉疾病严重程度和肾小球滤过率。造影剂体积≥300 mL、年龄≥60岁、药物依从性评分≥30与早期出院后风险相关,而冠状动脉疾病严重程度和肾小球滤过率在60天后的影响更大。结论:提出的时间-事件ML模型有效地捕获了PCI术后的动态风险模式,并确定了具有时变效应的关键预测因子。这些发现可能支持个体化的出院后管理和早期干预策略,以预防高危患者的MACE。
{"title":"Risk Prediction of Major Adverse Cardiovascular Events Within One Year After Percutaneous Coronary Intervention in Patients With Acute Coronary Syndrome: Machine Learning-Based Time-to-Event Analysis.","authors":"Hong-Jae Choi, Changhee Lee, Hack-Lyoung Kim, Youn-Jung Son","doi":"10.2196/81778","DOIUrl":"10.2196/81778","url":null,"abstract":"<p><strong>Background: </strong>Patients with acute coronary syndrome (ACS) who undergo percutaneous coronary intervention (PCI) remain at high risk for major adverse cardiovascular events (MACE). Conventional risk scores may not capture dynamic or nonlinear changes in postdischarge MACE risk, whereas machine learning (ML) approaches can improve predictive performance. However, few ML models have incorporated time-to-event analysis to reflect changes in MACE risk over time.</p><p><strong>Objective: </strong>This study aimed to develop a time-to-event ML model for predicting MACE after PCI in patients with ACS and to identify the risk factors with time-varying contributions.</p><p><strong>Methods: </strong>We analyzed electronic health records of 3159 patients with ACS who underwent PCI at a tertiary hospital in South Korea between 2008 and 2020. Six time-to-event ML models were developed using 54 variables. Model performance was evaluated using the time-dependent concordance index and Brier score. Variable importance was assessed using permutation importance and visualized with partial dependence plots to identify variables contributing to MACE risk over time.</p><p><strong>Results: </strong>During a median follow-up of 3.8 years, 626 (19.8%) patients experienced MACE. The best-performing model achieved a time-dependent concordance index of 0.743 at day 30 and 0.616 at 1 year. Time-dependent Brier scores increased and remained stable across all ML models. Key predictors included contrast volume, age, medication adherence, coronary artery disease severity, and glomerular filtration rate. Contrast volume ≥300 mL, age ≥60 years, and medication adherence score ≥30 were associated with early postdischarge risk, whereas coronary artery disease severity and glomerular filtration rate became more influential beyond 60 days.</p><p><strong>Conclusions: </strong>The proposed time-to-event ML model effectively captured dynamic risk patterns after PCI and identified key predictors with time-varying effects. These findings may support individualized postdischarge management and early intervention strategies to prevent MACE in high-risk patients.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e81778"},"PeriodicalIF":3.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12699253/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Coverage Sampling to Enhance Clinical Chart Review Coverage for Computable Phenotype Development: Simulation and Empirical Study. 无监督覆盖抽样提高临床图表审查覆盖可计算表型发展:模拟和实证研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-27 DOI: 10.2196/72068
Zigui Wang, Jillian H Hurst, Chuan Hong, Benjamin Alan Goldstein

Background: Developing computable phenotypes (CP) based on electronic health records (EHR) data requires "gold-standard" labels for the outcome of interest. To generate these labels, clinicians typically chart-review a subset of patient charts. Charts to be reviewed are most often randomly sampled from the larger set of patients of interest. However, random sampling may fail to capture the diversity of the patient population, particularly if smaller subpopulations exist among those with the condition of interest. This can lead to poorly performing and biased CPs.

Objective: This study aimed to propose an unsupervised sampling approach designed to better capture a diverse patient cohort and improve the information coverage of chart review samples.

Methods: Our coverage sampling method starts by clustering by the patient population of interest. We then perform a stratified sampling from each cluster to ensure even representation for the chart review sample. We introduce a novel metric, nearest neighbor distance, to evaluate the coverage of the generated sample. To evaluate our method, we first conducted a simulation study to model and compare the performance of random versus our proposed coverage sampling. We varied the size and number of subpopulations within the larger cohort. Finally, we apply our approach to a real-world data set to develop a CP for hospitalization due to COVID-19. We evaluate the different sampling strategies based on the information coverage as well as the performance of the learned CP, using the area under the receiver operator characteristic curve.

Results: Our simulation studies show that the unsupervised coverage sampling approach provides broader coverage of patient populations compared to random sampling. When there are no underlying subpopulations, both random and coverage perform equally well for CP development. When there are subgroups, coverage sampling achieves area under the receiver operating characteristic curve gains of approximately 0.03-0.05 over random sampling. In the real-world application, the approach also outperformed random sampling, generating both a more representative sample and an area under the receiver operating characteristic curve improvement of 0.02 (95% CI -0.08 to 0.04).

Conclusions: The proposed coverage sampling method is an easy-to-implement approach that produces a chart review sample that is more representative of the source population. This allows one to learn a CP that has better performance both for subpopulations and the overall cohort. Studies that aim to develop CPs should consider alternative strategies other than randomly sampling patient charts.

背景:基于电子健康记录(EHR)数据开发可计算表型(CP)需要对感兴趣的结果进行“金标准”标签。为了生成这些标签,临床医生通常会对患者图表的一个子集进行图表审查。要审查的图表通常是从感兴趣的较大患者组中随机抽取的。然而,随机抽样可能无法捕获患者群体的多样性,特别是如果在那些有兴趣的条件中存在较小的亚群。这可能导致cp表现不佳和有偏见。目的:本研究旨在提出一种无监督抽样方法,旨在更好地捕获多样化的患者队列,并提高图表回顾样本的信息覆盖率。方法:我们的覆盖抽样方法从感兴趣的患者群体聚类开始。然后,我们从每个集群中执行分层抽样,以确保图表审查样本的均匀表示。我们引入了一种新的度量,最近邻距离,来评估生成样本的覆盖率。为了评估我们的方法,我们首先进行了模拟研究,对随机抽样和我们建议的覆盖抽样的性能进行了建模和比较。我们在更大的队列中改变了亚种群的大小和数量。最后,我们将我们的方法应用于现实世界的数据集,以制定因COVID-19住院的CP。我们利用接收算子特征曲线下的面积,根据信息覆盖率和学习到的CP的性能来评估不同的采样策略。结果:我们的模拟研究表明,与随机抽样相比,无监督覆盖抽样方法提供了更广泛的患者群体覆盖。当没有潜在的亚群时,对于CP的发展,随机和覆盖都表现得同样好。当存在子组时,覆盖抽样比随机抽样实现了接受者工作特征曲线下面积增益约0.03-0.05。在实际应用中,该方法也优于随机抽样,产生了更具代表性的样本,并且接收者工作特征曲线下的面积提高了0.02 (95% CI -0.08至0.04)。结论:建议的覆盖抽样方法是一种易于实施的方法,它产生的图表审查样本更能代表源人群。这允许人们学习一种对亚群体和整体群体都有更好表现的CP。旨在发展CPs的研究应考虑其他策略,而不是随机抽样患者图表。
{"title":"Unsupervised Coverage Sampling to Enhance Clinical Chart Review Coverage for Computable Phenotype Development: Simulation and Empirical Study.","authors":"Zigui Wang, Jillian H Hurst, Chuan Hong, Benjamin Alan Goldstein","doi":"10.2196/72068","DOIUrl":"10.2196/72068","url":null,"abstract":"<p><strong>Background: </strong>Developing computable phenotypes (CP) based on electronic health records (EHR) data requires \"gold-standard\" labels for the outcome of interest. To generate these labels, clinicians typically chart-review a subset of patient charts. Charts to be reviewed are most often randomly sampled from the larger set of patients of interest. However, random sampling may fail to capture the diversity of the patient population, particularly if smaller subpopulations exist among those with the condition of interest. This can lead to poorly performing and biased CPs.</p><p><strong>Objective: </strong>This study aimed to propose an unsupervised sampling approach designed to better capture a diverse patient cohort and improve the information coverage of chart review samples.</p><p><strong>Methods: </strong>Our coverage sampling method starts by clustering by the patient population of interest. We then perform a stratified sampling from each cluster to ensure even representation for the chart review sample. We introduce a novel metric, nearest neighbor distance, to evaluate the coverage of the generated sample. To evaluate our method, we first conducted a simulation study to model and compare the performance of random versus our proposed coverage sampling. We varied the size and number of subpopulations within the larger cohort. Finally, we apply our approach to a real-world data set to develop a CP for hospitalization due to COVID-19. We evaluate the different sampling strategies based on the information coverage as well as the performance of the learned CP, using the area under the receiver operator characteristic curve.</p><p><strong>Results: </strong>Our simulation studies show that the unsupervised coverage sampling approach provides broader coverage of patient populations compared to random sampling. When there are no underlying subpopulations, both random and coverage perform equally well for CP development. When there are subgroups, coverage sampling achieves area under the receiver operating characteristic curve gains of approximately 0.03-0.05 over random sampling. In the real-world application, the approach also outperformed random sampling, generating both a more representative sample and an area under the receiver operating characteristic curve improvement of 0.02 (95% CI -0.08 to 0.04).</p><p><strong>Conclusions: </strong>The proposed coverage sampling method is an easy-to-implement approach that produces a chart review sample that is more representative of the source population. This allows one to learn a CP that has better performance both for subpopulations and the overall cohort. Studies that aim to develop CPs should consider alternative strategies other than randomly sampling patient charts.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e72068"},"PeriodicalIF":3.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12661603/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial Intelligence-Based Computerized Digit Vigilance Test in Community-Dwelling Older Adults: Development and Validation Study. 基于人工智能的社区老年人手指警觉性计算机化测试:开发与验证研究。
IF 3.8 3区 医学 Q2 MEDICAL INFORMATICS Pub Date : 2025-11-26 DOI: 10.2196/73038
Gong-Hong Lin, Dorothy Bai, Yi-Jing Huang, Shih-Chieh Lee, Mai Thi Thuy Vu, Tsu-Hsien Chiu

Background: The Computerized Digit Vigilance Test (CDVT) is a well-established measure of sustained attention. However, the CDVT only measures the total reaction time and response accuracy and fails to capture other crucial attentional features such as the eye blink rate, yawns, head movements, and eye movements. Omitting such features might provide an incomplete representative picture of sustained attention.

Objective: This study aimed to develop an artificial intelligence (AI)-based Computerized Digit Vigilance Test (AI-CDVT) for older adults.

Methods: Participants were assessed by the CDVT with video recordings capturing their head and face. The Montreal Cognitive Assessment (MoCA), Stroop Color Word Test (SCW), and Color Trails Test (CTT) were also administered. The AI-CDVT was developed in three steps: (1) retrieving attentional features using OpenFace AI software (CMU MultiComp Lab), (2) establishing an AI-based scoring model with the Extreme Gradient Boosting regressor, and (3) assessing the AI-CDVT's validity by Pearson r values and test-retest reliability by intraclass correlation coefficients (ICCs).

Results: In total, 153 participants were included. Pearson r values of the AI-CDVT with the MoCA were -0.42, -0.31 with the SCW, and 0.46-0.61 with the CTT. The ICC of the AI-CDVT was 0.78.

Conclusions: We developed an AI-CDVT, which leveraged AI to extract attentional features from video recordings and integrated them to generate a comprehensive attention score. Our findings demonstrated good validity and test-retest reliability for the AI-CDVT, suggesting its potential as a reliable and valid tool for assessing sustained attention in older adults.

背景:计算机数字警觉性测试(CDVT)是一种公认的持续注意的测量方法。然而,CDVT只测量总反应时间和反应准确性,而不能捕捉到其他关键的注意力特征,如眨眼频率、打哈欠、头部运动和眼球运动。忽略这些特征可能会提供一个不完整的持续关注的代表性图像。目的:本研究旨在开发一种基于人工智能(AI)的老年人计算机手指警觉性测试(AI- cdvt)。方法:对参与者进行CDVT评估,录像记录他们的头部和面部。同时进行蒙特利尔认知评估(MoCA)、Stroop颜色单词测试(SCW)和颜色轨迹测试(CTT)。AI- cdvt的开发分为三个步骤:(1)使用OpenFace人工智能软件(CMU MultiComp Lab)检索注意特征;(2)使用极端梯度增强回归器建立基于人工智能的评分模型;(3)使用Pearson r值评估AI- cdvt的效度,使用类内相关系数(ICCs)评估AI- cdvt的重测信度。结果:共纳入153名受试者。AI-CDVT与MoCA的Pearson r值为-0.42,与SCW的Pearson r值为-0.31,与CTT的Pearson r值为0.46-0.61。AI-CDVT的ICC为0.78。结论:我们开发了一个AI- cdvt,它利用AI从视频记录中提取注意力特征,并将它们整合起来生成一个综合的注意力评分。我们的研究结果表明,AI-CDVT具有良好的效度和重测信度,这表明它有可能成为评估老年人持续注意力的可靠和有效的工具。
{"title":"Artificial Intelligence-Based Computerized Digit Vigilance Test in Community-Dwelling Older Adults: Development and Validation Study.","authors":"Gong-Hong Lin, Dorothy Bai, Yi-Jing Huang, Shih-Chieh Lee, Mai Thi Thuy Vu, Tsu-Hsien Chiu","doi":"10.2196/73038","DOIUrl":"10.2196/73038","url":null,"abstract":"<p><strong>Background: </strong>The Computerized Digit Vigilance Test (CDVT) is a well-established measure of sustained attention. However, the CDVT only measures the total reaction time and response accuracy and fails to capture other crucial attentional features such as the eye blink rate, yawns, head movements, and eye movements. Omitting such features might provide an incomplete representative picture of sustained attention.</p><p><strong>Objective: </strong>This study aimed to develop an artificial intelligence (AI)-based Computerized Digit Vigilance Test (AI-CDVT) for older adults.</p><p><strong>Methods: </strong>Participants were assessed by the CDVT with video recordings capturing their head and face. The Montreal Cognitive Assessment (MoCA), Stroop Color Word Test (SCW), and Color Trails Test (CTT) were also administered. The AI-CDVT was developed in three steps: (1) retrieving attentional features using OpenFace AI software (CMU MultiComp Lab), (2) establishing an AI-based scoring model with the Extreme Gradient Boosting regressor, and (3) assessing the AI-CDVT's validity by Pearson r values and test-retest reliability by intraclass correlation coefficients (ICCs).</p><p><strong>Results: </strong>In total, 153 participants were included. Pearson r values of the AI-CDVT with the MoCA were -0.42, -0.31 with the SCW, and 0.46-0.61 with the CTT. The ICC of the AI-CDVT was 0.78.</p><p><strong>Conclusions: </strong>We developed an AI-CDVT, which leveraged AI to extract attentional features from video recordings and integrated them to generate a comprehensive attention score. Our findings demonstrated good validity and test-retest reliability for the AI-CDVT, suggesting its potential as a reliable and valid tool for assessing sustained attention in older adults.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e73038"},"PeriodicalIF":3.8,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12670460/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR Medical Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1