Background: Prolonged hospital stays can lead to inefficiencies in health care delivery and unnecessary consumption of medical resources.
Objective: This study aimed to identify key clinical variances associated with prolonged length of stay (PLOS) in clinical pathways using a machine learning model trained on real-world data from the ePath system.
Methods: We analyzed data from 480 patients with lung cancer (age: mean 68.3, SD 11.2 years; n=263, 54.8% men) who underwent video-assisted thoracoscopic surgery at a university hospital between 2019 and 2023. PLOS was defined as a hospital stay exceeding 9 days after video-assisted thoracoscopic surgery. The variables collected between admission and 4 days after surgery were examined, and those that showed a significant association with PLOS in univariate analyses (P<.01) were selected as predictors. Predictive models were developed using sparse linear regression methods (Lasso, ridge, and elastic net) and decision tree ensembles (random forest and extreme gradient boosting). The data were divided into derivation (earlier study period) and testing (later period) cohorts for temporal validation. The model performance was assessed using the area under the receiver operating characteristic curve, Brier score, and calibration plots. Counterfactual analysis was used to identify key clinical factors influencing PLOS.
Results: A 3D heatmap illustrated the temporal relationships between clinical factors and PLOS based on patient demographics, comorbidities, functional status, surgical details, care processes, medications, and variances recorded from admission to 4 days after surgery. Among the 5 algorithms evaluated, the ridge regression model demonstrated the best performance in terms of both discrimination and calibration. Specifically, it achieved area under the receiver operating characteristic curve values of 0.84 and 0.82 and Brier scores of 0.16 and 0.17 in the derivation and test cohorts, respectively. In the final model, a range of variables, including blood tests, care, patient background, procedures, and clinical variances, were associated with PLOS. Among these, particular emphasis was placed on clinical variances. Counterfactual analysis using the ridge regression model identified 6 key variables strongly linked to PLOS. In order of impact, these were abnormal respiratory sounds, postoperative fever, arrhythmia, impaired ambulation, complications after drain removal, and pulmonary air leaks.
Conclusions: A machine learning-based model using ePath data effectively identified critical variances in the clinical pathways associated with PLOS. This automated tool may enhance clinical decision-making and improve patient management.
{"title":"Identifying Key Variances in Clinical Pathways Associated With Prolonged Hospital Stays Using Machine Learning and ePath Real-World Data: Model Development and Validation Study.","authors":"Saori Tou, Koutarou Matsumoto, Asato Hashinokuchi, Fumihiko Kinoshita, Yasunobu Nohara, Takanori Yamashita, Yoshifumi Wakata, Tomoyoshi Takenaka, Hidehisa Soejima, Tomoharu Yoshizumi, Naoki Nakashima, Masahiro Kamouchi","doi":"10.2196/71617","DOIUrl":"10.2196/71617","url":null,"abstract":"<p><strong>Background: </strong>Prolonged hospital stays can lead to inefficiencies in health care delivery and unnecessary consumption of medical resources.</p><p><strong>Objective: </strong>This study aimed to identify key clinical variances associated with prolonged length of stay (PLOS) in clinical pathways using a machine learning model trained on real-world data from the ePath system.</p><p><strong>Methods: </strong>We analyzed data from 480 patients with lung cancer (age: mean 68.3, SD 11.2 years; n=263, 54.8% men) who underwent video-assisted thoracoscopic surgery at a university hospital between 2019 and 2023. PLOS was defined as a hospital stay exceeding 9 days after video-assisted thoracoscopic surgery. The variables collected between admission and 4 days after surgery were examined, and those that showed a significant association with PLOS in univariate analyses (P<.01) were selected as predictors. Predictive models were developed using sparse linear regression methods (Lasso, ridge, and elastic net) and decision tree ensembles (random forest and extreme gradient boosting). The data were divided into derivation (earlier study period) and testing (later period) cohorts for temporal validation. The model performance was assessed using the area under the receiver operating characteristic curve, Brier score, and calibration plots. Counterfactual analysis was used to identify key clinical factors influencing PLOS.</p><p><strong>Results: </strong>A 3D heatmap illustrated the temporal relationships between clinical factors and PLOS based on patient demographics, comorbidities, functional status, surgical details, care processes, medications, and variances recorded from admission to 4 days after surgery. Among the 5 algorithms evaluated, the ridge regression model demonstrated the best performance in terms of both discrimination and calibration. Specifically, it achieved area under the receiver operating characteristic curve values of 0.84 and 0.82 and Brier scores of 0.16 and 0.17 in the derivation and test cohorts, respectively. In the final model, a range of variables, including blood tests, care, patient background, procedures, and clinical variances, were associated with PLOS. Among these, particular emphasis was placed on clinical variances. Counterfactual analysis using the ridge regression model identified 6 key variables strongly linked to PLOS. In order of impact, these were abnormal respiratory sounds, postoperative fever, arrhythmia, impaired ambulation, complications after drain removal, and pulmonary air leaks.</p><p><strong>Conclusions: </strong>A machine learning-based model using ePath data effectively identified critical variances in the clinical pathways associated with PLOS. This automated tool may enhance clinical decision-making and improve patient management.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e71617"},"PeriodicalIF":3.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12706448/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter May, Julian Greß, Christoph Seidel, Sebastian Sommer, Markus K Schuler, Sina Nokodian, Florian Schröder, Johannes Jung
<p><strong>Background: </strong>Traditional cancer registries, limited by labor-intensive manual data abstraction and rigid, predefined schemas, often hinder timely and comprehensive oncology research. While large language models (LLMs) have shown promise in automating data extraction, their potential to perform direct, just-in-time (JIT) analysis on unstructured clinical narratives-potentially bypassing intermediate structured databases for many analytical tasks-remains largely unexplored.</p><p><strong>Objective: </strong>This study aimed to evaluate whether a state-of-the-art LLM (Gemini 2.5 Pro) can enable a JIT clinical oncology analysis paradigm by assessing its ability to (1) perform high-fidelity multiparameter data extraction, (2) answer complex clinical queries directly from raw text, (3) automate multistep survival analyses including executable code generation, and (4) generate novel, clinically plausible hypotheses from free-text documentation.</p><p><strong>Methods: </strong>A synthetic dataset of 240 unstructured clinical letters from patients with stage IV non-small cell lung cancer (NSCLC), embedding 14 predefined variables, was used. Gemini 2.5 Pro was evaluated on four core JIT capabilities. Performance was measured by using the following metrics: extraction accuracy (compared to human extraction of n=40 letters and across the full n=240 dataset); numerical deviation for direct question answering (n=40 to 240 letters, 5 questions); log-rank P value and Harrell concordance index for LLM-generated versus ground-truth Kaplan-Meier survival analyses (n=160 letters, overall survival and progression-free survival); and correct justification, novelty, and a qualitative evaluation of LLM-generated hypotheses (n=80 and n=160 letters).</p><p><strong>Results: </strong>For multiparameter extraction from 40 letters, the LLM achieved >99% average accuracy, comparable to human extraction, but in significantly less time (LLM: 3.7 min vs human: 133.8 min). Across the full 240-letter dataset, LLM multiparameter extraction maintained >98% accuracy for most variables. The LLM answered multiconditional clinical queries directly from raw text with a relative deviation rarely exceeding 1.5%, even with up to 240 letters. Crucially, it autonomously performed end-to-end survival analysis, generating text-to-R-code that produced Kaplan-Meier curves statistically indistinguishable from ground truth. Consistent performance was demonstrated on a small validation cohort of 80 synthetic acute myeloid leukemia reports. Stress testing on data with simulated imperfections revealed a key role of a human-in-the-loop to resolve AI-flagged ambiguities. Furthermore, the LLM generated several correctly justified, biologically plausible, and potentially novel hypotheses from datasets up to 80 letters.</p><p><strong>Conclusions: </strong>This feasibility study demonstrated that a frontier LLM (Gemini 2.5 Pro) can successfully perform high-fidelity data extraction, multic
背景:传统的癌症登记处受劳动密集型的人工数据抽象和僵化的预定义模式的限制,往往阻碍及时和全面的肿瘤学研究。虽然大型语言模型(llm)在自动化数据提取方面显示出了希望,但它们对非结构化临床叙述执行直接、即时(JIT)分析的潜力——可能绕过中间结构化数据库进行许多分析任务——在很大程度上仍未被探索。目的:本研究旨在评估最先进的LLM (Gemini 2.5 Pro)是否能够通过评估其能力来实现JIT临床肿瘤学分析范式:(1)执行高保真多参数数据提取,(2)直接从原始文本回答复杂的临床查询,(3)自动化多步骤生存分析,包括可执行代码生成,以及(4)从自由文本文档生成新颖的,临床可信的假设。方法:使用包含240份来自IV期非小细胞肺癌(NSCLC)患者的非结构化临床信函的合成数据集,嵌入14个预定义变量。对Gemini 2.5 Pro进行了四项核心JIT功能的评估。性能通过使用以下指标来衡量:提取准确性(与人类提取n=40个字母和整个n=240个数据集相比);直接问答的数值偏差(n=40 ~ 240个字母,5个问题);llm生成与真实Kaplan-Meier生存分析的log-rank P值和Harrell一致性指数(n=160个字母,总生存期和无进展生存期);以及法学硕士生成的假设的正确论证、新颖性和定性评价(n=80和n=160个字母)。结果:对于40个字母的多参数提取,LLM达到了bbbb99 %的平均准确率,与人类提取相当,但时间明显更短(LLM: 3.7 min vs .人类:133.8 min)。在整个240个字母的数据集中,LLM多参数提取对大多数变量保持了bb0 98%的准确率。LLM直接从原始文本回答多条件临床查询,相对偏差很少超过1.5%,即使高达240个字母。至关重要的是,它能自主执行端到端生存分析,生成文本到r代码,生成Kaplan-Meier曲线,在统计上与基础事实难以区分。在80例合成急性髓性白血病报告的小型验证队列中证明了一致的性能。对具有模拟缺陷的数据进行的压力测试显示,人工在解决人工智能标记的歧义方面发挥了关键作用。此外,法学硕士从多达80个字母的数据集中生成了几个正确证明的、生物学上合理的、潜在的新颖假设。结论:该可行性研究表明,前沿LLM (Gemini 2.5 Pro)可以成功地从非结构化文本中进行高保真数据提取、多条件查询和自动生存分析。这些结果为JIT临床分析方法的概念提供了基础证明。然而,这些发现仅限于合成患者,在考虑临床应用之前,对现实世界临床数据的严格验证是必不可少的下一步。
{"title":"Enabling Just-in-Time Clinical Oncology Analysis With Large Language Models: Feasibility and Validation Study Using Unstructured Synthetic Data.","authors":"Peter May, Julian Greß, Christoph Seidel, Sebastian Sommer, Markus K Schuler, Sina Nokodian, Florian Schröder, Johannes Jung","doi":"10.2196/78332","DOIUrl":"10.2196/78332","url":null,"abstract":"<p><strong>Background: </strong>Traditional cancer registries, limited by labor-intensive manual data abstraction and rigid, predefined schemas, often hinder timely and comprehensive oncology research. While large language models (LLMs) have shown promise in automating data extraction, their potential to perform direct, just-in-time (JIT) analysis on unstructured clinical narratives-potentially bypassing intermediate structured databases for many analytical tasks-remains largely unexplored.</p><p><strong>Objective: </strong>This study aimed to evaluate whether a state-of-the-art LLM (Gemini 2.5 Pro) can enable a JIT clinical oncology analysis paradigm by assessing its ability to (1) perform high-fidelity multiparameter data extraction, (2) answer complex clinical queries directly from raw text, (3) automate multistep survival analyses including executable code generation, and (4) generate novel, clinically plausible hypotheses from free-text documentation.</p><p><strong>Methods: </strong>A synthetic dataset of 240 unstructured clinical letters from patients with stage IV non-small cell lung cancer (NSCLC), embedding 14 predefined variables, was used. Gemini 2.5 Pro was evaluated on four core JIT capabilities. Performance was measured by using the following metrics: extraction accuracy (compared to human extraction of n=40 letters and across the full n=240 dataset); numerical deviation for direct question answering (n=40 to 240 letters, 5 questions); log-rank P value and Harrell concordance index for LLM-generated versus ground-truth Kaplan-Meier survival analyses (n=160 letters, overall survival and progression-free survival); and correct justification, novelty, and a qualitative evaluation of LLM-generated hypotheses (n=80 and n=160 letters).</p><p><strong>Results: </strong>For multiparameter extraction from 40 letters, the LLM achieved >99% average accuracy, comparable to human extraction, but in significantly less time (LLM: 3.7 min vs human: 133.8 min). Across the full 240-letter dataset, LLM multiparameter extraction maintained >98% accuracy for most variables. The LLM answered multiconditional clinical queries directly from raw text with a relative deviation rarely exceeding 1.5%, even with up to 240 letters. Crucially, it autonomously performed end-to-end survival analysis, generating text-to-R-code that produced Kaplan-Meier curves statistically indistinguishable from ground truth. Consistent performance was demonstrated on a small validation cohort of 80 synthetic acute myeloid leukemia reports. Stress testing on data with simulated imperfections revealed a key role of a human-in-the-loop to resolve AI-flagged ambiguities. Furthermore, the LLM generated several correctly justified, biologically plausible, and potentially novel hypotheses from datasets up to 80 letters.</p><p><strong>Conclusions: </strong>This feasibility study demonstrated that a frontier LLM (Gemini 2.5 Pro) can successfully perform high-fidelity data extraction, multic","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e78332"},"PeriodicalIF":3.8,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12670046/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elizabeth A Campbell, Felix Holl, Oliver J Bear Don't Walk Iv, Badisa Mosesane, Andrew S Kanter, Hamish Fraser, Amanda L Joseph, Judy Wawira Gichoya, Kabelo Leonard Mauco, Sansanee Craig
Unlabelled: Academic global health informatics (GHI) projects are impactful collaborations between institutions in high-income and low- and middle-income countries (LMICs) and play a crucial role in enhancing health care services and access in LMICs using eHealth practices. Researchers across all involved organizations bring unique expertise to these collaborations. However, these projects often face significant obstacles, including cultural and linguistic barriers, resource limitations, and sustainability issues. The lack of representation from LMIC researchers in knowledge generation and the high costs of open-access publications further complicate efforts to ensure inclusive, accessible, and collaborative scholarship. This viewpoint describes present gaps in the literature on academic GHI collaborations and describes a path forward for future research directions and successful research community development. Key recommendations include centering community-based participatory research, developing post-growth solutions, and creating sustainable funding models. Addressing these challenges is essential for fostering effective, scalable, and equitable GHI interventions that improve global health outcomes.
{"title":"Gaps and Pathways to Success in Global Health Informatics Academic Collaborations: Reflecting on Current Practices.","authors":"Elizabeth A Campbell, Felix Holl, Oliver J Bear Don't Walk Iv, Badisa Mosesane, Andrew S Kanter, Hamish Fraser, Amanda L Joseph, Judy Wawira Gichoya, Kabelo Leonard Mauco, Sansanee Craig","doi":"10.2196/67326","DOIUrl":"10.2196/67326","url":null,"abstract":"<p><strong>Unlabelled: </strong>Academic global health informatics (GHI) projects are impactful collaborations between institutions in high-income and low- and middle-income countries (LMICs) and play a crucial role in enhancing health care services and access in LMICs using eHealth practices. Researchers across all involved organizations bring unique expertise to these collaborations. However, these projects often face significant obstacles, including cultural and linguistic barriers, resource limitations, and sustainability issues. The lack of representation from LMIC researchers in knowledge generation and the high costs of open-access publications further complicate efforts to ensure inclusive, accessible, and collaborative scholarship. This viewpoint describes present gaps in the literature on academic GHI collaborations and describes a path forward for future research directions and successful research community development. Key recommendations include centering community-based participatory research, developing post-growth solutions, and creating sustainable funding models. Addressing these challenges is essential for fostering effective, scalable, and equitable GHI interventions that improve global health outcomes.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e67326"},"PeriodicalIF":3.8,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12669914/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan Sun, Bo Li, Chuanlan Ju, Liming Hu, Huiyi Sun, Jing An, Tae-Hun Kim, Zhijun Bu, Zeyang Shi, Jianping Liu, Zhaolan Liu
<p><strong>Background: </strong>Predicting colorectal cancer (CRC) recurrence risk remains a challenge in clinical practice. Owing to the widespread use of radiomics in CRC diagnosis and treatment, some researchers recently explored the effectiveness of radiomics-based models in forecasting CRC recurrence risk. Nonetheless, the lack of systematic evidence of the efficacy of such models has hampered their clinical adoption.</p><p><strong>Objective: </strong>This study aimed to explore the value of radiomics in predicting CRC recurrence, providing a scholarly rationale for developing more specific interventions.</p><p><strong>Methods: </strong>Overall, 4 databases (Embase, PubMed, the Cochrane Library, and Web of Science) were searched for relevant articles from inception to January 1, 2025. We included studies that developed or validated radiomics-based machine learning models for predicting CRC recurrence using computed tomography or magnetic resonance imaging and provided discriminative performance metrics (c-index). Nonoriginal articles, studies that did not develop a model, and those lacking clear outcome measures were excluded from the study. The quality of the included original studies was assessed using the Radiomics Quality Score. A bivariate mixed-effects model was used to conduct a meta-analysis in which the c-index values with 95% CI were pooled. For the meta-analysis, subgroup analyses were conducted separately on the validation and training sets.</p><p><strong>Results: </strong>This meta-analysis included 17 original studies involving 4600 patients with CRC. The quality of the identified studies was low (mean Radiomics Quality Score 13.23/36, SD 2.56), with limitations in prospective design and biological validation. In the validation set, the c-index values based on clinical features, radiomics features, and radiomics features combined with clinical features were 0.73 (95% CI 0.68-0.79), 0.80 (95% CI 0.75-0.85), and 0.83 (95% CI 0.79-0.87), respectively. In the internal validation set, the c-index values based on clinical features, radiomics features, and radiomics features+clinical features were 0.70 (95% CI 0.61-0.79), 0.83 (95% CI 0.78-0.88), and 0.83 (95% CI 0.78-0.88), respectively. Finally, in the external validation set, the c-index values based on clinical features, radiomics features, and radiomics features combined with clinical features were 0.76 (95% CI 0.70-0.83), 0.75 (95% CI 0.66-0.83), and 0.83 (95% CI 0.78-0.88), respectively.</p><p><strong>Conclusions: </strong>Radiomics-based machine learning models, especially those integrating radiomics and clinical features, showed promising predictive performance for CRC recurrence risk. However, this study has several limitations, such as moderate study quality, limited sample size, and high heterogeneity in modeling approaches. These findings suggest the potential clinical value of integrated models in risk stratification and their potential to enhance personalized treatment,
背景:预测结直肠癌(CRC)复发风险在临床实践中仍然是一个挑战。由于放射组学在CRC诊断和治疗中的广泛应用,一些研究人员最近探索了基于放射组学的模型在预测CRC复发风险方面的有效性。然而,缺乏系统的证据,这些模型的有效性阻碍了他们的临床应用。目的:本研究旨在探讨放射组学在预测结直肠癌复发中的价值,为制定更具体的干预措施提供学术依据。方法:在Embase、PubMed、Cochrane Library和Web of Science 4个数据库中检索从成立到2025年1月1日的相关文章。我们纳入了开发或验证基于放射组学的机器学习模型的研究,这些模型用于使用计算机断层扫描或磁共振成像预测CRC复发,并提供了判别性能指标(c-index)。非原创文章、未建立模型的研究和缺乏明确结果测量的研究被排除在研究之外。使用放射组学质量评分评估纳入的原始研究的质量。采用双变量混合效应模型进行荟萃分析,合并95% CI的c指数值。对于meta分析,分别对验证集和训练集进行亚组分析。结果:该荟萃分析包括17项原始研究,涉及4600例结直肠癌患者。所确定的研究质量较低(平均放射组学质量评分13.23/36,SD 2.56),在前瞻性设计和生物学验证方面存在局限性。在验证集中,基于临床特征、放射组学特征和放射组学特征联合临床特征的c指数分别为0.73 (95% CI 0.68-0.79)、0.80 (95% CI 0.75-0.85)和0.83 (95% CI 0.79-0.87)。在内部验证集中,基于临床特征、放射组学特征和放射组学特征+临床特征的c-指数值分别为0.70 (95% CI 0.61-0.79)、0.83 (95% CI 0.78-0.88)和0.83 (95% CI 0.78-0.88)。最后,在外部验证集中,基于临床特征、放射组学特征和放射组学特征结合临床特征的c-指数值分别为0.76 (95% CI 0.70-0.83)、0.75 (95% CI 0.66-0.83)和0.83 (95% CI 0.78-0.88)。结论:基于放射组学的机器学习模型,特别是结合放射组学和临床特征的机器学习模型,在预测结直肠癌复发风险方面表现出很好的效果。然而,本研究存在一些局限性,如研究质量适中,样本量有限,建模方法异质性高。这些发现表明,综合模型在风险分层中的潜在临床价值及其增强个性化治疗的潜力,但需要进一步进行高质量的前瞻性研究。
{"title":"Predictive Performance of Radiomics-Based Machine Learning for Colorectal Cancer Recurrence Risk: Systematic Review and Meta-Analysis.","authors":"Yuan Sun, Bo Li, Chuanlan Ju, Liming Hu, Huiyi Sun, Jing An, Tae-Hun Kim, Zhijun Bu, Zeyang Shi, Jianping Liu, Zhaolan Liu","doi":"10.2196/78644","DOIUrl":"10.2196/78644","url":null,"abstract":"<p><strong>Background: </strong>Predicting colorectal cancer (CRC) recurrence risk remains a challenge in clinical practice. Owing to the widespread use of radiomics in CRC diagnosis and treatment, some researchers recently explored the effectiveness of radiomics-based models in forecasting CRC recurrence risk. Nonetheless, the lack of systematic evidence of the efficacy of such models has hampered their clinical adoption.</p><p><strong>Objective: </strong>This study aimed to explore the value of radiomics in predicting CRC recurrence, providing a scholarly rationale for developing more specific interventions.</p><p><strong>Methods: </strong>Overall, 4 databases (Embase, PubMed, the Cochrane Library, and Web of Science) were searched for relevant articles from inception to January 1, 2025. We included studies that developed or validated radiomics-based machine learning models for predicting CRC recurrence using computed tomography or magnetic resonance imaging and provided discriminative performance metrics (c-index). Nonoriginal articles, studies that did not develop a model, and those lacking clear outcome measures were excluded from the study. The quality of the included original studies was assessed using the Radiomics Quality Score. A bivariate mixed-effects model was used to conduct a meta-analysis in which the c-index values with 95% CI were pooled. For the meta-analysis, subgroup analyses were conducted separately on the validation and training sets.</p><p><strong>Results: </strong>This meta-analysis included 17 original studies involving 4600 patients with CRC. The quality of the identified studies was low (mean Radiomics Quality Score 13.23/36, SD 2.56), with limitations in prospective design and biological validation. In the validation set, the c-index values based on clinical features, radiomics features, and radiomics features combined with clinical features were 0.73 (95% CI 0.68-0.79), 0.80 (95% CI 0.75-0.85), and 0.83 (95% CI 0.79-0.87), respectively. In the internal validation set, the c-index values based on clinical features, radiomics features, and radiomics features+clinical features were 0.70 (95% CI 0.61-0.79), 0.83 (95% CI 0.78-0.88), and 0.83 (95% CI 0.78-0.88), respectively. Finally, in the external validation set, the c-index values based on clinical features, radiomics features, and radiomics features combined with clinical features were 0.76 (95% CI 0.70-0.83), 0.75 (95% CI 0.66-0.83), and 0.83 (95% CI 0.78-0.88), respectively.</p><p><strong>Conclusions: </strong>Radiomics-based machine learning models, especially those integrating radiomics and clinical features, showed promising predictive performance for CRC recurrence risk. However, this study has several limitations, such as moderate study quality, limited sample size, and high heterogeneity in modeling approaches. These findings suggest the potential clinical value of integrated models in risk stratification and their potential to enhance personalized treatment,","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e78644"},"PeriodicalIF":3.8,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12669921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gary Leiserowitz, Jeff Mansfield, Scott MacDonald, Melissa Jost
<p><strong>Background: </strong>Many institutions are in various stages of deploying an artificial intelligence (AI) scribe system for clinic electronic health record (EHR) documentation. In anticipation of the University of California, Davis Health's deployment of an AI scribe program, we surveyed current patients about their perceptions of this technology to inform a patient-centered implementation.</p><p><strong>Objective: </strong>We assessed patient perceptions about current clinician EHR documentation practices before implementation of the AI scribe program, and preconceptions regarding the AI scribe's introduction.</p><p><strong>Methods: </strong>We conducted a descriptive preimplementation survey as a quality improvement study. A convenience sample of 9171 patients (aged ≥18 years) who had a clinic visit within the previous year, was recruited via an email postvisit survey. Patient-identified demographics (age, gender, and race and ethnicity) were collected. The survey included rating scales on questions related to the patient perception of the AI scribe program, plus open-ended comments. Data were collated to analyze patient perceptions of including AI Scribe technology in a clinician visit.</p><p><strong>Results: </strong>In total, 1893 patients completed the survey (20% response rate), with partial responses from another 549. Sixty-three percent (n=1205) of the respondents were female, and most were 51 years and older (87%, n=1649). Most patients identified themselves as White (69%, n=1312), multirace (8%, n=154), Latinx (7%, n=130), and Black (2%, n=42). The respondents were not representative of the overall clinic populations and skewed more toward being female, ages 50 years and older, and White in comparison. Patients reacted to the current EHR documentation system, with 71% (n=1349) feeling heard or sometimes heard, but 23% (n=416) expressed frustrations that their physician focused too much on typing into the computer. When asked about their anticipated response to the use of an AI scribe, 48% (n=904) were favorable, 33% (n=630) were neutral, and 19% (n=359) were unfavorable. Younger patients (ages 18-30 years) expressed more skepticism than those aged 51 years and older. Further, 42% (655/1567) of positive comments received indicated this technology could improve human interaction during their visits. Comments supported that the use of an AI scribe would enhance patient experience by allowing the clinician to focus on the patient. However, when asked about concerns regarding the AI scribe, 39% (515/1330) and 15% (203/1330) of comments expressed concerns about documentation accuracy and privacy, respectively. Providing previsit patient education and obtaining permission were viewed as very important.</p><p><strong>Conclusions: </strong>This patient survey showed that respondents are generally open to the use of an AI scribe program for EHR documentation to allow the clinician to focus on the patient during the actual encounter ra
背景:许多机构正处于为诊所电子健康记录(EHR)文档部署人工智能(AI)抄写系统的不同阶段。考虑到加州大学戴维斯健康中心(University of California, Davis Health)部署的人工智能记录程序,我们调查了当前患者对这项技术的看法,以告知以患者为中心的实施。目的:在实施人工智能抄写员计划之前,我们评估了患者对当前临床医生电子病历记录实践的看法,以及对人工智能抄写员引入的先入之见。方法:我们进行了一项描述性的实施前调查作为质量改进研究。通过电子邮件访后调查,选取9171例在前一年就诊的患者(年龄≥18岁)作为方便样本。收集患者确定的人口统计数据(年龄、性别、种族和民族)。该调查包括对患者对人工智能抄写程序的看法相关问题的评分量表,以及开放式评论。整理数据以分析患者对在临床医生访问中使用AI Scribe技术的看法。结果:共有1893名患者完成了调查(20%的有效率),另有549名患者部分应答。受访者中女性占63% (n=1205), 51岁及以上的占87% (n= 1649)。大多数患者认为自己是白人(69%,n=1312)、多种族(8%,n=154)、拉丁裔(7%,n=130)和黑人(2%,n=42)。受访者并不能代表整个诊所的人群,相比之下,他们更倾向于50岁及以上的女性和白人。患者对当前的EHR文件系统的反应是,71% (n=1349)的患者感觉被听到或有时被听到,但23% (n=416)的患者对他们的医生过于专注于在电脑上打字表示失望。当被问及他们对使用AI抄写器的预期反应时,48% (n=904)表示赞成,33% (n=630)表示中立,19% (n=359)表示不赞成。年轻患者(18-30岁)比51岁及以上的患者表达更多的怀疑。此外,收到的42%(655/1567)的积极评论表明,这项技术可以改善他们访问期间的人际互动。评论认为,使用人工智能抄写员可以让临床医生专注于患者,从而提高患者体验。然而,当被问及对AI抄写员的担忧时,39%(515/1330)和15%(203/1330)的评论分别表达了对文档准确性和隐私性的担忧。在就诊前对患者进行教育并获得许可是非常重要的。结论:该患者调查显示,受访者通常对使用人工智能抄写程序进行电子病历记录持开放态度,以便临床医生在实际遇到患者时专注于患者,而不是计算机。在使用人工智能之前提供患者教育和征得患者同意是获得患者信任的重要组成部分。考虑到低回复率和非代表性,对结果保持谨慎是适当的。
{"title":"Patient Attitudes Toward Ambient Voice Technology: Preimplementation Patient Survey in an Academic Medical Center.","authors":"Gary Leiserowitz, Jeff Mansfield, Scott MacDonald, Melissa Jost","doi":"10.2196/77901","DOIUrl":"10.2196/77901","url":null,"abstract":"<p><strong>Background: </strong>Many institutions are in various stages of deploying an artificial intelligence (AI) scribe system for clinic electronic health record (EHR) documentation. In anticipation of the University of California, Davis Health's deployment of an AI scribe program, we surveyed current patients about their perceptions of this technology to inform a patient-centered implementation.</p><p><strong>Objective: </strong>We assessed patient perceptions about current clinician EHR documentation practices before implementation of the AI scribe program, and preconceptions regarding the AI scribe's introduction.</p><p><strong>Methods: </strong>We conducted a descriptive preimplementation survey as a quality improvement study. A convenience sample of 9171 patients (aged ≥18 years) who had a clinic visit within the previous year, was recruited via an email postvisit survey. Patient-identified demographics (age, gender, and race and ethnicity) were collected. The survey included rating scales on questions related to the patient perception of the AI scribe program, plus open-ended comments. Data were collated to analyze patient perceptions of including AI Scribe technology in a clinician visit.</p><p><strong>Results: </strong>In total, 1893 patients completed the survey (20% response rate), with partial responses from another 549. Sixty-three percent (n=1205) of the respondents were female, and most were 51 years and older (87%, n=1649). Most patients identified themselves as White (69%, n=1312), multirace (8%, n=154), Latinx (7%, n=130), and Black (2%, n=42). The respondents were not representative of the overall clinic populations and skewed more toward being female, ages 50 years and older, and White in comparison. Patients reacted to the current EHR documentation system, with 71% (n=1349) feeling heard or sometimes heard, but 23% (n=416) expressed frustrations that their physician focused too much on typing into the computer. When asked about their anticipated response to the use of an AI scribe, 48% (n=904) were favorable, 33% (n=630) were neutral, and 19% (n=359) were unfavorable. Younger patients (ages 18-30 years) expressed more skepticism than those aged 51 years and older. Further, 42% (655/1567) of positive comments received indicated this technology could improve human interaction during their visits. Comments supported that the use of an AI scribe would enhance patient experience by allowing the clinician to focus on the patient. However, when asked about concerns regarding the AI scribe, 39% (515/1330) and 15% (203/1330) of comments expressed concerns about documentation accuracy and privacy, respectively. Providing previsit patient education and obtaining permission were viewed as very important.</p><p><strong>Conclusions: </strong>This patient survey showed that respondents are generally open to the use of an AI scribe program for EHR documentation to allow the clinician to focus on the patient during the actual encounter ra","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e77901"},"PeriodicalIF":3.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12699246/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong-Jae Choi, Changhee Lee, Hack-Lyoung Kim, Youn-Jung Son
Background: Patients with acute coronary syndrome (ACS) who undergo percutaneous coronary intervention (PCI) remain at high risk for major adverse cardiovascular events (MACE). Conventional risk scores may not capture dynamic or nonlinear changes in postdischarge MACE risk, whereas machine learning (ML) approaches can improve predictive performance. However, few ML models have incorporated time-to-event analysis to reflect changes in MACE risk over time.
Objective: This study aimed to develop a time-to-event ML model for predicting MACE after PCI in patients with ACS and to identify the risk factors with time-varying contributions.
Methods: We analyzed electronic health records of 3159 patients with ACS who underwent PCI at a tertiary hospital in South Korea between 2008 and 2020. Six time-to-event ML models were developed using 54 variables. Model performance was evaluated using the time-dependent concordance index and Brier score. Variable importance was assessed using permutation importance and visualized with partial dependence plots to identify variables contributing to MACE risk over time.
Results: During a median follow-up of 3.8 years, 626 (19.8%) patients experienced MACE. The best-performing model achieved a time-dependent concordance index of 0.743 at day 30 and 0.616 at 1 year. Time-dependent Brier scores increased and remained stable across all ML models. Key predictors included contrast volume, age, medication adherence, coronary artery disease severity, and glomerular filtration rate. Contrast volume ≥300 mL, age ≥60 years, and medication adherence score ≥30 were associated with early postdischarge risk, whereas coronary artery disease severity and glomerular filtration rate became more influential beyond 60 days.
Conclusions: The proposed time-to-event ML model effectively captured dynamic risk patterns after PCI and identified key predictors with time-varying effects. These findings may support individualized postdischarge management and early intervention strategies to prevent MACE in high-risk patients.
{"title":"Risk Prediction of Major Adverse Cardiovascular Events Within One Year After Percutaneous Coronary Intervention in Patients With Acute Coronary Syndrome: Machine Learning-Based Time-to-Event Analysis.","authors":"Hong-Jae Choi, Changhee Lee, Hack-Lyoung Kim, Youn-Jung Son","doi":"10.2196/81778","DOIUrl":"10.2196/81778","url":null,"abstract":"<p><strong>Background: </strong>Patients with acute coronary syndrome (ACS) who undergo percutaneous coronary intervention (PCI) remain at high risk for major adverse cardiovascular events (MACE). Conventional risk scores may not capture dynamic or nonlinear changes in postdischarge MACE risk, whereas machine learning (ML) approaches can improve predictive performance. However, few ML models have incorporated time-to-event analysis to reflect changes in MACE risk over time.</p><p><strong>Objective: </strong>This study aimed to develop a time-to-event ML model for predicting MACE after PCI in patients with ACS and to identify the risk factors with time-varying contributions.</p><p><strong>Methods: </strong>We analyzed electronic health records of 3159 patients with ACS who underwent PCI at a tertiary hospital in South Korea between 2008 and 2020. Six time-to-event ML models were developed using 54 variables. Model performance was evaluated using the time-dependent concordance index and Brier score. Variable importance was assessed using permutation importance and visualized with partial dependence plots to identify variables contributing to MACE risk over time.</p><p><strong>Results: </strong>During a median follow-up of 3.8 years, 626 (19.8%) patients experienced MACE. The best-performing model achieved a time-dependent concordance index of 0.743 at day 30 and 0.616 at 1 year. Time-dependent Brier scores increased and remained stable across all ML models. Key predictors included contrast volume, age, medication adherence, coronary artery disease severity, and glomerular filtration rate. Contrast volume ≥300 mL, age ≥60 years, and medication adherence score ≥30 were associated with early postdischarge risk, whereas coronary artery disease severity and glomerular filtration rate became more influential beyond 60 days.</p><p><strong>Conclusions: </strong>The proposed time-to-event ML model effectively captured dynamic risk patterns after PCI and identified key predictors with time-varying effects. These findings may support individualized postdischarge management and early intervention strategies to prevent MACE in high-risk patients.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e81778"},"PeriodicalIF":3.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12699253/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zigui Wang, Jillian H Hurst, Chuan Hong, Benjamin Alan Goldstein
Background: Developing computable phenotypes (CP) based on electronic health records (EHR) data requires "gold-standard" labels for the outcome of interest. To generate these labels, clinicians typically chart-review a subset of patient charts. Charts to be reviewed are most often randomly sampled from the larger set of patients of interest. However, random sampling may fail to capture the diversity of the patient population, particularly if smaller subpopulations exist among those with the condition of interest. This can lead to poorly performing and biased CPs.
Objective: This study aimed to propose an unsupervised sampling approach designed to better capture a diverse patient cohort and improve the information coverage of chart review samples.
Methods: Our coverage sampling method starts by clustering by the patient population of interest. We then perform a stratified sampling from each cluster to ensure even representation for the chart review sample. We introduce a novel metric, nearest neighbor distance, to evaluate the coverage of the generated sample. To evaluate our method, we first conducted a simulation study to model and compare the performance of random versus our proposed coverage sampling. We varied the size and number of subpopulations within the larger cohort. Finally, we apply our approach to a real-world data set to develop a CP for hospitalization due to COVID-19. We evaluate the different sampling strategies based on the information coverage as well as the performance of the learned CP, using the area under the receiver operator characteristic curve.
Results: Our simulation studies show that the unsupervised coverage sampling approach provides broader coverage of patient populations compared to random sampling. When there are no underlying subpopulations, both random and coverage perform equally well for CP development. When there are subgroups, coverage sampling achieves area under the receiver operating characteristic curve gains of approximately 0.03-0.05 over random sampling. In the real-world application, the approach also outperformed random sampling, generating both a more representative sample and an area under the receiver operating characteristic curve improvement of 0.02 (95% CI -0.08 to 0.04).
Conclusions: The proposed coverage sampling method is an easy-to-implement approach that produces a chart review sample that is more representative of the source population. This allows one to learn a CP that has better performance both for subpopulations and the overall cohort. Studies that aim to develop CPs should consider alternative strategies other than randomly sampling patient charts.
背景:基于电子健康记录(EHR)数据开发可计算表型(CP)需要对感兴趣的结果进行“金标准”标签。为了生成这些标签,临床医生通常会对患者图表的一个子集进行图表审查。要审查的图表通常是从感兴趣的较大患者组中随机抽取的。然而,随机抽样可能无法捕获患者群体的多样性,特别是如果在那些有兴趣的条件中存在较小的亚群。这可能导致cp表现不佳和有偏见。目的:本研究旨在提出一种无监督抽样方法,旨在更好地捕获多样化的患者队列,并提高图表回顾样本的信息覆盖率。方法:我们的覆盖抽样方法从感兴趣的患者群体聚类开始。然后,我们从每个集群中执行分层抽样,以确保图表审查样本的均匀表示。我们引入了一种新的度量,最近邻距离,来评估生成样本的覆盖率。为了评估我们的方法,我们首先进行了模拟研究,对随机抽样和我们建议的覆盖抽样的性能进行了建模和比较。我们在更大的队列中改变了亚种群的大小和数量。最后,我们将我们的方法应用于现实世界的数据集,以制定因COVID-19住院的CP。我们利用接收算子特征曲线下的面积,根据信息覆盖率和学习到的CP的性能来评估不同的采样策略。结果:我们的模拟研究表明,与随机抽样相比,无监督覆盖抽样方法提供了更广泛的患者群体覆盖。当没有潜在的亚群时,对于CP的发展,随机和覆盖都表现得同样好。当存在子组时,覆盖抽样比随机抽样实现了接受者工作特征曲线下面积增益约0.03-0.05。在实际应用中,该方法也优于随机抽样,产生了更具代表性的样本,并且接收者工作特征曲线下的面积提高了0.02 (95% CI -0.08至0.04)。结论:建议的覆盖抽样方法是一种易于实施的方法,它产生的图表审查样本更能代表源人群。这允许人们学习一种对亚群体和整体群体都有更好表现的CP。旨在发展CPs的研究应考虑其他策略,而不是随机抽样患者图表。
{"title":"Unsupervised Coverage Sampling to Enhance Clinical Chart Review Coverage for Computable Phenotype Development: Simulation and Empirical Study.","authors":"Zigui Wang, Jillian H Hurst, Chuan Hong, Benjamin Alan Goldstein","doi":"10.2196/72068","DOIUrl":"10.2196/72068","url":null,"abstract":"<p><strong>Background: </strong>Developing computable phenotypes (CP) based on electronic health records (EHR) data requires \"gold-standard\" labels for the outcome of interest. To generate these labels, clinicians typically chart-review a subset of patient charts. Charts to be reviewed are most often randomly sampled from the larger set of patients of interest. However, random sampling may fail to capture the diversity of the patient population, particularly if smaller subpopulations exist among those with the condition of interest. This can lead to poorly performing and biased CPs.</p><p><strong>Objective: </strong>This study aimed to propose an unsupervised sampling approach designed to better capture a diverse patient cohort and improve the information coverage of chart review samples.</p><p><strong>Methods: </strong>Our coverage sampling method starts by clustering by the patient population of interest. We then perform a stratified sampling from each cluster to ensure even representation for the chart review sample. We introduce a novel metric, nearest neighbor distance, to evaluate the coverage of the generated sample. To evaluate our method, we first conducted a simulation study to model and compare the performance of random versus our proposed coverage sampling. We varied the size and number of subpopulations within the larger cohort. Finally, we apply our approach to a real-world data set to develop a CP for hospitalization due to COVID-19. We evaluate the different sampling strategies based on the information coverage as well as the performance of the learned CP, using the area under the receiver operator characteristic curve.</p><p><strong>Results: </strong>Our simulation studies show that the unsupervised coverage sampling approach provides broader coverage of patient populations compared to random sampling. When there are no underlying subpopulations, both random and coverage perform equally well for CP development. When there are subgroups, coverage sampling achieves area under the receiver operating characteristic curve gains of approximately 0.03-0.05 over random sampling. In the real-world application, the approach also outperformed random sampling, generating both a more representative sample and an area under the receiver operating characteristic curve improvement of 0.02 (95% CI -0.08 to 0.04).</p><p><strong>Conclusions: </strong>The proposed coverage sampling method is an easy-to-implement approach that produces a chart review sample that is more representative of the source population. This allows one to learn a CP that has better performance both for subpopulations and the overall cohort. Studies that aim to develop CPs should consider alternative strategies other than randomly sampling patient charts.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e72068"},"PeriodicalIF":3.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12661603/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The Computerized Digit Vigilance Test (CDVT) is a well-established measure of sustained attention. However, the CDVT only measures the total reaction time and response accuracy and fails to capture other crucial attentional features such as the eye blink rate, yawns, head movements, and eye movements. Omitting such features might provide an incomplete representative picture of sustained attention.
Objective: This study aimed to develop an artificial intelligence (AI)-based Computerized Digit Vigilance Test (AI-CDVT) for older adults.
Methods: Participants were assessed by the CDVT with video recordings capturing their head and face. The Montreal Cognitive Assessment (MoCA), Stroop Color Word Test (SCW), and Color Trails Test (CTT) were also administered. The AI-CDVT was developed in three steps: (1) retrieving attentional features using OpenFace AI software (CMU MultiComp Lab), (2) establishing an AI-based scoring model with the Extreme Gradient Boosting regressor, and (3) assessing the AI-CDVT's validity by Pearson r values and test-retest reliability by intraclass correlation coefficients (ICCs).
Results: In total, 153 participants were included. Pearson r values of the AI-CDVT with the MoCA were -0.42, -0.31 with the SCW, and 0.46-0.61 with the CTT. The ICC of the AI-CDVT was 0.78.
Conclusions: We developed an AI-CDVT, which leveraged AI to extract attentional features from video recordings and integrated them to generate a comprehensive attention score. Our findings demonstrated good validity and test-retest reliability for the AI-CDVT, suggesting its potential as a reliable and valid tool for assessing sustained attention in older adults.
{"title":"Artificial Intelligence-Based Computerized Digit Vigilance Test in Community-Dwelling Older Adults: Development and Validation Study.","authors":"Gong-Hong Lin, Dorothy Bai, Yi-Jing Huang, Shih-Chieh Lee, Mai Thi Thuy Vu, Tsu-Hsien Chiu","doi":"10.2196/73038","DOIUrl":"10.2196/73038","url":null,"abstract":"<p><strong>Background: </strong>The Computerized Digit Vigilance Test (CDVT) is a well-established measure of sustained attention. However, the CDVT only measures the total reaction time and response accuracy and fails to capture other crucial attentional features such as the eye blink rate, yawns, head movements, and eye movements. Omitting such features might provide an incomplete representative picture of sustained attention.</p><p><strong>Objective: </strong>This study aimed to develop an artificial intelligence (AI)-based Computerized Digit Vigilance Test (AI-CDVT) for older adults.</p><p><strong>Methods: </strong>Participants were assessed by the CDVT with video recordings capturing their head and face. The Montreal Cognitive Assessment (MoCA), Stroop Color Word Test (SCW), and Color Trails Test (CTT) were also administered. The AI-CDVT was developed in three steps: (1) retrieving attentional features using OpenFace AI software (CMU MultiComp Lab), (2) establishing an AI-based scoring model with the Extreme Gradient Boosting regressor, and (3) assessing the AI-CDVT's validity by Pearson r values and test-retest reliability by intraclass correlation coefficients (ICCs).</p><p><strong>Results: </strong>In total, 153 participants were included. Pearson r values of the AI-CDVT with the MoCA were -0.42, -0.31 with the SCW, and 0.46-0.61 with the CTT. The ICC of the AI-CDVT was 0.78.</p><p><strong>Conclusions: </strong>We developed an AI-CDVT, which leveraged AI to extract attentional features from video recordings and integrated them to generate a comprehensive attention score. Our findings demonstrated good validity and test-retest reliability for the AI-CDVT, suggesting its potential as a reliable and valid tool for assessing sustained attention in older adults.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e73038"},"PeriodicalIF":3.8,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12670460/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prabin Shakya, Ayush Khaneja, Kavishwar B Wagholikar
Background: Heart failure (HF) is a public health concern with a wider impact on quality of life and cost of care. One of the major challenges in HF is the higher rate of unplanned readmissions and suboptimal performance of models to predict the readmissions. Hence, in this study, we implemented embeddings-based approaches to generate features for improving model performance.
Objective: The objective of this study was to evaluate and compare the effectiveness of different feature embedding approaches for improving the prediction of unplanned readmissions in patients with heart failure.
Methods: We compared three embedding approaches including word2vec on terminology codes and concept unique identifier (CUIs) and BERT on descriptive text of concept with baseline (one hot-encoding). We compared area under the receiver operating characteristic (AUROC) and F1-scores for the logistic regression, eXtream gradient-boosting (XGBoost) and artificial neural network (ANN) models using these embedding approaches. The model was tested on the heart failure cohort (N=21,031) identified using least restrictive phenotyping methods from MIMIC-IV dataset.
Results: We found that the embedding approaches significantly improved the performance of the prediction models. The XGBoost performed better for all approaches. The word2vec embeddings (0.65) trained on the dataset outperformed embeddings from pre-trained BERT model (0.59) using descriptive text.
Conclusions: Embedding methods, particularly word2vec trained on electronic health record data, can better discriminate HF readmission cases compared to both one-hot encoding and pre-trained BERT embeddings on concept descriptions making it a viable approach of automation feature selection. The observed AUROC improvement (0.65 vs 0.54) may support more effective risk stratification and targeted clinical interventions.
背景:心力衰竭(HF)是一个公共卫生问题,对生活质量和护理成本有更广泛的影响。心衰的主要挑战之一是较高的非计划再入院率和预测再入院模型的次优性能。因此,在本研究中,我们实现了基于嵌入的方法来生成特征以提高模型性能。目的:本研究的目的是评估和比较不同特征嵌入方法在改善心衰患者意外再入院预测方面的有效性。方法:采用word2vec方法对术语编码和概念唯一标识符(gui)进行嵌入,BERT方法对具有基线的概念描述文本进行嵌入(一种热编码)。我们比较了使用这些嵌入方法的逻辑回归、极端梯度增强(XGBoost)和人工神经网络(ANN)模型的接收者操作特征(AUROC)下的面积和f1得分。该模型在使用MIMIC-IV数据集中限制性最小的表型方法确定的心力衰竭队列(N= 21031)中进行了测试。结果:我们发现嵌入方法显著提高了预测模型的性能。XGBoost在所有方法中都表现得更好。在数据集上训练的word2vec嵌入(0.65)优于使用描述性文本的预训练BERT模型的嵌入(0.59)。结论:与单热编码和概念描述的预训练BERT嵌入相比,嵌入方法,特别是在电子健康记录数据上训练的word2vec,可以更好地区分HF再入院病例,使其成为一种可行的自动化特征选择方法。观察到的AUROC改善(0.65 vs 0.54)可能支持更有效的风险分层和有针对性的临床干预。
{"title":"Predicting 30-Days Hospital Readmission for Patients with Heart Failure Using Electronic Health Record Embeddings: Comparative Evaluation.","authors":"Prabin Shakya, Ayush Khaneja, Kavishwar B Wagholikar","doi":"10.2196/73020","DOIUrl":"10.2196/73020","url":null,"abstract":"<p><strong>Background: </strong>Heart failure (HF) is a public health concern with a wider impact on quality of life and cost of care. One of the major challenges in HF is the higher rate of unplanned readmissions and suboptimal performance of models to predict the readmissions. Hence, in this study, we implemented embeddings-based approaches to generate features for improving model performance.</p><p><strong>Objective: </strong>The objective of this study was to evaluate and compare the effectiveness of different feature embedding approaches for improving the prediction of unplanned readmissions in patients with heart failure.</p><p><strong>Methods: </strong>We compared three embedding approaches including word2vec on terminology codes and concept unique identifier (CUIs) and BERT on descriptive text of concept with baseline (one hot-encoding). We compared area under the receiver operating characteristic (AUROC) and F1-scores for the logistic regression, eXtream gradient-boosting (XGBoost) and artificial neural network (ANN) models using these embedding approaches. The model was tested on the heart failure cohort (N=21,031) identified using least restrictive phenotyping methods from MIMIC-IV dataset.</p><p><strong>Results: </strong>We found that the embedding approaches significantly improved the performance of the prediction models. The XGBoost performed better for all approaches. The word2vec embeddings (0.65) trained on the dataset outperformed embeddings from pre-trained BERT model (0.59) using descriptive text.</p><p><strong>Conclusions: </strong>Embedding methods, particularly word2vec trained on electronic health record data, can better discriminate HF readmission cases compared to both one-hot encoding and pre-trained BERT embeddings on concept descriptions making it a viable approach of automation feature selection. The observed AUROC improvement (0.65 vs 0.54) may support more effective risk stratification and targeted clinical interventions.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e73020"},"PeriodicalIF":3.8,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646029/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Zhang, Zhichang Zhang, Yali Liang, Wei Wang, Xia Wang
Background: Electronic health records (EHRs) contain comprehensive information regarding diagnoses, clinical procedures, and prescribed medications. This makes them a valuable resource for developing automated hypertension medication recommendation systems. Within this field, existing research has used machine learning approaches, leveraging demographic characteristics and basic clinical indicators, or deep learning techniques, which extract patterns from EHR data, to predict optimal medications or improve the accuracy of recommendations for common antihypertensive medication categories. However, these methodologies have significant limitations. They rarely adequately characterize the synergistic relationships among heterogeneous medical entities, such as the interplay between comorbid conditions, laboratory results, and specific antihypertensive agents. Furthermore, given the chronic and fluctuating nature of hypertension, effective medication recommendations require dynamic adaptation to disease progression over time. However, current approaches either lack rigorous temporal modeling of EHR data or fail to effectively integrate temporal dynamics with interentity relationships, resulting in the generation of recommendations that are not clinically appropriate due to the neglect of these critical factors.
Objective: This study aims to overcome the challenges in existing methods and introduce a novel model for hypertension medication recommendation that leverages the synergy and selectivity of heterogeneous medical entities.
Methods: First, we used patient EHR data to construct both heterogeneous and homogeneous graphs. The interentity synergies were captured using a multihead graph attention mechanism to enhance entity-level representations. Next, a bidirectional temporal selection mechanism calculated selective coefficients between current and historical visit records and aggregated them to form refined visit-level representations. Finally, medication recommendation probabilities were determined based on these comprehensive patient representations.
Results: Experimental evaluations on the real-world datasets Medical Information Mart for Intensive Care (MIMIC)-III v1.4 and MIMIC-IV v2.2 demonstrated that the proposed model achieved Jaccard similarity coefficients of 58.01% and 55.82%, respectively; areas under the curve of precision-recall of 83.56% and 80.69%, respectively; and F1-scores of 68.95% and 64.83%, respectively, outperforming the baseline models.
Conclusions: The findings indicate the superior efficacy of the introduced model in medication recommendation, highlighting its potential to enhance clinical decision-making in the management of hypertension. The code for the model has been released on GitHub.
背景:电子健康记录(EHRs)包含有关诊断、临床程序和处方药物的全面信息。这使得它们成为开发自动化高血压药物推荐系统的宝贵资源。在这一领域,现有的研究已经使用机器学习方法,利用人口统计学特征和基本临床指标,或深度学习技术,从电子病历数据中提取模式,预测最佳药物或提高常见抗高血压药物类别推荐的准确性。然而,这些方法有很大的局限性。它们很少充分描述异质医学实体之间的协同关系,例如合并症、实验室结果和特定抗高血压药物之间的相互作用。此外,鉴于高血压的慢性和波动性,有效的药物建议需要随着时间的推移动态适应疾病的进展。然而,目前的方法要么缺乏对电子病历数据的严格时间建模,要么未能有效地将时间动态与实体间关系结合起来,导致由于忽视这些关键因素而产生的建议在临床上不合适。目的:本研究旨在克服现有方法的挑战,引入一种利用异质医学实体协同作用和选择性的高血压药物推荐新模型。方法:首先,我们使用患者电子病历数据构建异质图和同质图。使用多头图注意机制捕获实体间的协同作用,以增强实体级表示。其次,双向时间选择机制计算当前和历史访问记录之间的选择系数,并将其聚合以形成精细化的访问级别表示。最后,根据这些综合的患者陈述确定药物推荐概率。结果:在icu Medical Information Mart (MIMIC)-III v1.4和MIMIC- iv v2.2真实数据集上的实验评估表明,所提模型的Jaccard相似系数分别达到58.01%和55.82%;精密度-召回率曲线下面积分别为83.56%和80.69%;和f1得分分别为68.95%和64.83%,优于基线模型。结论:所引入的模型在药物推荐方面具有较好的效果,可为高血压治疗的临床决策提供参考。该模型的代码已经在GitHub上发布。
{"title":"Hypertension Medication Recommendation via Synergistic and Selective Modeling of Heterogeneous Medical Entities: Development and Evaluation Study of a New Model.","authors":"Ke Zhang, Zhichang Zhang, Yali Liang, Wei Wang, Xia Wang","doi":"10.2196/74170","DOIUrl":"10.2196/74170","url":null,"abstract":"<p><strong>Background: </strong>Electronic health records (EHRs) contain comprehensive information regarding diagnoses, clinical procedures, and prescribed medications. This makes them a valuable resource for developing automated hypertension medication recommendation systems. Within this field, existing research has used machine learning approaches, leveraging demographic characteristics and basic clinical indicators, or deep learning techniques, which extract patterns from EHR data, to predict optimal medications or improve the accuracy of recommendations for common antihypertensive medication categories. However, these methodologies have significant limitations. They rarely adequately characterize the synergistic relationships among heterogeneous medical entities, such as the interplay between comorbid conditions, laboratory results, and specific antihypertensive agents. Furthermore, given the chronic and fluctuating nature of hypertension, effective medication recommendations require dynamic adaptation to disease progression over time. However, current approaches either lack rigorous temporal modeling of EHR data or fail to effectively integrate temporal dynamics with interentity relationships, resulting in the generation of recommendations that are not clinically appropriate due to the neglect of these critical factors.</p><p><strong>Objective: </strong>This study aims to overcome the challenges in existing methods and introduce a novel model for hypertension medication recommendation that leverages the synergy and selectivity of heterogeneous medical entities.</p><p><strong>Methods: </strong>First, we used patient EHR data to construct both heterogeneous and homogeneous graphs. The interentity synergies were captured using a multihead graph attention mechanism to enhance entity-level representations. Next, a bidirectional temporal selection mechanism calculated selective coefficients between current and historical visit records and aggregated them to form refined visit-level representations. Finally, medication recommendation probabilities were determined based on these comprehensive patient representations.</p><p><strong>Results: </strong>Experimental evaluations on the real-world datasets Medical Information Mart for Intensive Care (MIMIC)-III v1.4 and MIMIC-IV v2.2 demonstrated that the proposed model achieved Jaccard similarity coefficients of 58.01% and 55.82%, respectively; areas under the curve of precision-recall of 83.56% and 80.69%, respectively; and F1-scores of 68.95% and 64.83%, respectively, outperforming the baseline models.</p><p><strong>Conclusions: </strong>The findings indicate the superior efficacy of the introduced model in medication recommendation, highlighting its potential to enhance clinical decision-making in the management of hypertension. The code for the model has been released on GitHub.</p>","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e74170"},"PeriodicalIF":3.8,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12646553/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}