Pub Date : 2026-03-07DOI: 10.1038/s41746-026-02464-1
Phyllis M. Thangaraj, Sumukh Vasisht Shankar, Sicong Huang, Girish N. Nadkarni, Bobak J. Mortazavi, Evangelos K. Oikonomou, Rohan Khera
Randomized clinical trials (RCTs) guide medical practice; however, their generalizability across populations varies. We developed a statistically informed Generative Adversarial Network model, RCT-Twin-GAN, that leverages relationships between covariates and outcomes to generate a digital twin of an RCT conditioned on covariate distributions from a second patient population. We reproduced the disparate treatment effects of RCTs with similar interventions: the Systolic Blood Pressure Intervention Trial (SPRINT) and the Action to Control Cardiovascular Risk in Diabetes (ACCORD) Blood Pressure Trial. To demonstrate treatment effects of each RCT conditioned on the other RCT population, we evaluated the cardiovascular event-free survival of SPRINT-Twins conditioned on the ACCORD cohort and vice versa. The digital twins demonstrated balanced treatment arms (mean absolute standardized mean difference (MASMD)) of covariates 0.019 (SD 0.018), and the ACCORD-conditioned covariates of the SPRINT-Twins distributed more similarly to ACCORD than SPRINT (MASMD 0.0082 SD 0.016 vs. 0.46 SD 0.20). Notably, SPRINT-conditioned ACCORD-Twins reproduced the non-significant outcome seen in ACCORD (0.88 (0.73–1.06) vs. 0.87 (0.68–1.13)), while ACCORD-conditioned SPRINT-Twins reproduced the significant outcome seen in SPRINT (0.75 (0.64–0.89) vs. 0.79 (0.72–0.86)). Finally, we applied this approach to a real-world population in the electronic health record. RCT-Twin-GAN simulates the translation of RCT-derived treatment effects across patient populations.
{"title":"A novel digital twin strategy to examine the implications of randomized clinical trials for real-world populations","authors":"Phyllis M. Thangaraj, Sumukh Vasisht Shankar, Sicong Huang, Girish N. Nadkarni, Bobak J. Mortazavi, Evangelos K. Oikonomou, Rohan Khera","doi":"10.1038/s41746-026-02464-1","DOIUrl":"https://doi.org/10.1038/s41746-026-02464-1","url":null,"abstract":"Randomized clinical trials (RCTs) guide medical practice; however, their generalizability across populations varies. We developed a statistically informed Generative Adversarial Network model, RCT-Twin-GAN, that leverages relationships between covariates and outcomes to generate a digital twin of an RCT conditioned on covariate distributions from a second patient population. We reproduced the disparate treatment effects of RCTs with similar interventions: the Systolic Blood Pressure Intervention Trial (SPRINT) and the Action to Control Cardiovascular Risk in Diabetes (ACCORD) Blood Pressure Trial. To demonstrate treatment effects of each RCT conditioned on the other RCT population, we evaluated the cardiovascular event-free survival of SPRINT-Twins conditioned on the ACCORD cohort and vice versa. The digital twins demonstrated balanced treatment arms (mean absolute standardized mean difference (MASMD)) of covariates 0.019 (SD 0.018), and the ACCORD-conditioned covariates of the SPRINT-Twins distributed more similarly to ACCORD than SPRINT (MASMD 0.0082 SD 0.016 vs. 0.46 SD 0.20). Notably, SPRINT-conditioned ACCORD-Twins reproduced the non-significant outcome seen in ACCORD (0.88 (0.73–1.06) vs. 0.87 (0.68–1.13)), while ACCORD-conditioned SPRINT-Twins reproduced the significant outcome seen in SPRINT (0.75 (0.64–0.89) vs. 0.79 (0.72–0.86)). Finally, we applied this approach to a real-world population in the electronic health record. RCT-Twin-GAN simulates the translation of RCT-derived treatment effects across patient populations.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"75 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147371126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1038/s41746-026-02501-z
Jan Kirchhoff,Fabian Berns,Christian Schieder,Johannes Schobel
AI-enabled diagnostic decision support systems (DDSS) could improve diagnostic accuracy and efficiency, yet adoption is often impeded by pricing approaches that rely on opaque technical usage metrics. We examined how pricing can remain clinically legible and budgetable while accounting for AI-specific technical and organizational cost drivers. We conducted semi-structured interviews with healthcare decision makers (n = 17) across hospital, outpatient, laboratory, and industry settings and conducted a deductive-inductive thematic analysis. Ten themes emerged, including widespread resistance to purely usage-based pricing and strong preferences for transparency and predictability. Participants supported hybrid models combining a base fee with variable components defined in clinically meaningful units (per patient, per test, or per episode) and emphasized reimbursement alignment alongside integration, training, and support as integral value elements. Outcome-linked payment was viewed as ethically compelling but operationally difficult. We synthesize these findings into stakeholder-informed design principles and actionable recommendations for pricing models that facilitate procurement, reimbursement fit, and sustainable scaling of diagnostic AI.
{"title":"Pricing models for diagnostic AI based on qualitative insights from healthcare decision makers.","authors":"Jan Kirchhoff,Fabian Berns,Christian Schieder,Johannes Schobel","doi":"10.1038/s41746-026-02501-z","DOIUrl":"https://doi.org/10.1038/s41746-026-02501-z","url":null,"abstract":"AI-enabled diagnostic decision support systems (DDSS) could improve diagnostic accuracy and efficiency, yet adoption is often impeded by pricing approaches that rely on opaque technical usage metrics. We examined how pricing can remain clinically legible and budgetable while accounting for AI-specific technical and organizational cost drivers. We conducted semi-structured interviews with healthcare decision makers (n = 17) across hospital, outpatient, laboratory, and industry settings and conducted a deductive-inductive thematic analysis. Ten themes emerged, including widespread resistance to purely usage-based pricing and strong preferences for transparency and predictability. Participants supported hybrid models combining a base fee with variable components defined in clinically meaningful units (per patient, per test, or per episode) and emphasized reimbursement alignment alongside integration, training, and support as integral value elements. Outcome-linked payment was viewed as ethically compelling but operationally difficult. We synthesize these findings into stakeholder-informed design principles and actionable recommendations for pricing models that facilitate procurement, reimbursement fit, and sustainable scaling of diagnostic AI.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"10 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1038/s41746-026-02401-2
Amir Rafati Fard, Simon C. Williams, Kieran J. Smith, Jasneet K. Dhaliwal, Tomas Ferreira, Adrito Das, Joachim Starup-Hansen, John G. Hanrahan, Chan Hee Koh, Danyal Z. Khan, Danail Stoyanov, Hani J. Marcus
This systematic review and meta-analysis examines the design of studies comparing the performance of artificial intelligence (AI) with that of healthcare professionals in the analysis of videos from surgical and interventional procedures, and quantitatively evaluates the performance of AI, unassisted healthcare professionals, and AI-assisted healthcare professionals. From the 37,956 studies identified, 146 were included, with 76 providing sufficient information for inclusion in our exploratory meta-analysis. AI had significantly greater sensitivity and comparable specificity compared to unassisted healthcare professionals at their respective peak performance levels, with a relative risk of 1.12 (95% CI 1.07–1.19, p < 0.001) and 1.04 (95% CI 0.98–1.10, p = 0.224), respectively. AI-assisted healthcare professionals had significantly greater sensitivity and specificity compared to unassisted healthcare professionals across all levels of expertise, with a relative risk of 1.18 (95% CI 1.12–1.25, p < 0.001) and 1.05 (95% CI 1.02–1.08, p < 0.001), respectively. There was no significant difference in sensitivity and specificity of AI-assisted expert healthcare professionals versus AI, with a relative risk of 0.99 (95% CI 0.95–1.04, p = 0.787) and 1.03 (95% CI 0.97–1.08, p = 0.395), respectively. Whilst most studies to date have evaluated AI head-to-head against unassisted healthcare professionals, fewer studies examined AI as an assistive tool, despite the real-world integration of AI more likely to involve assistance than autonomy.
本系统综述和荟萃分析检验了比较人工智能(AI)与医疗保健专业人员在外科和介入手术视频分析中的表现的研究设计,并定量评估了人工智能、无辅助医疗保健专业人员和人工智能辅助医疗保健专业人员的表现。在37956项研究中,有146项被纳入,其中76项提供了足够的信息,可以纳入我们的探索性荟萃分析。在各自的最高表现水平上,人工智能的敏感性和可比特异性明显高于无辅助医疗保健专业人员,相对风险分别为1.12 (95% CI 1.07-1.19, p < 0.001)和1.04 (95% CI 0.98-1.10, p = 0.224)。人工智能辅助的医疗保健专业人员在所有专业水平上都比无辅助的医疗保健专业人员具有更高的敏感性和特异性,相对风险分别为1.18 (95% CI 1.12-1.25, p < 0.001)和1.05 (95% CI 1.02-1.08, p < 0.001)。人工智能辅助的专家医疗保健专业人员与人工智能的敏感性和特异性无显著差异,相对风险分别为0.99 (95% CI 0.95-1.04, p = 0.787)和1.03 (95% CI 0.97-1.08, p = 0.395)。虽然迄今为止大多数研究都是将人工智能与无辅助的医疗保健专业人员进行面对面的评估,但将人工智能作为辅助工具进行评估的研究较少,尽管人工智能在现实世界中的整合更有可能涉及辅助而不是自主。
{"title":"Comparing artificial intelligence and healthcare professional performance in surgical and interventional video analysis: a systematic review and meta-analysis","authors":"Amir Rafati Fard, Simon C. Williams, Kieran J. Smith, Jasneet K. Dhaliwal, Tomas Ferreira, Adrito Das, Joachim Starup-Hansen, John G. Hanrahan, Chan Hee Koh, Danyal Z. Khan, Danail Stoyanov, Hani J. Marcus","doi":"10.1038/s41746-026-02401-2","DOIUrl":"https://doi.org/10.1038/s41746-026-02401-2","url":null,"abstract":"This systematic review and meta-analysis examines the design of studies comparing the performance of artificial intelligence (AI) with that of healthcare professionals in the analysis of videos from surgical and interventional procedures, and quantitatively evaluates the performance of AI, unassisted healthcare professionals, and AI-assisted healthcare professionals. From the 37,956 studies identified, 146 were included, with 76 providing sufficient information for inclusion in our exploratory meta-analysis. AI had significantly greater sensitivity and comparable specificity compared to unassisted healthcare professionals at their respective peak performance levels, with a relative risk of 1.12 (95% CI 1.07–1.19, p < 0.001) and 1.04 (95% CI 0.98–1.10, p = 0.224), respectively. AI-assisted healthcare professionals had significantly greater sensitivity and specificity compared to unassisted healthcare professionals across all levels of expertise, with a relative risk of 1.18 (95% CI 1.12–1.25, p < 0.001) and 1.05 (95% CI 1.02–1.08, p < 0.001), respectively. There was no significant difference in sensitivity and specificity of AI-assisted expert healthcare professionals versus AI, with a relative risk of 0.99 (95% CI 0.95–1.04, p = 0.787) and 1.03 (95% CI 0.97–1.08, p = 0.395), respectively. Whilst most studies to date have evaluated AI head-to-head against unassisted healthcare professionals, fewer studies examined AI as an assistive tool, despite the real-world integration of AI more likely to involve assistance than autonomy.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"49 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147350614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1038/s41746-026-02497-6
Lauren H Cooke, Matthias Jung, Jan M Brendel, Nora M Kerkovits, Borek Foldyna, Michael T Lu, Vineet K Raghu
Chest radiographs (CXRs) are among the most common tests in medicine; automated interpretation may reduce radiologists' workload and expand access. Deep learning multi-task and foundation models have shown strong CXR interpretation performance but are vulnerable to shortcut learning, where spurious correlations drive decision-making. We introduce RoentMod, a counterfactual image editing framework that generates realistic CXRs with user-specified and synthetic pathology while maintaining the original anatomical features. RoentMod combines an open-source medical image generator (RoentGen) with an image-to-image modification model without retraining. In reader studies of RoentMod-produced images, 93% appeared realistic, 89-99% correctly incorporated the specified finding, and all preserved native anatomy comparable to real follow-up CXRs. Using RoentMod, we demonstrate that state-of-the-art multi-task and foundation models frequently exploit off-target pathology as shortcuts, limiting their specificity. Incorporating RoentMod-generated counterfactual images during training mitigated this vulnerability, improving model discrimination across multiple pathologies by 3-19% AUC in internal validation and by 1-11% for 5 out of 6 tested pathologies in external testing. These findings establish RoentMod as a tool to probe and correct shortcut learning in medical AI. By enabling controlled counterfactual interventions, RoentMod enhances the robustness and interpretability of CXR interpretation models and provides a strategy to improve medical imaging models.
{"title":"RoentMod: a synthetic chest X-ray modification model to identify and correct image interpretation model shortcuts.","authors":"Lauren H Cooke, Matthias Jung, Jan M Brendel, Nora M Kerkovits, Borek Foldyna, Michael T Lu, Vineet K Raghu","doi":"10.1038/s41746-026-02497-6","DOIUrl":"https://doi.org/10.1038/s41746-026-02497-6","url":null,"abstract":"<p><p>Chest radiographs (CXRs) are among the most common tests in medicine; automated interpretation may reduce radiologists' workload and expand access. Deep learning multi-task and foundation models have shown strong CXR interpretation performance but are vulnerable to shortcut learning, where spurious correlations drive decision-making. We introduce RoentMod, a counterfactual image editing framework that generates realistic CXRs with user-specified and synthetic pathology while maintaining the original anatomical features. RoentMod combines an open-source medical image generator (RoentGen) with an image-to-image modification model without retraining. In reader studies of RoentMod-produced images, 93% appeared realistic, 89-99% correctly incorporated the specified finding, and all preserved native anatomy comparable to real follow-up CXRs. Using RoentMod, we demonstrate that state-of-the-art multi-task and foundation models frequently exploit off-target pathology as shortcuts, limiting their specificity. Incorporating RoentMod-generated counterfactual images during training mitigated this vulnerability, improving model discrimination across multiple pathologies by 3-19% AUC in internal validation and by 1-11% for 5 out of 6 tested pathologies in external testing. These findings establish RoentMod as a tool to probe and correct shortcut learning in medical AI. By enabling controlled counterfactual interventions, RoentMod enhances the robustness and interpretability of CXR interpretation models and provides a strategy to improve medical imaging models.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147369864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial intelligence has made significant strides in predicting major adverse cardiovascular events (MACE) in patients with acute myocardial infarction (AMI) following percutaneous coronary intervention. However, most existing methods rely solely on tabular variables derived from clinical data and cardiac magnetic resonance (CMR), without fully leveraging the predictive potential of the CMR imaging modality itself. Moreover, these approaches often overlook the synergistic benefits of multimodal integration between imaging and tabular data. In addition, current models primarily focus on short-term MACE risk assessment (e.g., within 6 months or 1 year), limiting their applicability for long-term prognostication. To address these limitations, we first developed ReconSeg3D, a model that reconstructs short-axis cine CMR stacks into temporally-resolved 3D bi-ventricular volumes, capturing fine-grained cardiac anatomy and dynamic motion. These bi-ventricular sequences were then integrated with 45 clinical and CMR-derived variables using spatiotemporal decomposition and cross-attention mechanisms to construct a multimodal MACE prediction model-HeartTTable. HeartTTable achieved a 5-year time-dependent AUC of 0.934 (95% CI 0.907-0.959) and a Harrell's C-index of 0.897 for predicting MACE risk, significantly outperforming models based solely on clinical and CMR-derived tabular features, and demonstrated strong capabilities in postoperative risk stratification. Our study contributes to improved long-term postoperative management for AMI patients by offering clinicians an objective, data-driven decision-support tool.
人工智能在预测急性心肌梗死(AMI)患者经皮冠状动脉介入治疗后的主要不良心血管事件(MACE)方面取得了重大进展。然而,大多数现有方法仅依赖于从临床数据和心脏磁共振(CMR)得出的表格变量,而没有充分利用CMR成像模式本身的预测潜力。此外,这些方法往往忽略了影像和表格数据之间多模式整合的协同效益。此外,目前的模型主要侧重于短期MACE风险评估(例如6个月或1年内),限制了它们对长期预测的适用性。为了解决这些限制,我们首先开发了ReconSeg3D,这是一种将短轴CMR堆栈重建为临时分辨的3D双心室体积的模型,可以捕获细粒度的心脏解剖和动态运动。然后利用时空分解和交叉注意机制,将这些双心室序列与45个临床和cmr衍生变量整合,构建多模态MACE预测模型- hearttable。在预测MACE风险方面,hearttable的5年时间依赖AUC为0.934 (95% CI 0.907-0.959), Harrell's C-index为0.897,明显优于单纯基于临床和cmr衍生的表格特征的模型,在术后风险分层方面表现出强大的能力。我们的研究为临床医生提供了一个客观的、数据驱动的决策支持工具,有助于改善AMI患者的长期术后管理。
{"title":"3D Spatiotemporal cardiac reconstruction for predicting MACE in acute myocardial infarction.","authors":"Qiang Gao,Jingping Wu,Yingshuang Gao,Yongyong Ren,Xiaolei Wang,Guojun Zhu,Jinyi Xiang,Dongaolei An,Lei Xu,Yan Zhou,Jun Pu,Dan Mu,Lei Zhao,Hui Lu,Lian-Ming Wu","doi":"10.1038/s41746-026-02449-0","DOIUrl":"https://doi.org/10.1038/s41746-026-02449-0","url":null,"abstract":"Artificial intelligence has made significant strides in predicting major adverse cardiovascular events (MACE) in patients with acute myocardial infarction (AMI) following percutaneous coronary intervention. However, most existing methods rely solely on tabular variables derived from clinical data and cardiac magnetic resonance (CMR), without fully leveraging the predictive potential of the CMR imaging modality itself. Moreover, these approaches often overlook the synergistic benefits of multimodal integration between imaging and tabular data. In addition, current models primarily focus on short-term MACE risk assessment (e.g., within 6 months or 1 year), limiting their applicability for long-term prognostication. To address these limitations, we first developed ReconSeg3D, a model that reconstructs short-axis cine CMR stacks into temporally-resolved 3D bi-ventricular volumes, capturing fine-grained cardiac anatomy and dynamic motion. These bi-ventricular sequences were then integrated with 45 clinical and CMR-derived variables using spatiotemporal decomposition and cross-attention mechanisms to construct a multimodal MACE prediction model-HeartTTable. HeartTTable achieved a 5-year time-dependent AUC of 0.934 (95% CI 0.907-0.959) and a Harrell's C-index of 0.897 for predicting MACE risk, significantly outperforming models based solely on clinical and CMR-derived tabular features, and demonstrated strong capabilities in postoperative risk stratification. Our study contributes to improved long-term postoperative management for AMI patients by offering clinicians an objective, data-driven decision-support tool.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"19 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1038/s41746-026-02461-4
Isabel Voigt,Lars Masanneck,Marc Pawlitzki,Hernan Inojosa,Sven G Meuth,Tjalf Ziemssen
This perspective introduces MS360°, a conceptual hybrid care model for the management of multiple sclerosis (MS). It integrates traditional on-site assessments with digital health technologies (DHT) to enable more continuous, personalised, and proactive disease management. Current MS care is often fragmented, limiting timely interventions and patient engagement. MS360° addresses these challenges by introducing a digital-first hybrid framework for continuous data collection through remote monitoring, wearable sensors, and telemedicine. This data can be used to dynamically steer structured patient pathways and trigger targeted on-site assessments and interventions such as neurological examinations, imaging, laboratory assessments, and standardised functional tests based on predefined thresholds and patient profiles. The interaction of multidisciplinary teams, structured care pathways and bidirectional data flow enables timely clinical decision-making, stratified patient management and early detection of disease progression. Digital tools can further enhance patient engagement and lifestyle management, promoting adherence and outcomes. New technologies, including artificial intelligence and digital twins, are being discussed as potential future extensions for precision care, workflow optimisation, and risk prediction. MS360° provides a quality-driven conceptual framework, offering a roadmap for integrating digital innovations into patient-centred MS care.
{"title":"MS360°: a conceptual digital-first, data-driven hybrid care framework for personalised multiple sclerosis management.","authors":"Isabel Voigt,Lars Masanneck,Marc Pawlitzki,Hernan Inojosa,Sven G Meuth,Tjalf Ziemssen","doi":"10.1038/s41746-026-02461-4","DOIUrl":"https://doi.org/10.1038/s41746-026-02461-4","url":null,"abstract":"This perspective introduces MS360°, a conceptual hybrid care model for the management of multiple sclerosis (MS). It integrates traditional on-site assessments with digital health technologies (DHT) to enable more continuous, personalised, and proactive disease management. Current MS care is often fragmented, limiting timely interventions and patient engagement. MS360° addresses these challenges by introducing a digital-first hybrid framework for continuous data collection through remote monitoring, wearable sensors, and telemedicine. This data can be used to dynamically steer structured patient pathways and trigger targeted on-site assessments and interventions such as neurological examinations, imaging, laboratory assessments, and standardised functional tests based on predefined thresholds and patient profiles. The interaction of multidisciplinary teams, structured care pathways and bidirectional data flow enables timely clinical decision-making, stratified patient management and early detection of disease progression. Digital tools can further enhance patient engagement and lifestyle management, promoting adherence and outcomes. New technologies, including artificial intelligence and digital twins, are being discussed as potential future extensions for precision care, workflow optimisation, and risk prediction. MS360° provides a quality-driven conceptual framework, offering a roadmap for integrating digital innovations into patient-centred MS care.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"1 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147359263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital decision aids significantly improve shared decision-making outcomes, but barriers to implementation in clinical settings remain. We conducted a Hybrid Type 2 implementation-effectiveness trial of an atrial fibrillation rhythm control decision aid (clinicaltrials.gov NCT04993807; registered 08/06/2021) among 75 older adults across two sites. Guided by the RE-AIM framework, we assessed decision quality and implementation outcomes. While the decision aid was highly acceptable and broadly adopted, changes in decisional conflict and self-efficacy varied widely, with no significant average improvement across the cohort. Subgroup and qualitative analyses revealed that the decision aid was most effective when delivered to the right patient, at the right time, and in the right clinical context. Barriers included variability in health literacy, digital access, and timing of delivery relative to the clinical decision-making process. Findings underscore the challenges of deploying digital interventions within real-world workflows and highlight the importance of targeting decision support tools based on patient readiness, literacy, and care context.
{"title":"Evaluating a digital decision aid for atrial fibrillation rhythm control in a hybrid implementation-effectiveness trial.","authors":"Meghan Reading Turchioe,Afra Shamnath,David Slotwiner,Yihong Zhao,Deepak Saluja,Seth Goldbarg,JoonHyuk Kim,Paul Varosy,Angelo Biviano","doi":"10.1038/s41746-026-02405-y","DOIUrl":"https://doi.org/10.1038/s41746-026-02405-y","url":null,"abstract":"Digital decision aids significantly improve shared decision-making outcomes, but barriers to implementation in clinical settings remain. We conducted a Hybrid Type 2 implementation-effectiveness trial of an atrial fibrillation rhythm control decision aid (clinicaltrials.gov NCT04993807; registered 08/06/2021) among 75 older adults across two sites. Guided by the RE-AIM framework, we assessed decision quality and implementation outcomes. While the decision aid was highly acceptable and broadly adopted, changes in decisional conflict and self-efficacy varied widely, with no significant average improvement across the cohort. Subgroup and qualitative analyses revealed that the decision aid was most effective when delivered to the right patient, at the right time, and in the right clinical context. Barriers included variability in health literacy, digital access, and timing of delivery relative to the clinical decision-making process. Findings underscore the challenges of deploying digital interventions within real-world workflows and highlight the importance of targeting decision support tools based on patient readiness, literacy, and care context.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"225 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-05DOI: 10.1038/s41746-026-02471-2
Amanda Dy, Sandra M. Buetow, Andrew J. Bredemeyer, Monika Lamba Saini, Fabienne Lucas, Shannon Bennett, Kim R. M. Blenman, Keith Wharton Jr., Sunil Singhal, M. E. de Baca, Kevin Schap, Matthew G. Hanna, Staci J. Kearney, Norman Zerbe, Roberto Salgado, Jithesh Veetil, Jansen N. Seheult, David S. McClintock, April Khademi, Jochen K. Lennerz
Validation is a cornerstone of reliability and trust in diagnostics, yet discipline-specific assumptions and unspoken contextual differences often lead to miscommunication, misalignment, and avoidable delays. As AI/ML becomes more integrated into healthcare, there is a growing necessity to re-examine how the term validation is used and understood. We highlight inconsistencies in the use of the term validation through an analysis of 94 themes across five domains, including Communication Science (n = 12), AI/ML (n = 26), Clinical and Laboratory Practice (n = 19), Regulatory Science (n = 22), and Business (n = 15). We emphasize how persistent reliance on domain-specific implied definitions impedes interdisciplinary alignment. Rather than advocating for a single definition, we derived five consensus proposals that collectively advocate for more specific and context-aware additions to the term validation to support clarity, reliability, and compliance across disciplines. Our goal is to support clearer communication and provide useful strategies that inform the development, regulation, and use of digital health technologies.
{"title":"Clarifying validation terminologies in healthcare","authors":"Amanda Dy, Sandra M. Buetow, Andrew J. Bredemeyer, Monika Lamba Saini, Fabienne Lucas, Shannon Bennett, Kim R. M. Blenman, Keith Wharton Jr., Sunil Singhal, M. E. de Baca, Kevin Schap, Matthew G. Hanna, Staci J. Kearney, Norman Zerbe, Roberto Salgado, Jithesh Veetil, Jansen N. Seheult, David S. McClintock, April Khademi, Jochen K. Lennerz","doi":"10.1038/s41746-026-02471-2","DOIUrl":"https://doi.org/10.1038/s41746-026-02471-2","url":null,"abstract":"Validation is a cornerstone of reliability and trust in diagnostics, yet discipline-specific assumptions and unspoken contextual differences often lead to miscommunication, misalignment, and avoidable delays. As AI/ML becomes more integrated into healthcare, there is a growing necessity to re-examine how the term validation is used and understood. We highlight inconsistencies in the use of the term validation through an analysis of 94 themes across five domains, including Communication Science (n = 12), AI/ML (n = 26), Clinical and Laboratory Practice (n = 19), Regulatory Science (n = 22), and Business (n = 15). We emphasize how persistent reliance on domain-specific implied definitions impedes interdisciplinary alignment. Rather than advocating for a single definition, we derived five consensus proposals that collectively advocate for more specific and context-aware additions to the term validation to support clarity, reliability, and compliance across disciplines. Our goal is to support clearer communication and provide useful strategies that inform the development, regulation, and use of digital health technologies.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"61 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147350655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-05DOI: 10.1038/s41746-026-02511-x
Hwan-ho Cho, Joonwon Lee, Jeonghoon Bae, Dongwhane Lee, Hyung Chan Kim, Suk Yoon Lee, Jung Hwa Seo, Woo-Keun Seo, Jin-Man Jung, Hyunjin Park, Seongho Park
We developed and externally validated a deep learning model to automatically detect new ischemic lesions on serial FLAIR MRI scans in patients with stroke. Manual interpretation of follow-up imaging is labor-intensive and variable, and silent brain infarctions (SBIs) are frequently missed despite their prognostic importance. Using 25,451 paired slices from 1055 patients across two hospitals, we trained a convolutional neural network with supervised contrastive learning to classify new lesion occurrence. The model achieved an area under the receiver operating characteristic curve of 0.89 in both internal and external validation cohorts. To evaluate clinical relevance, we further analyzed an independent asymptomatic cohort of 307 patients with a median follow-up of two years. Patients classified as SBI-positive by the model showed a significantly higher risk of subsequent symptomatic stroke than those without SBI. In multivariable Cox regression adjusted for age and major vascular risk factors, model-positive patients had a 3.8-fold increased risk of stroke recurrence. These findings indicate that AI can identify clinically meaningful SBIs that are under-recognized in routine practice and independently associated with stroke recurrence. Automated lesion detection may provide a reproducible imaging biomarker for risk stratification, supporting standardized interpretation of follow-up MRI and informing secondary stroke prevention strategies.
{"title":"Automated detection of new cerebral infarctions and prognostic implications using deep learning on serial MRI","authors":"Hwan-ho Cho, Joonwon Lee, Jeonghoon Bae, Dongwhane Lee, Hyung Chan Kim, Suk Yoon Lee, Jung Hwa Seo, Woo-Keun Seo, Jin-Man Jung, Hyunjin Park, Seongho Park","doi":"10.1038/s41746-026-02511-x","DOIUrl":"https://doi.org/10.1038/s41746-026-02511-x","url":null,"abstract":"We developed and externally validated a deep learning model to automatically detect new ischemic lesions on serial FLAIR MRI scans in patients with stroke. Manual interpretation of follow-up imaging is labor-intensive and variable, and silent brain infarctions (SBIs) are frequently missed despite their prognostic importance. Using 25,451 paired slices from 1055 patients across two hospitals, we trained a convolutional neural network with supervised contrastive learning to classify new lesion occurrence. The model achieved an area under the receiver operating characteristic curve of 0.89 in both internal and external validation cohorts. To evaluate clinical relevance, we further analyzed an independent asymptomatic cohort of 307 patients with a median follow-up of two years. Patients classified as SBI-positive by the model showed a significantly higher risk of subsequent symptomatic stroke than those without SBI. In multivariable Cox regression adjusted for age and major vascular risk factors, model-positive patients had a 3.8-fold increased risk of stroke recurrence. These findings indicate that AI can identify clinically meaningful SBIs that are under-recognized in routine practice and independently associated with stroke recurrence. Automated lesion detection may provide a reproducible imaging biomarker for risk stratification, supporting standardized interpretation of follow-up MRI and informing secondary stroke prevention strategies.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"27 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147350611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-05DOI: 10.1038/s41746-026-02496-7
Raphael Judkiewicz, Eran Berkowitz, Meishar Meisel, Tomer Michaeli, Joachim A. Behar
Optical Coherence Tomography (OCT) is essential in ophthalmology for cross-sectional imaging of the retina. Pretrained foundation models facilitate task-specific model development by enabling fine-tuning with limited labeled data. However, current foundation models rely on a single B-scan (usually the central slice), overlooking volumetric context. This research investigates video foundation models to capture full 3D retinal structure and improve diagnostic performance. V-JEPA, a state-of-the-art video foundation model, was benchmarked against retinal foundation models (RETFound, VisionFM) and a natural image foundation model (DINOv2). All were fine-tuned to detect Age-related Macular Degeneration or Glaucomatous Optic Neuropathy using five OCT datasets. V-JEPA consistently equaled or outperformed image-based models, achieving an average AUROC of 0.94 (0.80–0.99), versus 0.90 (0.76–0.98) for the best image model, a statistically significant improvement (p < 0.001). To our knowledge, this is the first application of transformer-based video models to volumetric OCT, highlighting their promise in 3D medical imaging.
{"title":"Shifting the retinal foundation models paradigm from slices to volumes for optical coherence tomography","authors":"Raphael Judkiewicz, Eran Berkowitz, Meishar Meisel, Tomer Michaeli, Joachim A. Behar","doi":"10.1038/s41746-026-02496-7","DOIUrl":"https://doi.org/10.1038/s41746-026-02496-7","url":null,"abstract":"Optical Coherence Tomography (OCT) is essential in ophthalmology for cross-sectional imaging of the retina. Pretrained foundation models facilitate task-specific model development by enabling fine-tuning with limited labeled data. However, current foundation models rely on a single B-scan (usually the central slice), overlooking volumetric context. This research investigates video foundation models to capture full 3D retinal structure and improve diagnostic performance. V-JEPA, a state-of-the-art video foundation model, was benchmarked against retinal foundation models (RETFound, VisionFM) and a natural image foundation model (DINOv2). All were fine-tuned to detect Age-related Macular Degeneration or Glaucomatous Optic Neuropathy using five OCT datasets. V-JEPA consistently equaled or outperformed image-based models, achieving an average AUROC of 0.94 (0.80–0.99), versus 0.90 (0.76–0.98) for the best image model, a statistically significant improvement (p < 0.001). To our knowledge, this is the first application of transformer-based video models to volumetric OCT, highlighting their promise in 3D medical imaging.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"130 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147346791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}