Pub Date : 2026-03-09DOI: 10.1038/s41746-026-02514-8
Han Leong Goh,Vicente Sancenon,Benjamin M X Chu,Gerald C H Koh,Leroy Koh,Delia Teo,Maybelline S L Ooi,Corryne N Thng,Chia-Zhi Tan,David W L Chua,Andy W A Ta
The workforce shortages caused by aging populations demand a transition from reactive to preventive healthcare strategies. Generative Artificial Intelligence offers a promising solution through the use of agents that can generate personalised guidance. We implement a digital assistant powered by a multi-agent framework that generates and refines personalised health plans based on user interactions. A pilot study with a cohort of 20 residents and 7 clinicians revealed positive user acceptance. Both groups rated four success metrics significantly above neutral satisfaction levels (p values: <0.05). The majority of residents valued the personalisation (p value: 0.003), appreciated the level of granularity (p value: 0.0003), and did not express major concerns about the recommended plans (p value: 0.941). More than 50% of the collected feedback reflected a positive sentiment on the personalised diet (p value: 0.110), personalised exercise (p value: 0.003), and general features (p value: 6e-06). This pilot study highlights the potential of AI-driven digital assistants in supporting preventive healthcare programmes.
{"title":"Personalised health plan development using agentic AI in Singapore's national preventive care programme: a pilot study.","authors":"Han Leong Goh,Vicente Sancenon,Benjamin M X Chu,Gerald C H Koh,Leroy Koh,Delia Teo,Maybelline S L Ooi,Corryne N Thng,Chia-Zhi Tan,David W L Chua,Andy W A Ta","doi":"10.1038/s41746-026-02514-8","DOIUrl":"https://doi.org/10.1038/s41746-026-02514-8","url":null,"abstract":"The workforce shortages caused by aging populations demand a transition from reactive to preventive healthcare strategies. Generative Artificial Intelligence offers a promising solution through the use of agents that can generate personalised guidance. We implement a digital assistant powered by a multi-agent framework that generates and refines personalised health plans based on user interactions. A pilot study with a cohort of 20 residents and 7 clinicians revealed positive user acceptance. Both groups rated four success metrics significantly above neutral satisfaction levels (p values: <0.05). The majority of residents valued the personalisation (p value: 0.003), appreciated the level of granularity (p value: 0.0003), and did not express major concerns about the recommended plans (p value: 0.941). More than 50% of the collected feedback reflected a positive sentiment on the personalised diet (p value: 0.110), personalised exercise (p value: 0.003), and general features (p value: 6e-06). This pilot study highlights the potential of AI-driven digital assistants in supporting preventive healthcare programmes.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"11 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147381232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thirty percent of interval breast cancers, diagnosed between routine screening mammograms, have a poorer prognosis than screen-detected cancers. Deep learning algorithms can estimate short-term risk from negative mammograms to guide supplemental imaging or screening intervals, but comparative validation on complete national screening data is lacking. We retrospectively evaluated four risk algorithms (Mirai, iCAD, Transpara, and Google) using 112,621 negative mammograms from two UK NHS Breast Screening Programme sites with different mammography systems (Philips, GE) over one screening round (2014-2017) with five-year follow-up, including 1225 future cancers. There was a distinct ranking in discriminative ability; overall AUCs ranged 0.65-0.72, only one algorithm significantly differed between systems. For interval cancers, AUCs ranged 0.67-0.77. Within the highest 4.0% of risk scores, top algorithms identified ~20% of future cancers, including ~27% of interval cancers, doubling at the 14.0% threshold. These differences highlight the need for multi-algorithm prospective trials and potential fine-tuning to improve generalisation across unseen systems.
{"title":"Performance of breast cancer risk prediction algorithms across mammography systems in the UK screening programme.","authors":"Joshua Rothwell,Nicholas Payne,Fleur Kilburn-Toppin,Yuan Huang,Joshua Kaggie,Richard Black,Sarah Hickman,Bahman Kasmai,Arne Juette,Fiona Gilbert","doi":"10.1038/s41746-026-02507-7","DOIUrl":"https://doi.org/10.1038/s41746-026-02507-7","url":null,"abstract":"Thirty percent of interval breast cancers, diagnosed between routine screening mammograms, have a poorer prognosis than screen-detected cancers. Deep learning algorithms can estimate short-term risk from negative mammograms to guide supplemental imaging or screening intervals, but comparative validation on complete national screening data is lacking. We retrospectively evaluated four risk algorithms (Mirai, iCAD, Transpara, and Google) using 112,621 negative mammograms from two UK NHS Breast Screening Programme sites with different mammography systems (Philips, GE) over one screening round (2014-2017) with five-year follow-up, including 1225 future cancers. There was a distinct ranking in discriminative ability; overall AUCs ranged 0.65-0.72, only one algorithm significantly differed between systems. For interval cancers, AUCs ranged 0.67-0.77. Within the highest 4.0% of risk scores, top algorithms identified ~20% of future cancers, including ~27% of interval cancers, doubling at the 14.0% threshold. These differences highlight the need for multi-algorithm prospective trials and potential fine-tuning to improve generalisation across unseen systems.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"72 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147371004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In colorectal cancer liver metastases, chromosomal instability (CIN) serves as a critical hallmark linked to tumor aggressiveness and poor prognosis. This study integrated single-cell RNA sequencing, weighted gene co-expression network analysis, and non-negative matrix factorization to construct a comprehensive CIN index, revealing that CIN-high tumor cells exhibit more aggressive phenotypes and reside in an immune-excluded tumor microenvironment. Cancer-associated fibroblasts (CAFs) showed enhanced communication with CIN-high tumor cells, and a key CAF-derived gene, CCDC3, was experimentally validated to promote metastasis, proliferation, and CIN in vitro and in vivo. The bio-knowledge graph analysis based on artificial intelligence further revealed the core regulation of CCDC3 in chromosomal instability and liver metastasis of colorectal cancer. Mechanistically, CCDC3 physically interacts with CXCR3 on CRC cells, activating STAT3 phosphorylation and subsequent CDT1 transcription, forming a CCDC3/CXCR3/STAT3/CDT1 signaling axis. Disruption of this axis-either by genetic knockdown or pharmacological inhibition-significantly suppressed metastatic traits, tumor growth, and liver colonization in mouse models. Clinically, high CCDC3 expression correlated with elevated CIN signatures and worse patient survival. These findings uncover a novel CAF-driven signaling pathway that promotes CIN and metastatic progression in CRC, highlighting its potential as a therapeutic target for aggressive, CIN-high colorectal cancer.
{"title":"Combining AI to reveal CCDC3-mediated pathways of colorectal cancer liver metastasis.","authors":"Runze Huang,Qinyu Liu,Xin Jin,Xuanci Bai,Yibin Wu,Xigan He,Yixiu Wang,Ziting Jiang,Yongfa Zhang,Yi Shi,Lu Wang,Weiping Zhu","doi":"10.1038/s41746-026-02457-0","DOIUrl":"https://doi.org/10.1038/s41746-026-02457-0","url":null,"abstract":"In colorectal cancer liver metastases, chromosomal instability (CIN) serves as a critical hallmark linked to tumor aggressiveness and poor prognosis. This study integrated single-cell RNA sequencing, weighted gene co-expression network analysis, and non-negative matrix factorization to construct a comprehensive CIN index, revealing that CIN-high tumor cells exhibit more aggressive phenotypes and reside in an immune-excluded tumor microenvironment. Cancer-associated fibroblasts (CAFs) showed enhanced communication with CIN-high tumor cells, and a key CAF-derived gene, CCDC3, was experimentally validated to promote metastasis, proliferation, and CIN in vitro and in vivo. The bio-knowledge graph analysis based on artificial intelligence further revealed the core regulation of CCDC3 in chromosomal instability and liver metastasis of colorectal cancer. Mechanistically, CCDC3 physically interacts with CXCR3 on CRC cells, activating STAT3 phosphorylation and subsequent CDT1 transcription, forming a CCDC3/CXCR3/STAT3/CDT1 signaling axis. Disruption of this axis-either by genetic knockdown or pharmacological inhibition-significantly suppressed metastatic traits, tumor growth, and liver colonization in mouse models. Clinically, high CCDC3 expression correlated with elevated CIN signatures and worse patient survival. These findings uncover a novel CAF-driven signaling pathway that promotes CIN and metastatic progression in CRC, highlighting its potential as a therapeutic target for aggressive, CIN-high colorectal cancer.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"194 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-07DOI: 10.1038/s41746-026-02464-1
Phyllis M. Thangaraj, Sumukh Vasisht Shankar, Sicong Huang, Girish N. Nadkarni, Bobak J. Mortazavi, Evangelos K. Oikonomou, Rohan Khera
Randomized clinical trials (RCTs) guide medical practice; however, their generalizability across populations varies. We developed a statistically informed Generative Adversarial Network model, RCT-Twin-GAN, that leverages relationships between covariates and outcomes to generate a digital twin of an RCT conditioned on covariate distributions from a second patient population. We reproduced the disparate treatment effects of RCTs with similar interventions: the Systolic Blood Pressure Intervention Trial (SPRINT) and the Action to Control Cardiovascular Risk in Diabetes (ACCORD) Blood Pressure Trial. To demonstrate treatment effects of each RCT conditioned on the other RCT population, we evaluated the cardiovascular event-free survival of SPRINT-Twins conditioned on the ACCORD cohort and vice versa. The digital twins demonstrated balanced treatment arms (mean absolute standardized mean difference (MASMD)) of covariates 0.019 (SD 0.018), and the ACCORD-conditioned covariates of the SPRINT-Twins distributed more similarly to ACCORD than SPRINT (MASMD 0.0082 SD 0.016 vs. 0.46 SD 0.20). Notably, SPRINT-conditioned ACCORD-Twins reproduced the non-significant outcome seen in ACCORD (0.88 (0.73–1.06) vs. 0.87 (0.68–1.13)), while ACCORD-conditioned SPRINT-Twins reproduced the significant outcome seen in SPRINT (0.75 (0.64–0.89) vs. 0.79 (0.72–0.86)). Finally, we applied this approach to a real-world population in the electronic health record. RCT-Twin-GAN simulates the translation of RCT-derived treatment effects across patient populations.
{"title":"A novel digital twin strategy to examine the implications of randomized clinical trials for real-world populations","authors":"Phyllis M. Thangaraj, Sumukh Vasisht Shankar, Sicong Huang, Girish N. Nadkarni, Bobak J. Mortazavi, Evangelos K. Oikonomou, Rohan Khera","doi":"10.1038/s41746-026-02464-1","DOIUrl":"https://doi.org/10.1038/s41746-026-02464-1","url":null,"abstract":"Randomized clinical trials (RCTs) guide medical practice; however, their generalizability across populations varies. We developed a statistically informed Generative Adversarial Network model, RCT-Twin-GAN, that leverages relationships between covariates and outcomes to generate a digital twin of an RCT conditioned on covariate distributions from a second patient population. We reproduced the disparate treatment effects of RCTs with similar interventions: the Systolic Blood Pressure Intervention Trial (SPRINT) and the Action to Control Cardiovascular Risk in Diabetes (ACCORD) Blood Pressure Trial. To demonstrate treatment effects of each RCT conditioned on the other RCT population, we evaluated the cardiovascular event-free survival of SPRINT-Twins conditioned on the ACCORD cohort and vice versa. The digital twins demonstrated balanced treatment arms (mean absolute standardized mean difference (MASMD)) of covariates 0.019 (SD 0.018), and the ACCORD-conditioned covariates of the SPRINT-Twins distributed more similarly to ACCORD than SPRINT (MASMD 0.0082 SD 0.016 vs. 0.46 SD 0.20). Notably, SPRINT-conditioned ACCORD-Twins reproduced the non-significant outcome seen in ACCORD (0.88 (0.73–1.06) vs. 0.87 (0.68–1.13)), while ACCORD-conditioned SPRINT-Twins reproduced the significant outcome seen in SPRINT (0.75 (0.64–0.89) vs. 0.79 (0.72–0.86)). Finally, we applied this approach to a real-world population in the electronic health record. RCT-Twin-GAN simulates the translation of RCT-derived treatment effects across patient populations.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"75 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147371126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-07DOI: 10.1038/s41746-026-02475-y
Sicong Guo,Xinyi Zhao,Junhong Ren,Minakshmi Shaw,Jiye Wan,Jianyao Su
Accurate segmentation of anatomical structures in cardiac magnetic resonance imaging (MRI) plays an irreplaceable role in the clinical management of cardiovascular diseases, serving as a cornerstone for precise diagnosis, individualized treatment planning, and long-term prognosis assessment. Although deep learning techniques have demonstrated promising performance in achieving automatic segmentation of cardiac MRI anatomical structures, their heavy reliance on large-scale labeled datasets for model training presents notable challenges in the field of cardiac imaging, as the annotations can only be provided by medical specialists with extensive experience. Against this backdrop, this work proposes a mutual ensemble framework integrating data-level and network-level consistency for semi-supervised learning to utilize limited labeled and abundant unlabeled data. Extensive experiments demonstrate that our approach can successfully harness unlabeled data to improve performance, outperforming existing segmentation methods under the same conditions.
{"title":"Efficient cardiac MRI multi-structure segmentation for cardiovascular assessment with limited annotation by integrating data-level and network-level consistency.","authors":"Sicong Guo,Xinyi Zhao,Junhong Ren,Minakshmi Shaw,Jiye Wan,Jianyao Su","doi":"10.1038/s41746-026-02475-y","DOIUrl":"https://doi.org/10.1038/s41746-026-02475-y","url":null,"abstract":"Accurate segmentation of anatomical structures in cardiac magnetic resonance imaging (MRI) plays an irreplaceable role in the clinical management of cardiovascular diseases, serving as a cornerstone for precise diagnosis, individualized treatment planning, and long-term prognosis assessment. Although deep learning techniques have demonstrated promising performance in achieving automatic segmentation of cardiac MRI anatomical structures, their heavy reliance on large-scale labeled datasets for model training presents notable challenges in the field of cardiac imaging, as the annotations can only be provided by medical specialists with extensive experience. Against this backdrop, this work proposes a mutual ensemble framework integrating data-level and network-level consistency for semi-supervised learning to utilize limited labeled and abundant unlabeled data. Extensive experiments demonstrate that our approach can successfully harness unlabeled data to improve performance, outperforming existing segmentation methods under the same conditions.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"56 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147371005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-07DOI: 10.1038/s41746-026-02519-3
Aditya Narayan,Michael Blasingame,Sam Warmuth,Gabriella Palmeri,India Halm,Ramin Bastani,Whitney Engeran-Cordova,Harold J Phillips,Leandro Mena,Nirav R Shah
This retrospective cohort study evaluated an AI-powered chatbot used for PrEP support across AIDS Healthcare Foundation clinics in the United States. Among 155,217 eligible adults, individuals who engaged with the chatbot had higher rates of PrEP initiation, follow-up attendance, and appointment adherence than non-users. Engagement was greatest among younger and racial or ethnic minority patients. Findings suggest that AI-supported communication may enhance aspects of PrEP care delivery.
{"title":"AI-augmented communication improves HIV PrEP initiation and persistence in populations disproportionately impacted by HIV.","authors":"Aditya Narayan,Michael Blasingame,Sam Warmuth,Gabriella Palmeri,India Halm,Ramin Bastani,Whitney Engeran-Cordova,Harold J Phillips,Leandro Mena,Nirav R Shah","doi":"10.1038/s41746-026-02519-3","DOIUrl":"https://doi.org/10.1038/s41746-026-02519-3","url":null,"abstract":"This retrospective cohort study evaluated an AI-powered chatbot used for PrEP support across AIDS Healthcare Foundation clinics in the United States. Among 155,217 eligible adults, individuals who engaged with the chatbot had higher rates of PrEP initiation, follow-up attendance, and appointment adherence than non-users. Engagement was greatest among younger and racial or ethnic minority patients. Findings suggest that AI-supported communication may enhance aspects of PrEP care delivery.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"50 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1038/s41746-026-02501-z
Jan Kirchhoff,Fabian Berns,Christian Schieder,Johannes Schobel
AI-enabled diagnostic decision support systems (DDSS) could improve diagnostic accuracy and efficiency, yet adoption is often impeded by pricing approaches that rely on opaque technical usage metrics. We examined how pricing can remain clinically legible and budgetable while accounting for AI-specific technical and organizational cost drivers. We conducted semi-structured interviews with healthcare decision makers (n = 17) across hospital, outpatient, laboratory, and industry settings and conducted a deductive-inductive thematic analysis. Ten themes emerged, including widespread resistance to purely usage-based pricing and strong preferences for transparency and predictability. Participants supported hybrid models combining a base fee with variable components defined in clinically meaningful units (per patient, per test, or per episode) and emphasized reimbursement alignment alongside integration, training, and support as integral value elements. Outcome-linked payment was viewed as ethically compelling but operationally difficult. We synthesize these findings into stakeholder-informed design principles and actionable recommendations for pricing models that facilitate procurement, reimbursement fit, and sustainable scaling of diagnostic AI.
{"title":"Pricing models for diagnostic AI based on qualitative insights from healthcare decision makers.","authors":"Jan Kirchhoff,Fabian Berns,Christian Schieder,Johannes Schobel","doi":"10.1038/s41746-026-02501-z","DOIUrl":"https://doi.org/10.1038/s41746-026-02501-z","url":null,"abstract":"AI-enabled diagnostic decision support systems (DDSS) could improve diagnostic accuracy and efficiency, yet adoption is often impeded by pricing approaches that rely on opaque technical usage metrics. We examined how pricing can remain clinically legible and budgetable while accounting for AI-specific technical and organizational cost drivers. We conducted semi-structured interviews with healthcare decision makers (n = 17) across hospital, outpatient, laboratory, and industry settings and conducted a deductive-inductive thematic analysis. Ten themes emerged, including widespread resistance to purely usage-based pricing and strong preferences for transparency and predictability. Participants supported hybrid models combining a base fee with variable components defined in clinically meaningful units (per patient, per test, or per episode) and emphasized reimbursement alignment alongside integration, training, and support as integral value elements. Outcome-linked payment was viewed as ethically compelling but operationally difficult. We synthesize these findings into stakeholder-informed design principles and actionable recommendations for pricing models that facilitate procurement, reimbursement fit, and sustainable scaling of diagnostic AI.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"10 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1038/s41746-026-02401-2
Amir Rafati Fard, Simon C. Williams, Kieran J. Smith, Jasneet K. Dhaliwal, Tomas Ferreira, Adrito Das, Joachim Starup-Hansen, John G. Hanrahan, Chan Hee Koh, Danyal Z. Khan, Danail Stoyanov, Hani J. Marcus
This systematic review and meta-analysis examines the design of studies comparing the performance of artificial intelligence (AI) with that of healthcare professionals in the analysis of videos from surgical and interventional procedures, and quantitatively evaluates the performance of AI, unassisted healthcare professionals, and AI-assisted healthcare professionals. From the 37,956 studies identified, 146 were included, with 76 providing sufficient information for inclusion in our exploratory meta-analysis. AI had significantly greater sensitivity and comparable specificity compared to unassisted healthcare professionals at their respective peak performance levels, with a relative risk of 1.12 (95% CI 1.07–1.19, p < 0.001) and 1.04 (95% CI 0.98–1.10, p = 0.224), respectively. AI-assisted healthcare professionals had significantly greater sensitivity and specificity compared to unassisted healthcare professionals across all levels of expertise, with a relative risk of 1.18 (95% CI 1.12–1.25, p < 0.001) and 1.05 (95% CI 1.02–1.08, p < 0.001), respectively. There was no significant difference in sensitivity and specificity of AI-assisted expert healthcare professionals versus AI, with a relative risk of 0.99 (95% CI 0.95–1.04, p = 0.787) and 1.03 (95% CI 0.97–1.08, p = 0.395), respectively. Whilst most studies to date have evaluated AI head-to-head against unassisted healthcare professionals, fewer studies examined AI as an assistive tool, despite the real-world integration of AI more likely to involve assistance than autonomy.
本系统综述和荟萃分析检验了比较人工智能(AI)与医疗保健专业人员在外科和介入手术视频分析中的表现的研究设计,并定量评估了人工智能、无辅助医疗保健专业人员和人工智能辅助医疗保健专业人员的表现。在37956项研究中,有146项被纳入,其中76项提供了足够的信息,可以纳入我们的探索性荟萃分析。在各自的最高表现水平上,人工智能的敏感性和可比特异性明显高于无辅助医疗保健专业人员,相对风险分别为1.12 (95% CI 1.07-1.19, p < 0.001)和1.04 (95% CI 0.98-1.10, p = 0.224)。人工智能辅助的医疗保健专业人员在所有专业水平上都比无辅助的医疗保健专业人员具有更高的敏感性和特异性,相对风险分别为1.18 (95% CI 1.12-1.25, p < 0.001)和1.05 (95% CI 1.02-1.08, p < 0.001)。人工智能辅助的专家医疗保健专业人员与人工智能的敏感性和特异性无显著差异,相对风险分别为0.99 (95% CI 0.95-1.04, p = 0.787)和1.03 (95% CI 0.97-1.08, p = 0.395)。虽然迄今为止大多数研究都是将人工智能与无辅助的医疗保健专业人员进行面对面的评估,但将人工智能作为辅助工具进行评估的研究较少,尽管人工智能在现实世界中的整合更有可能涉及辅助而不是自主。
{"title":"Comparing artificial intelligence and healthcare professional performance in surgical and interventional video analysis: a systematic review and meta-analysis","authors":"Amir Rafati Fard, Simon C. Williams, Kieran J. Smith, Jasneet K. Dhaliwal, Tomas Ferreira, Adrito Das, Joachim Starup-Hansen, John G. Hanrahan, Chan Hee Koh, Danyal Z. Khan, Danail Stoyanov, Hani J. Marcus","doi":"10.1038/s41746-026-02401-2","DOIUrl":"https://doi.org/10.1038/s41746-026-02401-2","url":null,"abstract":"This systematic review and meta-analysis examines the design of studies comparing the performance of artificial intelligence (AI) with that of healthcare professionals in the analysis of videos from surgical and interventional procedures, and quantitatively evaluates the performance of AI, unassisted healthcare professionals, and AI-assisted healthcare professionals. From the 37,956 studies identified, 146 were included, with 76 providing sufficient information for inclusion in our exploratory meta-analysis. AI had significantly greater sensitivity and comparable specificity compared to unassisted healthcare professionals at their respective peak performance levels, with a relative risk of 1.12 (95% CI 1.07–1.19, p < 0.001) and 1.04 (95% CI 0.98–1.10, p = 0.224), respectively. AI-assisted healthcare professionals had significantly greater sensitivity and specificity compared to unassisted healthcare professionals across all levels of expertise, with a relative risk of 1.18 (95% CI 1.12–1.25, p < 0.001) and 1.05 (95% CI 1.02–1.08, p < 0.001), respectively. There was no significant difference in sensitivity and specificity of AI-assisted expert healthcare professionals versus AI, with a relative risk of 0.99 (95% CI 0.95–1.04, p = 0.787) and 1.03 (95% CI 0.97–1.08, p = 0.395), respectively. Whilst most studies to date have evaluated AI head-to-head against unassisted healthcare professionals, fewer studies examined AI as an assistive tool, despite the real-world integration of AI more likely to involve assistance than autonomy.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"49 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147350614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1038/s41746-026-02497-6
Lauren H Cooke, Matthias Jung, Jan M Brendel, Nora M Kerkovits, Borek Foldyna, Michael T Lu, Vineet K Raghu
Chest radiographs (CXRs) are among the most common tests in medicine; automated interpretation may reduce radiologists' workload and expand access. Deep learning multi-task and foundation models have shown strong CXR interpretation performance but are vulnerable to shortcut learning, where spurious correlations drive decision-making. We introduce RoentMod, a counterfactual image editing framework that generates realistic CXRs with user-specified and synthetic pathology while maintaining the original anatomical features. RoentMod combines an open-source medical image generator (RoentGen) with an image-to-image modification model without retraining. In reader studies of RoentMod-produced images, 93% appeared realistic, 89-99% correctly incorporated the specified finding, and all preserved native anatomy comparable to real follow-up CXRs. Using RoentMod, we demonstrate that state-of-the-art multi-task and foundation models frequently exploit off-target pathology as shortcuts, limiting their specificity. Incorporating RoentMod-generated counterfactual images during training mitigated this vulnerability, improving model discrimination across multiple pathologies by 3-19% AUC in internal validation and by 1-11% for 5 out of 6 tested pathologies in external testing. These findings establish RoentMod as a tool to probe and correct shortcut learning in medical AI. By enabling controlled counterfactual interventions, RoentMod enhances the robustness and interpretability of CXR interpretation models and provides a strategy to improve medical imaging models.
{"title":"RoentMod: a synthetic chest X-ray modification model to identify and correct image interpretation model shortcuts.","authors":"Lauren H Cooke, Matthias Jung, Jan M Brendel, Nora M Kerkovits, Borek Foldyna, Michael T Lu, Vineet K Raghu","doi":"10.1038/s41746-026-02497-6","DOIUrl":"https://doi.org/10.1038/s41746-026-02497-6","url":null,"abstract":"<p><p>Chest radiographs (CXRs) are among the most common tests in medicine; automated interpretation may reduce radiologists' workload and expand access. Deep learning multi-task and foundation models have shown strong CXR interpretation performance but are vulnerable to shortcut learning, where spurious correlations drive decision-making. We introduce RoentMod, a counterfactual image editing framework that generates realistic CXRs with user-specified and synthetic pathology while maintaining the original anatomical features. RoentMod combines an open-source medical image generator (RoentGen) with an image-to-image modification model without retraining. In reader studies of RoentMod-produced images, 93% appeared realistic, 89-99% correctly incorporated the specified finding, and all preserved native anatomy comparable to real follow-up CXRs. Using RoentMod, we demonstrate that state-of-the-art multi-task and foundation models frequently exploit off-target pathology as shortcuts, limiting their specificity. Incorporating RoentMod-generated counterfactual images during training mitigated this vulnerability, improving model discrimination across multiple pathologies by 3-19% AUC in internal validation and by 1-11% for 5 out of 6 tested pathologies in external testing. These findings establish RoentMod as a tool to probe and correct shortcut learning in medical AI. By enabling controlled counterfactual interventions, RoentMod enhances the robustness and interpretability of CXR interpretation models and provides a strategy to improve medical imaging models.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147369864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial intelligence has made significant strides in predicting major adverse cardiovascular events (MACE) in patients with acute myocardial infarction (AMI) following percutaneous coronary intervention. However, most existing methods rely solely on tabular variables derived from clinical data and cardiac magnetic resonance (CMR), without fully leveraging the predictive potential of the CMR imaging modality itself. Moreover, these approaches often overlook the synergistic benefits of multimodal integration between imaging and tabular data. In addition, current models primarily focus on short-term MACE risk assessment (e.g., within 6 months or 1 year), limiting their applicability for long-term prognostication. To address these limitations, we first developed ReconSeg3D, a model that reconstructs short-axis cine CMR stacks into temporally-resolved 3D bi-ventricular volumes, capturing fine-grained cardiac anatomy and dynamic motion. These bi-ventricular sequences were then integrated with 45 clinical and CMR-derived variables using spatiotemporal decomposition and cross-attention mechanisms to construct a multimodal MACE prediction model-HeartTTable. HeartTTable achieved a 5-year time-dependent AUC of 0.934 (95% CI 0.907-0.959) and a Harrell's C-index of 0.897 for predicting MACE risk, significantly outperforming models based solely on clinical and CMR-derived tabular features, and demonstrated strong capabilities in postoperative risk stratification. Our study contributes to improved long-term postoperative management for AMI patients by offering clinicians an objective, data-driven decision-support tool.
人工智能在预测急性心肌梗死(AMI)患者经皮冠状动脉介入治疗后的主要不良心血管事件(MACE)方面取得了重大进展。然而,大多数现有方法仅依赖于从临床数据和心脏磁共振(CMR)得出的表格变量,而没有充分利用CMR成像模式本身的预测潜力。此外,这些方法往往忽略了影像和表格数据之间多模式整合的协同效益。此外,目前的模型主要侧重于短期MACE风险评估(例如6个月或1年内),限制了它们对长期预测的适用性。为了解决这些限制,我们首先开发了ReconSeg3D,这是一种将短轴CMR堆栈重建为临时分辨的3D双心室体积的模型,可以捕获细粒度的心脏解剖和动态运动。然后利用时空分解和交叉注意机制,将这些双心室序列与45个临床和cmr衍生变量整合,构建多模态MACE预测模型- hearttable。在预测MACE风险方面,hearttable的5年时间依赖AUC为0.934 (95% CI 0.907-0.959), Harrell's C-index为0.897,明显优于单纯基于临床和cmr衍生的表格特征的模型,在术后风险分层方面表现出强大的能力。我们的研究为临床医生提供了一个客观的、数据驱动的决策支持工具,有助于改善AMI患者的长期术后管理。
{"title":"3D Spatiotemporal cardiac reconstruction for predicting MACE in acute myocardial infarction.","authors":"Qiang Gao,Jingping Wu,Yingshuang Gao,Yongyong Ren,Xiaolei Wang,Guojun Zhu,Jinyi Xiang,Dongaolei An,Lei Xu,Yan Zhou,Jun Pu,Dan Mu,Lei Zhao,Hui Lu,Lian-Ming Wu","doi":"10.1038/s41746-026-02449-0","DOIUrl":"https://doi.org/10.1038/s41746-026-02449-0","url":null,"abstract":"Artificial intelligence has made significant strides in predicting major adverse cardiovascular events (MACE) in patients with acute myocardial infarction (AMI) following percutaneous coronary intervention. However, most existing methods rely solely on tabular variables derived from clinical data and cardiac magnetic resonance (CMR), without fully leveraging the predictive potential of the CMR imaging modality itself. Moreover, these approaches often overlook the synergistic benefits of multimodal integration between imaging and tabular data. In addition, current models primarily focus on short-term MACE risk assessment (e.g., within 6 months or 1 year), limiting their applicability for long-term prognostication. To address these limitations, we first developed ReconSeg3D, a model that reconstructs short-axis cine CMR stacks into temporally-resolved 3D bi-ventricular volumes, capturing fine-grained cardiac anatomy and dynamic motion. These bi-ventricular sequences were then integrated with 45 clinical and CMR-derived variables using spatiotemporal decomposition and cross-attention mechanisms to construct a multimodal MACE prediction model-HeartTTable. HeartTTable achieved a 5-year time-dependent AUC of 0.934 (95% CI 0.907-0.959) and a Harrell's C-index of 0.897 for predicting MACE risk, significantly outperforming models based solely on clinical and CMR-derived tabular features, and demonstrated strong capabilities in postoperative risk stratification. Our study contributes to improved long-term postoperative management for AMI patients by offering clinicians an objective, data-driven decision-support tool.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"19 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}