Pub Date : 2026-03-16DOI: 10.1038/s41746-026-02551-3
Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang
Electrocardiograms (ECGs) are essential, non-invasive diagnostic tools for assessing cardiac conditions. Existing methods often have limited generalizability, focus on narrow condition sets, and rely on raw physiological signals, which may be unavailable in resource-limited settings where only printed or digital ECG images are accessible. Recent advances in multimodal large language models (MLLMs) offer new opportunities, yet ECG image interpretation remains challenging due to the lack of instruction-tuning data and standardized benchmarks. To address these gaps, we introduce ECGInstruct, the first large-scale ECG image instruction-tuning dataset with over one million samples, covering diverse tasks including feature recognition, rhythm analysis, morphology assessment, and clinical report generation. We develop PULSE, a fully open-source MLLM for ECG image interpretation trained on ECGInstruct. We further curate ECGBench, a human expert-developed benchmark spanning four core ECG interpretation tasks across nine datasets, incorporating both synthesized and real-world ECG images to enable clinically realistic evaluation. Our experiments demonstrate that PULSEestablishes a new state of the art, outperforming general-purpose MLLMs by 21% to 33% in average accuracy. These results highlight the potential of PULSEto improve ECG image interpretation in clinical practice. All code, data and models are available at https://aimedlab.github.io/PULSE/.
{"title":"Teaching multimodal LLMs to comprehend 12-lead electrocardiographic images","authors":"Ruoqi Liu, Yuelin Bai, Xiang Yue, Ping Zhang","doi":"10.1038/s41746-026-02551-3","DOIUrl":"https://doi.org/10.1038/s41746-026-02551-3","url":null,"abstract":"Electrocardiograms (ECGs) are essential, non-invasive diagnostic tools for assessing cardiac conditions. Existing methods often have limited generalizability, focus on narrow condition sets, and rely on raw physiological signals, which may be unavailable in resource-limited settings where only printed or digital ECG images are accessible. Recent advances in multimodal large language models (MLLMs) offer new opportunities, yet ECG image interpretation remains challenging due to the lack of instruction-tuning data and standardized benchmarks. To address these gaps, we introduce ECGInstruct, the first large-scale ECG image instruction-tuning dataset with over one million samples, covering diverse tasks including feature recognition, rhythm analysis, morphology assessment, and clinical report generation. We develop PULSE, a fully open-source MLLM for ECG image interpretation trained on ECGInstruct. We further curate ECGBench, a human expert-developed benchmark spanning four core ECG interpretation tasks across nine datasets, incorporating both synthesized and real-world ECG images to enable clinically realistic evaluation. Our experiments demonstrate that PULSEestablishes a new state of the art, outperforming general-purpose MLLMs by 21% to 33% in average accuracy. These results highlight the potential of PULSEto improve ECG image interpretation in clinical practice. All code, data and models are available at https://aimedlab.github.io/PULSE/.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"212 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147465227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-16DOI: 10.1038/s41746-026-02530-8
Aikeliyaer Ainiwaer, Tom J.A.J. Konings, Kaisaierjiang Kadier, Xiang Ma, Muhammet Emin Akpulat, Frits W. Prinzen, Tammo Delhaas, Hongxing Luo
Coronary artery disease (CAD) remains a major contributor to morbidity and mortality worldwide. Heart sound analysis has been investigated as a noninvasive approach to CAD detection, although existing evidence has been inconsistent. This systematic review evaluated the diagnostic performance of heart sound analysis for identifying CAD (≥50% stenosis). A search of four databases identified 1082 records, among which 40 studies involving 13,814 participants met the inclusion criteria. Among the 21 studies using signal processing methods, all but one of the larger studies (>50 participants, n = 15) reported diagnostic accuracy below 75%. The majority of signal processing studies lacked validation on independent datasets, thereby limiting confidence in the reliability of their reported performance. In contrast, 15 of the 19 studies applying machine learning-based methods reported accuracy, sensitivity, and specificity consistently above 80%. Moreover, 15 of these 19 studies conducted independent dataset validation, indicating comparatively stronger generalizability. Studies that used the full heart sound signal as model input also tended to achieve higher sensitivity than those using only the diastolic component, suggesting that utilizing the complete waveform preserves diagnostically informative features. These findings indicate that machine learning-based heart sound analysis may have diagnostic value for CAD, and larger multicenter studies are needed to further assess its clinical applicability and robustness.
{"title":"Coronary artery disease diagnosis with signal processing and machine learning of heart sound signals: a systematic review","authors":"Aikeliyaer Ainiwaer, Tom J.A.J. Konings, Kaisaierjiang Kadier, Xiang Ma, Muhammet Emin Akpulat, Frits W. Prinzen, Tammo Delhaas, Hongxing Luo","doi":"10.1038/s41746-026-02530-8","DOIUrl":"https://doi.org/10.1038/s41746-026-02530-8","url":null,"abstract":"Coronary artery disease (CAD) remains a major contributor to morbidity and mortality worldwide. Heart sound analysis has been investigated as a noninvasive approach to CAD detection, although existing evidence has been inconsistent. This systematic review evaluated the diagnostic performance of heart sound analysis for identifying CAD (≥50% stenosis). A search of four databases identified 1082 records, among which 40 studies involving 13,814 participants met the inclusion criteria. Among the 21 studies using signal processing methods, all but one of the larger studies (>50 participants, n = 15) reported diagnostic accuracy below 75%. The majority of signal processing studies lacked validation on independent datasets, thereby limiting confidence in the reliability of their reported performance. In contrast, 15 of the 19 studies applying machine learning-based methods reported accuracy, sensitivity, and specificity consistently above 80%. Moreover, 15 of these 19 studies conducted independent dataset validation, indicating comparatively stronger generalizability. Studies that used the full heart sound signal as model input also tended to achieve higher sensitivity than those using only the diastolic component, suggesting that utilizing the complete waveform preserves diagnostically informative features. These findings indicate that machine learning-based heart sound analysis may have diagnostic value for CAD, and larger multicenter studies are needed to further assess its clinical applicability and robustness.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"101 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147465228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15DOI: 10.1038/s41746-026-02533-5
Rui Santos,Delia Cabrera DeBuc,Gabor Márk Somfai
{"title":"Cautious optimism on foundation models in medical imaging balancing privacy and innovation.","authors":"Rui Santos,Delia Cabrera DeBuc,Gabor Márk Somfai","doi":"10.1038/s41746-026-02533-5","DOIUrl":"https://doi.org/10.1038/s41746-026-02533-5","url":null,"abstract":"","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"54 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147461696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-15DOI: 10.1038/s41746-026-02537-1
Jun Liang,Mengyao Xing,Peng Xiang,Guixuan Wang,Ming Chen,Qichuan Fang,Tingting Zhou,Zhengyang Lu,XueMing Leng,Jiuke Huang,Xiaoyi Jiao,Chenghua Tian,Jianbo Lei
No LLMs (Large Language Models) have yet been evaluated for understanding picture reports. Pure-tone audiograms, the gold standard for hearing loss assessment, are technical and often incomprehensible to patients without specialist interpretation. We conducted a blinded, multicenter evaluation of eight LLMs across diagnostic, interpretive, and recommendation tasks using 140 audiogram reports, assessed by clinicians and lay reviewers. The study revealed that DeepSeek-V3 achieved the highest diagnostic accuracy (severity: 67.00% ; type: 54.00%), R1 proved most suitable for general readership (FKGL: 6.41). The general public perceived significant benefits from all models in comprehension and emotional support, with Gemini 2.0 Flash/Thinking scoring higher. Challenges remain in understanding pathological mechanisms and controlling hallucinations. While current general-purpose LLMs cannot replace the diagnostic capabilities of physicians, they may serve as effective auxiliary tools for translating specialized audiogram data into structured, patient-accessible interpretations, with particular relevance for populations facing limited access to hearing-care services.
{"title":"A multicenter multifunctional assessment of large language models in pure-tone audiogram interpretation for patients.","authors":"Jun Liang,Mengyao Xing,Peng Xiang,Guixuan Wang,Ming Chen,Qichuan Fang,Tingting Zhou,Zhengyang Lu,XueMing Leng,Jiuke Huang,Xiaoyi Jiao,Chenghua Tian,Jianbo Lei","doi":"10.1038/s41746-026-02537-1","DOIUrl":"https://doi.org/10.1038/s41746-026-02537-1","url":null,"abstract":"No LLMs (Large Language Models) have yet been evaluated for understanding picture reports. Pure-tone audiograms, the gold standard for hearing loss assessment, are technical and often incomprehensible to patients without specialist interpretation. We conducted a blinded, multicenter evaluation of eight LLMs across diagnostic, interpretive, and recommendation tasks using 140 audiogram reports, assessed by clinicians and lay reviewers. The study revealed that DeepSeek-V3 achieved the highest diagnostic accuracy (severity: 67.00% ; type: 54.00%), R1 proved most suitable for general readership (FKGL: 6.41). The general public perceived significant benefits from all models in comprehension and emotional support, with Gemini 2.0 Flash/Thinking scoring higher. Challenges remain in understanding pathological mechanisms and controlling hallucinations. While current general-purpose LLMs cannot replace the diagnostic capabilities of physicians, they may serve as effective auxiliary tools for translating specialized audiogram data into structured, patient-accessible interpretations, with particular relevance for populations facing limited access to hearing-care services.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"92 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147454567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-14DOI: 10.1038/s41746-026-02521-9
Yang Xin,Deng Yan,Luo Shuren,Luo Minyang,Lu Liuheng
Concerns that AI tools may erode diagnostic reasoning contrast with claims that AI can foster higher-order thinking. This longitudinal study followed 372 medical students across 12 months of supervised rotations using an AI-assisted diagnosis system. AI-assisted diagnosis participation, AI literacy and medical critical thinking were assessed at baseline, 6 months and 12 months. Cross-lagged panel models examined prospective associations, statistical mediation by AI literacy and moderation by prior technological experience and learning goal orientation. Higher participation was associated with increases in AI literacy and critical thinking, and AI literacy statistically mediated the participation-to-critical thinking association. Indirect effects were stronger among students with greater technological experience and mastery-oriented goals and weaker among performance-oriented peers. Findings indicate that, within supervised clinical training, engagement with AI systems is associated with critical thinking development partly through enhanced AI literacy, supporting AI tools as educational resources under faculty guidance.
{"title":"AI literacy mediates AI assisted diagnosis participation and critical thinking among medical students under supervision.","authors":"Yang Xin,Deng Yan,Luo Shuren,Luo Minyang,Lu Liuheng","doi":"10.1038/s41746-026-02521-9","DOIUrl":"https://doi.org/10.1038/s41746-026-02521-9","url":null,"abstract":"Concerns that AI tools may erode diagnostic reasoning contrast with claims that AI can foster higher-order thinking. This longitudinal study followed 372 medical students across 12 months of supervised rotations using an AI-assisted diagnosis system. AI-assisted diagnosis participation, AI literacy and medical critical thinking were assessed at baseline, 6 months and 12 months. Cross-lagged panel models examined prospective associations, statistical mediation by AI literacy and moderation by prior technological experience and learning goal orientation. Higher participation was associated with increases in AI literacy and critical thinking, and AI literacy statistically mediated the participation-to-critical thinking association. Indirect effects were stronger among students with greater technological experience and mastery-oriented goals and weaker among performance-oriented peers. Findings indicate that, within supervised clinical training, engagement with AI systems is associated with critical thinking development partly through enhanced AI literacy, supporting AI tools as educational resources under faculty guidance.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"16 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147454571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-14DOI: 10.1038/s41746-026-02522-8
Lijuan Wu,Liyi Mai,Hongnian Wang,Jinxin Huang,Xinrong He,Xueyun Zhan,Anna Khalemsky,Vijaya Arun Kumar,James H Paxton,Dionyssios Tsilimingras,Said Hachimi-Idrissi,Shan W Liu,Gabriele Savioli,Niels K Rathlev,Karim Tazarourte,Anna Slagman,Michael Christ,Muhammad Qureshi,Hani Hariri,Shamai A Grossman,Bei Hu,Huajun Wang,Binbin He,Phillip D Levy,Brian J O'Neil,Seth Gemme,Lisa Kurland,Eddy Lang,Jinle Lin,Huiying Liang,Xin Li,Abdelouahab Bellou
Alert fatigue remains a major barrier to the effective deployment of predictive models in emergency care, particularly in the context of rare but critical outcomes such as in-hospital mortality (IHM), which often occurs in less than 5.0% of patients admitted from the emergency department (ED). Severe class imbalance leads to low positive predictive value (PPV), undermining the clinical utility of even high-performance predictive models. To address this issue, we propose AI-TEW (Artificial Intelligence-powered Tiered Early Warning), a novel two-stage early warning framework designed to reduce false alarms and improve clinical interpretability. In Stage 1, a robust machine learning model was developed and validated using data from 174,292 ED visits across three hospitals in China and the United States. The model demonstrated strong discriminative ability for IHM prediction, achieving AUROCs ranging from 0.84 (95% CI, 0.81-0.86) to 0.91 (95% CI, 0.90-0.91) in internal and external validation cohorts. In Stage 2, AI-TEW implements a tiered risk stratification strategy by optimizing decision thresholds to prioritize high-risk patients, thereby increasing PPV from baseline levels of 9.8-18.8% to 32.5-40.5% across sites, while maintaining a high negative predictive value (NPV) of over 98% for low-risk individuals. To further refine alert precision, a knowledge-based filtering layer is introduced, leveraging large language models (LLM) to interpret patient-specific risk factors derived from SHAP (Shapley Additive exPlanations) method. Integrating explainable AI with clinical reasoning enhances contextual understanding and reduces spurious alerts, leading to an 11.53% increase in PPV in external validation (p = 0.0092 for MedGemma). By integrating improved predictive efficiency with interpretable, knowledge-informed filtering, AI-TEW reduces alert burden while supporting timely clinical intervention, demonstrating a promising approach to mitigating the impact of class imbalance in emergency risk prediction.
{"title":"Artificial Intelligence-powered tiered early warning framework addressing high false alarm rates for in-hospital mortality prediction.","authors":"Lijuan Wu,Liyi Mai,Hongnian Wang,Jinxin Huang,Xinrong He,Xueyun Zhan,Anna Khalemsky,Vijaya Arun Kumar,James H Paxton,Dionyssios Tsilimingras,Said Hachimi-Idrissi,Shan W Liu,Gabriele Savioli,Niels K Rathlev,Karim Tazarourte,Anna Slagman,Michael Christ,Muhammad Qureshi,Hani Hariri,Shamai A Grossman,Bei Hu,Huajun Wang,Binbin He,Phillip D Levy,Brian J O'Neil,Seth Gemme,Lisa Kurland,Eddy Lang,Jinle Lin,Huiying Liang,Xin Li,Abdelouahab Bellou","doi":"10.1038/s41746-026-02522-8","DOIUrl":"https://doi.org/10.1038/s41746-026-02522-8","url":null,"abstract":"Alert fatigue remains a major barrier to the effective deployment of predictive models in emergency care, particularly in the context of rare but critical outcomes such as in-hospital mortality (IHM), which often occurs in less than 5.0% of patients admitted from the emergency department (ED). Severe class imbalance leads to low positive predictive value (PPV), undermining the clinical utility of even high-performance predictive models. To address this issue, we propose AI-TEW (Artificial Intelligence-powered Tiered Early Warning), a novel two-stage early warning framework designed to reduce false alarms and improve clinical interpretability. In Stage 1, a robust machine learning model was developed and validated using data from 174,292 ED visits across three hospitals in China and the United States. The model demonstrated strong discriminative ability for IHM prediction, achieving AUROCs ranging from 0.84 (95% CI, 0.81-0.86) to 0.91 (95% CI, 0.90-0.91) in internal and external validation cohorts. In Stage 2, AI-TEW implements a tiered risk stratification strategy by optimizing decision thresholds to prioritize high-risk patients, thereby increasing PPV from baseline levels of 9.8-18.8% to 32.5-40.5% across sites, while maintaining a high negative predictive value (NPV) of over 98% for low-risk individuals. To further refine alert precision, a knowledge-based filtering layer is introduced, leveraging large language models (LLM) to interpret patient-specific risk factors derived from SHAP (Shapley Additive exPlanations) method. Integrating explainable AI with clinical reasoning enhances contextual understanding and reduces spurious alerts, leading to an 11.53% increase in PPV in external validation (p = 0.0092 for MedGemma). By integrating improved predictive efficiency with interpretable, knowledge-informed filtering, AI-TEW reduces alert burden while supporting timely clinical intervention, demonstrating a promising approach to mitigating the impact of class imbalance in emergency risk prediction.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"55 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147454570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-14DOI: 10.1038/s41746-026-02524-6
Zheyuan Wang,Yukun Zhou,Yilan Wu,Jocelyn Hui Lin Goh,Ke Zou,Zhouyu Guan,Yibing Chen,Gabriel Dawei Yang,Ping Zhang,Changchang Yin,An Ran Ran,Miao Li Chee,Can Can Xue,Zhi da Soh,Samantha Yew,Danqi Fang,Xujia Liu,Benjamin Sommer Thinggaard,Jakob Grauslund,Haoxuan Li,Yixiao Jin,Jia Shu,Tingyao Li,Nan Jiang,Tingli Chen,Huating Li,Xiangning Wang,Qiang Wu,Charumathi Sabanayagam,Siegfried K Wagner,Carol Y Cheung,Ching-Yu Cheng,Bin Sheng,Tien Yin Wong,Pearse A Keane,Yih-Chung Tham
Foundation models (FMs) enable generalizable medical AI, but existing retinal FMs perform best on cross-sectional classification and detection and are less effective for predicting disease incidence and progression. We present RETFound Plus, a CFP-based FM trained with temporal modeling on 1,304,292 fundus photographs from 304,345 participants across multiple visits to learn progression-aware representations. Compared with RETFound, RETFound Plus improved calibration and 5-year risk prediction across systemic and ocular diseases, with larger gains for systemic outcomes (stroke, myocardial infarction, diabetes and hypertension; +4-10% c-index) than ocular outcomes (diabetic retinopathy and glaucoma; +3-7% c-index), and improved risk stratification for systemic diseases (1.2-2.1-fold higher hazard-ratio trend). Results were consistent across external multi-regional, multi-ethnic datasets from the UK, US, Singapore, Hong Kong, and Denmark.
{"title":"Time and person sensitive foundation model for disease prediction and risk stratification.","authors":"Zheyuan Wang,Yukun Zhou,Yilan Wu,Jocelyn Hui Lin Goh,Ke Zou,Zhouyu Guan,Yibing Chen,Gabriel Dawei Yang,Ping Zhang,Changchang Yin,An Ran Ran,Miao Li Chee,Can Can Xue,Zhi da Soh,Samantha Yew,Danqi Fang,Xujia Liu,Benjamin Sommer Thinggaard,Jakob Grauslund,Haoxuan Li,Yixiao Jin,Jia Shu,Tingyao Li,Nan Jiang,Tingli Chen,Huating Li,Xiangning Wang,Qiang Wu,Charumathi Sabanayagam,Siegfried K Wagner,Carol Y Cheung,Ching-Yu Cheng,Bin Sheng,Tien Yin Wong,Pearse A Keane,Yih-Chung Tham","doi":"10.1038/s41746-026-02524-6","DOIUrl":"https://doi.org/10.1038/s41746-026-02524-6","url":null,"abstract":"Foundation models (FMs) enable generalizable medical AI, but existing retinal FMs perform best on cross-sectional classification and detection and are less effective for predicting disease incidence and progression. We present RETFound Plus, a CFP-based FM trained with temporal modeling on 1,304,292 fundus photographs from 304,345 participants across multiple visits to learn progression-aware representations. Compared with RETFound, RETFound Plus improved calibration and 5-year risk prediction across systemic and ocular diseases, with larger gains for systemic outcomes (stroke, myocardial infarction, diabetes and hypertension; +4-10% c-index) than ocular outcomes (diabetic retinopathy and glaucoma; +3-7% c-index), and improved risk stratification for systemic diseases (1.2-2.1-fold higher hazard-ratio trend). Results were consistent across external multi-regional, multi-ethnic datasets from the UK, US, Singapore, Hong Kong, and Denmark.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"25 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147454568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-14DOI: 10.1038/s41746-026-02517-5
Bernardo G Collaco,Syed Ali Haider,Srinivasagam Prabha,Cesar A Gomez-Cabello,Ariana Genovese,Nadia G Wood,Sanjay P Bagaria,Narayanan Gopala,Cui Tao,Antonio Jorge Forte
Agentic AI represents a promising evolution of artificial intelligence in healthcare, with systems capable of operating autonomously to achieve defined clinical goals. However, the literature lacks conceptual clarity in distinguishing AI agents from Agentic AI, and few studies have rigorously explored their applications. We conducted a scoping review across five databases, identifying seven eligible studies spanning emergency medicine, oncology, radiology, and rehabilitation. The included systems demonstrated features such as autonomous operation, goal-directed behavior, action initiation, and, in some cases, multi-agent collaboration. Reported outcomes included high accuracy in cancer diagnosis, treatment planning, alert generation, coaching, and workflow optimization. Despite promising results, most studies were exploratory, limited in scope, and lacked robust clinical validation, with only one trial involving patients. These findings highlight both the potential and immaturity of Agentic AI in healthcare, underscoring the need for standardized definitions, regulatory guidance, and rigorous evaluation to ensure safe and effective integration into practice.
{"title":"The role of agentic artificial intelligence in healthcare: a scoping review.","authors":"Bernardo G Collaco,Syed Ali Haider,Srinivasagam Prabha,Cesar A Gomez-Cabello,Ariana Genovese,Nadia G Wood,Sanjay P Bagaria,Narayanan Gopala,Cui Tao,Antonio Jorge Forte","doi":"10.1038/s41746-026-02517-5","DOIUrl":"https://doi.org/10.1038/s41746-026-02517-5","url":null,"abstract":"Agentic AI represents a promising evolution of artificial intelligence in healthcare, with systems capable of operating autonomously to achieve defined clinical goals. However, the literature lacks conceptual clarity in distinguishing AI agents from Agentic AI, and few studies have rigorously explored their applications. We conducted a scoping review across five databases, identifying seven eligible studies spanning emergency medicine, oncology, radiology, and rehabilitation. The included systems demonstrated features such as autonomous operation, goal-directed behavior, action initiation, and, in some cases, multi-agent collaboration. Reported outcomes included high accuracy in cancer diagnosis, treatment planning, alert generation, coaching, and workflow optimization. Despite promising results, most studies were exploratory, limited in scope, and lacked robust clinical validation, with only one trial involving patients. These findings highlight both the potential and immaturity of Agentic AI in healthcare, underscoring the need for standardized definitions, regulatory guidance, and rigorous evaluation to ensure safe and effective integration into practice.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"483 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147454569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-13DOI: 10.1038/s41746-026-02531-7
Ashleigh Golden,Elias Aboujaoude
Millions are turning to general-purpose AI chatbots for psychological support, potentially reinforcing symptoms such as intolerance of uncertainty, "need to know" compulsions, and perfectionism. Clinical observation and emerging research suggest chatbot features exacerbate transdiagnostic avoidance-a process integral to OCD and anxiety-perpetuating maladaptive cycles and hindering corrective learning. We propose a framework in which avoidance is reinforced through repeated chatbot interactions, and outline strategies for clinicians, users, developers, and policymakers to support healthier engagement.
{"title":"A transdiagnostic model for how general purpose AI chatbots can perpetuate OCD and anxiety disorders.","authors":"Ashleigh Golden,Elias Aboujaoude","doi":"10.1038/s41746-026-02531-7","DOIUrl":"https://doi.org/10.1038/s41746-026-02531-7","url":null,"abstract":"Millions are turning to general-purpose AI chatbots for psychological support, potentially reinforcing symptoms such as intolerance of uncertainty, \"need to know\" compulsions, and perfectionism. Clinical observation and emerging research suggest chatbot features exacerbate transdiagnostic avoidance-a process integral to OCD and anxiety-perpetuating maladaptive cycles and hindering corrective learning. We propose a framework in which avoidance is reinforced through repeated chatbot interactions, and outline strategies for clinicians, users, developers, and policymakers to support healthier engagement.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"30 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147446895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effective first-trimester screening for congenital heart disease (CHD) remains an unmet clinical need, hindered by technical constraints and the lack of validated diagnostic tools. While artificial intelligence (AI) offers promise, its progress is restricted by data scarcity and privacy concerns surrounding data sharing. Federated learning (FL) offers a promising paradigm for collaborative model training without exposing sensitive patient data. In this study, we establish a Federated Congenital Heart Disease Learning to enable cross-hospital collaboration in early CHD diagnosis. A major challenge arises from inter-hospital heterogeneity, where variations in ultrasound devices, scanning protocols, and patient demographics lead to significant feature distribution shifts, resulting in poor performance. To address this, we introduce federated prototypes that align both clinical concept and disease subtype representations across participating sites, effectively calibrating local updates and enhancing global consistency. Experiments conducted across four tertiary hospitals demonstrate that our method achieves a 10.3% improvement in F1 score, 5.1% increase in sensitivity, and 1.0% improvement in specificity over state-of-the-art federated approaches. These results highlight our effectiveness in improving generalization under real-world clinical heterogeneity. Our implementation and benchmarking resources are publicly available at: https://github.com/WenkeHuang/FLCHD.
{"title":"Federated clinical concept and disease semantic learning for congenital heart disease diagnosis.","authors":"Wenke Huang,Yangxu Liao,Wenjia Lei,Guancheng Wan,Xuankun Rong,Chi Wen,He Li,Mang Ye,Qingqing Wu,Bo Du","doi":"10.1038/s41746-026-02487-8","DOIUrl":"https://doi.org/10.1038/s41746-026-02487-8","url":null,"abstract":"Effective first-trimester screening for congenital heart disease (CHD) remains an unmet clinical need, hindered by technical constraints and the lack of validated diagnostic tools. While artificial intelligence (AI) offers promise, its progress is restricted by data scarcity and privacy concerns surrounding data sharing. Federated learning (FL) offers a promising paradigm for collaborative model training without exposing sensitive patient data. In this study, we establish a Federated Congenital Heart Disease Learning to enable cross-hospital collaboration in early CHD diagnosis. A major challenge arises from inter-hospital heterogeneity, where variations in ultrasound devices, scanning protocols, and patient demographics lead to significant feature distribution shifts, resulting in poor performance. To address this, we introduce federated prototypes that align both clinical concept and disease subtype representations across participating sites, effectively calibrating local updates and enhancing global consistency. Experiments conducted across four tertiary hospitals demonstrate that our method achieves a 10.3% improvement in F1 score, 5.1% increase in sensitivity, and 1.0% improvement in specificity over state-of-the-art federated approaches. These results highlight our effectiveness in improving generalization under real-world clinical heterogeneity. Our implementation and benchmarking resources are publicly available at: https://github.com/WenkeHuang/FLCHD.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"54 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147446889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}