Aminoglycoside dosing in suspected neonatal sepsis remains difficult due to highly variable pharmacokinetics driven by marked physiological diversity, from extremely preterm to term neonates, and further complicated by acute kidney injury, perinatal asphyxia, and concomitant interventions. We developed multiscale medical digital twins combining a physiologically-based pharmacokinetic model with an eco-evolutionary pharmacodynamic module capturing drug-modulated bacterial growth and resistance. Glomerular filtration rate is continuously updated using a long short-term memory neural network trained on real-world data. Calibrated on 1634 neonates, the framework enables in silico optimization of full-course antibiotic therapy through real and virtual cohorts, balancing efficacy and safety while accounting for resistance-driven changes in the minimum inhibitory concentration (MIC). Nonlinear optimal control achieved bacteriostatic exposure across all digital-twin neonates, with safety preserved in most cases at higher MICs. Model predictive control further reduced bacterial rebound during late therapy. This framework supports evolution-aware precision dosing of renally cleared antibiotics in vulnerable neonatal populations.
{"title":"Evolutionary digital twin framework for optimal aminoglycoside dosing in neonates with suspected sepsis.","authors":"Michela Prunella,Chiara Romano,Alessandro Borri,Nicola Altini,Maria Domenica Di Benedetto,Pieter Annaert,Karel Allegaert,Anne Smits,Vitoantonio Bevilacqua","doi":"10.1038/s41746-026-02558-w","DOIUrl":"https://doi.org/10.1038/s41746-026-02558-w","url":null,"abstract":"Aminoglycoside dosing in suspected neonatal sepsis remains difficult due to highly variable pharmacokinetics driven by marked physiological diversity, from extremely preterm to term neonates, and further complicated by acute kidney injury, perinatal asphyxia, and concomitant interventions. We developed multiscale medical digital twins combining a physiologically-based pharmacokinetic model with an eco-evolutionary pharmacodynamic module capturing drug-modulated bacterial growth and resistance. Glomerular filtration rate is continuously updated using a long short-term memory neural network trained on real-world data. Calibrated on 1634 neonates, the framework enables in silico optimization of full-course antibiotic therapy through real and virtual cohorts, balancing efficacy and safety while accounting for resistance-driven changes in the minimum inhibitory concentration (MIC). Nonlinear optimal control achieved bacteriostatic exposure across all digital-twin neonates, with safety preserved in most cases at higher MICs. Model predictive control further reduced bacterial rebound during late therapy. This framework supports evolution-aware precision dosing of renally cleared antibiotics in vulnerable neonatal populations.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"6 4 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147489963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-21DOI: 10.1038/s41746-026-02535-3
Oscar Freyer,Rebecca Mathias,Hannah Sophie Muti,Henry Orlovsky,Stephan Buch,Max Ostermann,Anett Schönfelder,Akira-Sebastian Poncette,Adel Bassily-Marcus,Stephen Gilbert
Artificial intelligence (AI) is increasingly explored for use in intensive care units. While most approved AI devices use narrow models, research is shifting towards generalist systems based on large language models and agentic AI. In this perspective, we propose a five-paradigm framework that shows how regulatory complexity rises with AI functionality and scale. As current regulatory frameworks are device-centric, new approaches like agentic oversight are needed for orchestrating AI systems.
{"title":"The regulation of artificial intelligence in intensive care units: from narrow tools to generalist systems.","authors":"Oscar Freyer,Rebecca Mathias,Hannah Sophie Muti,Henry Orlovsky,Stephan Buch,Max Ostermann,Anett Schönfelder,Akira-Sebastian Poncette,Adel Bassily-Marcus,Stephen Gilbert","doi":"10.1038/s41746-026-02535-3","DOIUrl":"https://doi.org/10.1038/s41746-026-02535-3","url":null,"abstract":"Artificial intelligence (AI) is increasingly explored for use in intensive care units. While most approved AI devices use narrow models, research is shifting towards generalist systems based on large language models and agentic AI. In this perspective, we propose a five-paradigm framework that shows how regulatory complexity rises with AI functionality and scale. As current regulatory frameworks are device-centric, new approaches like agentic oversight are needed for orchestrating AI systems.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"60 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147493043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The histological heterogeneity of primary tumors across the pan-cancer spectrum poses a formidable barrier to accurate lymph node metastasis assessment, often causing AI systems to make "overconfident errors" on rare variants that lead to missed diagnoses. To address this, we present UPATHLN, a unified diagnostic platform that synergizes a pathology foundation model-based encoder with a decoupled uncertainty estimation mechanism. We developed and validated the system using a large-scale multicentre dataset of 26,229 lymph nodes from 14 distinct primary origins. In internal validation, UPATHLN achieved an area under the curve (AUC) of 0.986. Crucially, the uncertainty module functioned as a decisive fail-safe: by flagging potential false-negative predictions for mandatory pathologist review, it intercepted all missed diagnoses, securing 100% conditional sensitivity across both the development and independent test cohorts-even for tumors from seven unseen primary origins. Concurrently, this mechanism reduced the review burden on negative lymph nodes by 73.2%. Ultimately, UPATHLN sets a new benchmark for safety-critical AI, demonstrating that explicitly modeling uncertainty is key to unlocking reliable, workload-efficient diagnostics at the pan-cancer scale.
{"title":"High-sensitivity pan-cancer AI assessment of lymph node metastasis via uncertainty quantification.","authors":"Xiaodong Wang,Ying Chen,Xiaohong Liu,Cen Qiu,Hong Tang,Tinggui Huang,Siqi Guo,Sainan Ma,Mengjiao Cai,Qingyun Sun,Zichen Chang,Jinge Liu,Xiongjun Wang,Jinda Li,Wulei Qian,Biyu Wang,Boan Zhang,Chenguang Bai,Min Shi,Xinlei Zhang,Meng Li,Jiahai Wang,Bin Wang,Jinlu Ma,Lirong Ai,Shaoqing Yu,Liming Wang,Ninghan Feng,Xiyang Liu,Guanzhen Yu","doi":"10.1038/s41746-026-02564-y","DOIUrl":"https://doi.org/10.1038/s41746-026-02564-y","url":null,"abstract":"The histological heterogeneity of primary tumors across the pan-cancer spectrum poses a formidable barrier to accurate lymph node metastasis assessment, often causing AI systems to make \"overconfident errors\" on rare variants that lead to missed diagnoses. To address this, we present UPATHLN, a unified diagnostic platform that synergizes a pathology foundation model-based encoder with a decoupled uncertainty estimation mechanism. We developed and validated the system using a large-scale multicentre dataset of 26,229 lymph nodes from 14 distinct primary origins. In internal validation, UPATHLN achieved an area under the curve (AUC) of 0.986. Crucially, the uncertainty module functioned as a decisive fail-safe: by flagging potential false-negative predictions for mandatory pathologist review, it intercepted all missed diagnoses, securing 100% conditional sensitivity across both the development and independent test cohorts-even for tumors from seven unseen primary origins. Concurrently, this mechanism reduced the review burden on negative lymph nodes by 73.2%. Ultimately, UPATHLN sets a new benchmark for safety-critical AI, demonstrating that explicitly modeling uncertainty is key to unlocking reliable, workload-efficient diagnostics at the pan-cancer scale.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"60 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147489915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inappropriate cervical cancer screening practices, including over- and under-screening, pose significant healthcare burdens in low-resource settings. This study analyzed screening behaviors and determinants among 33,362 women aged 35-64 in Wuxiang County, China, using longitudinal cohort data. Screening events were classified as guideline-adherent, over-screened, under-screened, or unscreened based on prior methods (HPV, cytology, or co-testing) and results, using cause-specific frailty models for analysis. Overall, only 19.9% of events were guideline-adherent, while 29.5% were over-screened and 50.6% were under- or unscreened. Notably, the implementation of a county-wide Electronic Medical Record (EMR) platform in 2022 coincided with a sharp decline in over-screening from 36.7% to 15.7%. Compared with primary HPV testing, prior co-testing increased the hazard of both over- and under-screening, whereas prior cytology was strongly associated with under-screening. Women with low-grade abnormalities (≤CIN1) showed a substantially higher risk of under-screening compared to those with negative results. Additionally, community residents were more prone to over-screening, while village residents faced higher under-screening risks. These findings suggest that transitioning to HPV-based screening and integrating EMR systems effectively reduces unnecessary testing, though enhanced reminder systems are crucial to address persistent under-screening in resource-constrained regions.
{"title":"Trends in over- and under-screening for cervical cancer after EMR implementation in rural China.","authors":"Yitong Zhu,Huike Wang,Bo Zhang,Mingyang Chen,Jinxiu Han,Xiaopin Shi,Hanyue Ding,Youlin Qiao","doi":"10.1038/s41746-026-02504-w","DOIUrl":"https://doi.org/10.1038/s41746-026-02504-w","url":null,"abstract":"Inappropriate cervical cancer screening practices, including over- and under-screening, pose significant healthcare burdens in low-resource settings. This study analyzed screening behaviors and determinants among 33,362 women aged 35-64 in Wuxiang County, China, using longitudinal cohort data. Screening events were classified as guideline-adherent, over-screened, under-screened, or unscreened based on prior methods (HPV, cytology, or co-testing) and results, using cause-specific frailty models for analysis. Overall, only 19.9% of events were guideline-adherent, while 29.5% were over-screened and 50.6% were under- or unscreened. Notably, the implementation of a county-wide Electronic Medical Record (EMR) platform in 2022 coincided with a sharp decline in over-screening from 36.7% to 15.7%. Compared with primary HPV testing, prior co-testing increased the hazard of both over- and under-screening, whereas prior cytology was strongly associated with under-screening. Women with low-grade abnormalities (≤CIN1) showed a substantially higher risk of under-screening compared to those with negative results. Additionally, community residents were more prone to over-screening, while village residents faced higher under-screening risks. These findings suggest that transitioning to HPV-based screening and integrating EMR systems effectively reduces unnecessary testing, though enhanced reminder systems are crucial to address persistent under-screening in resource-constrained regions.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"1 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147489920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lengthy acquisition time remains a key bottleneck for the widespread use of MRI in clinics. While accelerated MRI can reduce scan duration, it often introduces increased noise, compromising image quality and diagnostic reliability. In this study, we present a unified deep learning-based denoising model for multi-organ accelerated MRI, designed to operate directly on reconstructed images from commercial MRI systems. Our model was trained on a prospectively collected, large-scale real-world dataset comprising 148,930 noisy-clean image pairs from six clinical centers and four major MRI vendors, spanning six organs and 96 MRI protocols. On a test set of 20,143 real-world image pairs, our model consistently outperforms state-of-the-art denoising methods. Importantly, downstream evaluation using tissue segmentation demonstrates a 7.05% improvement in Dice score across multiple organs compared to noisy images. The model further generalizes effectively to 46,870 external clinical images from four independent cohorts, highlighting its robustness across various scanners and acquisition protocols. To assess clinical utility, two experienced radiologists conducted blinded evaluations across multiple organs, focusing on overall image quality, diagnostic confidence, and disease diagnosis. The denoised images retained high visual fidelity and yielded diagnostic performance equivalent to clean images even with acceleration factor of 3× compared to clinical scanning setup, such that many acquisitions can be completed within one minute. This unified MRI denoising model holds great potential for various clinical applications.
{"title":"Real-world unified denoising for multi-organ fast MRI: a large-scale prospective validation.","authors":"Yuchen Shao,Hongyan Huang,Lingyan Zhang,Dongsheng Li,Zhiguang Ding,Fan Wang,Shengli Chen,Shiwei Lin,Yuning Gu,Mu Du,Hongbing Li,Jiuping Liang,Xiaoqian Huang,Aowen Liu,Jiafu Zhong,Yiqiang Zhan,Xiang Sean Zhou,Feng Shi,Shu Liao,Kaicong Sun,Dinggang Shen,Yingwei Qiu","doi":"10.1038/s41746-026-02548-y","DOIUrl":"https://doi.org/10.1038/s41746-026-02548-y","url":null,"abstract":"Lengthy acquisition time remains a key bottleneck for the widespread use of MRI in clinics. While accelerated MRI can reduce scan duration, it often introduces increased noise, compromising image quality and diagnostic reliability. In this study, we present a unified deep learning-based denoising model for multi-organ accelerated MRI, designed to operate directly on reconstructed images from commercial MRI systems. Our model was trained on a prospectively collected, large-scale real-world dataset comprising 148,930 noisy-clean image pairs from six clinical centers and four major MRI vendors, spanning six organs and 96 MRI protocols. On a test set of 20,143 real-world image pairs, our model consistently outperforms state-of-the-art denoising methods. Importantly, downstream evaluation using tissue segmentation demonstrates a 7.05% improvement in Dice score across multiple organs compared to noisy images. The model further generalizes effectively to 46,870 external clinical images from four independent cohorts, highlighting its robustness across various scanners and acquisition protocols. To assess clinical utility, two experienced radiologists conducted blinded evaluations across multiple organs, focusing on overall image quality, diagnostic confidence, and disease diagnosis. The denoised images retained high visual fidelity and yielded diagnostic performance equivalent to clean images even with acceleration factor of 3× compared to clinical scanning setup, such that many acquisitions can be completed within one minute. This unified MRI denoising model holds great potential for various clinical applications.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"27 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-19DOI: 10.1038/s41746-026-02556-y
Imane Ben M'Barek,Badr Ben M'Barek,Grégoire Jauvion,Virginia Whelehan,Aris Papageorghiou,Erwan Le Pennec,Julien Stirnemann
Cardiotocography (CTG) interpretation during labour is subject to high interobserver variability, limiting its performance for predicting perinatal acidaemia. This study aimed to evaluate whether computerised CTG (cCTG) assistance improves clinicians' predictive performance. In a prospective randomised multi-reader design, 211 clinicians from 23 countries were proposed to assess 100 CTG recordings (50 with pH <7.15), with or without cCTG assistance. Participants predicted the occurrence of perinatal acidaemia. cCTG assistance significantly improved overall prediction, increasing the success rate from 54.0% to 61.4% (p < 0.01) and sensitivity from 49.3% to 61.7% (p < 0.01). There was no significant difference in specificity between groups (58.7% vs 61.2%, p = 0.14). In discordant cases, the cCTG model was correct 67.5% of the time. Agreement and reliability between clinicians were also improved across professions, countries and levels of experience. These findings suggest that cCTG enhances the detection of perinatal acidaemia.
临产时的心脏造影(CTG)解释受观察者之间的高度可变性的影响,限制了其预测围产期酸血症的性能。本研究旨在评估计算机化CTG (cCTG)辅助是否能提高临床医生的预测能力。在一项前瞻性随机多阅读器设计中,来自23个国家的211名临床医生被建议评估100份CTG记录(其中50份pH <7.15),有无cCTG辅助。参与者预测围产期酸血症的发生。cCTG辅助显著提高了总体预测,成功率从54.0%提高到61.4% (p < 0.01),敏感性从49.3%提高到61.7% (p < 0.01)。两组特异性差异无统计学意义(58.7% vs 61.2%, p = 0.14)。在不一致的情况下,cCTG模型的正确率为67.5%。不同专业、国家和经验水平的临床医生之间的一致性和可靠性也有所提高。这些结果表明,cCTG增强围产期酸血症的检测。
{"title":"Randomised study of human machine collaboration for cardiotocography interpretation during labour.","authors":"Imane Ben M'Barek,Badr Ben M'Barek,Grégoire Jauvion,Virginia Whelehan,Aris Papageorghiou,Erwan Le Pennec,Julien Stirnemann","doi":"10.1038/s41746-026-02556-y","DOIUrl":"https://doi.org/10.1038/s41746-026-02556-y","url":null,"abstract":"Cardiotocography (CTG) interpretation during labour is subject to high interobserver variability, limiting its performance for predicting perinatal acidaemia. This study aimed to evaluate whether computerised CTG (cCTG) assistance improves clinicians' predictive performance. In a prospective randomised multi-reader design, 211 clinicians from 23 countries were proposed to assess 100 CTG recordings (50 with pH <7.15), with or without cCTG assistance. Participants predicted the occurrence of perinatal acidaemia. cCTG assistance significantly improved overall prediction, increasing the success rate from 54.0% to 61.4% (p < 0.01) and sensitivity from 49.3% to 61.7% (p < 0.01). There was no significant difference in specificity between groups (58.7% vs 61.2%, p = 0.14). In discordant cases, the cCTG model was correct 67.5% of the time. Agreement and reliability between clinicians were also improved across professions, countries and levels of experience. These findings suggest that cCTG enhances the detection of perinatal acidaemia.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"59 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early identification of abnormal bone mineral density (BMD) through opportunistic screening is critical for preventing osteoporotic fractures. We validated an AI model in 2384 asymptomatic adults (57.7% female; mean age 43.6 years) undergoing health examinations in Taiwan. Using DXA as the reference, the model identified 255 suspected abnormal BMD cases, with 94 (3.9%) DXA-confirmed positive. Population-level performance was robust, yielding an AUC of 0.95 (95% CI 0.93-0.99) and sensitivity of 79.7% (95% CI 71.3-86.5%). Although BMI distributions paralleled East Asian regional trends, intersectional subgroup analyses remain exploratory due to small event counts. Decision curve analysis indicated superior net benefit for AI-based referral over "refer all" or "refer none" strategies, particularly for women with normal BMI (18.5-23 kg/m²). This AI tool offers precise triage for Asian health examination populations, though further validation in multi-center cohorts is required to confirm broad generalizability.
通过机会性筛查早期识别异常骨密度(BMD)对于预防骨质疏松性骨折至关重要。我们在台湾接受健康检查的2384名无症状成年人(57.7%为女性,平均年龄43.6岁)中验证了AI模型。以DXA为参照,该模型共鉴定出255例疑似BMD异常,其中94例(3.9%)为DXA阳性。总体水平的表现是稳健的,AUC为0.95 (95% CI 0.93-0.99),灵敏度为79.7% (95% CI 71.3-86.5%)。虽然BMI分布与东亚地区趋势相似,但由于事件数较少,交叉亚组分析仍然是探索性的。决策曲线分析表明,基于人工智能的转诊比“全部转诊”或“不转诊”策略的净收益更高,特别是对于BMI正常(18.5-23 kg/m²)的女性。该人工智能工具为亚洲健康检查人群提供了精确的分类,尽管需要在多中心队列中进一步验证以确认广泛的普遍性。
{"title":"Advancing diagnostic equity through artificial intelligence chest radiograph screening for osteoporosis in Asian populations.","authors":"Shu-Han Chen,Ray-E Chang,Chia-En Lien,Dun-Jhu Yang,Pei Yao,Meng-Lu Wu,Kun-Hui Chen","doi":"10.1038/s41746-026-02484-x","DOIUrl":"https://doi.org/10.1038/s41746-026-02484-x","url":null,"abstract":"Early identification of abnormal bone mineral density (BMD) through opportunistic screening is critical for preventing osteoporotic fractures. We validated an AI model in 2384 asymptomatic adults (57.7% female; mean age 43.6 years) undergoing health examinations in Taiwan. Using DXA as the reference, the model identified 255 suspected abnormal BMD cases, with 94 (3.9%) DXA-confirmed positive. Population-level performance was robust, yielding an AUC of 0.95 (95% CI 0.93-0.99) and sensitivity of 79.7% (95% CI 71.3-86.5%). Although BMI distributions paralleled East Asian regional trends, intersectional subgroup analyses remain exploratory due to small event counts. Decision curve analysis indicated superior net benefit for AI-based referral over \"refer all\" or \"refer none\" strategies, particularly for women with normal BMI (18.5-23 kg/m²). This AI tool offers precise triage for Asian health examination populations, though further validation in multi-center cohorts is required to confirm broad generalizability.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"11 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-19DOI: 10.1038/s41746-026-02552-2
Jasmine Chiat Ling Ong,Yilin Ning,Mingxuan Liu,Yian Ma,Liang Zhao,Kuldev Singh,Robert T Chang,Silke Vogel,John C W Lim,Iris Siu Kwan Tan,Oscar Freyer,Stephen Gilbert,Danielle S Bitterman,Xiaoxuan Liu,Alastair K Denniston,Nan Liu
The integration of generative AI (GenAI) and large language models (LLMs) in healthcare presents both unprecedented opportunities and challenges, necessitating innovative regulatory approaches. In this perspective, we discuss the risks of GenAI and LLM-based medical devices, the limitations of current medical device regulation frameworks when applied to GenAI or LLMs, and advocate for global collaboration in regulatory science research through engaging multidisciplinary expertise and focusing on the needs of diverse populations.
{"title":"Innovating global regulatory frameworks for generative AI in medical devices is an urgent priority.","authors":"Jasmine Chiat Ling Ong,Yilin Ning,Mingxuan Liu,Yian Ma,Liang Zhao,Kuldev Singh,Robert T Chang,Silke Vogel,John C W Lim,Iris Siu Kwan Tan,Oscar Freyer,Stephen Gilbert,Danielle S Bitterman,Xiaoxuan Liu,Alastair K Denniston,Nan Liu","doi":"10.1038/s41746-026-02552-2","DOIUrl":"https://doi.org/10.1038/s41746-026-02552-2","url":null,"abstract":"The integration of generative AI (GenAI) and large language models (LLMs) in healthcare presents both unprecedented opportunities and challenges, necessitating innovative regulatory approaches. In this perspective, we discuss the risks of GenAI and LLM-based medical devices, the limitations of current medical device regulation frameworks when applied to GenAI or LLMs, and advocate for global collaboration in regulatory science research through engaging multidisciplinary expertise and focusing on the needs of diverse populations.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"50 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-18DOI: 10.1038/s41746-026-02545-1
Selin S Everett, Bryan J Bunning, Priyank Jain, Ivan Lopez, Anup Agarwal, Manisha Desai, Robert Gallo, Ethan Goh, Vinay B Kadiyala, Zahir Kanjee, Jacob M Koshy, Andrew Olson, Adam Rodman, Kevin Schulman, Eric Strong, Jonathan H Chen, Eric Horvitz
Early studies of large language models (LLMs) in clinical settings have largely treated artificial intelligence (AI) as a tool rather than an active collaborator. As LLMs demonstrate expert-level diagnostic performance, the focus shifts from whether AI can offer valuable suggestions to how it integrates into physicians' diagnostic workflows. We conducted a randomized controlled trial (n = 70 clinicians) to assess a custom system designed for collaborative diagnostic reasoning. The design involved independent diagnostic assessments by the clinician and AI, followed by an AI-generated synthesis integrating both perspectives, highlighting agreements, disagreements, and offering commentary. We evaluated two collaborative workflows: AI as first opinion (preceding clinician) and AI as second opinion (following clinician). Both improved clinician diagnostic accuracy over conventional resources, (85% and 82% vs. 75%). Performance was comparable across workflows and not statistically different from AI-alone accuracy (90%), highlighting the potential of collaborative AI to complement clinician expertise. Qualitative analyses illustrate how workflow design shapes human-AI interaction. C: NCT06911645.
临床环境中大型语言模型(llm)的早期研究在很大程度上将人工智能(AI)视为一种工具,而不是积极的合作者。随着法学硕士展现出专家级的诊断能力,人们关注的焦点从人工智能能否提供有价值的建议,转移到如何将其整合到医生的诊断工作流程中。我们进行了一项随机对照试验(n = 70名临床医生),以评估设计用于协作诊断推理的定制系统。该设计包括临床医生和人工智能的独立诊断评估,然后由人工智能生成的综合分析,整合两种观点,突出一致意见和分歧,并提供评论。我们评估了两个协同工作流程:人工智能作为第一意见(前临床医生)和人工智能作为第二意见(后临床医生)。两者都比传统资源提高了临床医生的诊断准确性(85%和82% vs. 75%)。整个工作流程的性能相当,与人工智能单独的准确率(90%)没有统计学差异,突出了协作人工智能补充临床医生专业知识的潜力。定性分析说明工作流设计如何影响人类与人工智能的交互。C: NCT06911645。
{"title":"From tool to teammate in a randomized controlled trial of clinician-AI collaborative workflows for diagnosis.","authors":"Selin S Everett, Bryan J Bunning, Priyank Jain, Ivan Lopez, Anup Agarwal, Manisha Desai, Robert Gallo, Ethan Goh, Vinay B Kadiyala, Zahir Kanjee, Jacob M Koshy, Andrew Olson, Adam Rodman, Kevin Schulman, Eric Strong, Jonathan H Chen, Eric Horvitz","doi":"10.1038/s41746-026-02545-1","DOIUrl":"10.1038/s41746-026-02545-1","url":null,"abstract":"<p><p>Early studies of large language models (LLMs) in clinical settings have largely treated artificial intelligence (AI) as a tool rather than an active collaborator. As LLMs demonstrate expert-level diagnostic performance, the focus shifts from whether AI can offer valuable suggestions to how it integrates into physicians' diagnostic workflows. We conducted a randomized controlled trial (n = 70 clinicians) to assess a custom system designed for collaborative diagnostic reasoning. The design involved independent diagnostic assessments by the clinician and AI, followed by an AI-generated synthesis integrating both perspectives, highlighting agreements, disagreements, and offering commentary. We evaluated two collaborative workflows: AI as first opinion (preceding clinician) and AI as second opinion (following clinician). Both improved clinician diagnostic accuracy over conventional resources, (85% and 82% vs. 75%). Performance was comparable across workflows and not statistically different from AI-alone accuracy (90%), highlighting the potential of collaborative AI to complement clinician expertise. Qualitative analyses illustrate how workflow design shapes human-AI interaction. C: NCT06911645.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147481271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early detection of congenital ptosis is critical to prevent visual and psychosocial impairment in children, yet clinical assessment is challenged by limited patient cooperation and specialist availability. In this prospective, multicenter study, we developed and validated a smartphone-based system comprising three modules: morphological assessment, functional analysis, and a domain-adapted dialogue model, using 3164 blink clips and 1,229 facial images. The morphological module showed high measurement accuracy with intraclass correlation coefficients over 0.90 versus manual assessments. The functional module identified levator dysfunction with an area under the curve of 0.993, achieving robust functional stratification accuracy in both internal (0.91) and real-world (0.89) cohorts. The dialogue model demonstrated improved correctness and applicability over its baseline in addressing ptosis-related queries, achieving overall performance comparable to GPT-4o in expert evaluation and a patient satisfaction score of 4.93/5 in real-world deployment. This smartphone platform enables precise ptosis evaluation with patient-centered interaction, facilitating informed decision-making and personalized care in oculoplastic practice. ClinicalTrials.gov NCT07078552.
{"title":"From blink to care: smartphone video-based functional analysis and personalized management in pediatric blepharoptosis.","authors":"Huimin Li,Jing Cao,Shuangshuang Duan,Saiyu Hu,Lixia Lou,Ming Lin,Tianming Jian,Ji Shao,Xuan Zhang,Pengjie Chen,Yingcheng He,Jiawei Wang,Shoujun Huang,Juan Ye","doi":"10.1038/s41746-026-02510-y","DOIUrl":"https://doi.org/10.1038/s41746-026-02510-y","url":null,"abstract":"Early detection of congenital ptosis is critical to prevent visual and psychosocial impairment in children, yet clinical assessment is challenged by limited patient cooperation and specialist availability. In this prospective, multicenter study, we developed and validated a smartphone-based system comprising three modules: morphological assessment, functional analysis, and a domain-adapted dialogue model, using 3164 blink clips and 1,229 facial images. The morphological module showed high measurement accuracy with intraclass correlation coefficients over 0.90 versus manual assessments. The functional module identified levator dysfunction with an area under the curve of 0.993, achieving robust functional stratification accuracy in both internal (0.91) and real-world (0.89) cohorts. The dialogue model demonstrated improved correctness and applicability over its baseline in addressing ptosis-related queries, achieving overall performance comparable to GPT-4o in expert evaluation and a patient satisfaction score of 4.93/5 in real-world deployment. This smartphone platform enables precise ptosis evaluation with patient-centered interaction, facilitating informed decision-making and personalized care in oculoplastic practice. ClinicalTrials.gov NCT07078552.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"11 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147471791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}