Automatically deriving radiological diagnoses from brain MRI report findings is challenging due to high complexity and domain expertise. This study evaluated 10 large language models (LLMs) in generating diagnoses from brain MRI report findings, using 4293 reports (9973 diagnostic labels) covering 15 brain disease categories from three medical centers. DeepSeek-R1 achieved the highest performance among the evaluated models on the full dataset and across different clinical scenarios and subgroups, particularly when provided with structured report findings and clinical information. A top three differential-diagnosis prompting strategy achieved superior performance, with 97.6% patient-level accuracy versus 87.1% for single-diagnosis prompting. The diagnostic performance of six radiologists was assessed with and without DeepSeek-R1 assistance on 500 reports. Integration of DeepSeek-R1 significantly improved diagnostic accuracy (AUPRC: 0.774-0.893) and reduced reading time (from 61 to 53 s), with more pronounced benefits for junior radiologists. Our findings indicate that effective automated diagnostic impression generation in brain MRI reporting requires advanced large-scale LLMs like DeepSeek-R1. With optimized prompting and input strategies, this framework may serve as a supportive tool in drafting brain MRI reports and contribute to enhanced workflow efficiency in radiology practice.
{"title":"Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study.","authors":"Ming-Liang Wang,Rui-Peng Zhang,Wen-Juan Wu,Yu Lu,Xiao-Er Wei,Zheng Sun,Bao-Hui Guan,Jun-Jie Zhang,Xue Wu,Lei Zhang,Tian-Le Wang,Yue-Hua Li","doi":"10.1038/s41746-026-02380-4","DOIUrl":"https://doi.org/10.1038/s41746-026-02380-4","url":null,"abstract":"Automatically deriving radiological diagnoses from brain MRI report findings is challenging due to high complexity and domain expertise. This study evaluated 10 large language models (LLMs) in generating diagnoses from brain MRI report findings, using 4293 reports (9973 diagnostic labels) covering 15 brain disease categories from three medical centers. DeepSeek-R1 achieved the highest performance among the evaluated models on the full dataset and across different clinical scenarios and subgroups, particularly when provided with structured report findings and clinical information. A top three differential-diagnosis prompting strategy achieved superior performance, with 97.6% patient-level accuracy versus 87.1% for single-diagnosis prompting. The diagnostic performance of six radiologists was assessed with and without DeepSeek-R1 assistance on 500 reports. Integration of DeepSeek-R1 significantly improved diagnostic accuracy (AUPRC: 0.774-0.893) and reduced reading time (from 61 to 53 s), with more pronounced benefits for junior radiologists. Our findings indicate that effective automated diagnostic impression generation in brain MRI reporting requires advanced large-scale LLMs like DeepSeek-R1. With optimized prompting and input strategies, this framework may serve as a supportive tool in drafting brain MRI reports and contribute to enhanced workflow efficiency in radiology practice.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"16 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1038/s41746-026-02363-5
Matthias Kirchler, Matteo Ferro, Veronica Lorenzini, Robin P van de Water, Christoph Lippert, Andrea Ganna
Variation in medical practices and reporting standards across healthcare systems limits the transferability of prediction models based on structured electronic health record data. Prior studies have demonstrated that embedding medical codes into a shared semantic space can help address these discrepancies, but real-world applications remain limited. Here, we show that leveraging embeddings from a large language model alongside a transformer-based prediction model provides an effective and scalable solution to enhance generalizability. We call this approach GRASP and apply it to predict the onset of 21 diseases and all-cause mortality in over one million individuals. Trained on the UK Biobank (UK) and evaluated in FinnGen (Finland) and Mount Sinai (USA), GRASP achieved an average ΔC-index that was 88% and 47% higher than language-unaware models, respectively. GRASP also showed significantly higher correlations with polygenic risk scores for 62% of diseases, and maintained robust performance even when datasets were not harmonized to the same data model.
{"title":"Large language models improve transferability of electronic health record-based predictions across countries and coding systems.","authors":"Matthias Kirchler, Matteo Ferro, Veronica Lorenzini, Robin P van de Water, Christoph Lippert, Andrea Ganna","doi":"10.1038/s41746-026-02363-5","DOIUrl":"10.1038/s41746-026-02363-5","url":null,"abstract":"<p><p>Variation in medical practices and reporting standards across healthcare systems limits the transferability of prediction models based on structured electronic health record data. Prior studies have demonstrated that embedding medical codes into a shared semantic space can help address these discrepancies, but real-world applications remain limited. Here, we show that leveraging embeddings from a large language model alongside a transformer-based prediction model provides an effective and scalable solution to enhance generalizability. We call this approach GRASP and apply it to predict the onset of 21 diseases and all-cause mortality in over one million individuals. Trained on the UK Biobank (UK) and evaluated in FinnGen (Finland) and Mount Sinai (USA), GRASP achieved an average ΔC-index that was 88% and 47% higher than language-unaware models, respectively. GRASP also showed significantly higher correlations with polygenic risk scores for 62% of diseases, and maintained robust performance even when datasets were not harmonized to the same data model.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1038/s41746-025-02321-7
Yichuan G Liang,Leo Fan,Armando Teixeira-Pinto,Gerald Liew,Andrew J R White
Glaucoma is the leading cause of irreversible blindness worldwide with heterogeneous progression rates. Artificial Intelligence (AI) may enable accurate progression predictions in clinical practice. We conducted a systematic review to survey quantitative AI performance and examine strengths and shortfalls in current AI approaches with future clinical implementation in mind. Two reviewers independently screened studies in English from MEDLINE, Embase, Web of Science, Cochrane CENTRAL and arXiv since 2014 and performed risk of bias assessment on eligible studies using QUADAS-2. 46 reports of 43 unique studies demonstrated moderate to good performance in predicting glaucoma conversion, biological deterioration and progression to surgery. Several challenges for clinical translation remain, including inconsistent reporting, limitations and heterogeneity in study design and poor AI generalisability and transparency. We encourage future studies to adopt robust study design and transparent reporting and propose the first glaucoma-specific list of recommended practices and reporting items for future clinical implementation.
青光眼是世界范围内不可逆性失明的主要原因,其进展率参差不齐。人工智能(AI)可以在临床实践中实现准确的进展预测。我们进行了一项系统综述,以调查定量人工智能的表现,并考虑到未来的临床实施,检查当前人工智能方法的优势和不足。两位审稿人独立筛选了2014年以来来自MEDLINE、Embase、Web of Science、Cochrane CENTRAL和arXiv的英文研究,并使用QUADAS-2对符合条件的研究进行了偏倚风险评估。43项独特研究的46份报告显示,在预测青光眼转化、生物学恶化和进展到手术方面有中等到良好的效果。临床翻译仍然面临一些挑战,包括不一致的报告、研究设计的局限性和异质性、人工智能的广泛性和透明度差。我们鼓励未来的研究采用稳健的研究设计和透明的报告,并为未来的临床实施提出第一个青光眼特异性推荐做法和报告项目清单。
{"title":"A systematic review of AI for predicting glaucoma progression: challenges and recommendations towards clinical implementation.","authors":"Yichuan G Liang,Leo Fan,Armando Teixeira-Pinto,Gerald Liew,Andrew J R White","doi":"10.1038/s41746-025-02321-7","DOIUrl":"https://doi.org/10.1038/s41746-025-02321-7","url":null,"abstract":"Glaucoma is the leading cause of irreversible blindness worldwide with heterogeneous progression rates. Artificial Intelligence (AI) may enable accurate progression predictions in clinical practice. We conducted a systematic review to survey quantitative AI performance and examine strengths and shortfalls in current AI approaches with future clinical implementation in mind. Two reviewers independently screened studies in English from MEDLINE, Embase, Web of Science, Cochrane CENTRAL and arXiv since 2014 and performed risk of bias assessment on eligible studies using QUADAS-2. 46 reports of 43 unique studies demonstrated moderate to good performance in predicting glaucoma conversion, biological deterioration and progression to surgery. Several challenges for clinical translation remain, including inconsistent reporting, limitations and heterogeneity in study design and poor AI generalisability and transparency. We encourage future studies to adopt robust study design and transparent reporting and propose the first glaucoma-specific list of recommended practices and reporting items for future clinical implementation.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"35 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Axial spondyloarthritis (axSpA) is an inflammatory disease marked by chronic low back pain, with a global average diagnostic delay of 6.7 years. Early diagnosis is crucial for improving prognosis and reducing disability rates, yet primary care physicians (PCPs) may find it challenging to ensure timely recognition and referrals. This study developed and validated Spondyloarthritis Agents (SpAgents), an early diagnostic system based on a multi-agent framework integrating large language models (LLMs) and imaging models. The SpAgents framework includes PlannerAgent, DataAgent, ToolAgent, and DoctorAgent, supported by long-term memory for dynamic knowledge updates. We enrolled 596 patients, dividing 545 from one hospital into a training dataset (n = 359) and a validation dataset (n = 186), along with an independent cohort of 51 patients from five additional hospitals for testing. SpAgents demonstrated strong diagnostic performance, achieving sensitivity of 0.8615 and specificity of 0.8000 during validation, and 0.9375 and 0.7368 during testing. SpAgents exhibited significantly higher sensitivity (0.9400) and accuracy (0.8600) than both PCPs and junior rheumatologists, with overall performance equivalent to that of senior rheumatologists. Under SpAgents-assisted diagnosis, both PCPs and junior rheumatologists showed marked improvements in sensitivity and accuracy. SpAgents effectively enhance early axSpA identification among PCPs, offering an innovative solution to reduce diagnostic delays.
{"title":"Early diagnosis of axial spondyloarthritis in primary care using multi-agent systems.","authors":"Xiaojian Ji,Zhuofeng Li,Lulu Zeng,Lidong Hu,Yanyan Wang,Kui Zhang,Lianjie Shi,Meng Wei,Lifeng Chen,Lin Guo,Jing Dong,An'an Wang,Lei Sun,Yimin Song,Huatao Wang,Jingming Wang,Ying Lei,Wenqian Yue,Zheng Zhao,Jian Zhu,Feng Huang,Jing Zhang,Tao Li,Kunpeng Li","doi":"10.1038/s41746-026-02372-4","DOIUrl":"https://doi.org/10.1038/s41746-026-02372-4","url":null,"abstract":"Axial spondyloarthritis (axSpA) is an inflammatory disease marked by chronic low back pain, with a global average diagnostic delay of 6.7 years. Early diagnosis is crucial for improving prognosis and reducing disability rates, yet primary care physicians (PCPs) may find it challenging to ensure timely recognition and referrals. This study developed and validated Spondyloarthritis Agents (SpAgents), an early diagnostic system based on a multi-agent framework integrating large language models (LLMs) and imaging models. The SpAgents framework includes PlannerAgent, DataAgent, ToolAgent, and DoctorAgent, supported by long-term memory for dynamic knowledge updates. We enrolled 596 patients, dividing 545 from one hospital into a training dataset (n = 359) and a validation dataset (n = 186), along with an independent cohort of 51 patients from five additional hospitals for testing. SpAgents demonstrated strong diagnostic performance, achieving sensitivity of 0.8615 and specificity of 0.8000 during validation, and 0.9375 and 0.7368 during testing. SpAgents exhibited significantly higher sensitivity (0.9400) and accuracy (0.8600) than both PCPs and junior rheumatologists, with overall performance equivalent to that of senior rheumatologists. Under SpAgents-assisted diagnosis, both PCPs and junior rheumatologists showed marked improvements in sensitivity and accuracy. SpAgents effectively enhance early axSpA identification among PCPs, offering an innovative solution to reduce diagnostic delays.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"6 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1038/s41746-026-02368-0
Clay Smyth,Md Fahim Anjum,Jin-Xiao Zhang,Jiaang Yao,Reza Abbasi-Asl,Philip Starr,Simon Little
Impaired sleep in Parkinson's Disease (PD) is a significant unmet need. Targeting sleep stage-specific neurophysiologies with adaptive Deep Brain Stimulation (aDBS) may ameliorate sleep disruption. This study analyzes the efficacy of personalized machine learning approaches on classifying sleep stages from participants receiving deep brain stimulation. We acquired 283 hours of multi-night intracranial cortico-basal recordings with synchronized sleep stage labels derived from scalp EEG across 5 participants during chronic stimulation. Five-stage classification accuracy across PD subjects averaged 80.2% (±0.9% SEM). When constraining sleep classification to algorithms implementable in currently available DBS devices, e.g., binary NREM classification using linear models, an average accuracy of 85.9% (±0.4% SEM) was achieved for PD subjects. Additionally, linear models trained on unsupervised cluster labels achieved an average accuracy of 83.5% (±5.6% SEM) when discriminating NREM sleep. Overall, this demonstrates the feasibility of personalized supervised and unsupervised ML models for sleep classification using intracranial data during stimulation. The Institutional Review Board approved the parent study protocol, and the study was registered on clinicaltrials.gov (NCT0358289; IDE G180097).
帕金森病(PD)的睡眠受损是一个重要的未满足的需求。适应性脑深部刺激(aDBS)针对睡眠阶段特异性神经生理可能改善睡眠中断。本研究分析了个性化机器学习方法对接受深部脑刺激的参与者进行睡眠阶段分类的功效。在慢性刺激下,我们获得了5名参与者283小时的多夜颅内皮质-基底记录,这些记录与同步睡眠阶段标签来自头皮脑电图。PD受试者的五阶段分类准确率平均为80.2%(±0.9% SEM)。当将睡眠分类限制在目前可用的DBS设备中可实现的算法时,例如,使用线性模型的二元NREM分类,PD受试者的平均准确率为85.9%(±0.4% SEM)。此外,在无监督聚类标签上训练的线性模型在区分NREM睡眠时的平均准确率为83.5%(±5.6% SEM)。总的来说,这证明了在刺激期间使用颅内数据进行个性化监督和无监督ML模型进行睡眠分类的可行性。机构审查委员会批准了母体研究方案,该研究已在clinicaltrials.gov上注册(NCT0358289; IDE G180097)。
{"title":"Personalized supervised and unsupervised intracranial sleep decoding during deep brain stimulation.","authors":"Clay Smyth,Md Fahim Anjum,Jin-Xiao Zhang,Jiaang Yao,Reza Abbasi-Asl,Philip Starr,Simon Little","doi":"10.1038/s41746-026-02368-0","DOIUrl":"https://doi.org/10.1038/s41746-026-02368-0","url":null,"abstract":"Impaired sleep in Parkinson's Disease (PD) is a significant unmet need. Targeting sleep stage-specific neurophysiologies with adaptive Deep Brain Stimulation (aDBS) may ameliorate sleep disruption. This study analyzes the efficacy of personalized machine learning approaches on classifying sleep stages from participants receiving deep brain stimulation. We acquired 283 hours of multi-night intracranial cortico-basal recordings with synchronized sleep stage labels derived from scalp EEG across 5 participants during chronic stimulation. Five-stage classification accuracy across PD subjects averaged 80.2% (±0.9% SEM). When constraining sleep classification to algorithms implementable in currently available DBS devices, e.g., binary NREM classification using linear models, an average accuracy of 85.9% (±0.4% SEM) was achieved for PD subjects. Additionally, linear models trained on unsupervised cluster labels achieved an average accuracy of 83.5% (±5.6% SEM) when discriminating NREM sleep. Overall, this demonstrates the feasibility of personalized supervised and unsupervised ML models for sleep classification using intracranial data during stimulation. The Institutional Review Board approved the parent study protocol, and the study was registered on clinicaltrials.gov (NCT0358289; IDE G180097).","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"86 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1038/s41746-026-02349-3
Waverly Wei, Junzhe Shao, Rita Qiuran Lyu, Rebecca Hemono, Xinwei Ma, Joseph Giorgio, Zeyu Zheng, Feng Ji, Xiaoya Zhang, Emmanuel Katabaro, Matilda Mlowe, Amon Sabasaba, Caroline Lister, Siraji Shabani, Prosper Njau, Sandra I. McCoy, Jingshen Wang
Sustained engagement in HIV care and adherence to ART are crucial for meeting the UNAIDS “95-95-95” targets. Disengagement from care remains a significant issue, especially in sub-Saharan Africa. Traditional machine learning (ML) models have had moderate success in predicting disengagement, enabling early intervention. We developed an enhanced large language model (LLM) fine-tuned with electronic medical records (EMRs) to predict individuals at risk of disengaging from HIV care in Tanzania. Using 4.8 million EMR records from the National HIV Care and Treatment Program (2018–2023), we identified risks of ART non-adherence, non-suppressed viral load, and loss to follow-up. Our enhanced LLM may outperform traditional machine learning models and zero-shot LLMs. HIV physicians in Tanzania evaluated the model’s predictions and justifications, finding 65% alignment with expert assessments, and 92.3% of the aligned cases were considered clinically relevant. This model can support data-driven decisions and may improve patient outcomes and reduce HIV transmission.
{"title":"Enhanced language models for predicting and understanding HIV care disengagement: a case study in Tanzania","authors":"Waverly Wei, Junzhe Shao, Rita Qiuran Lyu, Rebecca Hemono, Xinwei Ma, Joseph Giorgio, Zeyu Zheng, Feng Ji, Xiaoya Zhang, Emmanuel Katabaro, Matilda Mlowe, Amon Sabasaba, Caroline Lister, Siraji Shabani, Prosper Njau, Sandra I. McCoy, Jingshen Wang","doi":"10.1038/s41746-026-02349-3","DOIUrl":"https://doi.org/10.1038/s41746-026-02349-3","url":null,"abstract":"Sustained engagement in HIV care and adherence to ART are crucial for meeting the UNAIDS “95-95-95” targets. Disengagement from care remains a significant issue, especially in sub-Saharan Africa. Traditional machine learning (ML) models have had moderate success in predicting disengagement, enabling early intervention. We developed an enhanced large language model (LLM) fine-tuned with electronic medical records (EMRs) to predict individuals at risk of disengaging from HIV care in Tanzania. Using 4.8 million EMR records from the National HIV Care and Treatment Program (2018–2023), we identified risks of ART non-adherence, non-suppressed viral load, and loss to follow-up. Our enhanced LLM may outperform traditional machine learning models and zero-shot LLMs. HIV physicians in Tanzania evaluated the model’s predictions and justifications, finding 65% alignment with expert assessments, and 92.3% of the aligned cases were considered clinically relevant. This model can support data-driven decisions and may improve patient outcomes and reduce HIV transmission.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"1 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1038/s41746-025-02336-0
Boya Zhang, Alban Bornet, Rui Yang, Nan Liu, Douglas Teodoro
How do language models use contextual information to answer health questions? How are their responses impacted by conflicting contexts? We assess the ability of language models to reason over long, conflicting biomedical contexts using HealthContradict, an expert-verified dataset comprising 920 unique instances, each consisting of a health-related question, a factual answer supported by scientific evidence, and two documents presenting contradictory stances. We consider several prompt settings, including correct, incorrect or contradictory context, and measure their impact on model outputs. Compared to existing medical question-answering evaluation benchmarks, HealthContradict provides greater distinctions of language models’ contextual reasoning capabilities. Our experiments show that the strength of fine-tuned biomedical language models lies not only in their parametric knowledge from pretraining, but also in their ability to exploit correct context while resisting incorrect context.
{"title":"HealthContradict: Evaluating biomedical knowledge conflicts in language models","authors":"Boya Zhang, Alban Bornet, Rui Yang, Nan Liu, Douglas Teodoro","doi":"10.1038/s41746-025-02336-0","DOIUrl":"https://doi.org/10.1038/s41746-025-02336-0","url":null,"abstract":"How do language models use contextual information to answer health questions? How are their responses impacted by conflicting contexts? We assess the ability of language models to reason over long, conflicting biomedical contexts using HealthContradict, an expert-verified dataset comprising 920 unique instances, each consisting of a health-related question, a factual answer supported by scientific evidence, and two documents presenting contradictory stances. We consider several prompt settings, including correct, incorrect or contradictory context, and measure their impact on model outputs. Compared to existing medical question-answering evaluation benchmarks, HealthContradict provides greater distinctions of language models’ contextual reasoning capabilities. Our experiments show that the strength of fine-tuned biomedical language models lies not only in their parametric knowledge from pretraining, but also in their ability to exploit correct context while resisting incorrect context.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"45 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1038/s41746-026-02375-1
Minghao Du,Ping Shi,Zehao Liu,Yunuo Xu,Xiaoya Liu,Wei Liu,Shuang Liu,Dong Ming
Existing facial-expression studies in children with autism spectrum disorder (ASD) rely mainly on discrete, task-driven measures that overlook the sustained emotional fluctuations and ambiguous expressions in naturalistic interactions. This study quantified atypical facial expression patterns in ASD during spontaneous, unscripted interactions. We analyzed 184 naturalistic video sessions from 99 children with ASD and 85 typically developing (TD) peers and extracted three features capturing spontaneous dynamics: emotion variation (temporal stability of emotional states), expression intensity (magnitude of facial muscle activation), and facial coordination (synchrony across facial muscles). These features integrated holistic and processual representations across coarse- and fine-grained levels, enabling detailed quantification of facial patterns. Compared with TD peers, the ASD group exhibited increased prominence of anger, altered emotion transition probabilities, heightened activation in non-core facial muscles, and atypical facial coordination (p < 0.05). These findings reveal subtle facial dynamics inaccessible to traditional approaches and provide a quantitative explanation for the hard-to-describe atypical expressions. Using the fused feature set, ASD classification reached 92.4% accuracy and 0.977 AUC. Regression analyses further predicted symptom severity with mean absolute errors of 13.94 on the ABC scale and 3.84 on the CABS scale. These quantitative and interpretable markers show promise for large-scale ASD screening in naturalistic settings.
{"title":"Naturalistic facial dynamics enable quantitative clinical assessment of atypical expression phenotypes in children with autism spectrum disorder.","authors":"Minghao Du,Ping Shi,Zehao Liu,Yunuo Xu,Xiaoya Liu,Wei Liu,Shuang Liu,Dong Ming","doi":"10.1038/s41746-026-02375-1","DOIUrl":"https://doi.org/10.1038/s41746-026-02375-1","url":null,"abstract":"Existing facial-expression studies in children with autism spectrum disorder (ASD) rely mainly on discrete, task-driven measures that overlook the sustained emotional fluctuations and ambiguous expressions in naturalistic interactions. This study quantified atypical facial expression patterns in ASD during spontaneous, unscripted interactions. We analyzed 184 naturalistic video sessions from 99 children with ASD and 85 typically developing (TD) peers and extracted three features capturing spontaneous dynamics: emotion variation (temporal stability of emotional states), expression intensity (magnitude of facial muscle activation), and facial coordination (synchrony across facial muscles). These features integrated holistic and processual representations across coarse- and fine-grained levels, enabling detailed quantification of facial patterns. Compared with TD peers, the ASD group exhibited increased prominence of anger, altered emotion transition probabilities, heightened activation in non-core facial muscles, and atypical facial coordination (p < 0.05). These findings reveal subtle facial dynamics inaccessible to traditional approaches and provide a quantitative explanation for the hard-to-describe atypical expressions. Using the fused feature set, ASD classification reached 92.4% accuracy and 0.977 AUC. Regression analyses further predicted symptom severity with mean absolute errors of 13.94 on the ABC scale and 3.84 on the CABS scale. These quantitative and interpretable markers show promise for large-scale ASD screening in naturalistic settings.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"266 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1038/s41746-026-02370-6
Wenbo Li, Bao Wang, Tianzun Li, Yiwen Ma, Haoyong Jin, Jiangli Zhao, Zhiwei Xue, Nan Su, Yanya He, Jiaqi Shi, Xuchen Liu, Xiaoyang Liu, Tianzi Wang, Jiwei Wang, Chao Li, Can Yan, Yang Ma, Qichao Qi, Xinyu Wang, Weiguo Li, Bin Huang, Donghai Wang, Xuelian Wang, Yan Qu, Xingang Li, Chen Qiu, Ning Yang
Cranioplasty is associated with a substantial burden of postoperative complications. In this multicenter study, we developed a machine learning–based clinical decision-support tool to predict the risk of postoperative complications following cranioplasty. A set of nine features was selected for model development. Among the 15 algorithms evaluated, the random forest model demonstrated the best overall performance and was validated on data from both spatial and temporal external cohorts (AUROC = 0.949, internal cross-validation; 0.930, geographical validation; and 0.932, temporal validation). Subgroup analyses by age and sex demonstrated consistently high discriminative performance (lowest AUROC = 0.927) and good calibration (O/E ratio = 1.16, 95% CI: 0.97–1.40). Analysis of causal effects of modifiable intraoperative variables on postoperative complications, with diverse counterfactual explanations and causal inference methods, including double machine learning and the T-learner framework, revealed a protective effect of subcutaneous negative-pressure drainage (ATE = −0.241) and titanium mesh (ATE = −0.191). Finally, we present the model as an accessible web-based tool for individualized, real-time clinical decision-making (http://www.cranioplastycomplicationprediction.top). These findings provide a practical framework for postoperative risk stratification and support the optimization of intraoperative decision-making in cranioplasty.
{"title":"A Causal and interpretable machine learning framework for postcranioplasty risk prediction and surgical decision support","authors":"Wenbo Li, Bao Wang, Tianzun Li, Yiwen Ma, Haoyong Jin, Jiangli Zhao, Zhiwei Xue, Nan Su, Yanya He, Jiaqi Shi, Xuchen Liu, Xiaoyang Liu, Tianzi Wang, Jiwei Wang, Chao Li, Can Yan, Yang Ma, Qichao Qi, Xinyu Wang, Weiguo Li, Bin Huang, Donghai Wang, Xuelian Wang, Yan Qu, Xingang Li, Chen Qiu, Ning Yang","doi":"10.1038/s41746-026-02370-6","DOIUrl":"https://doi.org/10.1038/s41746-026-02370-6","url":null,"abstract":"Cranioplasty is associated with a substantial burden of postoperative complications. In this multicenter study, we developed a machine learning–based clinical decision-support tool to predict the risk of postoperative complications following cranioplasty. A set of nine features was selected for model development. Among the 15 algorithms evaluated, the random forest model demonstrated the best overall performance and was validated on data from both spatial and temporal external cohorts (AUROC = 0.949, internal cross-validation; 0.930, geographical validation; and 0.932, temporal validation). Subgroup analyses by age and sex demonstrated consistently high discriminative performance (lowest AUROC = 0.927) and good calibration (O/E ratio = 1.16, 95% CI: 0.97–1.40). Analysis of causal effects of modifiable intraoperative variables on postoperative complications, with diverse counterfactual explanations and causal inference methods, including double machine learning and the T-learner framework, revealed a protective effect of subcutaneous negative-pressure drainage (ATE = −0.241) and titanium mesh (ATE = −0.191). Finally, we present the model as an accessible web-based tool for individualized, real-time clinical decision-making (http://www.cranioplastycomplicationprediction.top). These findings provide a practical framework for postoperative risk stratification and support the optimization of intraoperative decision-making in cranioplasty.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"39 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-21DOI: 10.1038/s41746-026-02366-2
Hao Zhang, Xindi Liu, Jiayi Wu, Ke Wu, Dong Li, Bin Chen, Hong Wang, Jinyang Liang, Zhanle Lin, Yuping Zheng, Liang Yao
Three-dimensional mapping of retinal microvasculature is essential for monitoring systemic vascular health. Existing methods rely heavily on manual annotation or training-intensive deep learning models that lack generalizability. Here we report RADAR. This is an annotation-free computational framework for the 3D segmentation and quantification of optical coherence tomography angiography data. The pipeline integrates adaptive physics-aware denoising with topology-preserving centerline extraction to reconstruct complex networks without manual labeling. We validated the framework in healthy individuals and patients with early-stage diabetic retinopathy. The method outperformed standard segmentation tools and resolved layer-specific morphological alterations obscured in conventional two-dimensional projections. Quantitative analysis revealed distinct patterns of compensatory remodeling and increased tortuosity in diabetic eyes. RADAR enables precise extraction of volumetric biomarkers including vessel length and branching complexity. It provides a scalable tool for early detection and longitudinal assessment of ocular and systemic vascular diseases.
{"title":"Annotation-free 3D reconstruction and quantification of retinal microvasculature by RADAR","authors":"Hao Zhang, Xindi Liu, Jiayi Wu, Ke Wu, Dong Li, Bin Chen, Hong Wang, Jinyang Liang, Zhanle Lin, Yuping Zheng, Liang Yao","doi":"10.1038/s41746-026-02366-2","DOIUrl":"https://doi.org/10.1038/s41746-026-02366-2","url":null,"abstract":"Three-dimensional mapping of retinal microvasculature is essential for monitoring systemic vascular health. Existing methods rely heavily on manual annotation or training-intensive deep learning models that lack generalizability. Here we report RADAR. This is an annotation-free computational framework for the 3D segmentation and quantification of optical coherence tomography angiography data. The pipeline integrates adaptive physics-aware denoising with topology-preserving centerline extraction to reconstruct complex networks without manual labeling. We validated the framework in healthy individuals and patients with early-stage diabetic retinopathy. The method outperformed standard segmentation tools and resolved layer-specific morphological alterations obscured in conventional two-dimensional projections. Quantitative analysis revealed distinct patterns of compensatory remodeling and increased tortuosity in diabetic eyes. RADAR enables precise extraction of volumetric biomarkers including vessel length and branching complexity. It provides a scalable tool for early detection and longitudinal assessment of ocular and systemic vascular diseases.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"64 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}