NPJ Digital Medicine最新文献_第2页

Independent and collaborative performance of large language models and healthcare professionals in diagnosis and triage. 大型语言模型和医疗保健专业人员在诊断和分类方面的独立和协作性能。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-06 DOI: 10.1038/s41746-026-02409-8

Mingyang Chen, Yijin Wu, Jiayi Ma, Xinhua Jia, Chen Gao, Fanghui Zhao, Youlin Qiao

Large language models (LLMs) show promising diagnostic and triage performance, yet direct comparisons with healthcare professionals (HCPs) and collaborative effects remain limited. We conducted a systematic review and meta-analysis of studies (January 2020 to September 2025) comparing the diagnostic or triage accuracy of LLMs, HCPs, or their collaboration across seven databases. Studies using multiple-choice formats rather than open diagnostic generation were excluded. We extracted top-1, top-3, top-5, and top-10 diagnostic and triage accuracies and pooled results using multilevel random-effects models to account for nested observations. Of 10,398 studies screened, 50 met criteria, evaluating 25 different LLMs across diverse medical specialties. The relative diagnostic accuracy of LLMs versus HCPs progressively improved from 0.89 (95% CI, 0.79-1.00) for top-1 to 0.91 (0.83-1.00) for top-3, 1.04 (0.89-1.22) for top-5, and 1.17 (0.87-1.57) for top-10 diagnoses, with significant model variability. LLM-assisted HCPs outperformed HCPs alone, with relative diagnostic accuracy of 1.13 (1.00-1.27) for top-1, 1.11 (1.01-1.23) for top-3, 1.42 (1.16-1.73) for top-5, and 1.33 (0.94-1.87) for top-10 diagnoses. Triage accuracy was similar between LLMs and HCPs (1.01 [0.94-1.09]). These findings show potential for LLM integration but methodological flaws in studies necessitate rigorous real-world evaluation before clinical implementation.

大型语言模型（llm）显示出有希望的诊断和分类性能，但与医疗保健专业人员（hcp）和协作效果的直接比较仍然有限。我们对研究进行了系统回顾和荟萃分析（2020年1月至2025年9月），比较了llm、hcp或他们在7个数据库中的合作的诊断或分诊准确性。采用多项选择格式而非开放式诊断生成的研究被排除在外。我们提取了前1名、前3名、前5名和前10名的诊断和分诊准确率，并使用多层随机效应模型汇总了结果，以解释嵌套的观察结果。在筛选的10,398项研究中，有50项符合标准，评估了不同医学专业的25个不同的法学硕士。LLMs与HCPs的相对诊断准确性逐渐提高，从前1名的0.89 （95% CI, 0.79-1.00）提高到前3名的0.91(0.83-1.00)，前5名的1.04(0.89-1.22)，前10名的1.17(0.87-1.57)，具有显著的模型变异性。llm辅助的HCPs优于单独的HCPs， top-1的相对诊断准确率为1.13 (1.00-1.27)，top-3的相对诊断准确率为1.11 (1.01-1.23)，top-5的相对诊断准确率为1.42 (1.16-1.73)，top-10的相对诊断准确率为1.33（0.94-1.87）。llm和HCPs的分诊准确率相似（1.01[0.94-1.09]）。这些发现显示了法学硕士整合的潜力，但研究方法上的缺陷需要在临床实施之前进行严格的实际评估。

{"title":"Independent and collaborative performance of large language models and healthcare professionals in diagnosis and triage.","authors":"Mingyang Chen, Yijin Wu, Jiayi Ma, Xinhua Jia, Chen Gao, Fanghui Zhao, Youlin Qiao","doi":"10.1038/s41746-026-02409-8","DOIUrl":"https://doi.org/10.1038/s41746-026-02409-8","url":null,"abstract":"Large language models (LLMs) show promising diagnostic and triage performance, yet direct comparisons with healthcare professionals (HCPs) and collaborative effects remain limited. We conducted a systematic review and meta-analysis of studies (January 2020 to September 2025) comparing the diagnostic or triage accuracy of LLMs, HCPs, or their collaboration across seven databases. Studies using multiple-choice formats rather than open diagnostic generation were excluded. We extracted top-1, top-3, top-5, and top-10 diagnostic and triage accuracies and pooled results using multilevel random-effects models to account for nested observations. Of 10,398 studies screened, 50 met criteria, evaluating 25 different LLMs across diverse medical specialties. The relative diagnostic accuracy of LLMs versus HCPs progressively improved from 0.89 (95% CI, 0.79-1.00) for top-1 to 0.91 (0.83-1.00) for top-3, 1.04 (0.89-1.22) for top-5, and 1.17 (0.87-1.57) for top-10 diagnoses, with significant model variability. LLM-assisted HCPs outperformed HCPs alone, with relative diagnostic accuracy of 1.13 (1.00-1.27) for top-1, 1.11 (1.01-1.23) for top-3, 1.42 (1.16-1.73) for top-5, and 1.33 (0.94-1.87) for top-10 diagnoses. Triage accuracy was similar between LLMs and HCPs (1.01 [0.94-1.09]). These findings show potential for LLM integration but methodological flaws in studies necessitate rigorous real-world evaluation before clinical implementation.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Physics constrained graph neural network for real time prediction of intracranial aneurysm hemodynamics. 用于颅内动脉瘤血流动力学实时预测的物理约束图神经网络。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-06 DOI: 10.1038/s41746-026-02404-z

Vincent Lannelongue, Paul Garnier, Pablo Jeken-Rico, Aurèle Goetz, Philippe Meliga, Yves Chau, Elie Hachem

Intracranial aneurysms (IAs) are life-threatening vascular conditions requiring accurate risk assessment to guide treatment. Hemodynamic biomarkers such as wall shear stress and oscillatory shear index are promising predictors of rupture risk but remain underused clinically due to the high computational cost of traditional CFD methods. We propose a physics-constrained graph neural network (GNN) framework trained on high-fidelity CFD data to predict full 3D, time-resolved hemodynamic fields throughout the cardiac cycle. Our model incorporates enhanced node features and physics-based constraints to capture complex spatio-temporal flow behavior in near real time. It generalizes to varying inflow conditions and unseen patient-specific geometries with no fine-tuning. Additionally, we release a benchmark dataset of 105 patient-derived aneurysm geometries with CFD fields to support the machine learning (ML) community. This is the first GNN model applied to transient 3D aneurysmal flow prediction, paving the way for rapid, AI-driven hemodynamic analysis toward risk stratification and treatment planning.

颅内动脉瘤（IAs）是危及生命的血管疾病，需要准确的风险评估来指导治疗。壁面剪切应力和振荡剪切指数等血流动力学生物标志物是很有希望预测破裂风险的指标，但由于传统CFD方法的高计算成本，在临床上仍未得到充分应用。我们提出了一个基于高保真CFD数据训练的物理约束图神经网络（GNN）框架，用于预测整个心脏周期的全3D、时间分辨血流动力学场。我们的模型结合了增强的节点特征和基于物理的约束，以近乎实时地捕获复杂的时空流行为。它适用于不同的流入条件和不可见的患者特定几何形状，无需微调。此外，我们还发布了一个包含105个患者衍生动脉瘤几何形状的基准数据集，其中包含CFD字段，以支持机器学习（ML）社区。这是第一个应用于瞬态三维动脉瘤血流预测的GNN模型，为快速、人工智能驱动的血流动力学分析铺平了道路，从而实现风险分层和治疗计划。

{"title":"Physics constrained graph neural network for real time prediction of intracranial aneurysm hemodynamics.","authors":"Vincent Lannelongue, Paul Garnier, Pablo Jeken-Rico, Aurèle Goetz, Philippe Meliga, Yves Chau, Elie Hachem","doi":"10.1038/s41746-026-02404-z","DOIUrl":"https://doi.org/10.1038/s41746-026-02404-z","url":null,"abstract":"Intracranial aneurysms (IAs) are life-threatening vascular conditions requiring accurate risk assessment to guide treatment. Hemodynamic biomarkers such as wall shear stress and oscillatory shear index are promising predictors of rupture risk but remain underused clinically due to the high computational cost of traditional CFD methods. We propose a physics-constrained graph neural network (GNN) framework trained on high-fidelity CFD data to predict full 3D, time-resolved hemodynamic fields throughout the cardiac cycle. Our model incorporates enhanced node features and physics-based constraints to capture complex spatio-temporal flow behavior in near real time. It generalizes to varying inflow conditions and unseen patient-specific geometries with no fine-tuning. Additionally, we release a benchmark dataset of 105 patient-derived aneurysm geometries with CFD fields to support the machine learning (ML) community. This is the first GNN model applied to transient 3D aneurysmal flow prediction, paving the way for rapid, AI-driven hemodynamic analysis toward risk stratification and treatment planning.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wearable EEG devices in the detection of mild cognitive impairment: a systematic review. 可穿戴脑电图设备在轻度认知障碍检测中的应用：系统综述。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-06 DOI: 10.1038/s41746-026-02342-w

Chanchan He, Xiru Yu, Yuhe Zhang, Yuanning Li, Nan Jiang

Wearable electroencephalography (EEG) devices are miniaturized, portable, and wireless systems for long-term brain monitoring, demonstrating significant potential as accessible mild cognitive impairment (MCI) screening tools based on objective neurophysiological biomarkers. However, their performance in MCI detection remains unclear, and their translation to real-world applications faces several challenges. This study aimed to comprehensively evaluate wearable EEG for MCI detection, identify key characteristics that optimize classification performance and usability, and address gaps in effective design implementation. We conducted a systematic search across seven databases, screening 1562 records and analyzing 21 studies that examined 16 distinct wearable EEG devices for MCI detection. The results revealed considerable variation in classification accuracy (range: 46-95%). A system-level analysis of the entire wearable EEG system and data flow identified seven critical factors that optimize the trade-off between diagnostic performance, portability, and affordability: (1) moderate channel density; (2) frontal and parietal electrode placement; (3) elderly-friendly multi-domain cognitive tasks; (4) adaptive signal preprocessing; (5) multi-domain feature extraction; (6) ensemble classifiers; and (7) multimodal integration. Additionally, methodological considerations for future wearable EEG-based MCI detection research include: (1) standardize MCI diagnostic frameworks; (2) increase sample diversity; (3) optimizing device usability and technical specifications; (4) standardize recording protocols; (5) harmonizing data processing pipelines; (6) validate in real-world settings; (7) assess cost-effectiveness; and (8) implement comprehensive reporting guidelines. These insights enable further translational applications of wearable EEG-based MCI detection and provide a foundation for developing user-friendly systems that could transform early cognitive impairment screening in community and primary care settings.

可穿戴式脑电图（EEG）设备是用于长期大脑监测的小型化、便携式和无线系统，显示出基于客观神经生理生物标志物的可获得的轻度认知障碍（MCI）筛查工具的巨大潜力。然而，它们在MCI检测中的性能仍然不清楚，并且它们在实际应用中的转化面临着一些挑战。本研究旨在全面评估可穿戴EEG用于MCI检测，确定优化分类性能和可用性的关键特征，并解决有效设计实施中的空白。我们对7个数据库进行了系统搜索，筛选了1562条记录，并分析了21项研究，这些研究检查了16种不同的用于MCI检测的可穿戴脑电图设备。结果显示在分类精度上有相当大的差异（范围：46-95%）。对整个可穿戴脑电图系统和数据流的系统级分析确定了七个关键因素，以优化诊断性能、便携性和可负担性之间的权衡：(1)适度的通道密度；(2)额、顶叶电极放置；(3)老年人友好型多领域认知任务；(4)自适应信号预处理；(5)多域特征提取；(6)集成分类器；(7)多模态集成。此外，未来基于可穿戴脑电图的MCI检测研究的方法学考虑包括：(1)标准化MCI诊断框架；(2)增加样本多样性；(3)优化设备可用性和技术指标；(4)规范记录协议；(5)协调数据处理管道；(6)在现实环境中进行验证；(7)评估成本效益；(8)实施全面的报告准则。这些见解使基于脑电图的可穿戴式MCI检测能够进一步转化应用，并为开发用户友好的系统提供基础，这些系统可以改变社区和初级保健环境中的早期认知障碍筛查。

{"title":"Wearable EEG devices in the detection of mild cognitive impairment: a systematic review.","authors":"Chanchan He, Xiru Yu, Yuhe Zhang, Yuanning Li, Nan Jiang","doi":"10.1038/s41746-026-02342-w","DOIUrl":"https://doi.org/10.1038/s41746-026-02342-w","url":null,"abstract":"Wearable electroencephalography (EEG) devices are miniaturized, portable, and wireless systems for long-term brain monitoring, demonstrating significant potential as accessible mild cognitive impairment (MCI) screening tools based on objective neurophysiological biomarkers. However, their performance in MCI detection remains unclear, and their translation to real-world applications faces several challenges. This study aimed to comprehensively evaluate wearable EEG for MCI detection, identify key characteristics that optimize classification performance and usability, and address gaps in effective design implementation. We conducted a systematic search across seven databases, screening 1562 records and analyzing 21 studies that examined 16 distinct wearable EEG devices for MCI detection. The results revealed considerable variation in classification accuracy (range: 46-95%). A system-level analysis of the entire wearable EEG system and data flow identified seven critical factors that optimize the trade-off between diagnostic performance, portability, and affordability: (1) moderate channel density; (2) frontal and parietal electrode placement; (3) elderly-friendly multi-domain cognitive tasks; (4) adaptive signal preprocessing; (5) multi-domain feature extraction; (6) ensemble classifiers; and (7) multimodal integration. Additionally, methodological considerations for future wearable EEG-based MCI detection research include: (1) standardize MCI diagnostic frameworks; (2) increase sample diversity; (3) optimizing device usability and technical specifications; (4) standardize recording protocols; (5) harmonizing data processing pipelines; (6) validate in real-world settings; (7) assess cost-effectiveness; and (8) implement comprehensive reporting guidelines. These insights enable further translational applications of wearable EEG-based MCI detection and provide a foundation for developing user-friendly systems that could transform early cognitive impairment screening in community and primary care settings.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Development of deep learning model to screen for primary open-angle glaucoma in African ancestry individuals. 开发深度学习模型筛选原发性开角型青光眼的非洲血统个体。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-06 DOI: 10.1038/s41746-025-02318-2

Shuo Li, Rebecca Salowe, Roy Lee, Gui-Shuang Ying, Insup Lee, Joan O'Brien, Osbert Bastani

Primary open-angle glaucoma (POAG) screening using artificial intelligence (AI) has emerged as a transformative method to identify undiagnosed disease. African ancestry individuals are under-represented in current datasets for AI models, despite being disproportionally affected by this blinding disease. We developed a deep learning model that screens for POAG using fundus photography from Primary Open-Angle African American Glaucoma Genetics (POAAGG) subjects (n = 64,129 images, including 42,914 images from 1782 cases and 21,215 images from 682 controls). Our final diagnosis pipeline is as follows: (1) select the six most informative images from single timepoint using a Binary Classifier, (2) predict POAG probability from each image using Vision-Transformer, (3) make final POAG predictions by averaging predicted probabilities across selected images (AUC = 0.925). The model was evaluated on the REFUGE-1 dataset of Chinese ancestry individuals (AUC = 0.920). Our model has applications to POAG screening in public settings such as primary care offices, as well as low-resource settings.

利用人工智能（AI）筛查原发性开角型青光眼（POAG）已经成为一种识别未确诊疾病的革命性方法。非洲血统的个体在目前人工智能模型的数据集中代表性不足，尽管受到这种致盲疾病的不成比例的影响。我们开发了一个深度学习模型，使用原发性开角非洲裔美国人青光眼遗传学（POAAGG）受试者的眼底摄影筛查POAG （n = 64129张图像，包括来自1782例病例的42914张图像和来自682例对照的21215张图像）。我们的最终诊断流程如下：(1)使用二元分类器从单个时间点选择六张信息量最大的图像，(2)使用Vision-Transformer从每张图像中预测POAG的概率，(3)通过对所选图像的预测概率进行平均（AUC = 0.925）来做出最终的POAG预测。在中国血统个体的REFUGE-1数据集上对模型进行了评价（AUC = 0.920）。我们的模型适用于公共场所的POAG筛查，如初级保健办公室，以及低资源环境。

{"title":"Development of deep learning model to screen for primary open-angle glaucoma in African ancestry individuals.","authors":"Shuo Li, Rebecca Salowe, Roy Lee, Gui-Shuang Ying, Insup Lee, Joan O'Brien, Osbert Bastani","doi":"10.1038/s41746-025-02318-2","DOIUrl":"https://doi.org/10.1038/s41746-025-02318-2","url":null,"abstract":"Primary open-angle glaucoma (POAG) screening using artificial intelligence (AI) has emerged as a transformative method to identify undiagnosed disease. African ancestry individuals are under-represented in current datasets for AI models, despite being disproportionally affected by this blinding disease. We developed a deep learning model that screens for POAG using fundus photography from Primary Open-Angle African American Glaucoma Genetics (POAAGG) subjects (n = 64,129 images, including 42,914 images from 1782 cases and 21,215 images from 682 controls). Our final diagnosis pipeline is as follows: (1) select the six most informative images from single timepoint using a Binary Classifier, (2) predict POAG probability from each image using Vision-Transformer, (3) make final POAG predictions by averaging predicted probabilities across selected images (AUC = 0.925). The model was evaluated on the REFUGE-1 dataset of Chinese ancestry individuals (AUC = 0.920). Our model has applications to POAG screening in public settings such as primary care offices, as well as low-resource settings.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multidisciplinary prediction of running-related injuries using machine learning. 运用机器学习多学科预测跑步相关损伤。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-06 DOI: 10.1038/s41746-026-02413-y

Han Wu, Katherine Brooke-Wavell, Michael R Barnes, Zainab Awan, Sarabjit Mastana, Sam Allen, Richard C Blagrove

The causes of endurance running-related injury (RRI) are multifactorial, yet little research has been conducted which utilizes multidisciplinary risk factors for individualized RRI prediction. This paper presents a machine learning (ML)-ready RRI weekly prediction dataset using evidence-based multidisciplinary risk factors. Risk factors in genetic single-nucleotide polymorphisms, history, muscular strength, biomechanics, body composition, nutrition, and training were collected from competitive endurance runners (n = 142), who were prospectively monitored for 12 months for RRIs, accumulating 6181 weekly samples. ML models were fitted using (i) risk factors with high-level supporting evidence, and (ii) a broader range of risk factors to establish a performance baseline. Model performance (AUC = 0.784 ± 0.014) showed moderate improvement compared to previous RRI prediction modeling. Random forest achieved the best performance (AUC = 0.781 ± 0.016, 0.784 ± 0.014), which was significantly higher (q < 0.05) than most other algorithms. Only logistic regression achieved significantly improved (q < 0.05) performance when trained using a broader range of risk factors compared to a selection of high-quality risk factors. This study introduces a reproducible methodological framework for future ML sports injury prediction research and a valuable dataset for pooling in larger-scale analytics. Comparisons among different ML methods revealed nuanced insights into the interaction between data structure and model suitability.

耐力跑相关损伤（RRI）的原因是多因素的，但利用多学科危险因素进行个体化RRI预测的研究很少。本文提出了一个机器学习（ML）就绪的RRI每周预测数据集，使用基于证据的多学科风险因素。研究收集了142名耐力赛跑运动员的遗传单核苷酸多态性、病史、肌肉力量、生物力学、身体组成、营养和训练等危险因素，并对这些运动员进行了为期12个月的RRIs监测，每周收集6181份样本。ML模型使用(i)具有高水平支持证据的风险因素，以及（ii）更广泛的风险因素来建立性能基线。与之前的RRI预测模型相比，模型性能（AUC = 0.784±0.014）有中度改善。随机森林表现最佳（AUC = 0.781±0.016,0.784±0.014），显著高于随机森林(q

{"title":"Multidisciplinary prediction of running-related injuries using machine learning.","authors":"Han Wu, Katherine Brooke-Wavell, Michael R Barnes, Zainab Awan, Sarabjit Mastana, Sam Allen, Richard C Blagrove","doi":"10.1038/s41746-026-02413-y","DOIUrl":"https://doi.org/10.1038/s41746-026-02413-y","url":null,"abstract":"The causes of endurance running-related injury (RRI) are multifactorial, yet little research has been conducted which utilizes multidisciplinary risk factors for individualized RRI prediction. This paper presents a machine learning (ML)-ready RRI weekly prediction dataset using evidence-based multidisciplinary risk factors. Risk factors in genetic single-nucleotide polymorphisms, history, muscular strength, biomechanics, body composition, nutrition, and training were collected from competitive endurance runners (n = 142), who were prospectively monitored for 12 months for RRIs, accumulating 6181 weekly samples. ML models were fitted using (i) risk factors with high-level supporting evidence, and (ii) a broader range of risk factors to establish a performance baseline. Model performance (AUC = 0.784 ± 0.014) showed moderate improvement compared to previous RRI prediction modeling. Random forest achieved the best performance (AUC = 0.781 ± 0.016, 0.784 ± 0.014), which was significantly higher (q < 0.05) than most other algorithms. Only logistic regression achieved significantly improved (q < 0.05) performance when trained using a broader range of risk factors compared to a selection of high-quality risk factors. This study introduces a reproducible methodological framework for future ML sports injury prediction research and a valuable dataset for pooling in larger-scale analytics. Comparisons among different ML methods revealed nuanced insights into the interaction between data structure and model suitability.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies. 一个弱监督变压器罕见疾病诊断和亚表型从电子病历与肺部病例研究。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-06 DOI: 10.1038/s41746-026-02406-x

Kimberly F Greco, Zongxin Yang, Mengyan Li, Han Tong, Sara Morini Sweet, Alon Geva, Kenneth D Mandl, Benjamin A Raby, Tianxi Cai

Rare diseases affect an estimated 300-400 million people worldwide, yet individual conditions remain underdiagnosed and poorly characterized due to low prevalence and limited clinician familiarity. Computational phenotyping offers a scalable approach to improving rare disease detection, but algorithm development is constrained by scarce high-quality labeled data. Expert-labeled datasets from chart reviews and registries are highly accurate but limited in scope, whereas labels derived from electronic health records (EHRs) provide broader coverage but are often noisy or incomplete. To efficiently leverage both sources, we propose WEST (WEakly Supervised Transformer) for rare disease diagnosis and subphenotyping from EHRs. At its core, WEST employs a weakly supervised transformer trained on a limited set of expert-validated labels and extensive probabilistic silver-standard labels-derived from structured and unstructured EHR features-that are iteratively refined across training rounds to improve model calibration. We evaluate WEST on two rare pulmonary conditions using EHR data from Boston Children's Hospital and show that it outperforms existing methods in phenotype classification, identification of clinically relevant subphenotypes, and prediction of disease progression. By reducing reliance on manual annotation, WEST enables label-efficient representation learning that supports accurate rare disease diagnosis and reveals deeper clinical insights from routine EHR data.

全世界估计有3 -4亿人患有罕见病，但由于患病率低和临床医生熟悉程度有限，个别疾病仍未得到充分诊断和特征不明确。计算表型为改善罕见病检测提供了一种可扩展的方法，但算法开发受到稀缺的高质量标记数据的限制。来自图表审查和注册表的专家标记数据集非常准确，但范围有限，而来自电子健康记录（EHRs）的标签提供更广泛的覆盖范围，但通常是嘈杂或不完整的。为了有效地利用这两种资源，我们提出了WEST（弱监督变压器）用于罕见疾病诊断和从电子病历中进行亚表型分析。在其核心，WEST采用了一个弱监督的变压器，该变压器在一组有限的专家验证标签和广泛的概率银标准标签上进行训练，这些标签来自结构化和非结构化的EHR特征，这些标签在训练轮次中迭代改进，以改进模型校准。我们利用波士顿儿童医院的电子病历数据评估了WEST在两种罕见肺部疾病中的应用，并表明它在表型分类、临床相关亚表型识别和疾病进展预测方面优于现有方法。通过减少对手动注释的依赖，WEST实现了标签高效的表示学习，支持准确的罕见疾病诊断，并从常规EHR数据中揭示更深入的临床见解。

{"title":"A weakly supervised transformer for rare disease diagnosis and subphenotyping from EHRs with pulmonary case studies.","authors":"Kimberly F Greco, Zongxin Yang, Mengyan Li, Han Tong, Sara Morini Sweet, Alon Geva, Kenneth D Mandl, Benjamin A Raby, Tianxi Cai","doi":"10.1038/s41746-026-02406-x","DOIUrl":"https://doi.org/10.1038/s41746-026-02406-x","url":null,"abstract":"Rare diseases affect an estimated 300-400 million people worldwide, yet individual conditions remain underdiagnosed and poorly characterized due to low prevalence and limited clinician familiarity. Computational phenotyping offers a scalable approach to improving rare disease detection, but algorithm development is constrained by scarce high-quality labeled data. Expert-labeled datasets from chart reviews and registries are highly accurate but limited in scope, whereas labels derived from electronic health records (EHRs) provide broader coverage but are often noisy or incomplete. To efficiently leverage both sources, we propose WEST (WEakly Supervised Transformer) for rare disease diagnosis and subphenotyping from EHRs. At its core, WEST employs a weakly supervised transformer trained on a limited set of expert-validated labels and extensive probabilistic silver-standard labels-derived from structured and unstructured EHR features-that are iteratively refined across training rounds to improve model calibration. We evaluate WEST on two rare pulmonary conditions using EHR data from Boston Children's Hospital and show that it outperforms existing methods in phenotype classification, identification of clinically relevant subphenotypes, and prediction of disease progression. By reducing reliance on manual annotation, WEST enables label-efficient representation learning that supports accurate rare disease diagnosis and reveals deeper clinical insights from routine EHR data.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146132789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting isolated REM sleep behavior disorder at home using a lower-back wearable sensor. 使用下背部可穿戴传感器在家中检测孤立的快速眼动睡眠行为障碍。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-05 DOI: 10.1038/s41746-026-02412-z

Tal Tzfoni, Riva Tauman, Jeffrey M Hausdorff, Yael Hanein, Anat Mirelman

Isolated REM Sleep Behavior Disorder (iRBD) is a strong predictor of neurodegenerative diseases, particularly synucleinopathies. Current diagnosis requires overnight video-polysomnography (vPSG) in sleep laboratories. Limited access to vPSG and differences in sleep habits result in diagnostic challenges. Here we aimed to evaluate the feasibility of identifying iRBD from a lumbar-mounted wearable sensor in the home setting and explored night-to-night variability. Seventy-three participants (15 iRBD, 58 controls) underwent vPSG, followed by six nights of wearing a lower-back inertial measurement unit at home. iRBD participants showed distinct mobility patterns compared to controls. Machine learning models were trained on mobility features and classified iRBD with high sensitivity and moderate specificity. Performance improved with increased nights, plateauing at five nights recorded at home. Principal component analysis identified substantial differences between lab and home data. Our findings suggest that lumbar-mounted wearables can support sensitive, multi-night home-based detection of nocturnal motor patterns associated with iRBD, with potential utility as part of a staged screening approach and for enriching cohorts for further evaluation.

孤立的快速眼动睡眠行为障碍（iRBD）是神经退行性疾病，特别是突触核蛋白病的一个强有力的预测因子。目前的诊断需要在睡眠实验室进行夜间视频多导睡眠图（vPSG）。对vPSG的限制和睡眠习惯的差异导致了诊断上的挑战。在这里，我们的目的是评估在家庭环境中通过腰装可穿戴传感器识别iRBD的可行性，并探索夜间的可变性。73名参与者（15名iRBD， 58名对照组）接受了vPSG，随后在家中佩戴下背部惯性测量装置6晚。与对照组相比，iRBD参与者表现出明显的活动模式。机器学习模型根据移动性特征进行训练，并以高灵敏度和中等特异性对iRBD进行分类。成绩随着夜间记录的增加而提高，在家中记录的五晚达到稳定。主成分分析确定了实验室和家庭数据之间的实质性差异。我们的研究结果表明，腰载可穿戴设备可以支持与iRBD相关的夜间运动模式的敏感、多夜家庭检测，作为分阶段筛查方法的一部分，具有潜在的实用性，并可丰富进一步评估的队列。

{"title":"Detecting isolated REM sleep behavior disorder at home using a lower-back wearable sensor.","authors":"Tal Tzfoni, Riva Tauman, Jeffrey M Hausdorff, Yael Hanein, Anat Mirelman","doi":"10.1038/s41746-026-02412-z","DOIUrl":"https://doi.org/10.1038/s41746-026-02412-z","url":null,"abstract":"Isolated REM Sleep Behavior Disorder (iRBD) is a strong predictor of neurodegenerative diseases, particularly synucleinopathies. Current diagnosis requires overnight video-polysomnography (vPSG) in sleep laboratories. Limited access to vPSG and differences in sleep habits result in diagnostic challenges. Here we aimed to evaluate the feasibility of identifying iRBD from a lumbar-mounted wearable sensor in the home setting and explored night-to-night variability. Seventy-three participants (15 iRBD, 58 controls) underwent vPSG, followed by six nights of wearing a lower-back inertial measurement unit at home. iRBD participants showed distinct mobility patterns compared to controls. Machine learning models were trained on mobility features and classified iRBD with high sensitivity and moderate specificity. Performance improved with increased nights, plateauing at five nights recorded at home. Principal component analysis identified substantial differences between lab and home data. Our findings suggest that lumbar-mounted wearables can support sensitive, multi-night home-based detection of nocturnal motor patterns associated with iRBD, with potential utility as part of a staged screening approach and for enriching cohorts for further evaluation.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantification of PET activation in adipose tissue from non-contrast CT scans. 非对比CT扫描脂肪组织中PET激活的定量。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-05 DOI: 10.1038/s41746-026-02392-0

Carlos Cano-Espinosa, Michael W Subrize, Elisa Franquet, Aaron M Cypess, Gerald Kolodny, George R Washko, Raúl San José Estépar

Brown adipose tissue (BAT) plays a key role in energy metabolism and cardiometabolic health. Its detection typically relies on 18F-FDG PET, which is costly, radiation-intensive, and impractical for large-scale screening. We propose a deep learning model to estimate regional metabolic activity in adipose tissue from standard non-contrast CT, enabling PET-like insights without radiotracers. Using paired PET/CT data from two independent cohorts, we trained a conditional Generative Adversarial Network (cGAN) to predict standardized uptake values (SUV) within adipose regions identified on CT. The network included a fat-focused loss function to enhance metabolic signal estimation. Predicted activations showed strong agreement with PET-derived values and were reproducible across anatomical regions and datasets. This method provides a radiation-sparing alternative for assessing adipose metabolic activity in clinical and research settings and it could support population-based studies of BAT, metabolic health, and disease progression using routine chest CT scans without additional imaging burden.

棕色脂肪组织（BAT）在能量代谢和心脏代谢健康中起着关键作用。它的检测通常依赖于18F-FDG PET，这是昂贵的，辐射密集的，并且不适合大规模筛查。我们提出了一种深度学习模型，通过标准的非对比CT来估计脂肪组织的区域代谢活性，从而在没有放射性示踪剂的情况下获得类似pet的见解。使用来自两个独立队列的成对PET/CT数据，我们训练了一个条件生成对抗网络（cGAN）来预测CT上识别的脂肪区域的标准化摄取值（SUV）。该网络包括一个以脂肪为中心的损失函数，以增强代谢信号的估计。预测的激活显示与pet衍生值非常一致，并且跨解剖区域和数据集可重复。该方法为临床和研究环境中评估脂肪代谢活动提供了一种节省辐射的替代方法，它可以支持基于人群的BAT、代谢健康和疾病进展的研究，使用常规胸部CT扫描，无需额外的成像负担。

{"title":"Quantification of PET activation in adipose tissue from non-contrast CT scans.","authors":"Carlos Cano-Espinosa, Michael W Subrize, Elisa Franquet, Aaron M Cypess, Gerald Kolodny, George R Washko, Raúl San José Estépar","doi":"10.1038/s41746-026-02392-0","DOIUrl":"https://doi.org/10.1038/s41746-026-02392-0","url":null,"abstract":"Brown adipose tissue (BAT) plays a key role in energy metabolism and cardiometabolic health. Its detection typically relies on 18F-FDG PET, which is costly, radiation-intensive, and impractical for large-scale screening. We propose a deep learning model to estimate regional metabolic activity in adipose tissue from standard non-contrast CT, enabling PET-like insights without radiotracers. Using paired PET/CT data from two independent cohorts, we trained a conditional Generative Adversarial Network (cGAN) to predict standardized uptake values (SUV) within adipose regions identified on CT. The network included a fat-focused loss function to enhance metabolic signal estimation. Predicted activations showed strong agreement with PET-derived values and were reproducible across anatomical regions and datasets. This method provides a radiation-sparing alternative for assessing adipose metabolic activity in clinical and research settings and it could support population-based studies of BAT, metabolic health, and disease progression using routine chest CT scans without additional imaging burden.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146126020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application and prospect of artificial intelligence in diagnostic imaging of prostate cancer. 人工智能在前列腺癌诊断成像中的应用与展望。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-05 DOI: 10.1038/s41746-026-02354-6

Xiaoxiao Wang, Shan Zhong, Kun Fang, Yangchun Du, Jianlin Huang

Prostate cancer is a leading cause of male cancer mortality, and early, accurate diagnosis is critical. Artificial intelligence (AI), including machine learning, deep learning, and radiomics, enhances detection, characterization, and treatment assessment across TRUS, mp-MRI, and PSMA PET/CT. AI models achieve high accuracy, often matching experts, improving small-lesion detection, and supporting risk stratification. Challenges remain in data quality, generalization, clinical integration, and ethics, with future prospects in multi-omics, explainable AI, and workflow-embedded decision support.

前列腺癌是男性癌症死亡的主要原因，早期、准确的诊断至关重要。人工智能（AI），包括机器学习、深度学习和放射组学，增强了TRUS、mp-MRI和PSMA PET/CT的检测、表征和治疗评估。人工智能模型实现了很高的准确性，通常可以匹配专家，改进小病变检测，并支持风险分层。在数据质量、泛化、临床整合和伦理方面仍然存在挑战，未来的前景是多组学、可解释的人工智能和嵌入工作流的决策支持。

引用次数: 0

Neck-to-knee dixon MRI thigh volume as a superior mass biomarker for Sarcopenia: evidence from the UK biobank. 颈部到膝盖的dixon MRI大腿体积作为肌肉减少症的优越肿块生物标志物：来自英国生物银行的证据。

IF 15.1 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

NPJ Digital Medicine

Pub Date : 2026-02-05 DOI: 10.1038/s41746-026-02379-x

Hyeon Su Kim, Hyunwoo Park, Junseok Kang, Hyunbin Kim, Bonsang Gu, Bhola Shivam, Jun-Il Yoo

Sarcopenia assessment requires biomarkers capturing muscle-specific strength beyond single-slice measurements. We developed an automated MRI framework segmenting 27 pelvic-thigh musculoskeletal structures to investigate muscle distribution as functional biomarkers. Among 37,004 UK Biobank participants (64.5 ± 7.9 years), transformer-based segmentation achieved Dice similarity coefficient of 0.896. Dixon MRI-derived thigh muscle volume showed exceptional DEXA concordance (r = 0.936). Posterior/anterior (P/A) muscle ratio independently predicted adverse outcomes: weak grip strength (OR 1.60, 95%CI 1.45-1.77), sarcopenia (OR 1.42, 95%CI 1.13-1.78), mortality (OR 1.49, 95%CI 1.23-1.81), and falls (OR 1.12, 95%CI 1.05-1.20), all p < 0.005, while left/right asymmetry showed no associations. Automated MRI phenotyping reveals muscle distribution patterns, particularly reduced anterior compartment volume, predict functional decline independent of total muscle mass, supporting evolution toward composition-aware sarcopenia criteria.

肌肉减少症的评估需要生物标志物捕捉肌肉特异性强度，而不是单片测量。我们开发了一个自动MRI框架，分割27个骨盆-大腿肌肉骨骼结构，以研究肌肉分布作为功能生物标志物。在37,004名UK Biobank参与者（64.5±7.9岁）中，基于变压器的分割获得了0.896的Dice相似系数。Dixon mri衍生的大腿肌肉体积显示异常的DEXA一致性（r = 0.936）。后/前（P/A）肌比独立预测不良结局：握力弱（OR 1.60, 95%CI 1.45-1.77）、肌肉减少（OR 1.42, 95%CI 1.13-1.78）、死亡率（OR 1.49, 95%CI 1.23-1.81）和跌倒（OR 1.12, 95%CI 1.05-1.20），均为P

引用次数: 0