首页 > 最新文献

NPJ Digital Medicine最新文献

英文 中文
Multimodal deep learning with anatomically constrained attention for screening MRI-detectable TMJ abnormalities from panoramic images 多模态深度学习与解剖学约束注意力筛选mri可检测颞下颌关节异常全景图像
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-23 DOI: 10.1038/s41746-026-02378-y
Hyo-Jung Jung, Dayun Ju, Chanyoung Kim, Seong Jae Hwang, Chena Lee, Younjung Park
{"title":"Multimodal deep learning with anatomically constrained attention for screening MRI-detectable TMJ abnormalities from panoramic images","authors":"Hyo-Jung Jung, Dayun Ju, Chanyoung Kim, Seong Jae Hwang, Chena Lee, Younjung Park","doi":"10.1038/s41746-026-02378-y","DOIUrl":"https://doi.org/10.1038/s41746-026-02378-y","url":null,"abstract":"","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"85 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146032789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sex disparities in deep learning estimation of ejection fraction from cardiac magnetic resonance imaging. 心脏磁共振成像射血分数深度学习估计中的性别差异。
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-23 DOI: 10.1038/s41746-025-02330-6
Dhamanpreet Kaur,Rohan Shad,Abhinav Kumar,Mrudang Mathur,Joseph Cho,Robyn Fong,Cyril Zakka,Curran Phillips,William Hiesinger
The advent of artificial intelligence in cardiovascular imaging holds immense potential for earlier diagnoses, precision medicine, and improved disease management. However, the presence of sex-based disparities and strategies to mitigate biases in deep learning models for cardiac imaging remain understudied. In this study, we analyzed algorithmic bias in a foundation model that was pretrained on cardiac magnetic resonance imaging and radiology reports from multiple institutes and finetuned to estimate ejection fraction (EF) on the UK Biobank dataset. The model performed significantly worse in EF estimation for females than males in the diagnosis of reduced EF. Algorithmic fairness did not improve despite masking of protected attributes in radiology reports and data resampling, although explicit input of sex in model finetuning may improve EF estimation in some cases. The underdiagnosis of reduced EF among females holds critical implications for the exacerbation of existing sex-based disparities in cardiovascular health. We advise caution in the development of models for cardiovascular imaging to avoid such pitfalls.
人工智能在心血管成像领域的出现为早期诊断、精准医疗和改善疾病管理提供了巨大的潜力。然而,在心脏成像的深度学习模型中,性别差异的存在和减轻偏见的策略仍未得到充分研究。在这项研究中,我们分析了一个基础模型的算法偏差,该模型是根据多个研究所的心脏磁共振成像和放射学报告进行预训练的,并对其进行微调,以估计英国生物银行数据集中的射血分数(EF)。该模型在诊断EF减少时,对女性的EF估计明显低于男性。尽管在放射学报告和数据重采样中屏蔽了受保护的属性,但算法的公平性并未得到改善,尽管在模型微调中明确输入性别可能在某些情况下改善EF估计。女性EF降低的诊断不足对心血管健康中现有的基于性别的差异的加剧具有重要意义。我们建议在开发心血管成像模型时要谨慎,以避免这样的陷阱。
{"title":"Sex disparities in deep learning estimation of ejection fraction from cardiac magnetic resonance imaging.","authors":"Dhamanpreet Kaur,Rohan Shad,Abhinav Kumar,Mrudang Mathur,Joseph Cho,Robyn Fong,Cyril Zakka,Curran Phillips,William Hiesinger","doi":"10.1038/s41746-025-02330-6","DOIUrl":"https://doi.org/10.1038/s41746-025-02330-6","url":null,"abstract":"The advent of artificial intelligence in cardiovascular imaging holds immense potential for earlier diagnoses, precision medicine, and improved disease management. However, the presence of sex-based disparities and strategies to mitigate biases in deep learning models for cardiac imaging remain understudied. In this study, we analyzed algorithmic bias in a foundation model that was pretrained on cardiac magnetic resonance imaging and radiology reports from multiple institutes and finetuned to estimate ejection fraction (EF) on the UK Biobank dataset. The model performed significantly worse in EF estimation for females than males in the diagnosis of reduced EF. Algorithmic fairness did not improve despite masking of protected attributes in radiology reports and data resampling, although explicit input of sex in model finetuning may improve EF estimation in some cases. The underdiagnosis of reduced EF among females holds critical implications for the exacerbation of existing sex-based disparities in cardiovascular health. We advise caution in the development of models for cardiovascular imaging to avoid such pitfalls.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"11 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146034047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty modeling in multimodal speech analysis across the psychosis spectrum. 跨精神病谱的多模态语音分析中的不确定性建模。
IF 15.1 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-23 DOI: 10.1038/s41746-025-02309-3
Morteza Rohanian, Roya Hüppi, Farhad Nooralahzadeh, Noemi Dannecker, Yves Pauli, Werner Surbeck, Iris Sommer, Wolfram Hinzen, Nicolas Langer, Michael Krauthammer, Philipp Homan

Speech provides a rich behavioral signal of psychosis, yet its diagnostic use remains limited because speech patterns vary widely across individuals and contexts. We model this variability as uncertainty, capturing how consistently speech features indicate symptom expression. We introduce a multimodal model that integrates acoustic and linguistic information to predict symptom severity and psychosis-related traits across the spectrum, from high schizotypy to clinical psychosis. By estimating uncertainty for each modality, the model learns when to rely on specific signals, adapting to speech quality and task context to improve accuracy and interpretability. Using speech from 114 participants-32 with early psychosis and 82 with low or high schizotypy-recorded in German across structured and narrative tasks, the model achieved an F1-score of 83% (ECE = 0.045), demonstrating robust and well-calibrated performance. Uncertainty estimation further revealed which speech markers most reliably indicated symptoms, including pitch variability, fluency disruptions, and spectral instability.

言语提供了丰富的精神病行为信号,但其诊断用途仍然有限,因为言语模式在个体和环境中差异很大。我们将这种可变性建模为不确定性,捕捉语音特征表明症状表达的一致性。我们引入了一个多模态模型,该模型集成了声学和语言信息,以预测从高度分裂型到临床精神病的症状严重程度和精神病相关特征。通过估计每种模态的不确定性,该模型学习何时依赖特定信号,适应语音质量和任务上下文,以提高准确性和可解释性。使用114名参与者(32名患有早期精神病,82名患有低或高精神分裂型)在结构化和叙事任务中用德语记录的语音,该模型获得了83%的f1得分(ECE = 0.045),显示出稳健且校准良好的性能。不确定性估计进一步揭示了哪些言语标记最可靠地指示症状,包括音高变化、流利中断和频谱不稳定。
{"title":"Uncertainty modeling in multimodal speech analysis across the psychosis spectrum.","authors":"Morteza Rohanian, Roya Hüppi, Farhad Nooralahzadeh, Noemi Dannecker, Yves Pauli, Werner Surbeck, Iris Sommer, Wolfram Hinzen, Nicolas Langer, Michael Krauthammer, Philipp Homan","doi":"10.1038/s41746-025-02309-3","DOIUrl":"https://doi.org/10.1038/s41746-025-02309-3","url":null,"abstract":"<p><p>Speech provides a rich behavioral signal of psychosis, yet its diagnostic use remains limited because speech patterns vary widely across individuals and contexts. We model this variability as uncertainty, capturing how consistently speech features indicate symptom expression. We introduce a multimodal model that integrates acoustic and linguistic information to predict symptom severity and psychosis-related traits across the spectrum, from high schizotypy to clinical psychosis. By estimating uncertainty for each modality, the model learns when to rely on specific signals, adapting to speech quality and task context to improve accuracy and interpretability. Using speech from 114 participants-32 with early psychosis and 82 with low or high schizotypy-recorded in German across structured and narrative tasks, the model achieved an F1-score of 83% (ECE = 0.045), demonstrating robust and well-calibrated performance. Uncertainty estimation further revealed which speech markers most reliably indicated symptoms, including pitch variability, fluency disruptions, and spectral instability.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study. 从脑MRI报告结果中产生诊断印象的大型语言模型的评估:一项多中心基准和读者研究。
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-22 DOI: 10.1038/s41746-026-02380-4
Ming-Liang Wang,Rui-Peng Zhang,Wen-Juan Wu,Yu Lu,Xiao-Er Wei,Zheng Sun,Bao-Hui Guan,Jun-Jie Zhang,Xue Wu,Lei Zhang,Tian-Le Wang,Yue-Hua Li
Automatically deriving radiological diagnoses from brain MRI report findings is challenging due to high complexity and domain expertise. This study evaluated 10 large language models (LLMs) in generating diagnoses from brain MRI report findings, using 4293 reports (9973 diagnostic labels) covering 15 brain disease categories from three medical centers. DeepSeek-R1 achieved the highest performance among the evaluated models on the full dataset and across different clinical scenarios and subgroups, particularly when provided with structured report findings and clinical information. A top three differential-diagnosis prompting strategy achieved superior performance, with 97.6% patient-level accuracy versus 87.1% for single-diagnosis prompting. The diagnostic performance of six radiologists was assessed with and without DeepSeek-R1 assistance on 500 reports. Integration of DeepSeek-R1 significantly improved diagnostic accuracy (AUPRC: 0.774-0.893) and reduced reading time (from 61 to 53 s), with more pronounced benefits for junior radiologists. Our findings indicate that effective automated diagnostic impression generation in brain MRI reporting requires advanced large-scale LLMs like DeepSeek-R1. With optimized prompting and input strategies, this framework may serve as a supportive tool in drafting brain MRI reports and contribute to enhanced workflow efficiency in radiology practice.
由于高复杂性和领域专业知识,从脑MRI报告结果中自动获得放射学诊断具有挑战性。本研究评估了10个大型语言模型(llm)从脑MRI报告结果中生成诊断,使用了来自三个医学中心的4293份报告(9973个诊断标签),涵盖了15种脑部疾病类别。DeepSeek-R1在完整数据集、不同临床场景和亚组的评估模型中取得了最高的性能,特别是在提供结构化报告结果和临床信息时。前三种鉴别诊断提示策略取得了优异的表现,患者水平的准确率为97.6%,而单一诊断提示的准确率为87.1%。在500份报告中评估了六名放射科医生在有无DeepSeek-R1辅助下的诊断表现。DeepSeek-R1的集成显著提高了诊断准确性(AUPRC: 0.774-0.893),并缩短了阅读时间(从61秒减少到53秒),对初级放射科医生的好处更明显。我们的研究结果表明,在脑MRI报告中有效的自动诊断印象生成需要先进的大型llm,如DeepSeek-R1。通过优化提示和输入策略,该框架可作为起草脑MRI报告的辅助工具,并有助于提高放射学实践中的工作流程效率。
{"title":"Evaluation of large language models for diagnostic impression generation from brain MRI report findings: a multicenter benchmark and reader study.","authors":"Ming-Liang Wang,Rui-Peng Zhang,Wen-Juan Wu,Yu Lu,Xiao-Er Wei,Zheng Sun,Bao-Hui Guan,Jun-Jie Zhang,Xue Wu,Lei Zhang,Tian-Le Wang,Yue-Hua Li","doi":"10.1038/s41746-026-02380-4","DOIUrl":"https://doi.org/10.1038/s41746-026-02380-4","url":null,"abstract":"Automatically deriving radiological diagnoses from brain MRI report findings is challenging due to high complexity and domain expertise. This study evaluated 10 large language models (LLMs) in generating diagnoses from brain MRI report findings, using 4293 reports (9973 diagnostic labels) covering 15 brain disease categories from three medical centers. DeepSeek-R1 achieved the highest performance among the evaluated models on the full dataset and across different clinical scenarios and subgroups, particularly when provided with structured report findings and clinical information. A top three differential-diagnosis prompting strategy achieved superior performance, with 97.6% patient-level accuracy versus 87.1% for single-diagnosis prompting. The diagnostic performance of six radiologists was assessed with and without DeepSeek-R1 assistance on 500 reports. Integration of DeepSeek-R1 significantly improved diagnostic accuracy (AUPRC: 0.774-0.893) and reduced reading time (from 61 to 53 s), with more pronounced benefits for junior radiologists. Our findings indicate that effective automated diagnostic impression generation in brain MRI reporting requires advanced large-scale LLMs like DeepSeek-R1. With optimized prompting and input strategies, this framework may serve as a supportive tool in drafting brain MRI reports and contribute to enhanced workflow efficiency in radiology practice.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"16 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language models improve transferability of electronic health record-based predictions across countries and coding systems. 大型语言模型提高了基于电子健康记录的预测跨国家和编码系统的可转移性。
IF 15.1 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-22 DOI: 10.1038/s41746-026-02363-5
Matthias Kirchler, Matteo Ferro, Veronica Lorenzini, Robin P van de Water, Christoph Lippert, Andrea Ganna

Variation in medical practices and reporting standards across healthcare systems limits the transferability of prediction models based on structured electronic health record data. Prior studies have demonstrated that embedding medical codes into a shared semantic space can help address these discrepancies, but real-world applications remain limited. Here, we show that leveraging embeddings from a large language model alongside a transformer-based prediction model provides an effective and scalable solution to enhance generalizability. We call this approach GRASP and apply it to predict the onset of 21 diseases and all-cause mortality in over one million individuals. Trained on the UK Biobank (UK) and evaluated in FinnGen (Finland) and Mount Sinai (USA), GRASP achieved an average ΔC-index that was 88% and 47% higher than language-unaware models, respectively. GRASP also showed significantly higher correlations with polygenic risk scores for 62% of diseases, and maintained robust performance even when datasets were not harmonized to the same data model.

医疗实践和报告标准在医疗保健系统中的差异限制了基于结构化电子病历数据的预测模型的可转移性。先前的研究表明,将医疗代码嵌入到共享语义空间中可以帮助解决这些差异,但实际应用仍然有限。在这里,我们展示了利用来自大型语言模型的嵌入以及基于变压器的预测模型提供了一种有效且可扩展的解决方案,以增强泛化性。我们称这种方法为GRASP,并将其应用于预测21种疾病的发病和100多万人的全因死亡率。在UK Biobank(英国)进行训练,并在FinnGen(芬兰)和Mount Sinai(美国)进行评估,GRASP的平均得分ΔC-index分别比无语言模型高88%和47%。在62%的疾病中,GRASP还显示出与多基因风险评分显著更高的相关性,并且即使在数据集没有统一到相同的数据模型时,也保持了稳健的性能。
{"title":"Large language models improve transferability of electronic health record-based predictions across countries and coding systems.","authors":"Matthias Kirchler, Matteo Ferro, Veronica Lorenzini, Robin P van de Water, Christoph Lippert, Andrea Ganna","doi":"10.1038/s41746-026-02363-5","DOIUrl":"10.1038/s41746-026-02363-5","url":null,"abstract":"<p><p>Variation in medical practices and reporting standards across healthcare systems limits the transferability of prediction models based on structured electronic health record data. Prior studies have demonstrated that embedding medical codes into a shared semantic space can help address these discrepancies, but real-world applications remain limited. Here, we show that leveraging embeddings from a large language model alongside a transformer-based prediction model provides an effective and scalable solution to enhance generalizability. We call this approach GRASP and apply it to predict the onset of 21 diseases and all-cause mortality in over one million individuals. Trained on the UK Biobank (UK) and evaluated in FinnGen (Finland) and Mount Sinai (USA), GRASP achieved an average ΔC-index that was 88% and 47% higher than language-unaware models, respectively. GRASP also showed significantly higher correlations with polygenic risk scores for 62% of diseases, and maintained robust performance even when datasets were not harmonized to the same data model.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":""},"PeriodicalIF":15.1,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146030247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic review of AI for predicting glaucoma progression: challenges and recommendations towards clinical implementation. 人工智能预测青光眼进展的系统综述:对临床实施的挑战和建议。
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-22 DOI: 10.1038/s41746-025-02321-7
Yichuan G Liang,Leo Fan,Armando Teixeira-Pinto,Gerald Liew,Andrew J R White
Glaucoma is the leading cause of irreversible blindness worldwide with heterogeneous progression rates. Artificial Intelligence (AI) may enable accurate progression predictions in clinical practice. We conducted a systematic review to survey quantitative AI performance and examine strengths and shortfalls in current AI approaches with future clinical implementation in mind. Two reviewers independently screened studies in English from MEDLINE, Embase, Web of Science, Cochrane CENTRAL and arXiv since 2014 and performed risk of bias assessment on eligible studies using QUADAS-2. 46 reports of 43 unique studies demonstrated moderate to good performance in predicting glaucoma conversion, biological deterioration and progression to surgery. Several challenges for clinical translation remain, including inconsistent reporting, limitations and heterogeneity in study design and poor AI generalisability and transparency. We encourage future studies to adopt robust study design and transparent reporting and propose the first glaucoma-specific list of recommended practices and reporting items for future clinical implementation.
青光眼是世界范围内不可逆性失明的主要原因,其进展率参差不齐。人工智能(AI)可以在临床实践中实现准确的进展预测。我们进行了一项系统综述,以调查定量人工智能的表现,并考虑到未来的临床实施,检查当前人工智能方法的优势和不足。两位审稿人独立筛选了2014年以来来自MEDLINE、Embase、Web of Science、Cochrane CENTRAL和arXiv的英文研究,并使用QUADAS-2对符合条件的研究进行了偏倚风险评估。43项独特研究的46份报告显示,在预测青光眼转化、生物学恶化和进展到手术方面有中等到良好的效果。临床翻译仍然面临一些挑战,包括不一致的报告、研究设计的局限性和异质性、人工智能的广泛性和透明度差。我们鼓励未来的研究采用稳健的研究设计和透明的报告,并为未来的临床实施提出第一个青光眼特异性推荐做法和报告项目清单。
{"title":"A systematic review of AI for predicting glaucoma progression: challenges and recommendations towards clinical implementation.","authors":"Yichuan G Liang,Leo Fan,Armando Teixeira-Pinto,Gerald Liew,Andrew J R White","doi":"10.1038/s41746-025-02321-7","DOIUrl":"https://doi.org/10.1038/s41746-025-02321-7","url":null,"abstract":"Glaucoma is the leading cause of irreversible blindness worldwide with heterogeneous progression rates. Artificial Intelligence (AI) may enable accurate progression predictions in clinical practice. We conducted a systematic review to survey quantitative AI performance and examine strengths and shortfalls in current AI approaches with future clinical implementation in mind. Two reviewers independently screened studies in English from MEDLINE, Embase, Web of Science, Cochrane CENTRAL and arXiv since 2014 and performed risk of bias assessment on eligible studies using QUADAS-2. 46 reports of 43 unique studies demonstrated moderate to good performance in predicting glaucoma conversion, biological deterioration and progression to surgery. Several challenges for clinical translation remain, including inconsistent reporting, limitations and heterogeneity in study design and poor AI generalisability and transparency. We encourage future studies to adopt robust study design and transparent reporting and propose the first glaucoma-specific list of recommended practices and reporting items for future clinical implementation.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"35 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Early diagnosis of axial spondyloarthritis in primary care using multi-agent systems. 多agent系统在初级保健中轴性脊柱炎早期诊断中的应用。
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-22 DOI: 10.1038/s41746-026-02372-4
Xiaojian Ji,Zhuofeng Li,Lulu Zeng,Lidong Hu,Yanyan Wang,Kui Zhang,Lianjie Shi,Meng Wei,Lifeng Chen,Lin Guo,Jing Dong,An'an Wang,Lei Sun,Yimin Song,Huatao Wang,Jingming Wang,Ying Lei,Wenqian Yue,Zheng Zhao,Jian Zhu,Feng Huang,Jing Zhang,Tao Li,Kunpeng Li
Axial spondyloarthritis (axSpA) is an inflammatory disease marked by chronic low back pain, with a global average diagnostic delay of 6.7 years. Early diagnosis is crucial for improving prognosis and reducing disability rates, yet primary care physicians (PCPs) may find it challenging to ensure timely recognition and referrals. This study developed and validated Spondyloarthritis Agents (SpAgents), an early diagnostic system based on a multi-agent framework integrating large language models (LLMs) and imaging models. The SpAgents framework includes PlannerAgent, DataAgent, ToolAgent, and DoctorAgent, supported by long-term memory for dynamic knowledge updates. We enrolled 596 patients, dividing 545 from one hospital into a training dataset (n = 359) and a validation dataset (n = 186), along with an independent cohort of 51 patients from five additional hospitals for testing. SpAgents demonstrated strong diagnostic performance, achieving sensitivity of 0.8615 and specificity of 0.8000 during validation, and 0.9375 and 0.7368 during testing. SpAgents exhibited significantly higher sensitivity (0.9400) and accuracy (0.8600) than both PCPs and junior rheumatologists, with overall performance equivalent to that of senior rheumatologists. Under SpAgents-assisted diagnosis, both PCPs and junior rheumatologists showed marked improvements in sensitivity and accuracy. SpAgents effectively enhance early axSpA identification among PCPs, offering an innovative solution to reduce diagnostic delays.
轴性脊柱炎(axSpA)是一种以慢性腰痛为特征的炎症性疾病,全球平均诊断延迟为6.7年。早期诊断对于改善预后和降低残疾率至关重要,但初级保健医生(pcp)可能会发现确保及时识别和转诊具有挑战性。本研究开发并验证了Spondyloarthritis Agents (SpAgents),这是一种基于集成大语言模型(LLMs)和成像模型的多智能体框架的早期诊断系统。SpAgents框架包括PlannerAgent、DataAgent、ToolAgent和DoctorAgent,支持长期记忆,实现动态知识更新。我们招募了596名患者,将一家医院的545名患者分为训练数据集(n = 359)和验证数据集(n = 186),以及来自另外五家医院的51名患者的独立队列进行测试。SpAgents具有较强的诊断性能,验证时灵敏度为0.8615,特异性为0.8000,检测时灵敏度为0.9375,特异性为0.7368。SpAgents的敏感性(0.9400)和准确性(0.8600)均显著高于pcp和初级风湿科医生,总体表现与高级风湿科医生相当。在spagt辅助诊断下,pcp和初级风湿病学家的敏感性和准确性都有显著提高。SpAgents有效地提高了pcp之间的早期axa识别,提供了减少诊断延迟的创新解决方案。
{"title":"Early diagnosis of axial spondyloarthritis in primary care using multi-agent systems.","authors":"Xiaojian Ji,Zhuofeng Li,Lulu Zeng,Lidong Hu,Yanyan Wang,Kui Zhang,Lianjie Shi,Meng Wei,Lifeng Chen,Lin Guo,Jing Dong,An'an Wang,Lei Sun,Yimin Song,Huatao Wang,Jingming Wang,Ying Lei,Wenqian Yue,Zheng Zhao,Jian Zhu,Feng Huang,Jing Zhang,Tao Li,Kunpeng Li","doi":"10.1038/s41746-026-02372-4","DOIUrl":"https://doi.org/10.1038/s41746-026-02372-4","url":null,"abstract":"Axial spondyloarthritis (axSpA) is an inflammatory disease marked by chronic low back pain, with a global average diagnostic delay of 6.7 years. Early diagnosis is crucial for improving prognosis and reducing disability rates, yet primary care physicians (PCPs) may find it challenging to ensure timely recognition and referrals. This study developed and validated Spondyloarthritis Agents (SpAgents), an early diagnostic system based on a multi-agent framework integrating large language models (LLMs) and imaging models. The SpAgents framework includes PlannerAgent, DataAgent, ToolAgent, and DoctorAgent, supported by long-term memory for dynamic knowledge updates. We enrolled 596 patients, dividing 545 from one hospital into a training dataset (n = 359) and a validation dataset (n = 186), along with an independent cohort of 51 patients from five additional hospitals for testing. SpAgents demonstrated strong diagnostic performance, achieving sensitivity of 0.8615 and specificity of 0.8000 during validation, and 0.9375 and 0.7368 during testing. SpAgents exhibited significantly higher sensitivity (0.9400) and accuracy (0.8600) than both PCPs and junior rheumatologists, with overall performance equivalent to that of senior rheumatologists. Under SpAgents-assisted diagnosis, both PCPs and junior rheumatologists showed marked improvements in sensitivity and accuracy. SpAgents effectively enhance early axSpA identification among PCPs, offering an innovative solution to reduce diagnostic delays.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"6 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized supervised and unsupervised intracranial sleep decoding during deep brain stimulation. 个性化监督和无监督颅内睡眠解码在深部脑刺激。
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-22 DOI: 10.1038/s41746-026-02368-0
Clay Smyth,Md Fahim Anjum,Jin-Xiao Zhang,Jiaang Yao,Reza Abbasi-Asl,Philip Starr,Simon Little
Impaired sleep in Parkinson's Disease (PD) is a significant unmet need. Targeting sleep stage-specific neurophysiologies with adaptive Deep Brain Stimulation (aDBS) may ameliorate sleep disruption. This study analyzes the efficacy of personalized machine learning approaches on classifying sleep stages from participants receiving deep brain stimulation. We acquired 283 hours of multi-night intracranial cortico-basal recordings with synchronized sleep stage labels derived from scalp EEG across 5 participants during chronic stimulation. Five-stage classification accuracy across PD subjects averaged 80.2% (±0.9% SEM). When constraining sleep classification to algorithms implementable in currently available DBS devices, e.g., binary NREM classification using linear models, an average accuracy of 85.9% (±0.4% SEM) was achieved for PD subjects. Additionally, linear models trained on unsupervised cluster labels achieved an average accuracy of 83.5% (±5.6% SEM) when discriminating NREM sleep. Overall, this demonstrates the feasibility of personalized supervised and unsupervised ML models for sleep classification using intracranial data during stimulation. The Institutional Review Board approved the parent study protocol, and the study was registered on clinicaltrials.gov (NCT0358289; IDE G180097).
帕金森病(PD)的睡眠受损是一个重要的未满足的需求。适应性脑深部刺激(aDBS)针对睡眠阶段特异性神经生理可能改善睡眠中断。本研究分析了个性化机器学习方法对接受深部脑刺激的参与者进行睡眠阶段分类的功效。在慢性刺激下,我们获得了5名参与者283小时的多夜颅内皮质-基底记录,这些记录与同步睡眠阶段标签来自头皮脑电图。PD受试者的五阶段分类准确率平均为80.2%(±0.9% SEM)。当将睡眠分类限制在目前可用的DBS设备中可实现的算法时,例如,使用线性模型的二元NREM分类,PD受试者的平均准确率为85.9%(±0.4% SEM)。此外,在无监督聚类标签上训练的线性模型在区分NREM睡眠时的平均准确率为83.5%(±5.6% SEM)。总的来说,这证明了在刺激期间使用颅内数据进行个性化监督和无监督ML模型进行睡眠分类的可行性。机构审查委员会批准了母体研究方案,该研究已在clinicaltrials.gov上注册(NCT0358289; IDE G180097)。
{"title":"Personalized supervised and unsupervised intracranial sleep decoding during deep brain stimulation.","authors":"Clay Smyth,Md Fahim Anjum,Jin-Xiao Zhang,Jiaang Yao,Reza Abbasi-Asl,Philip Starr,Simon Little","doi":"10.1038/s41746-026-02368-0","DOIUrl":"https://doi.org/10.1038/s41746-026-02368-0","url":null,"abstract":"Impaired sleep in Parkinson's Disease (PD) is a significant unmet need. Targeting sleep stage-specific neurophysiologies with adaptive Deep Brain Stimulation (aDBS) may ameliorate sleep disruption. This study analyzes the efficacy of personalized machine learning approaches on classifying sleep stages from participants receiving deep brain stimulation. We acquired 283 hours of multi-night intracranial cortico-basal recordings with synchronized sleep stage labels derived from scalp EEG across 5 participants during chronic stimulation. Five-stage classification accuracy across PD subjects averaged 80.2% (±0.9% SEM). When constraining sleep classification to algorithms implementable in currently available DBS devices, e.g., binary NREM classification using linear models, an average accuracy of 85.9% (±0.4% SEM) was achieved for PD subjects. Additionally, linear models trained on unsupervised cluster labels achieved an average accuracy of 83.5% (±5.6% SEM) when discriminating NREM sleep. Overall, this demonstrates the feasibility of personalized supervised and unsupervised ML models for sleep classification using intracranial data during stimulation. The Institutional Review Board approved the parent study protocol, and the study was registered on clinicaltrials.gov (NCT0358289; IDE G180097).","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"86 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced language models for predicting and understanding HIV care disengagement: a case study in Tanzania 用于预测和理解艾滋病毒护理脱离的增强语言模型:坦桑尼亚的案例研究
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-21 DOI: 10.1038/s41746-026-02349-3
Waverly Wei, Junzhe Shao, Rita Qiuran Lyu, Rebecca Hemono, Xinwei Ma, Joseph Giorgio, Zeyu Zheng, Feng Ji, Xiaoya Zhang, Emmanuel Katabaro, Matilda Mlowe, Amon Sabasaba, Caroline Lister, Siraji Shabani, Prosper Njau, Sandra I. McCoy, Jingshen Wang
Sustained engagement in HIV care and adherence to ART are crucial for meeting the UNAIDS “95-95-95” targets. Disengagement from care remains a significant issue, especially in sub-Saharan Africa. Traditional machine learning (ML) models have had moderate success in predicting disengagement, enabling early intervention. We developed an enhanced large language model (LLM) fine-tuned with electronic medical records (EMRs) to predict individuals at risk of disengaging from HIV care in Tanzania. Using 4.8 million EMR records from the National HIV Care and Treatment Program (2018–2023), we identified risks of ART non-adherence, non-suppressed viral load, and loss to follow-up. Our enhanced LLM may outperform traditional machine learning models and zero-shot LLMs. HIV physicians in Tanzania evaluated the model’s predictions and justifications, finding 65% alignment with expert assessments, and 92.3% of the aligned cases were considered clinically relevant. This model can support data-driven decisions and may improve patient outcomes and reduce HIV transmission.
持续参与艾滋病毒护理和坚持抗逆转录病毒药物治疗对于实现联合国艾滋病规划署“95-95-95”目标至关重要。脱离护理仍然是一个重大问题,特别是在撒哈拉以南非洲。传统的机器学习(ML)模型在预测脱离接触方面取得了一定的成功,从而实现了早期干预。我们开发了一个增强的大语言模型(LLM),与电子医疗记录(emr)进行了微调,以预测坦桑尼亚有脱离艾滋病毒护理风险的个体。利用国家艾滋病毒护理和治疗计划(2018-2023)的480万份电子病历记录,我们确定了抗逆转录病毒治疗不坚持、病毒载量未受抑制和随访失败的风险。我们的增强LLM可能优于传统的机器学习模型和零射击LLM。坦桑尼亚的艾滋病医生评估了该模型的预测和理由,发现65%与专家评估一致,92.3%的一致病例被认为与临床相关。该模型可以支持数据驱动的决策,并可能改善患者的治疗结果并减少艾滋病毒传播。
{"title":"Enhanced language models for predicting and understanding HIV care disengagement: a case study in Tanzania","authors":"Waverly Wei, Junzhe Shao, Rita Qiuran Lyu, Rebecca Hemono, Xinwei Ma, Joseph Giorgio, Zeyu Zheng, Feng Ji, Xiaoya Zhang, Emmanuel Katabaro, Matilda Mlowe, Amon Sabasaba, Caroline Lister, Siraji Shabani, Prosper Njau, Sandra I. McCoy, Jingshen Wang","doi":"10.1038/s41746-026-02349-3","DOIUrl":"https://doi.org/10.1038/s41746-026-02349-3","url":null,"abstract":"Sustained engagement in HIV care and adherence to ART are crucial for meeting the UNAIDS “95-95-95” targets. Disengagement from care remains a significant issue, especially in sub-Saharan Africa. Traditional machine learning (ML) models have had moderate success in predicting disengagement, enabling early intervention. We developed an enhanced large language model (LLM) fine-tuned with electronic medical records (EMRs) to predict individuals at risk of disengaging from HIV care in Tanzania. Using 4.8 million EMR records from the National HIV Care and Treatment Program (2018–2023), we identified risks of ART non-adherence, non-suppressed viral load, and loss to follow-up. Our enhanced LLM may outperform traditional machine learning models and zero-shot LLMs. HIV physicians in Tanzania evaluated the model’s predictions and justifications, finding 65% alignment with expert assessments, and 92.3% of the aligned cases were considered clinically relevant. This model can support data-driven decisions and may improve patient outcomes and reduce HIV transmission.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"1 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HealthContradict: Evaluating biomedical knowledge conflicts in language models 健康矛盾:评估语言模型中的生物医学知识冲突
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-21 DOI: 10.1038/s41746-025-02336-0
Boya Zhang, Alban Bornet, Rui Yang, Nan Liu, Douglas Teodoro
How do language models use contextual information to answer health questions? How are their responses impacted by conflicting contexts? We assess the ability of language models to reason over long, conflicting biomedical contexts using HealthContradict, an expert-verified dataset comprising 920 unique instances, each consisting of a health-related question, a factual answer supported by scientific evidence, and two documents presenting contradictory stances. We consider several prompt settings, including correct, incorrect or contradictory context, and measure their impact on model outputs. Compared to existing medical question-answering evaluation benchmarks, HealthContradict provides greater distinctions of language models’ contextual reasoning capabilities. Our experiments show that the strength of fine-tuned biomedical language models lies not only in their parametric knowledge from pretraining, but also in their ability to exploit correct context while resisting incorrect context.
语言模型如何使用上下文信息来回答健康问题?他们的反应如何受到冲突环境的影响?我们使用专家验证的数据集healthcontradictory来评估语言模型在长时间、相互冲突的生物医学背景下进行推理的能力,该数据集由920个独特的实例组成,每个实例由一个与健康相关的问题、一个由科学证据支持的事实答案和两个提出矛盾立场的文件组成。我们考虑了几种提示设置,包括正确、不正确或矛盾的上下文,并测量了它们对模型输出的影响。与现有的医学问答评估基准相比,health矛盾体在语言模型的上下文推理能力方面提供了更大的区别。我们的实验表明,微调生物医学语言模型的优势不仅在于它们从预训练中获得的参数知识,还在于它们利用正确上下文同时抵制错误上下文的能力。
{"title":"HealthContradict: Evaluating biomedical knowledge conflicts in language models","authors":"Boya Zhang, Alban Bornet, Rui Yang, Nan Liu, Douglas Teodoro","doi":"10.1038/s41746-025-02336-0","DOIUrl":"https://doi.org/10.1038/s41746-025-02336-0","url":null,"abstract":"How do language models use contextual information to answer health questions? How are their responses impacted by conflicting contexts? We assess the ability of language models to reason over long, conflicting biomedical contexts using HealthContradict, an expert-verified dataset comprising 920 unique instances, each consisting of a health-related question, a factual answer supported by scientific evidence, and two documents presenting contradictory stances. We consider several prompt settings, including correct, incorrect or contradictory context, and measure their impact on model outputs. Compared to existing medical question-answering evaluation benchmarks, HealthContradict provides greater distinctions of language models’ contextual reasoning capabilities. Our experiments show that the strength of fine-tuned biomedical language models lies not only in their parametric knowledge from pretraining, but also in their ability to exploit correct context while resisting incorrect context.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"45 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
NPJ Digital Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1