首页 > 最新文献

PLOS digital health最新文献

英文 中文
Implementation of large language models in electronic health records. 电子健康记录中大型语言模型的实现。
IF 7.7 Pub Date : 2025-12-19 eCollection Date: 2025-12-01 DOI: 10.1371/journal.pdig.0001141
Maxime Griot, Jean Vanderdonckt, Demet Yuksel

Electronic Health Records (EHRs) have improved access to patient information but substantially increased clinicians' documentation workload. Large Language Models (LLMs) offer a potential means to reduce this burden, yet real-world deployments in live hospital systems remain limited. We implemented a secure, GDPR-compliant, on-premises LLM assistant integrated into the Epic EHR at a European university hospital. The system uses Qwen3-235B with Retrieval Augmented Generation to deliver context-aware answers drawing on structured patient data, internal and regional clinical documents, and medical literature. A one-month pilot with 28 physicians across nine specialties demonstrated high engagement, with 64% of participants using the assistant daily and generating 482 multi-turn conversations. The most common tasks were summarization, information retrieval, and note drafting, which together accounted for over 70% of interactions. Following the pilot, the system was deployed hospital-wide and adopted by 1,028 users who generated 14,910 conversations over five months, with more than half of clinicians using it at least weekly. Usage remained concentrated on information access and documentation support, indicating stable incorporation into everyday clinical workflows. Feedback volume decreased compared with the pilot, suggesting that routine use diminishes voluntary reporting and underscoring the need for complementary automated monitoring strategies. These findings demonstrate that large-scale integration of LLMs into clinical environments is technically feasible and can achieve sustained use when embedded directly within EHR workflows and governed by strong privacy safeguards. The observed patterns of engagement show that such systems can deliver consistent value in information retrieval and documentation, providing a replicable model for responsible clinical AI deployment.

电子健康记录(EHRs)改善了对患者信息的访问,但大大增加了临床医生的文档工作量。大型语言模型(llm)提供了一种潜在的方法来减轻这种负担,但在实际医院系统中的实际部署仍然有限。我们在一家欧洲大学医院的Epic EHR中集成了一个安全的、符合gdpr的本地LLM助手。该系统使用具有检索增强生成功能的Qwen3-235B,根据结构化的患者数据、内部和区域临床文档以及医学文献提供上下文感知的答案。在一个为期一个月的试点项目中,来自9个专业的28名医生表现出了很高的参与度,64%的参与者每天使用助手,并产生了482次多回合对话。最常见的任务是总结、信息检索和笔记起草,它们加起来占交互的70%以上。在试点之后,该系统被部署到整个医院,并被1028名用户采用,在五个月内产生了14910次对话,超过一半的临床医生至少每周使用一次。使用仍然集中在信息访问和文档支持,表明稳定地纳入日常临床工作流程。与试点相比,反馈量有所减少,表明常规使用减少了自愿报告,并强调需要补充自动监测策略。这些发现表明,将llm大规模集成到临床环境中在技术上是可行的,并且当直接嵌入到EHR工作流程中并由强大的隐私保护管理时,可以实现持续使用。观察到的参与模式表明,此类系统可以在信息检索和文档编制方面提供一致的价值,为负责任的临床人工智能部署提供可复制的模型。
{"title":"Implementation of large language models in electronic health records.","authors":"Maxime Griot, Jean Vanderdonckt, Demet Yuksel","doi":"10.1371/journal.pdig.0001141","DOIUrl":"10.1371/journal.pdig.0001141","url":null,"abstract":"<p><p>Electronic Health Records (EHRs) have improved access to patient information but substantially increased clinicians' documentation workload. Large Language Models (LLMs) offer a potential means to reduce this burden, yet real-world deployments in live hospital systems remain limited. We implemented a secure, GDPR-compliant, on-premises LLM assistant integrated into the Epic EHR at a European university hospital. The system uses Qwen3-235B with Retrieval Augmented Generation to deliver context-aware answers drawing on structured patient data, internal and regional clinical documents, and medical literature. A one-month pilot with 28 physicians across nine specialties demonstrated high engagement, with 64% of participants using the assistant daily and generating 482 multi-turn conversations. The most common tasks were summarization, information retrieval, and note drafting, which together accounted for over 70% of interactions. Following the pilot, the system was deployed hospital-wide and adopted by 1,028 users who generated 14,910 conversations over five months, with more than half of clinicians using it at least weekly. Usage remained concentrated on information access and documentation support, indicating stable incorporation into everyday clinical workflows. Feedback volume decreased compared with the pilot, suggesting that routine use diminishes voluntary reporting and underscoring the need for complementary automated monitoring strategies. These findings demonstrate that large-scale integration of LLMs into clinical environments is technically feasible and can achieve sustained use when embedded directly within EHR workflows and governed by strong privacy safeguards. The observed patterns of engagement show that such systems can deliver consistent value in information retrieval and documentation, providing a replicable model for responsible clinical AI deployment.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001141"},"PeriodicalIF":7.7,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716761/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards precision psychiatry: Metabolomics identifies three biological subtypes of depression. 迈向精确精神病学:代谢组学确定了抑郁症的三种生物学亚型。
IF 7.7 Pub Date : 2025-12-19 eCollection Date: 2025-12-01 DOI: 10.1371/journal.pdig.0001125
Simeng Ma, Zhaowen Nie, Mengyuan Zhang, Junhua Mei, Enqi Zhou, Zhiyi Hu, Honggang Lv, Qian Gong, Gaohua Wang, Huiling Wang, Bo Du, Jun Yang, Zhongchun Liu

Depression is clinically and biologically heterogeneous, mandating classification strategies for personalized medicine. This study explored depression subtypes using metabolomics data from the UK Biobank and validated the subtypes in the Whitehall II cohort. The five-step analysis included: (1) identification of distinct subtypes using non-negative matrix factorization (NMF) and four machine learning algorithms; (2) genome-wide association studies (GWAS) to examine associations across subtypes and controls; (3) comparison of clinical characteristics across subtypes; (4) development of 24 subtype-specific diagnostic models and validation in an independent cohort; and (5) construction and comparison of metabolic networks across subtypes. Cluster analysis of 249 metabolomic indicators in individuals with current depressive episodes (n = 7,945) identified three metabolic subtypes of depression. Subtype 1 was characterized by fatty acid dysregulation, subtype 3 had a hyperlipidemia phenotype, while subtype 2 displayed an intermediate phenotype. Metabolic subtypes were not associated with SNPs. Diagnostic models built using the 249 metabolic indicators yielded the area under the curve (AUC) of 0.644 for the total depression sample and 0.785, 0.817, and 0.942 for subtypes 1, 2, and 3, respectively. Twenty-three additional diagnostic models based on combinations of metabolic indicators improved performance by 12.8-39.6% over a binary classification model. Metabolic networks significantly differed between each subtype and healthy controls but not between the total depressed group and controls. This study defines distinct metabolic subtypes of depression. Future research should combine high-throughput metabolomics with prospectively established depression cohorts and tailored interventions to explore subtype-specific diagnostic and therapeutic biomarkers.

抑郁症在临床上和生物学上都是异质的,因此需要个性化治疗的分类策略。本研究利用英国生物银行的代谢组学数据探索抑郁症亚型,并在Whitehall II队列中验证了这些亚型。五步分析包括:(1)使用非负矩阵分解(NMF)和四种机器学习算法识别不同的亚型;(2)全基因组关联研究(GWAS),以检验不同亚型和对照之间的关联;(3)各亚型临床特征比较;(4)开发24种亚型特异性诊断模型并在独立队列中进行验证;(5)跨亚型代谢网络的构建与比较。对当前抑郁发作个体(n = 7,945) 249个代谢组学指标的聚类分析确定了抑郁症的三种代谢亚型。亚型1表现为脂肪酸失调,亚型3表现为高脂血症表型,亚型2表现为中间表型。代谢亚型与snp无关。采用249个代谢指标建立的诊断模型,总抑郁样本的曲线下面积(AUC)为0.644,亚型1、2和3的AUC分别为0.785、0.817和0.942。基于代谢指标组合的23种附加诊断模型比二元分类模型提高了12.8-39.6%的性能。代谢网络在每个亚型和健康对照组之间存在显著差异,但在完全抑郁组和对照组之间没有显著差异。这项研究定义了不同的代谢亚型抑郁症。未来的研究应将高通量代谢组学与前瞻性建立的抑郁症队列和量身定制的干预措施相结合,以探索亚型特异性诊断和治疗生物标志物。
{"title":"Towards precision psychiatry: Metabolomics identifies three biological subtypes of depression.","authors":"Simeng Ma, Zhaowen Nie, Mengyuan Zhang, Junhua Mei, Enqi Zhou, Zhiyi Hu, Honggang Lv, Qian Gong, Gaohua Wang, Huiling Wang, Bo Du, Jun Yang, Zhongchun Liu","doi":"10.1371/journal.pdig.0001125","DOIUrl":"10.1371/journal.pdig.0001125","url":null,"abstract":"<p><p>Depression is clinically and biologically heterogeneous, mandating classification strategies for personalized medicine. This study explored depression subtypes using metabolomics data from the UK Biobank and validated the subtypes in the Whitehall II cohort. The five-step analysis included: (1) identification of distinct subtypes using non-negative matrix factorization (NMF) and four machine learning algorithms; (2) genome-wide association studies (GWAS) to examine associations across subtypes and controls; (3) comparison of clinical characteristics across subtypes; (4) development of 24 subtype-specific diagnostic models and validation in an independent cohort; and (5) construction and comparison of metabolic networks across subtypes. Cluster analysis of 249 metabolomic indicators in individuals with current depressive episodes (n = 7,945) identified three metabolic subtypes of depression. Subtype 1 was characterized by fatty acid dysregulation, subtype 3 had a hyperlipidemia phenotype, while subtype 2 displayed an intermediate phenotype. Metabolic subtypes were not associated with SNPs. Diagnostic models built using the 249 metabolic indicators yielded the area under the curve (AUC) of 0.644 for the total depression sample and 0.785, 0.817, and 0.942 for subtypes 1, 2, and 3, respectively. Twenty-three additional diagnostic models based on combinations of metabolic indicators improved performance by 12.8-39.6% over a binary classification model. Metabolic networks significantly differed between each subtype and healthy controls but not between the total depressed group and controls. This study defines distinct metabolic subtypes of depression. Future research should combine high-throughput metabolomics with prospectively established depression cohorts and tailored interventions to explore subtype-specific diagnostic and therapeutic biomarkers.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001125"},"PeriodicalIF":7.7,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716697/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating ChatGPT-4 in the development of family medicine residency examinations. ChatGPT-4在家庭医学住院医师考试发展中的评价
IF 7.7 Pub Date : 2025-12-19 eCollection Date: 2025-12-01 DOI: 10.1371/journal.pdig.0001156
Hanu Chaudhari, Christopher Meaney, Kulamakan Kulasegaram, Fok-Han Leung

Creating high-quality medical examinations is challenging due to time, cost, and training requirements. This study evaluates the use of ChatGPT 4.0 (ChatGPT-4) in generating medical exam questions for postgraduate family medicine (FM) trainees. Develop a standardized method for postgraduate multiple-choice medical exam question creation using ChatGPT-4 and compare the effectiveness of large language model (LLM) generated questions to those created by human experts. Eight academic FM physicians rated multiple-choice questions (MCQs) generated by humans and ChatGPT-4 across four categories: 1) human-generated, 2) ChatGPT-4 cloned, 3) ChatGPT-4 novel, and 4) ChatGPT-4 generated questions edited by a human expert. Raters scored each question on 17 quality domains. Quality scores were compared using linear mixed effect models. ChatGPT-4 and human-generated questions were rated as high quality, addressing higher-order thinking. Human-generated questions were less likely to be perceived as artificial intelligence (AI) generated, compared to ChatGPT-4 generated questions. For several quality domains ChatGPT-4 was non-inferior (at a 10% margin), but not superior, to human-generated questions. ChatGPT-4 can create medical exam questions that are high quality, and with respect to certain quality domains, non-inferior to those developed by human experts. LLMs can assist in generating and appraising educational content, leading to potential cost and time savings.

由于时间、成本和培训要求,创建高质量的医疗检查具有挑战性。本研究评估ChatGPT 4.0 (ChatGPT-4)在家庭医学研究生(FM)学员生成医学考题中的使用情况。开发一种使用ChatGPT-4创建研究生多项选择医学考试问题的标准化方法,并将大型语言模型(LLM)生成的问题与人类专家创建的问题的有效性进行比较。8位学术FM医生将人类和ChatGPT-4生成的多项选择题(mcq)分为四个类别:1)人类生成,2)ChatGPT-4克隆,3)ChatGPT-4小说,以及4)ChatGPT-4由人类专家编辑的问题。评分者根据17个质量领域对每个问题进行评分。质量评分采用线性混合效应模型进行比较。ChatGPT-4和人工生成的问题被评为高质量,解决了高阶思维。与ChatGPT-4生成的问题相比,人工生成的问题不太可能被视为人工智能(AI)生成的问题。在几个质量领域,ChatGPT-4不逊色于人工生成的问题(10%的边际),但并不优越。ChatGPT-4可以创建高质量的医学考试问题,并且在某些质量领域,不低于人类专家开发的问题。法学硕士可以帮助生成和评估教育内容,从而节省潜在的成本和时间。
{"title":"Evaluating ChatGPT-4 in the development of family medicine residency examinations.","authors":"Hanu Chaudhari, Christopher Meaney, Kulamakan Kulasegaram, Fok-Han Leung","doi":"10.1371/journal.pdig.0001156","DOIUrl":"10.1371/journal.pdig.0001156","url":null,"abstract":"<p><p>Creating high-quality medical examinations is challenging due to time, cost, and training requirements. This study evaluates the use of ChatGPT 4.0 (ChatGPT-4) in generating medical exam questions for postgraduate family medicine (FM) trainees. Develop a standardized method for postgraduate multiple-choice medical exam question creation using ChatGPT-4 and compare the effectiveness of large language model (LLM) generated questions to those created by human experts. Eight academic FM physicians rated multiple-choice questions (MCQs) generated by humans and ChatGPT-4 across four categories: 1) human-generated, 2) ChatGPT-4 cloned, 3) ChatGPT-4 novel, and 4) ChatGPT-4 generated questions edited by a human expert. Raters scored each question on 17 quality domains. Quality scores were compared using linear mixed effect models. ChatGPT-4 and human-generated questions were rated as high quality, addressing higher-order thinking. Human-generated questions were less likely to be perceived as artificial intelligence (AI) generated, compared to ChatGPT-4 generated questions. For several quality domains ChatGPT-4 was non-inferior (at a 10% margin), but not superior, to human-generated questions. ChatGPT-4 can create medical exam questions that are high quality, and with respect to certain quality domains, non-inferior to those developed by human experts. LLMs can assist in generating and appraising educational content, leading to potential cost and time savings.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001156"},"PeriodicalIF":7.7,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12716725/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145795756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel expert-annotated single-cell dataset for thyroid cancer diagnosis with deep learning benchmarks. 基于深度学习基准的甲状腺癌诊断的新型专家注释单细胞数据集。
IF 7.7 Pub Date : 2025-12-16 eCollection Date: 2025-12-01 DOI: 10.1371/journal.pdig.0001120
Nguyen Quang Huy, Thanh-Ha Do, Nguyen Van De, Hoang Kim Giap, Vu Huyen Tram

This paper introduces a novel, expert-annotated single-cell image dataset for thyroid cancer diagnosis, comprising 3,419 individual cell images extracted from high-resolution histopathological slides and annotated with nine clinically significant nuclear features. The dataset, collected and annotated in collaboration with pathologists at the 108 Military Central Hospital (Vietnam), presents a significant resource for advancing research in automated cytological analysis. We establish a series of robust deep-learning baseline pipelines for multi-label classification on this dataset. These baselines incorporate ConvNeXt, Vision Transformers (ViT), and ResNet backbones, along with techniques to address class imbalance, including conditional CutMix, weighted sampling, and SPA loss with Label Pairwise Regularization (LPR). Experiments evaluate the good performance of the proposed pipelines, demonstrating the challenges over the dataset's characteristics and providing a benchmark for future studies in interpretable and reliable AI-based cytological diagnosis. The results highlight the importance of effective model architectures and data-centric strategies for accurate multi-label classification of single-cell images.

本文介绍了一种新的、专家注释的甲状腺癌诊断单细胞图像数据集,包括从高分辨率组织病理切片中提取的3,419个单个细胞图像,并注释了9个临床重要的核特征。该数据集是与108军事中心医院(越南)的病理学家合作收集和注释的,为推进自动化细胞学分析研究提供了重要资源。我们在该数据集上建立了一系列鲁棒的深度学习基线管道,用于多标签分类。这些基线结合了ConvNeXt、Vision transformer (ViT)和ResNet骨干网,以及解决类别不平衡的技术,包括条件CutMix、加权采样和标签对正则化(LPR)的SPA损失。实验评估了所提出的管道的良好性能,展示了数据集特征的挑战,并为未来可解释和可靠的基于人工智能的细胞学诊断研究提供了基准。结果强调了有效的模型架构和以数据为中心的策略对于单细胞图像的精确多标签分类的重要性。
{"title":"A novel expert-annotated single-cell dataset for thyroid cancer diagnosis with deep learning benchmarks.","authors":"Nguyen Quang Huy, Thanh-Ha Do, Nguyen Van De, Hoang Kim Giap, Vu Huyen Tram","doi":"10.1371/journal.pdig.0001120","DOIUrl":"10.1371/journal.pdig.0001120","url":null,"abstract":"<p><p>This paper introduces a novel, expert-annotated single-cell image dataset for thyroid cancer diagnosis, comprising 3,419 individual cell images extracted from high-resolution histopathological slides and annotated with nine clinically significant nuclear features. The dataset, collected and annotated in collaboration with pathologists at the 108 Military Central Hospital (Vietnam), presents a significant resource for advancing research in automated cytological analysis. We establish a series of robust deep-learning baseline pipelines for multi-label classification on this dataset. These baselines incorporate ConvNeXt, Vision Transformers (ViT), and ResNet backbones, along with techniques to address class imbalance, including conditional CutMix, weighted sampling, and SPA loss with Label Pairwise Regularization (LPR). Experiments evaluate the good performance of the proposed pipelines, demonstrating the challenges over the dataset's characteristics and providing a benchmark for future studies in interpretable and reliable AI-based cytological diagnosis. The results highlight the importance of effective model architectures and data-centric strategies for accurate multi-label classification of single-cell images.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001120"},"PeriodicalIF":7.7,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12707618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Telemedicine in adult intensive care: A systematic review of patient-relevant outcomes and methodological considerations. 成人重症监护中的远程医疗:对患者相关结果和方法学考虑的系统回顾。
IF 7.7 Pub Date : 2025-12-15 eCollection Date: 2025-12-01 DOI: 10.1371/journal.pdig.0001126
Tamara Pscheidl, Carina Benstoem, Kelly Ansems, Lena Saal-Bauernschubert, Anne Ritter, Ana-Mihaela Zorger, Karolina Dahms, Sandra Dohmen, Eva Steinfeld, Julia Dormann, Claire Iannizzi, Nicole Skoetz, Heidrun Janka, Maria-Inti Metzendorf, Carla Nau, Miriam Stegemann, Patrick Meybohm, Falk von Dincklage, Sven Laudi, Falk Fichtner, Stephanie Weibel

Given the growing challenges of healthcare, including an aging population and increasing shortages of specialized intensive care staff, this systematic review investigates the efficacy of telemedicine in intensive care compared to standard of care (SoC) or any other type or mode of telemedicine on patient-relevant outcomes for adult intensive care unit (ICU) patients. This systematic review follows Cochrane's methodological standards. Comprehensive searches for any controlled clinical studies were conducted in MEDLINE, Scopus, CINAHL, and CENTRAL (up to 18 April 2024, and an updated search for randomized controlled trials (RCTs) up to 29 September 2025). Twenty-six studies comparing telemedicine in intensive care to SoC with approximately 2,164,508 analysed patients were identified, including data from one cluster RCT (cRCT), two stepped-wedge cluster RCTs (sw-cRCTs), and 23 non-randomized studies of interventions (NRSIs). No other comparisons were identified. Due to high clinical and methodological heterogeneity among studies, no meta-analysis was conducted. For ICU mortality, one cRCT (15,230 patients) and two sw-cRCTs (5,915 patients) showed heterogeneous results: two found no evidence for a difference, while one favoured SoC (very low-certainty). One sw-cRCT (1,462 patients) reporting overall mortality at 180 days suggested no evidence for a difference between groups (very low-certainty). Data from one cRCT (15,230 patients) and one sw-cRCT (1,462 patients) on ICU length of stay (LOS) showed no evidence for a difference between groups (moderate- and very low-certainty). Quality of life from one sw-cRCT (786 patients) indicated no evidence for a difference (very low-certainty). Six NRSIs reported adjusted data on ICU mortality, two on overall mortality, and three on ICU LOS, with heterogeneous results. High risk of bias and substantial heterogeneity limited the certainty, emphasizing the need for robust, patient-centered research in clinical studies to define telemedicine's role in intensive care and optimize its implementation. Future studies should particularly ensure transparent and comprehensive reporting.

鉴于医疗保健面临越来越大的挑战,包括人口老龄化和专业重症监护人员的日益短缺,本系统综述调查了远程医疗在重症监护中的效果,与标准护理(SoC)或任何其他类型或模式的远程医疗相比,对成人重症监护病房(ICU)患者的患者相关结果。本系统综述遵循Cochrane的方法标准。在MEDLINE、Scopus、CINAHL和CENTRAL中进行了所有对照临床研究的综合检索(截止到2024年4月18日),并更新了随机对照试验(rct)的检索(截止到2025年9月29日)。共有26项研究比较了远程医疗在重症监护和SoC中的应用,涉及约2164,508名分析患者,其中包括一项群集随机对照试验(cRCT)、两项楔形步进群集随机对照试验(sw-cRCT)和23项非随机干预研究(NRSIs)的数据。没有发现其他比较。由于各研究的临床和方法学异质性较高,未进行meta分析。对于ICU死亡率,一项cRCT(15,230例患者)和两项sw-cRCT(5,915例患者)显示异质结果:两项没有发现差异的证据,而一项支持SoC(非常低确定性)。一项sw-cRCT(1462例患者)报告了180天的总死亡率,表明没有证据表明两组之间存在差异(非常低确定性)。一项cRCT(15230例患者)和一项sw-cRCT(1462例患者)关于ICU住院时间(LOS)的数据显示,两组之间没有差异(中度和极低确定性)。一项sw-cRCT(786例患者)的生活质量显示没有证据表明存在差异(非常低的确定性)。6个nrsi报告了ICU死亡率的调整数据,2个报告了总体死亡率,3个报告了ICU LOS,结果不一致。高偏倚风险和大量异质性限制了确定性,强调需要在临床研究中进行强有力的、以患者为中心的研究,以确定远程医疗在重症监护中的作用并优化其实施。今后的研究应特别确保报告透明和全面。
{"title":"Telemedicine in adult intensive care: A systematic review of patient-relevant outcomes and methodological considerations.","authors":"Tamara Pscheidl, Carina Benstoem, Kelly Ansems, Lena Saal-Bauernschubert, Anne Ritter, Ana-Mihaela Zorger, Karolina Dahms, Sandra Dohmen, Eva Steinfeld, Julia Dormann, Claire Iannizzi, Nicole Skoetz, Heidrun Janka, Maria-Inti Metzendorf, Carla Nau, Miriam Stegemann, Patrick Meybohm, Falk von Dincklage, Sven Laudi, Falk Fichtner, Stephanie Weibel","doi":"10.1371/journal.pdig.0001126","DOIUrl":"10.1371/journal.pdig.0001126","url":null,"abstract":"<p><p>Given the growing challenges of healthcare, including an aging population and increasing shortages of specialized intensive care staff, this systematic review investigates the efficacy of telemedicine in intensive care compared to standard of care (SoC) or any other type or mode of telemedicine on patient-relevant outcomes for adult intensive care unit (ICU) patients. This systematic review follows Cochrane's methodological standards. Comprehensive searches for any controlled clinical studies were conducted in MEDLINE, Scopus, CINAHL, and CENTRAL (up to 18 April 2024, and an updated search for randomized controlled trials (RCTs) up to 29 September 2025). Twenty-six studies comparing telemedicine in intensive care to SoC with approximately 2,164,508 analysed patients were identified, including data from one cluster RCT (cRCT), two stepped-wedge cluster RCTs (sw-cRCTs), and 23 non-randomized studies of interventions (NRSIs). No other comparisons were identified. Due to high clinical and methodological heterogeneity among studies, no meta-analysis was conducted. For ICU mortality, one cRCT (15,230 patients) and two sw-cRCTs (5,915 patients) showed heterogeneous results: two found no evidence for a difference, while one favoured SoC (very low-certainty). One sw-cRCT (1,462 patients) reporting overall mortality at 180 days suggested no evidence for a difference between groups (very low-certainty). Data from one cRCT (15,230 patients) and one sw-cRCT (1,462 patients) on ICU length of stay (LOS) showed no evidence for a difference between groups (moderate- and very low-certainty). Quality of life from one sw-cRCT (786 patients) indicated no evidence for a difference (very low-certainty). Six NRSIs reported adjusted data on ICU mortality, two on overall mortality, and three on ICU LOS, with heterogeneous results. High risk of bias and substantial heterogeneity limited the certainty, emphasizing the need for robust, patient-centered research in clinical studies to define telemedicine's role in intensive care and optimize its implementation. Future studies should particularly ensure transparent and comprehensive reporting.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001126"},"PeriodicalIF":7.7,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12704867/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Healthy Environments and Active Living for Translational Health (HEALTH) Platform: A smartphone-based system for geographic ecological momentary assessment research. 健康环境与积极生活转化健康(Health)平台:基于智能手机的地理生态瞬间评价研究系统。
IF 7.7 Pub Date : 2025-12-11 eCollection Date: 2025-12-01 DOI: 10.1371/journal.pdig.0001133
Alexander J Wray, Katelyn R O'Bright, Shiran Zhong, Sean Doherty, Michael Luubert, Jed Long, Catherine E Reining, Christopher J Lemieux, Jon Salter, Jason Gilliland

Smartphones have become a widely used tool for delivering digital health interventions and conducting observational research. Many digital health studies adopt an ecological momentary assessment (EMA) methodology, which can be enhanced by collecting participant location data using built-in smartphone technologies. However, there is currently a lack of customizable software capable of supporting geographically explicit research in EMA. To address this gap, we developed the Healthy Environments and Active Living for Translational Health (HEALTH) Platform. The HEALTH Platform is a customizable smartphone application that enables researchers to deliver geographic ecological momentary assessment (GEMA) prompts on a smartphone in real-time based on spatially complex geofence boundaries, to collect audiovisual data, and to flexibly adjust system logic without requiring time-consuming updates to participants' devices. We illustrate the HEALTH Platform's capabilities through a study of park exposure and well-being. This study illustrates how the HEALTH Platform improves upon existing GEMA software platforms by offering greater customization and real-time flexibility in data collection and prompting participants. We observed survey prompt adherence is associated with participant motivation and the complexity of the survey instrument itself, following past EMA research findings. Overall, the HEALTH Platform offers a flexible solution for implementing GEMA in digital health research and practice.

智能手机已成为一种广泛使用的工具,用于提供数字卫生干预措施和进行观察性研究。许多数字健康研究采用生态瞬时评估(EMA)方法,可通过使用内置智能手机技术收集参与者位置数据来增强该方法。然而,目前缺乏能够支持EMA地理明确研究的可定制软件。为了解决这一差距,我们开发了健康环境和积极生活促进转化健康(健康)平台。健康平台是一个可定制的智能手机应用程序,使研究人员能够根据空间复杂的地理围栏边界在智能手机上实时提供地理生态瞬时评估(GEMA)提示,收集视听数据,并灵活调整系统逻辑,而无需耗时地更新参与者的设备。我们通过对公园暴露和幸福感的研究来说明健康平台的能力。该研究说明了HEALTH平台如何通过在数据收集和提示参与者方面提供更大的定制和实时灵活性,改进现有的GEMA软件平台。我们观察到,根据过去的EMA研究结果,调查提示依从性与参与者动机和调查工具本身的复杂性有关。总体而言,卫生平台为在数字卫生研究和实践中实施GEMA提供了一个灵活的解决方案。
{"title":"The Healthy Environments and Active Living for Translational Health (HEALTH) Platform: A smartphone-based system for geographic ecological momentary assessment research.","authors":"Alexander J Wray, Katelyn R O'Bright, Shiran Zhong, Sean Doherty, Michael Luubert, Jed Long, Catherine E Reining, Christopher J Lemieux, Jon Salter, Jason Gilliland","doi":"10.1371/journal.pdig.0001133","DOIUrl":"10.1371/journal.pdig.0001133","url":null,"abstract":"<p><p>Smartphones have become a widely used tool for delivering digital health interventions and conducting observational research. Many digital health studies adopt an ecological momentary assessment (EMA) methodology, which can be enhanced by collecting participant location data using built-in smartphone technologies. However, there is currently a lack of customizable software capable of supporting geographically explicit research in EMA. To address this gap, we developed the Healthy Environments and Active Living for Translational Health (HEALTH) Platform. The HEALTH Platform is a customizable smartphone application that enables researchers to deliver geographic ecological momentary assessment (GEMA) prompts on a smartphone in real-time based on spatially complex geofence boundaries, to collect audiovisual data, and to flexibly adjust system logic without requiring time-consuming updates to participants' devices. We illustrate the HEALTH Platform's capabilities through a study of park exposure and well-being. This study illustrates how the HEALTH Platform improves upon existing GEMA software platforms by offering greater customization and real-time flexibility in data collection and prompting participants. We observed survey prompt adherence is associated with participant motivation and the complexity of the survey instrument itself, following past EMA research findings. Overall, the HEALTH Platform offers a flexible solution for implementing GEMA in digital health research and practice.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001133"},"PeriodicalIF":7.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12697974/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145745995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COVID-19 vaccination data management and visualization systems for improved decision-making: Lessons learnt from Africa CDC Saving Lives and Livelihoods program. 用于改进决策的COVID-19疫苗接种数据管理和可视化系统:从非洲疾病预防控制中心拯救生命和生计项目中吸取的经验教训。
IF 7.7 Pub Date : 2025-12-11 eCollection Date: 2025-12-01 DOI: 10.1371/journal.pdig.0000782
Raji Tajudeen, Mosoka Papa Fallah, John Ojo, Tamrat Shaweno, Michael Sileshi Mekbib, Frehiwot Mulugeta, Wondwossen Amanuel, Moses Bamatura, Dennis Kibiye, Patrick Chanda Kabwe, Senga Sembuche, Ngashi Ngongo, Nebiyu Dereje, Jean Kaseya

The DHIS2 system enabled real-time tracking of vaccine distribution and administration to facilitate data-driven decisions. Experts from the Africa Centres for Disease Control and Prevention (Africa CDC) Monitoring and Evaluation (M&E) and Management Information System (MIS) teams, with support from the Health Information Systems Program South Africa (HISP-SA), developed the continental COVID-19 vaccination tracking system. Several variables related to COVID-19 vaccination were considered in developing the system. Three-hundred fifty users can access the system at different levels with specific roles and privileges. Four dashboards with high-level summary visualizations were developed for top leadership for decision-making, while pages with detailed programmatic results are available to other users depending on their level of access. Africa CDC staff at different levels with a role-based account can view and interact with the dashboards and make necessary decisions based on the COVID-19 vaccination data from program implementation areas on the continent. The Africa CDC vaccination program dashboard provided essential information for public health officials to monitor the continental COVID-19 vaccination efforts and guide timely decisions. As the impact of COVID-19 is not yet over, the continental tracking of COVID-19 vaccine uptake and dashboard visualizations are used to provide the context of continental COVID-19 vaccination coverage and multiple other metrics that may impact the continental COVID-19 vaccine uptake. The lessons learned during the development and implementation of a continental COVID-19 vaccination tracking and visualization dashboard may be applied across various other public health events of continental and global concern.

DHIS2系统能够实时跟踪疫苗分配和管理,以促进数据驱动的决策。来自非洲疾病控制和预防中心(非洲CDC)监测与评估(M&E)和管理信息系统(MIS)小组的专家在南非卫生信息系统规划(HISP-SA)的支持下,开发了非洲大陆COVID-19疫苗接种跟踪系统。在开发该系统时考虑了与COVID-19疫苗接种相关的几个变量。350个不同级别的用户可以使用特定的角色和权限访问系统。开发了四个具有高级摘要可视化的仪表板,用于高层领导的决策,而具有详细编程结果的页面则可供其他用户使用,具体取决于他们的访问级别。拥有基于角色帐户的非洲疾病预防控制中心各级工作人员可以查看仪表板并与之互动,并根据非洲大陆规划实施地区的COVID-19疫苗接种数据做出必要的决定。非洲疾病预防控制中心疫苗接种计划仪表板为公共卫生官员提供了重要信息,以监测非洲大陆的COVID-19疫苗接种工作并指导及时决策。由于COVID-19的影响尚未结束,各大洲对COVID-19疫苗接种情况的跟踪和仪表板可视化用于提供各大洲COVID-19疫苗接种覆盖率的背景以及可能影响各大洲COVID-19疫苗接种的多种其他指标。在制定和实施大陆COVID-19疫苗接种跟踪和可视化仪表板期间吸取的经验教训可以应用于大陆和全球关注的各种其他公共卫生事件。
{"title":"COVID-19 vaccination data management and visualization systems for improved decision-making: Lessons learnt from Africa CDC Saving Lives and Livelihoods program.","authors":"Raji Tajudeen, Mosoka Papa Fallah, John Ojo, Tamrat Shaweno, Michael Sileshi Mekbib, Frehiwot Mulugeta, Wondwossen Amanuel, Moses Bamatura, Dennis Kibiye, Patrick Chanda Kabwe, Senga Sembuche, Ngashi Ngongo, Nebiyu Dereje, Jean Kaseya","doi":"10.1371/journal.pdig.0000782","DOIUrl":"10.1371/journal.pdig.0000782","url":null,"abstract":"<p><p>The DHIS2 system enabled real-time tracking of vaccine distribution and administration to facilitate data-driven decisions. Experts from the Africa Centres for Disease Control and Prevention (Africa CDC) Monitoring and Evaluation (M&E) and Management Information System (MIS) teams, with support from the Health Information Systems Program South Africa (HISP-SA), developed the continental COVID-19 vaccination tracking system. Several variables related to COVID-19 vaccination were considered in developing the system. Three-hundred fifty users can access the system at different levels with specific roles and privileges. Four dashboards with high-level summary visualizations were developed for top leadership for decision-making, while pages with detailed programmatic results are available to other users depending on their level of access. Africa CDC staff at different levels with a role-based account can view and interact with the dashboards and make necessary decisions based on the COVID-19 vaccination data from program implementation areas on the continent. The Africa CDC vaccination program dashboard provided essential information for public health officials to monitor the continental COVID-19 vaccination efforts and guide timely decisions. As the impact of COVID-19 is not yet over, the continental tracking of COVID-19 vaccine uptake and dashboard visualizations are used to provide the context of continental COVID-19 vaccination coverage and multiple other metrics that may impact the continental COVID-19 vaccine uptake. The lessons learned during the development and implementation of a continental COVID-19 vaccination tracking and visualization dashboard may be applied across various other public health events of continental and global concern.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0000782"},"PeriodicalIF":7.7,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12697957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145746000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing and validating an explainable digital mortality prediction tool for extremely preterm infants. 开发和验证可解释的极早产儿数字死亡率预测工具。
IF 7.7 Pub Date : 2025-12-10 eCollection Date: 2025-12-01 DOI: 10.1371/journal.pdig.0000955
T'ng Chang Kwok, Chao Chen, Jayaprakash Veeravalli, Carol A C Coupland, Edmund Juszczak, Jonathan Garibaldi, Kirsten Mitchell, Kate L Francis, Christopher J D McKinlay, Brett J Manley, Don Sharkey

Decision-making in perinatal management of extremely preterm infants is challenging. Mortality prediction tools may support decision-making. We used population-based routinely entered electronic patient record data from 25,902 infants born between 23+0-27+6 weeks' gestation and admitted to 185 English and Welsh neonatal units from 2010-2020 to develop and internally validate an online tool to predict mortality before neonatal discharge. Comparing nine machine learning approaches, we developed an explainable tool based on stepwise backward logistic regression (https://premoutcome.shinyapps.io/Death/). The tool demonstrated good discrimination (area under the receiver operating characteristics curve (95% confidence interval) of 0.746 (0.729-0.762)) and calibration with superior net benefit across probability thresholds of 10%-70%. Our tool also demonstrated superior calibration and utility performance than previously published models. Acceptable performance was demonstrated in a multinational, external validation cohort of preterm infants. This tool may be useful to support high-risk perinatal decision-making following further evaluation.

极早产儿围产期管理决策具有挑战性。死亡率预测工具可能支持决策。我们使用基于人群的常规输入电子病历数据,这些数据来自25,902名出生在妊娠23+0-27+6周之间的婴儿,并于2010-2020年在185个英格兰和威尔士新生儿单位入院,以开发并内部验证一个在线工具,用于预测新生儿出院前的死亡率。比较了九种机器学习方法,我们开发了一个基于逐步向后逻辑回归的可解释工具(https://premoutcome.shinyapps.io/Death/)。该工具具有良好的辨别能力(受试者工作特征曲线下面积(95%置信区间)为0.746(0.729-0.762)),在10%-70%的概率阈值范围内具有优越的校准净效益。我们的工具还展示了比以前发表的模型更好的校准和实用性能。可接受的表现被证明在一个跨国的,外部验证队列早产儿。该工具可能有助于在进一步评估后支持高危围产期决策。
{"title":"Developing and validating an explainable digital mortality prediction tool for extremely preterm infants.","authors":"T'ng Chang Kwok, Chao Chen, Jayaprakash Veeravalli, Carol A C Coupland, Edmund Juszczak, Jonathan Garibaldi, Kirsten Mitchell, Kate L Francis, Christopher J D McKinlay, Brett J Manley, Don Sharkey","doi":"10.1371/journal.pdig.0000955","DOIUrl":"10.1371/journal.pdig.0000955","url":null,"abstract":"<p><p>Decision-making in perinatal management of extremely preterm infants is challenging. Mortality prediction tools may support decision-making. We used population-based routinely entered electronic patient record data from 25,902 infants born between 23+0-27+6 weeks' gestation and admitted to 185 English and Welsh neonatal units from 2010-2020 to develop and internally validate an online tool to predict mortality before neonatal discharge. Comparing nine machine learning approaches, we developed an explainable tool based on stepwise backward logistic regression (https://premoutcome.shinyapps.io/Death/). The tool demonstrated good discrimination (area under the receiver operating characteristics curve (95% confidence interval) of 0.746 (0.729-0.762)) and calibration with superior net benefit across probability thresholds of 10%-70%. Our tool also demonstrated superior calibration and utility performance than previously published models. Acceptable performance was demonstrated in a multinational, external validation cohort of preterm infants. This tool may be useful to support high-risk perinatal decision-making following further evaluation.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0000955"},"PeriodicalIF":7.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12694798/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-agent approach to neurological clinical reasoning. 神经临床推理的多智能体方法。
IF 7.7 Pub Date : 2025-12-04 eCollection Date: 2025-12-01 DOI: 10.1371/journal.pdig.0001106
Moran Sorka, Alon Gorenshtein, Dvir Aran, Shahar Shelly

Large language models (LLMs) have demonstrated impressive capabilities in medical domains, yet their ability to handle the specialized reasoning patterns required in clinical neurology warrants systematic evaluation. Neurological assessment presents distinctive challenges that combine anatomical localization, temporal pattern recognition, and nuanced symptom interpretation-cognitive processes that are specifically tested in board certification examinations. We developed a comprehensive benchmark comprising 305 questions from Israeli Board Certification Exams in Neurology and classified each along three dimensions of complexity: factual knowledge depth, clinical concept integration, and reasoning complexity. We evaluated ten LLMs of varying architectures and specializations using this benchmark, testing base models, retrieval-augmented generation (RAG) enhancement, and a novel multi-agent system. Our analysis revealed significant performance variation across models and methodologies. The OpenAI-o1 model achieved the highest base performance (90.9% accuracy), while specialized medical models performed surprisingly poorly (52.9% for Meditron-70B). RAG enhancement provided variable benefits across models; substantial improvements for mid-tier models like GPT-4o (80.5% to 87.3%) and smaller models, but limited effectiveness on the highest complexity questions regardless of model size. In contrast, our multi-agent framework-which decomposes neurological reasoning into specialized cognitive functions including question analysis, knowledge retrieval, answer synthesis, and validation-achieved dramatic improvements, especially for mid-range models. The LLaMA 3.3-70B-based agentic system reached 89.2% accuracy compared to 69.5% for its base model, with particularly substantial gains on level 3 complexity questions across all dimensions. External validation on MedQA revealed dataset-specific RAG effects: while RAG improved board certification performance, it showed minimal benefit on MedQA questions (LLaMA 3.3-70B: + 1.4% vs + 3.9% on board exams), reflecting alignment between our specialized neurology textbook and board examination content rather than the broader medical knowledge required for MedQA. Most notably, the multi-agent approach transformed inconsistent subspecialty performance into remarkably uniform excellence, effectively addressing the neurological reasoning challenges that persisted even with RAG enhancement. We further validated our approach using an independent dataset comprising 155 neurological cases extracted from MedQA. The results confirm that structured multi-agent approaches designed to emulate specialized cognitive processes significantly enhance complex medical reasoning offering promising directions for AI assistance in challenging clinical contexts.

大型语言模型(llm)已经在医学领域展示了令人印象深刻的能力,但是它们处理临床神经病学所需的专业推理模式的能力需要系统的评估。神经学评估提出了独特的挑战,结合解剖定位,时间模式识别和细致入微的症状解释-认知过程,在委员会认证考试中专门测试。我们开发了一个综合基准,包括来自以色列委员会神经学认证考试的305个问题,并根据复杂性的三个维度对每个问题进行分类:事实知识深度、临床概念整合和推理复杂性。我们使用这个基准评估了10个不同架构和专门化的llm,测试了基本模型、检索增强生成(RAG)增强和一个新的多智能体系统。我们的分析揭示了不同模型和方法的显著性能差异。openai - 01模型实现了最高的基本性能(准确率为90.9%),而专业医学模型的表现却令人惊讶地差(Meditron-70B的准确率为52.9%)。RAG增强在不同模型中提供了不同的好处;对于像gpt - 40这样的中档模型(80.5%到87.3%)和较小的模型有了实质性的改进,但无论模型大小如何,在最高复杂性问题上的效果有限。相比之下,我们的多智能体框架——将神经推理分解为专门的认知功能,包括问题分析、知识检索、答案合成和验证——取得了巨大的进步,特别是对于中档模型。基于LLaMA 3.3- 70b的代理系统达到了89.2%的准确率,而其基本模型的准确率为69.5%,在所有维度的3级复杂性问题上都有显著的提高。MedQA的外部验证显示了数据集特定的RAG效应:虽然RAG提高了委员会认证的性能,但它对MedQA问题的益处很小(LLaMA 3.3-70B: + 1.4% vs + 3.9%),反映了我们的专业神经学教科书和委员会考试内容之间的一致性,而不是MedQA所需的更广泛的医学知识。最值得注意的是,多智能体方法将不一致的亚专业表现转化为非常统一的卓越表现,有效地解决了即使在RAG增强后仍然存在的神经推理挑战。我们使用从MedQA中提取的155例神经系统病例的独立数据集进一步验证了我们的方法。结果证实,旨在模拟专业认知过程的结构化多智能体方法显着增强了复杂的医学推理,为人工智能在具有挑战性的临床环境中的辅助提供了有希望的方向。
{"title":"A multi-agent approach to neurological clinical reasoning.","authors":"Moran Sorka, Alon Gorenshtein, Dvir Aran, Shahar Shelly","doi":"10.1371/journal.pdig.0001106","DOIUrl":"10.1371/journal.pdig.0001106","url":null,"abstract":"<p><p>Large language models (LLMs) have demonstrated impressive capabilities in medical domains, yet their ability to handle the specialized reasoning patterns required in clinical neurology warrants systematic evaluation. Neurological assessment presents distinctive challenges that combine anatomical localization, temporal pattern recognition, and nuanced symptom interpretation-cognitive processes that are specifically tested in board certification examinations. We developed a comprehensive benchmark comprising 305 questions from Israeli Board Certification Exams in Neurology and classified each along three dimensions of complexity: factual knowledge depth, clinical concept integration, and reasoning complexity. We evaluated ten LLMs of varying architectures and specializations using this benchmark, testing base models, retrieval-augmented generation (RAG) enhancement, and a novel multi-agent system. Our analysis revealed significant performance variation across models and methodologies. The OpenAI-o1 model achieved the highest base performance (90.9% accuracy), while specialized medical models performed surprisingly poorly (52.9% for Meditron-70B). RAG enhancement provided variable benefits across models; substantial improvements for mid-tier models like GPT-4o (80.5% to 87.3%) and smaller models, but limited effectiveness on the highest complexity questions regardless of model size. In contrast, our multi-agent framework-which decomposes neurological reasoning into specialized cognitive functions including question analysis, knowledge retrieval, answer synthesis, and validation-achieved dramatic improvements, especially for mid-range models. The LLaMA 3.3-70B-based agentic system reached 89.2% accuracy compared to 69.5% for its base model, with particularly substantial gains on level 3 complexity questions across all dimensions. External validation on MedQA revealed dataset-specific RAG effects: while RAG improved board certification performance, it showed minimal benefit on MedQA questions (LLaMA 3.3-70B: + 1.4% vs + 3.9% on board exams), reflecting alignment between our specialized neurology textbook and board examination content rather than the broader medical knowledge required for MedQA. Most notably, the multi-agent approach transformed inconsistent subspecialty performance into remarkably uniform excellence, effectively addressing the neurological reasoning challenges that persisted even with RAG enhancement. We further validated our approach using an independent dataset comprising 155 neurological cases extracted from MedQA. The results confirm that structured multi-agent approaches designed to emulate specialized cognitive processes significantly enhance complex medical reasoning offering promising directions for AI assistance in challenging clinical contexts.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001106"},"PeriodicalIF":7.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12677565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From artificial to organic: Rethinking the roots of intelligence for digital health. 从人工到有机:为数字健康重新思考智能的根源。
IF 7.7 Pub Date : 2025-12-01 DOI: 10.1371/journal.pdig.0001109
Prajwal Ghimire, Keyoumars Ashkan

The term "artificial" implies an inherent dichotomy from the natural or organic. However, AI, as we know it, is a product of organic ingenuity-designed, implemented, and iteratively improved by human cognition. The very principles that underpin AI systems, from neural networks to decision-making algorithms, are inspired by the organic intelligence embedded in human neurobiology and evolutionary processes. The path from "organic" to "artificial" intelligence in digital health is neither mystical nor merely a matter of parameter count-it is fundamentally about organization and adaption. Thus, the boundaries between "artificial" and "organic" are far less distinct than the nomenclature suggests.

“人造”一词隐含着天然和有机的内在二分法。然而,正如我们所知,人工智能是一种有机智慧的产物——由人类的认知设计、实施和迭代改进。支撑人工智能系统的原则,从神经网络到决策算法,都是受到人类神经生物学和进化过程中嵌入的有机智能的启发。在数字健康领域,从“有机”智能到“人工”智能的道路既不是神秘的,也不仅仅是一个参数计数问题——它从根本上是关于组织和适应的。因此,“人造”和“有机”之间的界限远没有术语所显示的那么明显。
{"title":"From artificial to organic: Rethinking the roots of intelligence for digital health.","authors":"Prajwal Ghimire, Keyoumars Ashkan","doi":"10.1371/journal.pdig.0001109","DOIUrl":"10.1371/journal.pdig.0001109","url":null,"abstract":"<p><p>The term \"artificial\" implies an inherent dichotomy from the natural or organic. However, AI, as we know it, is a product of organic ingenuity-designed, implemented, and iteratively improved by human cognition. The very principles that underpin AI systems, from neural networks to decision-making algorithms, are inspired by the organic intelligence embedded in human neurobiology and evolutionary processes. The path from \"organic\" to \"artificial\" intelligence in digital health is neither mystical nor merely a matter of parameter count-it is fundamentally about organization and adaption. Thus, the boundaries between \"artificial\" and \"organic\" are far less distinct than the nomenclature suggests.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 12","pages":"e0001109"},"PeriodicalIF":7.7,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668481/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145656537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
PLOS digital health
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1