首页 > 最新文献

NPJ Digital Medicine最新文献

英文 中文
Digital pathways connecting social and biological factors to health outcomes and equity
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-20 DOI: 10.1038/s41746-025-01564-8
Yan Cui
Digital pathways extend conventional connections between social and biological factors and health outcomes, significantly influencing health equity. Data representation bias and distribution shifts are key mechanisms through which determinants of health impact generalizability of artificial intelligence (AI) models and subsequently affect health outcomes and equity. These mechanisms provide critical targets for algorithmic interventions, which can lead to Pareto improvements in AI model performance across diverse populations, thereby mitigating health disparities.
{"title":"Digital pathways connecting social and biological factors to health outcomes and equity","authors":"Yan Cui","doi":"10.1038/s41746-025-01564-8","DOIUrl":"https://doi.org/10.1038/s41746-025-01564-8","url":null,"abstract":"Digital pathways extend conventional connections between social and biological factors and health outcomes, significantly influencing health equity. Data representation bias and distribution shifts are key mechanisms through which determinants of health impact generalizability of artificial intelligence (AI) models and subsequently affect health outcomes and equity. These mechanisms provide critical targets for algorithmic interventions, which can lead to Pareto improvements in AI model performance across diverse populations, thereby mitigating health disparities.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"44 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable personalized surgical recommendation with joint consideration of multiple decisional dimensions
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-19 DOI: 10.1038/s41746-025-01509-1
Zhe Du, Zhaoyang Liu, Linru Fu, Che Wang, Zhijing Sun, Lan Zhu, Ke Deng

Surgical planning can be highly complicated and personalized, where a surgeon needs to balance multiple decisional dimensions including surgical effectiveness, risk, cost, and patient’s conditions and preferences. Turning to artificial intelligence is a great appeal. This study filled in this gap with Multi-Dimensional Recommendation (MUDI), an interpretable data-driven intelligent system that supported personalized surgical recommendations on both the patient’s and the surgeon’s side with joint consideration of multiple decisional dimensions. Applied to Pelvic Organ Prolapse, a common female disease with significant impacts on life quality, MUDI stood out from a crowd of competing methods and achieved excellent performance that was comparable to top urogynecologists, with a transparent process that made communications between surgeons and patients easier. Users showed a willingness to accept the recommendations and achieved higher accuracy with the aid of MUDI. Such a success indicated that MUDI had the potential to solve similar challenges in other situations.

{"title":"Interpretable personalized surgical recommendation with joint consideration of multiple decisional dimensions","authors":"Zhe Du, Zhaoyang Liu, Linru Fu, Che Wang, Zhijing Sun, Lan Zhu, Ke Deng","doi":"10.1038/s41746-025-01509-1","DOIUrl":"https://doi.org/10.1038/s41746-025-01509-1","url":null,"abstract":"<p>Surgical planning can be highly complicated and personalized, where a surgeon needs to balance multiple decisional dimensions including surgical effectiveness, risk, cost, and patient’s conditions and preferences. Turning to artificial intelligence is a great appeal. This study filled in this gap with Multi-Dimensional Recommendation (MUDI), an interpretable data-driven intelligent system that supported personalized surgical recommendations on both the patient’s and the surgeon’s side with joint consideration of multiple decisional dimensions. Applied to Pelvic Organ Prolapse, a common female disease with significant impacts on life quality, MUDI stood out from a crowd of competing methods and achieved excellent performance that was comparable to top urogynecologists, with a transparent process that made communications between surgeons and patients easier. Users showed a willingness to accept the recommendations and achieved higher accuracy with the aid of MUDI. Such a success indicated that MUDI had the potential to solve similar challenges in other situations.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"69 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143653388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Credibility assessment of a mechanistic model of atherosclerosis to predict cardiovascular outcomes under lipid-lowering therapy
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-19 DOI: 10.1038/s41746-025-01557-7
Yishu Wang, Eulalie Courcelles, Emmanuel Peyronnet, Solène Porte, Alizée Diatchenko, Evgueni Jacob, Denis Angoulvant, Pierre Amarenco, Franck Boccara, Bertrand Cariou, Guillaume Mahé, Philippe Gabriel Steg, Alexandre Bastien, Lolita Portal, Jean-Pierre Boissel, Solène Granjeon-Noriot, Emmanuelle Bechet

Demonstrating cardiovascular (CV) benefits with lipid-lowering therapy (LLT) requires long-term randomized clinical trials (RCTs) with thousands of patients. Innovative approaches such as in silico trials applying a disease computational model to virtual patients receiving multiple treatments offer a complementary approach to rapidly generate comparative effectiveness data. A mechanistic computational model of atherosclerotic cardiovascular disease (ASCVD) was built from knowledge, describing lipoprotein homeostasis, LLT effects, and the progression of atherosclerotic plaques leading to myocardial infarction, ischemic stroke, major acute limb event and CV death. The ASCVD model was successfully calibrated and validated, and reproduced LLT effects observed in selected RCTs (ORION-10 and FOURIER for calibration; ORION-11, ODYSSEY-OUTCOMES and FOURIER-OLE for validation) on lipoproteins and ASCVD event incidence at both population and subgroup levels. This enables the future use of the model to conduct the SIRIUS programme, which intends to predict CV event reduction with inclisiran, an siRNA targeting hepatic PCSK9 mRNA.

{"title":"Credibility assessment of a mechanistic model of atherosclerosis to predict cardiovascular outcomes under lipid-lowering therapy","authors":"Yishu Wang, Eulalie Courcelles, Emmanuel Peyronnet, Solène Porte, Alizée Diatchenko, Evgueni Jacob, Denis Angoulvant, Pierre Amarenco, Franck Boccara, Bertrand Cariou, Guillaume Mahé, Philippe Gabriel Steg, Alexandre Bastien, Lolita Portal, Jean-Pierre Boissel, Solène Granjeon-Noriot, Emmanuelle Bechet","doi":"10.1038/s41746-025-01557-7","DOIUrl":"https://doi.org/10.1038/s41746-025-01557-7","url":null,"abstract":"<p>Demonstrating cardiovascular (CV) benefits with lipid-lowering therapy (LLT) requires long-term randomized clinical trials (RCTs) with thousands of patients. Innovative approaches such as in silico trials applying a disease computational model to virtual patients receiving multiple treatments offer a complementary approach to rapidly generate comparative effectiveness data. A mechanistic computational model of atherosclerotic cardiovascular disease (ASCVD) was built from knowledge, describing lipoprotein homeostasis, LLT effects, and the progression of atherosclerotic plaques leading to myocardial infarction, ischemic stroke, major acute limb event and CV death. The ASCVD model was successfully calibrated and validated, and reproduced LLT effects observed in selected RCTs (ORION-10 and FOURIER for calibration; ORION-11, ODYSSEY-OUTCOMES and FOURIER-OLE for validation) on lipoproteins and ASCVD event incidence at both population and subgroup levels. This enables the future use of the model to conduct the SIRIUS programme, which intends to predict CV event reduction with inclisiran, an siRNA targeting hepatic PCSK9 mRNA.</p><figure></figure>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"34 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross sectional pilot study on clinical review generation using large language models
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-19 DOI: 10.1038/s41746-025-01535-z
Zining Luo, Yang Qiao, Xinyu Xu, Xiangyu Li, Mengyan Xiao, Aijia Kang, Dunrui Wang, Yueshan Pang, Xing Xie, Sijun Xie, Dachen Luo, Xuefeng Ding, Zhenglong Liu, Ying Liu, Aimin Hu, Yixing Ren, Jiebin Xie

As the volume of medical literature accelerates, necessitating efficient tools to synthesize evidence for clinical practice and research, the interest in leveraging large language models (LLMs) for generating clinical reviews has surged. However, there are significant concerns regarding the reliability associated with integrating LLMs into the clinical review process. This study presents a systematic comparison between LLM-generated and human-authored clinical reviews, revealing that while AI can quickly produce reviews, it often has fewer references, less comprehensive insights, and lower logical consistency while exhibiting lower authenticity and accuracy in their citations. Additionally, a higher proportion of its references are from lower-tier journals. Moreover, the study uncovers a concerning inefficiency in current detection systems for identifying AI-generated content, suggesting a need for more advanced checking systems and a stronger ethical framework to ensure academic transparency. Addressing these challenges is vital for the responsible integration of LLMs into clinical research.

{"title":"Cross sectional pilot study on clinical review generation using large language models","authors":"Zining Luo, Yang Qiao, Xinyu Xu, Xiangyu Li, Mengyan Xiao, Aijia Kang, Dunrui Wang, Yueshan Pang, Xing Xie, Sijun Xie, Dachen Luo, Xuefeng Ding, Zhenglong Liu, Ying Liu, Aimin Hu, Yixing Ren, Jiebin Xie","doi":"10.1038/s41746-025-01535-z","DOIUrl":"https://doi.org/10.1038/s41746-025-01535-z","url":null,"abstract":"<p>As the volume of medical literature accelerates, necessitating efficient tools to synthesize evidence for clinical practice and research, the interest in leveraging large language models (LLMs) for generating clinical reviews has surged. However, there are significant concerns regarding the reliability associated with integrating LLMs into the clinical review process. This study presents a systematic comparison between LLM-generated and human-authored clinical reviews, revealing that while AI can quickly produce reviews, it often has fewer references, less comprehensive insights, and lower logical consistency while exhibiting lower authenticity and accuracy in their citations. Additionally, a higher proportion of its references are from lower-tier journals. Moreover, the study uncovers a concerning inefficiency in current detection systems for identifying AI-generated content, suggesting a need for more advanced checking systems and a stronger ethical framework to ensure academic transparency. Addressing these challenges is vital for the responsible integration of LLMs into clinical research.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"183 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wearable data reveals distinct characteristics of individuals with persistent symptoms after a SARS-CoV-2 infection
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-19 DOI: 10.1038/s41746-025-01456-x
Katharina Ledebur, Marc Wiedermann, Christian Puta, Stefan Thurner, Peter Klimek, Dirk Brockmann

Understanding the factors associated with persistent symptoms after SARS-CoV-2 infection is critical to improving long-term health outcomes. Using a wearable-derived behavioral and physiological dataset (n = 20,815), we identified individuals characterized by self-reported persistent fatigue and shortness of breath after SARS-CoV-2 infection. Compared with symptom-free COVID-19 positive (n = 150) and negative controls (n = 150), these individuals (n = 50) had higher resting heart rates (mean difference 2.37/1.49 bpm) and lower daily step counts (mean 3030/2909 steps fewer), even at least three weeks prior to SARS-CoV-2 infection. In addition, persistent fatigue and shortness of breath were associated with a significant reduction in mean quality of life (WHO-5, EQ-5D), even before infection. Here we show that persistent symptoms after SARS-CoV-2 infection may be associated with pre-existing lower fitness levels or health conditions. These findings additionally highlight the potential of wearable devices to track health dynamics and provide valuable insights into long-term outcomes of infectious diseases.

{"title":"Wearable data reveals distinct characteristics of individuals with persistent symptoms after a SARS-CoV-2 infection","authors":"Katharina Ledebur, Marc Wiedermann, Christian Puta, Stefan Thurner, Peter Klimek, Dirk Brockmann","doi":"10.1038/s41746-025-01456-x","DOIUrl":"https://doi.org/10.1038/s41746-025-01456-x","url":null,"abstract":"<p>Understanding the factors associated with persistent symptoms after SARS-CoV-2 infection is critical to improving long-term health outcomes. Using a wearable-derived behavioral and physiological dataset (<i>n</i> = 20,815), we identified individuals characterized by self-reported persistent fatigue and shortness of breath after SARS-CoV-2 infection. Compared with symptom-free COVID-19 positive (n = 150) and negative controls (<i>n</i> = 150), these individuals (<i>n</i> = 50) had higher resting heart rates (mean difference 2.37/1.49 bpm) and lower daily step counts (mean 3030/2909 steps fewer), even at least three weeks <i>prior</i> to SARS-CoV-2 infection. In addition, persistent fatigue and shortness of breath were associated with a significant reduction in mean quality of life (WHO-5, EQ-5D), even <i>before</i> infection. Here we show that persistent symptoms after SARS-CoV-2 infection may be associated with pre-existing lower fitness levels or health conditions. These findings additionally highlight the potential of wearable devices to track health dynamics and provide valuable insights into long-term outcomes of infectious diseases.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"90 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143653387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examining human-AI interaction in real-world healthcare beyond the laboratory
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-19 DOI: 10.1038/s41746-025-01559-5
Magdalena Katharina Wekenborg, Stephen Gilbert, Jakob Nikolas Kather

Artificial Intelligence (AI) is revolutionizing healthcare, but its true impact depends on seamless human interaction. While most research focuses on technical metrics, we lack frameworks to measure the compatibility or synergy of real-world human-AI interactions in healthcare settings. We propose a multimodal toolkit combining ecological momentary assessment, quantitative observations, and baseline measurements to optimize AI implementation.

{"title":"Examining human-AI interaction in real-world healthcare beyond the laboratory","authors":"Magdalena Katharina Wekenborg, Stephen Gilbert, Jakob Nikolas Kather","doi":"10.1038/s41746-025-01559-5","DOIUrl":"https://doi.org/10.1038/s41746-025-01559-5","url":null,"abstract":"<p>Artificial Intelligence (AI) is revolutionizing healthcare, but its true impact depends on seamless human interaction. While most research focuses on technical metrics, we lack frameworks to measure the compatibility or synergy of real-world human-AI interactions in healthcare settings. We propose a multimodal toolkit combining ecological momentary assessment, quantitative observations, and baseline measurements to optimize AI implementation.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"183 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143653391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Preliminary analysis of the impact of lab results on large language model generated differential diagnoses
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-18 DOI: 10.1038/s41746-025-01556-8
Balu Bhasuran, Qiao Jin, Yuzhang Xie, Carl Yang, Karim Hanna, Jennifer Costa, Cindy Shavor, Wenshan Han, Zhiyong Lu, Zhe He

Differential diagnosis (DDx) is crucial for medicine as it helps healthcare providers systematically distinguish between conditions that share similar symptoms. This study evaluates the influence of lab test results on DDx accuracy generated by large language models (LLMs). Clinical vignettes from 50 randomly selected case reports from PMC-Patients were created, incorporating demographics, symptoms, and lab data. Five LLMs—GPT-4, GPT-3.5, Llama-2-70b, Claude-2, and Mixtral-8x7B—were tested to generate Top 10, Top 5, and Top 1 DDx with and without lab data. Results show that incorporating lab data enhances accuracy by up to 30% across models. GPT-4 achieved the highest performance, with Top 1 accuracy of 55% (0.41–0.69) and lenient accuracy reaching 79% (0.68–0.90). Statistically significant improvements (Holm-adjusted p values < 0.05) were observed, with GPT-4 and Mixtral excelling. Lab tests, including liver function, metabolic/toxicology panels, and serology, were generally interpreted correctly by LLMs for DDx.

{"title":"Preliminary analysis of the impact of lab results on large language model generated differential diagnoses","authors":"Balu Bhasuran, Qiao Jin, Yuzhang Xie, Carl Yang, Karim Hanna, Jennifer Costa, Cindy Shavor, Wenshan Han, Zhiyong Lu, Zhe He","doi":"10.1038/s41746-025-01556-8","DOIUrl":"https://doi.org/10.1038/s41746-025-01556-8","url":null,"abstract":"<p>Differential diagnosis (DDx) is crucial for medicine as it helps healthcare providers systematically distinguish between conditions that share similar symptoms. This study evaluates the influence of lab test results on DDx accuracy generated by large language models (LLMs). Clinical vignettes from 50 randomly selected case reports from PMC-Patients were created, incorporating demographics, symptoms, and lab data. Five LLMs—GPT-4, GPT-3.5, Llama-2-70b, Claude-2, and Mixtral-8x7B—were tested to generate Top 10, Top 5, and Top 1 DDx with and without lab data. Results show that incorporating lab data enhances accuracy by up to 30% across models. GPT-4 achieved the highest performance, with Top 1 accuracy of 55% (0.41–0.69) and lenient accuracy reaching 79% (0.68–0.90). Statistically significant improvements (Holm-adjusted <i>p</i> values &lt; 0.05) were observed, with GPT-4 and Mixtral excelling. Lab tests, including liver function, metabolic/toxicology panels, and serology, were generally interpreted correctly by LLMs for DDx.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"55 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143641064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consternation as Congress proposal for autonomous prescribing AI coincides with the haphazard cuts at the FDA
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-18 DOI: 10.1038/s41746-025-01540-2
Stephen Gilbert, Tinglong Dai, Rebecca Mathias
We live in interesting regulatory times. In January, a bill was introduced to the US Congress proposing that AI “can qualify as a practitioner eligible to prescribe drugs” if overseen by the States and FDA. This a bold and contentious move. Even proponents of AI’s swift integration into medicine must recognize the deep paradox: this proposal emerges even as the FDA’s world-leading infrastructure for AI oversight faces dismantling.
{"title":"Consternation as Congress proposal for autonomous prescribing AI coincides with the haphazard cuts at the FDA","authors":"Stephen Gilbert, Tinglong Dai, Rebecca Mathias","doi":"10.1038/s41746-025-01540-2","DOIUrl":"https://doi.org/10.1038/s41746-025-01540-2","url":null,"abstract":"We live in interesting regulatory times. In January, a bill was introduced to the US Congress proposing that AI “can qualify as a practitioner eligible to prescribe drugs” if overseen by the States and FDA. This a bold and contentious move. Even proponents of AI’s swift integration into medicine must recognize the deep paradox: this proposal emerges even as the FDA’s world-leading infrastructure for AI oversight faces dismantling.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"20 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143641090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language model agents can use tools to perform clinical calculations
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-17 DOI: 10.1038/s41746-025-01475-8
Alex J. Goodell, Simon N. Chu, Dara Rouholiman, Larry F. Chu

Large language models (LLMs) can answer expert-level questions in medicine but are prone to hallucinations and arithmetic errors. Early evidence suggests LLMs cannot reliably perform clinical calculations, limiting their potential integration into clinical workflows. We evaluated ChatGPT’s performance across 48 medical calculation tasks, finding incorrect responses in one-third of trials (n = 212). We then assessed three forms of agentic augmentation: retrieval-augmented generation, a code interpreter tool, and a set of task-specific calculation tools (OpenMedCalc) across 10,000 trials. Models with access to task-specific tools showed the greatest improvement, with LLaMa and GPT-based models demonstrating a 5.5-fold (88% vs 16%) and 13-fold (64% vs 4.8%) reduction in incorrect responses, respectively, compared to the unimproved models. Our findings suggest that integration of machine-readable, task-specific tools may help overcome LLMs’ limitations in medical calculations.

大型语言模型(LLM)可以回答医学领域的专家级问题,但容易产生幻觉和计算错误。早期证据表明,大型语言模型无法可靠地进行临床计算,这限制了它们融入临床工作流程的可能性。我们评估了 ChatGPT 在 48 项医学计算任务中的表现,发现三分之一的试验(n = 212)存在错误回答。然后,我们评估了三种形式的代理增强:检索增强生成、代码解释器工具和一套针对特定任务的计算工具(OpenMedCalc)。使用特定任务工具的模型显示出最大的改进,与未改进的模型相比,基于 LLaMa 和 GPT 的模型的错误回答分别减少了 5.5 倍(88% vs 16%)和 13 倍(64% vs 4.8%)。我们的研究结果表明,整合机器可读的特定任务工具可能有助于克服 LLM 在医学计算中的局限性。
{"title":"Large language model agents can use tools to perform clinical calculations","authors":"Alex J. Goodell, Simon N. Chu, Dara Rouholiman, Larry F. Chu","doi":"10.1038/s41746-025-01475-8","DOIUrl":"https://doi.org/10.1038/s41746-025-01475-8","url":null,"abstract":"<p>Large language models (LLMs) can answer expert-level questions in medicine but are prone to hallucinations and arithmetic errors. Early evidence suggests LLMs cannot reliably perform clinical calculations, limiting their potential integration into clinical workflows. We evaluated ChatGPT’s performance across 48 medical calculation tasks, finding incorrect responses in one-third of trials (<i>n</i> = 212). We then assessed three forms of agentic augmentation: retrieval-augmented generation, a code interpreter tool, and a set of task-specific calculation tools (OpenMedCalc) across 10,000 trials. Models with access to task-specific tools showed the greatest improvement, with LLaMa and GPT-based models demonstrating a 5.5-fold (88% vs 16%) and 13-fold (64% vs 4.8%) reduction in incorrect responses, respectively, compared to the unimproved models. Our findings suggest that integration of machine-readable, task-specific tools may help overcome LLMs’ limitations in medical calculations.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"18 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retinal fundus imaging as biomarker for ADHD using machine learning for screening and visual attention stratification 将视网膜眼底成像作为多动症的生物标记,利用机器学习进行筛查和视觉注意力分层
IF 15.2 1区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-03-17 DOI: 10.1038/s41746-025-01547-9
Hangnyoung Choi, JaeSeong Hong, Hyun Goo Kang, Min-Hyeon Park, Sungji Ha, Junghan Lee, Sangchul Yoon, Daeseong Kim, Yu Rang Park, Keun-Ah Cheon

Attention-deficit/hyperactivity disorder (ADHD), characterized by diagnostic complexity and symptom heterogeneity, is a prevalent neurodevelopmental disorder. Here, we explored the machine learning (ML) analysis of retinal fundus photographs as a noninvasive biomarker for ADHD screening and stratification of executive function (EF) deficits. From April to October 2022, 323 children and adolescents with ADHD were recruited from two tertiary South Korean hospitals, and the age- and sex-matched individuals with typical development were retrospectively collected. We used the AutoMorph pipeline to extract retinal features and used four types of ML models for ADHD screening and EF subdomain prediction, and we adopted the Shapely additive explanation method. ADHD screening models achieved 95.5%-96.9% AUROC. For EF function stratification, the visual and auditory subdomains showed strong (AUROC > 85%) and poor performances, respectively. Our analysis of retinal fundus photographs demonstrated potential as a noninvasive biomarker for ADHD screening and EF deficit stratification in the visual attention domain.

注意力缺陷/多动障碍(ADHD)具有诊断复杂性和症状异质性的特点,是一种普遍存在的神经发育障碍。在此,我们探讨了将视网膜眼底照片的机器学习(ML)分析作为一种非侵入性生物标志物,用于多动症筛查和执行功能(EF)缺陷的分层。2022年4月至10月,我们从韩国两家三甲医院招募了323名患有多动症的儿童和青少年,并回顾性地收集了与之年龄和性别匹配的典型发育个体。我们使用AutoMorph管道提取视网膜特征,使用四种类型的ML模型进行ADHD筛查和EF子域预测,并采用Shapely加法解释法。ADHD筛查模型的AUROC达到了95.5%-96.9%。在 EF 功能分层方面,视觉和听觉子域的表现分别较好(AUROC > 85%)和较差。我们对视网膜眼底照片的分析表明,视网膜眼底照片可作为一种无创生物标记物,用于多动症筛查和视觉注意力领域的EF缺陷分层。
{"title":"Retinal fundus imaging as biomarker for ADHD using machine learning for screening and visual attention stratification","authors":"Hangnyoung Choi, JaeSeong Hong, Hyun Goo Kang, Min-Hyeon Park, Sungji Ha, Junghan Lee, Sangchul Yoon, Daeseong Kim, Yu Rang Park, Keun-Ah Cheon","doi":"10.1038/s41746-025-01547-9","DOIUrl":"https://doi.org/10.1038/s41746-025-01547-9","url":null,"abstract":"<p>Attention-deficit/hyperactivity disorder (ADHD), characterized by diagnostic complexity and symptom heterogeneity, is a prevalent neurodevelopmental disorder. Here, we explored the machine learning (ML) analysis of retinal fundus photographs as a noninvasive biomarker for ADHD screening and stratification of executive function (EF) deficits. From April to October 2022, 323 children and adolescents with ADHD were recruited from two tertiary South Korean hospitals, and the age- and sex-matched individuals with typical development were retrospectively collected. We used the AutoMorph pipeline to extract retinal features and used four types of ML models for ADHD screening and EF subdomain prediction, and we adopted the Shapely additive explanation method. ADHD screening models achieved 95.5%-96.9% AUROC. For EF function stratification, the visual and auditory subdomains showed strong (AUROC &gt; 85%) and poor performances, respectively. Our analysis of retinal fundus photographs demonstrated potential as a noninvasive biomarker for ADHD screening and EF deficit stratification in the visual attention domain.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"70 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
NPJ Digital Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1