Frontiers in digital health最新文献

Comparative performance of ChatGPT-5 and DeepSeek on the Chinese ultrasound medicine senior professional title examination. ChatGPT-5与DeepSeek在中超医学高级职称考试中的比较表现。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-09 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1783347

Dao-Rong Hong, Chun-Yan Huang, Jiu Gao

Background: Large language models (LLMs) have shown growing potential for medical education and assessment, but evidence on their performance in specialty certification exams in China-particularly in ultrasound medicine-remains limited.

Objective: To compare the performance of ChatGPT-5 and DeepSeek on the Chinese Ultrasound Medicine Senior Professional Title Examination, overall and by item type.

Methods: Between August and September 2025, we randomly selected 100 multiple-choice questions from the official Chinese Ultrasound Medicine Senior Professional Title Examination bank (60 image-based interpretation items and 40 text-based items). We evaluated ChatGPT-5 and DeepSeek using identical prompts through their public web interfaces. The primary outcome was overall accuracy; secondary outcomes were accuracy by item type and subspecialty. Between-model differences were assessed using two-proportion z-tests (α = 0.05) in Python 3.12.

Results: Overall accuracy was higher for ChatGPT-5 than for DeepSeek [74.0% (74/100) vs. 60.0% (60/100); p = 0.035]. Accuracy on image-based items was also higher for ChatGPT-5 (61.7% vs. 40.0%; p = 0.018). Performance on text-based items was similar for both models (92.5% vs. 90.0%). Subspecialty patterns varied across domains; however, no between-model differences reached statistical significance.

Conclusions: ChatGPT-5 outperformed DeepSeek on image-based items (61.7% vs. 40.0%), while both models performed similarly on text-based knowledge items (92.5% vs. 90.0%). Overall, both LLMs showed strong performance on Chinese ultrasound senior-title examination questions, with complementary strengths across content areas. They may be useful as supplementary educational tools, but further advances in multimodal reasoning are needed to support more reliable image interpretation.

背景：大型语言模型（llm）在医学教育和评估方面显示出越来越大的潜力，但它们在中国的专业认证考试中的表现证据仍然有限，特别是在超声医学方面。目的：比较ChatGPT-5与DeepSeek在中超医学高级职称考试中的整体及分项表现。方法：于2025年8月至9月，从官方中医超声医学高级职称考试题库中随机抽取100道选择题（图像解译题60道，文字解译题40道）。我们通过ChatGPT-5和DeepSeek的公共网络界面使用相同的提示来评估它们。主要结局是总体准确性；次要结果是项目类型和亚专业的准确性。在Python 3.12中采用双比例z检验（α = 0.05）评估模型间差异。结果：ChatGPT-5的总体准确率高于DeepSeek [74.0% (74/100) vs. 60.0% (60/100)]；p = 0.035]。ChatGPT-5在基于图像的项目上的准确率也更高（61.7% vs. 40.0%; p = 0.018）。两种模型在基于文本的项目上的表现相似（92.5%对90.0%）。亚专业模式在不同领域有所不同；但模型间差异无统计学意义。结论：ChatGPT-5在基于图像的项目上的表现优于DeepSeek(61.7%对40.0%)，而两个模型在基于文本的知识项目上的表现相似（92.5%对90.0%）。总体而言，两位法学硕士在中国超声高级职称考试问题上表现出色，在内容领域上具有互补优势。它们可能是有用的辅助教育工具，但需要在多模态推理方面取得进一步进展，以支持更可靠的图像解释。

{"title":"Comparative performance of ChatGPT-5 and DeepSeek on the Chinese ultrasound medicine senior professional title examination.","authors":"Dao-Rong Hong, Chun-Yan Huang, Jiu Gao","doi":"10.3389/fdgth.2026.1783347","DOIUrl":"https://doi.org/10.3389/fdgth.2026.1783347","url":null,"abstract":"Background: Large language models (LLMs) have shown growing potential for medical education and assessment, but evidence on their performance in specialty certification exams in China-particularly in ultrasound medicine-remains limited.Objective: To compare the performance of ChatGPT-5 and DeepSeek on the Chinese Ultrasound Medicine Senior Professional Title Examination, overall and by item type.Methods: Between August and September 2025, we randomly selected 100 multiple-choice questions from the official Chinese Ultrasound Medicine Senior Professional Title Examination bank (60 image-based interpretation items and 40 text-based items). We evaluated ChatGPT-5 and DeepSeek using identical prompts through their public web interfaces. The primary outcome was overall accuracy; secondary outcomes were accuracy by item type and subspecialty. Between-model differences were assessed using two-proportion z-tests (α = 0.05) in Python 3.12.Results: Overall accuracy was higher for ChatGPT-5 than for DeepSeek [74.0% (74/100) vs. 60.0% (60/100); p = 0.035]. Accuracy on image-based items was also higher for ChatGPT-5 (61.7% vs. 40.0%; p = 0.018). Performance on text-based items was similar for both models (92.5% vs. 90.0%). Subspecialty patterns varied across domains; however, no between-model differences reached statistical significance.Conclusions: ChatGPT-5 outperformed DeepSeek on image-based items (61.7% vs. 40.0%), while both models performed similarly on text-based knowledge items (92.5% vs. 90.0%). Overall, both LLMs showed strong performance on Chinese ultrasound senior-title examination questions, with complementary strengths across content areas. They may be useful as supplementary educational tools, but further advances in multimodal reasoning are needed to support more reliable image interpretation.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"8 ","pages":"1783347"},"PeriodicalIF":3.2,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12968994/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A professional training simulator for skill acquisition in ultrasound-guided lumbar facet syndrome intervention: design and educational evaluation. 超声引导腰椎关节突综合征干预中技能习得的专业训练模拟器：设计与教育评价。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-06 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1761690

Belén Curto, Vidal Moreno, Juan-Alberto García-Esteban, David Sanchez-Poveda, Pablo Alonso, Felipe Zaballos

Introduction: Ultrasonography (US) plays a central role in modern diagnostic and interventional medicine, particularly in the management of facet-origin chronic low back pain, a highly prevalent condition in industrialized societies. However, its clinical effectiveness depends largely on the level of specialist training, requiring advanced skills in probe manipulation, sonoanatomy interpretation, brain-hand-eye coordination, and safe planning of interventional procedures. This work presents the development of a training simulator for ultrasound-guided treatment of lumbar facet syndrome; the simulator is implemented within a modular learning framework designed to support the flexible and efficient creation of procedure-specific simulators.

Methods: The developed simulator integrates a physical replica of an ultrasound probe, enabling trainees to practice realistic handling. Probe movements performed by the trainee along the scan path are continuously tracked and mapped to corresponding ultrasound images and videos, previously acquired by clinical experts from a real subject and displayed in real time on a computer screen. For interventional planning, a virtual syringe-and-needle component allows trainees to simulate needle orientation and insertion depth, with relevant anatomical structures highlighted as visual learning aids.

Results: A validation study was conducted involving 18 final-year medical students using an ad hoc questionnaire addressing usability, realism, learning support, and overall training experience. The results demonstrate a high level of student acceptance and a positive perceived impact on the acquisition of skills related to ultrasound-guided exploration and interventional planning. Most students reported accelerated skill acquisition in US examination (89% very satisfied, 11% satisfied) and high motivation (83% very satisfied, 17% satisfied). Overall performance and the likelihood of recommending the simulator received the highest rating from all participants (100%).

Discussion: From the perspective of students, the simulator provides a realistic and supportive learning experience, particularly due to the realism of the physical probe replica, the quality of the graphical user interface, and the guided learning process. From the perspective of instructors, the effectiveness of the simulator depends on the quality of the learning resources and the scope of the training cases. Although the preparation and curation of high-quality ultrasound datasets and annotations remains time-consuming, the framework significantly facilitates and adds flexibility to the development of new case studies. This positions the approach as a valuable complementary training resource, helping to bridge the gap between theoretical instruction and supervised clinical practice in ultrasound-guided procedures.

超声检查（US）在现代诊断和介入医学中起着核心作用，特别是在管理关节源性慢性腰痛方面，这是工业化社会中非常普遍的疾病。然而，其临床效果在很大程度上取决于专家培训的水平，需要在探针操作、超声解剖解释、脑手眼协调和介入手术的安全规划方面的高级技能。这项工作提出了超声引导下腰椎关节突综合征治疗的训练模拟器的发展；该模拟器在模块化学习框架中实现，旨在支持灵活有效地创建特定于程序的模拟器。方法：开发的模拟器集成了超声探头的物理复制品，使受训者能够练习真实的操作。受训者沿着扫描路径执行的探针运动被连续跟踪并映射到相应的超声图像和视频，这些图像和视频先前由临床专家从真实受试者处获取并实时显示在计算机屏幕上。对于介入计划，虚拟注射器和针头组件允许学员模拟针头的方向和插入深度，并突出显示相关解剖结构作为视觉学习辅助工具。结果：对18名医学生进行了一项验证性研究，采用了一份关于可用性、现实性、学习支持和整体训练体验的临时问卷。结果表明，学生的接受程度高，并对超声引导勘探和介入计划相关技能的获得产生了积极的感知影响。大多数学生表示在美国考试中加速了技能习得（89%非常满意，11%满意）和高动机（83%非常满意，17%满意）。整体表现和推荐模拟器的可能性获得了所有参与者的最高评级（100%）。讨论：从学生的角度来看，模拟器提供了一个真实的和支持性的学习体验，特别是由于物理探针复制品的真实性，图形用户界面的质量，以及引导的学习过程。从教师的角度来看，模拟器的有效性取决于学习资源的质量和培训案例的范围。尽管高质量超声数据集和注释的准备和管理仍然耗时，但该框架显着促进了新案例研究的发展，并增加了灵活性。这使得该方法成为一种有价值的补充培训资源，有助于在超声引导手术的理论指导和监督临床实践之间架起桥梁。

{"title":"A professional training simulator for skill acquisition in ultrasound-guided lumbar facet syndrome intervention: design and educational evaluation.","authors":"Belén Curto, Vidal Moreno, Juan-Alberto García-Esteban, David Sanchez-Poveda, Pablo Alonso, Felipe Zaballos","doi":"10.3389/fdgth.2026.1761690","DOIUrl":"https://doi.org/10.3389/fdgth.2026.1761690","url":null,"abstract":"Introduction: Ultrasonography (US) plays a central role in modern diagnostic and interventional medicine, particularly in the management of facet-origin chronic low back pain, a highly prevalent condition in industrialized societies. However, its clinical effectiveness depends largely on the level of specialist training, requiring advanced skills in probe manipulation, sonoanatomy interpretation, brain-hand-eye coordination, and safe planning of interventional procedures. This work presents the development of a training simulator for ultrasound-guided treatment of lumbar facet syndrome; the simulator is implemented within a modular learning framework designed to support the flexible and efficient creation of procedure-specific simulators.Methods: The developed simulator integrates a physical replica of an ultrasound probe, enabling trainees to practice realistic handling. Probe movements performed by the trainee along the scan path are continuously tracked and mapped to corresponding ultrasound images and videos, previously acquired by clinical experts from a real subject and displayed in real time on a computer screen. For interventional planning, a virtual syringe-and-needle component allows trainees to simulate needle orientation and insertion depth, with relevant anatomical structures highlighted as visual learning aids.Results: A validation study was conducted involving 18 final-year medical students using an ad hoc questionnaire addressing usability, realism, learning support, and overall training experience. The results demonstrate a high level of student acceptance and a positive perceived impact on the acquisition of skills related to ultrasound-guided exploration and interventional planning. Most students reported accelerated skill acquisition in US examination (89% very satisfied, 11% satisfied) and high motivation (83% very satisfied, 17% satisfied). Overall performance and the likelihood of recommending the simulator received the highest rating from all participants (100%).Discussion: From the perspective of students, the simulator provides a realistic and supportive learning experience, particularly due to the realism of the physical probe replica, the quality of the graphical user interface, and the guided learning process. From the perspective of instructors, the effectiveness of the simulator depends on the quality of the learning resources and the scope of the training cases. Although the preparation and curation of high-quality ultrasound datasets and annotations remains time-consuming, the framework significantly facilitates and adds flexibility to the development of new case studies. This positions the approach as a valuable complementary training resource, helping to bridge the gap between theoretical instruction and supervised clinical practice in ultrasound-guided procedures.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"8 ","pages":"1761690"},"PeriodicalIF":3.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13002623/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ChatGPT-4o with faculty guidance outperforms AI-only and traditional learning in ultrasonography training: a randomized trial. 一项随机试验：在教师指导下的chatgpt - 40在超声检查培训中优于人工智能和传统学习。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-06 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1772965

Dao-Rong Hong, Chun-Yan Huang, Jiu Gao

Background: Ultrasonography training for residents is challenging owing to its operator-dependent nature and difficulties in mastering subtle image interpretation. Multimodal large language models like ChatGPT-4o enable efficient knowledge retrieval but show marked limitations in static ultrasonography image analysis.

Methods: In this prospective, single-centre randomized controlled trial, 45 first-year ultrasonography residents were randomly allocated to control (traditional resources), AI-only (ChatGPT-4o exclusively), or blended (ChatGPT-4o plus weekly faculty tutorials) groups. After a 3-week intervention, performance was assessed using a 150-item examination (pure-text and image-based multiple-choice questions). The study was approved by the institutional ethics committee, and written informed consent was obtained.

Results: The blended group achieved the highest scores (mean 128.40 ± 18.25) vs. AI-only (119.87 ± 19.11) and control (110.60 ± 20.45; P = 0.02), with superior pure-text performance (P = 0.03) and significant advantages in obstetrics/gynaecology (P = 0.04) and superficial organ ultrasonography (P = 0.047). Examination time was shortest in the blended group (P = 0.03). ChatGPT-4o alone was 85% accurate on text but only 47% on image-based questions.

Conclusions: A faculty-guided AI-integrated strategy was associated with improved short-term post-intervention performance compared with AI-only or traditional learning; however, effects reflect the combined intervention and AI support for static ultrasound image interpretation remains limited.

背景：住院医师超声检查培训具有操作员依赖性和难以掌握细微图像解释的特点，具有挑战性。chatgpt - 40等多模态大语言模型能够实现高效的知识检索，但在静态超声图像分析中存在明显的局限性。方法：在这项前瞻性、单中心随机对照试验中，45名一年级超声检查住院医师被随机分配到对照组（传统资源）、纯人工智能组（仅chatgpt - 40）或混合组（chatgpt - 40加每周教师教程）。干预3周后，通过150项考试（纯文本和基于图像的多项选择题）评估表现。本研究经机构伦理委员会批准，并获得书面知情同意。结果：混合组得分最高（平均128.40±18.25分），高于单纯人工智能组（119.87±19.11分）和对照组（110.60±20.45分，P = 0.02），纯文本表现更优（P = 0.03），在妇产科（P = 0.04）和浅表器官超声检查（P = 0.047）方面有显著优势。混合组检查时间最短（P = 0.03）。chatgpt - 40在文本问题上的准确率为85%，但在图像问题上的准确率仅为47%。结论：与纯人工智能或传统学习相比，教师指导的人工智能整合策略与短期干预后表现的改善有关；然而，效果反映了联合干预和人工智能对静态超声图像解释的支持仍然有限。

{"title":"ChatGPT-4o with faculty guidance outperforms AI-only and traditional learning in ultrasonography training: a randomized trial.","authors":"Dao-Rong Hong, Chun-Yan Huang, Jiu Gao","doi":"10.3389/fdgth.2026.1772965","DOIUrl":"https://doi.org/10.3389/fdgth.2026.1772965","url":null,"abstract":"Background: Ultrasonography training for residents is challenging owing to its operator-dependent nature and difficulties in mastering subtle image interpretation. Multimodal large language models like ChatGPT-4o enable efficient knowledge retrieval but show marked limitations in static ultrasonography image analysis.Methods: In this prospective, single-centre randomized controlled trial, 45 first-year ultrasonography residents were randomly allocated to control (traditional resources), AI-only (ChatGPT-4o exclusively), or blended (ChatGPT-4o plus weekly faculty tutorials) groups. After a 3-week intervention, performance was assessed using a 150-item examination (pure-text and image-based multiple-choice questions). The study was approved by the institutional ethics committee, and written informed consent was obtained.Results: The blended group achieved the highest scores (mean 128.40 ± 18.25) vs. AI-only (119.87 ± 19.11) and control (110.60 ± 20.45; P = 0.02), with superior pure-text performance (P = 0.03) and significant advantages in obstetrics/gynaecology (P = 0.04) and superficial organ ultrasonography (P = 0.047). Examination time was shortest in the blended group (P = 0.03). ChatGPT-4o alone was 85% accurate on text but only 47% on image-based questions.Conclusions: A faculty-guided AI-integrated strategy was associated with improved short-term post-intervention performance compared with AI-only or traditional learning; however, effects reflect the combined intervention and AI support for static ultrasound image interpretation remains limited.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"8 ","pages":"1772965"},"PeriodicalIF":3.2,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12969794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147438231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating remote blood pressure monitoring into NHS primary care: a human factors perspective. 将远程血压监测纳入NHS初级保健：人为因素视角。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-05 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1697787

Massimo Micocci, Omar Butt, ShanShan Zhou, Austen El-Osta, Peter Buckle, George B Hanna

Background: Hypertension remains a major health burden in the UK, contributing significantly to cardiovascular disease and health inequalities. Although digital health technologies offer opportunities to enhance hypertension management, current NHS pathways face challenges, including inefficiencies in patient monitoring, limited patient engagement, and resource constraints. This study aimed to evaluate integration challenges of remote digital monitoring tools for blood pressure into NHS hypertension care pathways.

Methods: An exploratory study combining semi-structured interviews with 14 primary care NHS stakeholders recruited from across England, and a field study at two GP practices. Participants were selected to have either experience or not with digital platforms for remote monitoring of chronic conditions in primary care. Clinical pathway mapping and gap analysis were used to identify inefficiencies in hypertension management and explore digital platforms' potential integration.

Results: Eight major gaps were identified, including inconsistent patient engagement, lack of automated identification of at-risk and non-compliant patients, limited access to home monitors, and health inequalities related to digital literacy. Integration of a digital platform addressed several of these gaps by promoting self-monitoring behaviours, improving resource allocation through risk stratification, and enhancing decision-making with continuous patient data. However, barriers such as interoperability issues, workload concerns, literacy disparities, and unclear role responsibilities were noted.

Conclusion: Successful implementation requires addressing systemic challenges through targeted training, robust interoperability standards, clearer task allocation, and equity-focused interventions to bridge the digital divide. A human-centred, system-wide strategy is essential to ensure sustainable adoption and maximise the impact of digital innovations in primary care.

背景：高血压仍然是英国的主要健康负担，对心血管疾病和健康不平等有重要影响。尽管数字卫生技术为加强高血压管理提供了机会，但目前的NHS途径面临挑战，包括患者监测效率低下、患者参与有限以及资源限制。本研究旨在评估将远程数字血压监测工具整合到NHS高血压护理途径中的挑战。方法：一项探索性研究，结合从英格兰各地招募的14名初级保健NHS利益相关者的半结构化访谈，以及两个全科医生实践的实地研究。参与者被选择有或没有在初级保健中使用远程监测慢性病的数字平台的经验。临床路径映射和差距分析用于识别高血压管理的低效率，并探索数字平台的潜在整合。结果：确定了八个主要差距，包括患者参与不一致，缺乏对风险和不合规患者的自动识别，家庭监视器的使用有限，以及与数字素养相关的健康不平等。数字平台的整合通过促进自我监测行为、通过风险分层改善资源分配以及通过持续的患者数据加强决策，解决了其中的一些差距。然而，注意到诸如互操作性问题、工作量关注、文化差异和不明确的角色职责等障碍。结论：成功实施需要通过有针对性的培训、健全的互操作性标准、更明确的任务分配和以公平为重点的干预措施来解决系统性挑战，以弥合数字鸿沟。一项以人为本的全系统战略对于确保初级保健中可持续采用数字创新并最大限度地发挥其影响至关重要。

{"title":"Integrating remote blood pressure monitoring into NHS primary care: a human factors perspective.","authors":"Massimo Micocci, Omar Butt, ShanShan Zhou, Austen El-Osta, Peter Buckle, George B Hanna","doi":"10.3389/fdgth.2026.1697787","DOIUrl":"https://doi.org/10.3389/fdgth.2026.1697787","url":null,"abstract":"Background: Hypertension remains a major health burden in the UK, contributing significantly to cardiovascular disease and health inequalities. Although digital health technologies offer opportunities to enhance hypertension management, current NHS pathways face challenges, including inefficiencies in patient monitoring, limited patient engagement, and resource constraints. This study aimed to evaluate integration challenges of remote digital monitoring tools for blood pressure into NHS hypertension care pathways.Methods: An exploratory study combining semi-structured interviews with 14 primary care NHS stakeholders recruited from across England, and a field study at two GP practices. Participants were selected to have either experience or not with digital platforms for remote monitoring of chronic conditions in primary care. Clinical pathway mapping and gap analysis were used to identify inefficiencies in hypertension management and explore digital platforms' potential integration.Results: Eight major gaps were identified, including inconsistent patient engagement, lack of automated identification of at-risk and non-compliant patients, limited access to home monitors, and health inequalities related to digital literacy. Integration of a digital platform addressed several of these gaps by promoting self-monitoring behaviours, improving resource allocation through risk stratification, and enhancing decision-making with continuous patient data. However, barriers such as interoperability issues, workload concerns, literacy disparities, and unclear role responsibilities were noted.Conclusion: Successful implementation requires addressing systemic challenges through targeted training, robust interoperability standards, clearer task allocation, and equity-focused interventions to bridge the digital divide. A human-centred, system-wide strategy is essential to ensure sustainable adoption and maximise the impact of digital innovations in primary care.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"8 ","pages":"1697787"},"PeriodicalIF":3.2,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12999572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A cross-sectional analysis of AI readiness and attitudes among nurses in resource-limited Chinese county hospitals. 资源有限的中国县级医院护士人工智能准备程度和态度的横断面分析

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-05 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1778627

Ming Yu, Rong Yu, Mengjia Zhou, Xiaoli Fan, Ronghui Geng, Jing Ji, Suping Cai, Lili JIang, Lingling Jiang

Aim: To investigate the current situation of clinical nurses' attitudes towards artificial intelligence in county hospitals and analyze its influencing factors, so as to provide a reference for promoting the application of artificial intelligence technology in the field of primary medical care.

Design: A descriptive, cross-sectional study.

Methods: A total of 449 clinical nurses from a Chinese county-level B-level hospital in Nantong City were selected from August to September 2025 by convenience sampling, and the general information questionnaire, the Attitude Scale for the Application of Artificial Intelligence Technology in Nursing, the Artificial Intelligence Literacy Scale and the Change Fatigue Scale were used to investigate the influencing factors.

Results: The total score of clinical nurses' attitudes toward AI was 45.17 ± 2.38, indicating a moderate level. Multiple linear regression analysis identified age, participation in AI-related training, education level, number of monthly night shifts, change fatigue, and total AI literacy score as significant determinants of AI attitudes (all P < 0.05). Collectively, these factors accounted for 60.6% of the total variance in AI attitude scores.

Conclusion: The attitude of Chinese county-level clinical nurses towards AI is at a moderate level and is influenced by multiple modifiable factors. To enhance AI acceptance and facilitate its integration into primary care, we recommend implementing targeted AI training programs, improving AI literacy, optimizing scheduling to reduce night shift burdens, and proactively managing change fatigue.

目的：了解县级医院临床护士对人工智能的态度现状，并分析其影响因素，为推动人工智能技术在基层医疗领域的应用提供参考。设计：描述性横断面研究。方法：采用方便抽样的方法，于2025年8 - 9月选取南通市某县级b级医院临床护士449名，采用一般信息问卷、人工智能技术在护理中的应用态度量表、人工智能素养量表和变化疲劳量表对其影响因素进行调查。结果：临床护士对人工智能的态度总分为45.17±2.38分，处于中等水平。多元线性回归分析发现，年龄、参与人工智能相关培训、受教育程度、每月夜班次数、变化疲劳和人工智能素养总分是人工智能态度的重要决定因素（均为P）。结论：我国县级临床护士对人工智能的态度处于中等水平，受多个可调节因素的影响。为了提高人工智能的接受度并促进其融入初级保健，我们建议实施有针对性的人工智能培训计划，提高人工智能素养，优化调度以减少夜班负担，并积极管理变革疲劳。

{"title":"A cross-sectional analysis of AI readiness and attitudes among nurses in resource-limited Chinese county hospitals.","authors":"Ming Yu, Rong Yu, Mengjia Zhou, Xiaoli Fan, Ronghui Geng, Jing Ji, Suping Cai, Lili JIang, Lingling Jiang","doi":"10.3389/fdgth.2026.1778627","DOIUrl":"https://doi.org/10.3389/fdgth.2026.1778627","url":null,"abstract":"Aim: To investigate the current situation of clinical nurses' attitudes towards artificial intelligence in county hospitals and analyze its influencing factors, so as to provide a reference for promoting the application of artificial intelligence technology in the field of primary medical care.Design: A descriptive, cross-sectional study.Methods: A total of 449 clinical nurses from a Chinese county-level B-level hospital in Nantong City were selected from August to September 2025 by convenience sampling, and the general information questionnaire, the Attitude Scale for the Application of Artificial Intelligence Technology in Nursing, the Artificial Intelligence Literacy Scale and the Change Fatigue Scale were used to investigate the influencing factors.Results: The total score of clinical nurses' attitudes toward AI was 45.17 ± 2.38, indicating a moderate level. Multiple linear regression analysis identified age, participation in AI-related training, education level, number of monthly night shifts, change fatigue, and total AI literacy score as significant determinants of AI attitudes (all P < 0.05). Collectively, these factors accounted for 60.6% of the total variance in AI attitude scores.Conclusion: The attitude of Chinese county-level clinical nurses towards AI is at a moderate level and is influenced by multiple modifiable factors. To enhance AI acceptance and facilitate its integration into primary care, we recommend implementing targeted AI training programs, improving AI literacy, optimizing scheduling to reduce night shift burdens, and proactively managing change fatigue.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"8 ","pages":"1778627"},"PeriodicalIF":3.2,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12999563/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A digital twin framework for predicting and simulating type 2 diabetes onset using retrospective lifestyle data. 使用回顾性生活方式数据预测和模拟2型糖尿病发病的数字双胞胎框架。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-05 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1710829

Mahreen Kiran, Ying Xie, Graham Ball, Rudolph Schutte, Nasreen Anjum, Barbara Pierscionek

Introduction: Type 2 Diabetes Mellitus (T2DM) is a rising global health concern, heavily influenced by modifiable lifestyle and psychosocial factors. However, most predictive tools focus on biomedical markers and rely on real-time data from wearables or electronic health records, limiting their scalability in resource-constrained settings. This study presents a novel digital twin (DT) framework that uses retrospective lifestyle, behavioral, and psychosocial data to forecast T2DM onset and simulate the estimated effects of preventive interventions.Methods: Data were drawn from 19,774 participants in the UK Biobank cohort, followed for up to 17 years. A penalized Cox proportional hazards model was employed to estimate individual time-to-event risk trajectories based on 90 candidate predictors. Predictors were selected through univariate screening, multicollinearity assessment, and variance filtering, yielding a final model with 14 significant variables. Causal inference techniques, including directed acyclic graphs (DAGs) and counterfactual simulations, were used to explore intervention effects on disease progression.Results: The model demonstrated strong predictive performance (C-index <math><mo>=</mo></math> 0.90, SD <math><mo>=</mo></math> 0.004). Psychosocial stressors such as loneliness, insomnia, and poor mental health emerged as strong independent predictors and were associated with estimated increases in absolute T2DM risk of approximately 35 percentage points individually and nearly 78 percentage points when combined, under the modeled assumptions. These effects were partly reinforced through diet, with high intake of processed meat, salt, and sugary cereals acting as risk amplifiers within the modeled causal pathways. Cheese intake was protective overall, but its estimated benefit was attenuated under psychosocial stress, where reduced consumption produced a small, directionally harmful mediation effect. Counterfactual simulations suggested that improvements in psychosocial conditions could reduce estimated T2DM risk by approximately 11.6 percentage points within the modeled cohort, with protective dietary patterns such as cheese consumption re-emerging as psychosocial stress was alleviated. The model also revealed pronounced ethnic disparities, with South Asian, African, and Caribbean participants exhibiting significantly higher estimated risk than White counterparts within this cohort. These findings highlight the potential of integrated, stress-informed prevention strategies that address both psychosocial and dietary pathways.Conclusion: This study introduces a transparent, simulation-enabled DT framework for estimating T2DM risk and exploring behavioral intervention scenarios without reliance on real-time data streams. It enables interpretable, personalized prevention planning and supports exploration of scalable deployment in public health, pa

2型糖尿病（T2DM）是一个日益严重的全球健康问题，受可改变的生活方式和社会心理因素的严重影响。然而，大多数预测工具都侧重于生物医学标记，并依赖于可穿戴设备或电子健康记录的实时数据，这限制了它们在资源受限环境下的可扩展性。本研究提出了一种新的数字孪生（DT）框架，该框架使用回顾性的生活方式、行为和社会心理数据来预测2型糖尿病的发病，并模拟预防干预的估计效果。方法：数据来自英国生物银行队列的19774名参与者，随访长达17年。采用惩罚Cox比例风险模型，基于90个候选预测因子估计个体时间到事件的风险轨迹。预测因子通过单变量筛选、多重共线性评估和方差过滤来选择，最终得到一个包含14个显著变量的模型。因果推理技术，包括有向无环图（dag）和反事实模拟，用于探索干预对疾病进展的影响。结果：该模型具有较强的预测能力（C-index = 0.90, SD = 0.004）。在模型假设下，孤独、失眠和心理健康状况不佳等社会心理压力因素是强有力的独立预测因素，与T2DM绝对风险的估计增加有关，单独增加约35个百分点，综合增加近78个百分点。这些影响在一定程度上通过饮食得到加强，大量摄入加工肉类、盐和含糖谷物在模拟的因果途径中起到了风险放大的作用。总的来说，奶酪的摄入是有保护作用的，但在心理社会压力下，它的估计益处被削弱了，在心理社会压力下，减少摄入会产生一个小的、方向上有害的中介效应。反事实模拟表明，在模型队列中，社会心理状况的改善可以将估计的2型糖尿病风险降低约11.6个百分点，随着社会心理压力的缓解，食用奶酪等保护性饮食模式重新出现。该模型还揭示了明显的种族差异，南亚、非洲和加勒比地区的参与者在该队列中表现出明显高于白人的估计风险。这些发现强调了综合的、基于压力的预防策略的潜力，这些策略可以同时解决心理社会和饮食途径。结论：本研究引入了一个透明的、模拟的DT框架，用于估计T2DM风险和探索行为干预方案，而不依赖于实时数据流。它使可解释的、个性化的预防规划成为可能，并支持探索在公共卫生领域，特别是在服务不足或基础设施较差的环境中进行可扩展部署。整合社会心理和生活方式数据是朝着更公平和行为知情的数字卫生解决方案迈出的重要一步。

{"title":"A digital twin framework for predicting and simulating type 2 diabetes onset using retrospective lifestyle data.","authors":"Mahreen Kiran, Ying Xie, Graham Ball, Rudolph Schutte, Nasreen Anjum, Barbara Pierscionek","doi":"10.3389/fdgth.2026.1710829","DOIUrl":"https://doi.org/10.3389/fdgth.2026.1710829","url":null,"abstract":"Introduction: Type 2 Diabetes Mellitus (T2DM) is a rising global health concern, heavily influenced by modifiable lifestyle and psychosocial factors. However, most predictive tools focus on biomedical markers and rely on real-time data from wearables or electronic health records, limiting their scalability in resource-constrained settings. This study presents a novel digital twin (DT) framework that uses retrospective lifestyle, behavioral, and psychosocial data to forecast T2DM onset and simulate the estimated effects of preventive interventions.Methods: Data were drawn from 19,774 participants in the UK Biobank cohort, followed for up to 17 years. A penalized Cox proportional hazards model was employed to estimate individual time-to-event risk trajectories based on 90 candidate predictors. Predictors were selected through univariate screening, multicollinearity assessment, and variance filtering, yielding a final model with 14 significant variables. Causal inference techniques, including directed acyclic graphs (DAGs) and counterfactual simulations, were used to explore intervention effects on disease progression.Results: The model demonstrated strong predictive performance (C-index <math><mo>=</mo></math> 0.90, SD <math><mo>=</mo></math> 0.004). Psychosocial stressors such as loneliness, insomnia, and poor mental health emerged as strong independent predictors and were associated with estimated increases in absolute T2DM risk of approximately 35 percentage points individually and nearly 78 percentage points when combined, under the modeled assumptions. These effects were partly reinforced through diet, with high intake of processed meat, salt, and sugary cereals acting as risk amplifiers within the modeled causal pathways. Cheese intake was protective overall, but its estimated benefit was attenuated under psychosocial stress, where reduced consumption produced a small, directionally harmful mediation effect. Counterfactual simulations suggested that improvements in psychosocial conditions could reduce estimated T2DM risk by approximately 11.6 percentage points within the modeled cohort, with protective dietary patterns such as cheese consumption re-emerging as psychosocial stress was alleviated. The model also revealed pronounced ethnic disparities, with South Asian, African, and Caribbean participants exhibiting significantly higher estimated risk than White counterparts within this cohort. These findings highlight the potential of integrated, stress-informed prevention strategies that address both psychosocial and dietary pathways.Conclusion: This study introduces a transparent, simulation-enabled DT framework for estimating T2DM risk and exploring behavioral intervention scenarios without reliance on real-time data streams. It enables interpretable, personalized prevention planning and supports exploration of scalable deployment in public health, pa","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"8 ","pages":"1710829"},"PeriodicalIF":3.2,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12999582/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The repaired man or the man with extras: medical human-cyborgs. 被修复的人还是有额外装备的人：医疗人形机器人。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-05 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1754061

Gábor Speer

引用次数: 0

"I need to feel safe before I can engage": embedding trauma-informed principles in sexual and reproductive health digital technologies. “我需要在参与之前感到安全”：将创伤知情原则融入性健康和生殖健康数字技术。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-05 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1733713

Agnes Kyamulabi, Abdul-Fatawu Abdulai

Background: The use of digital health technologies to access information and services related to sexual and reproductive health has been increasing. Despite the usefulness of these technologies, there are emerging concerns that they could inadvertently trigger, perpetuate and exacerbate trauma among patients. The purpose of this study was to explore trauma-informed care principles that could be applied in designing and/or utilizing sexual and reproductive health services.

Method: We conducted 5 focus group discussions with participants who have used digital health technologies to access sexual and reproductive health services in Western Canada. The discussion centred on ways sexual health-related digital technologies could prevent triggering or perpetuating trauma among patients. The discussion took place over Zoom, and the data were analyzed using a thematic analysis approach.

Results: The study revealed five main considerations that could be adopted in the design and use of sexual and reproductive health technologies to prevent the unintended consequences of trauma. These include (1) integrating accessibility and inclusivity features; (2) integrating confidentiality, safety, and privacy features like quick exit buttons; (3) using empathetic language and terminologies; (4) integrating emotional and psychological support services; and (5) implementing aesthetic design features.

Conclusion: The findings of this study would help produce equitable, safe, and empowering digital health technologies for all users, particularly trauma survivors. By integrating these principles, developers and healthcare providers can create tools that reduce barriers, mitigate re-traumatization risks, and promote positive health outcomes. Future research should focus on evaluating the implementation and impact of trauma-informed digital tools in diverse settings.

背景：越来越多的人利用数字卫生技术获取与性健康和生殖健康有关的信息和服务。尽管这些技术很有用，但人们越来越担心它们可能会在不经意间引发、延续和加剧患者的创伤。本研究的目的是探讨创伤知情护理原则，可应用于设计和/或利用性健康和生殖健康服务。方法：我们与加拿大西部使用数字健康技术获得性健康和生殖健康服务的参与者进行了5次焦点小组讨论。讨论的重点是与性健康有关的数字技术如何防止在患者中引发或延续创伤。讨论通过Zoom进行，并使用主题分析方法对数据进行分析。结果：该研究揭示了在设计和使用性健康和生殖健康技术时可采用的五个主要考虑因素，以防止创伤的意外后果。这包括：(1)整合可访问性和包容性特征；(2)集成了保密、安全、隐私等功能，如快速退出按钮；(3)使用移情语言和术语；(4)整合情感和心理支持服务；(5)实现美学设计特征。结论：本研究的结果将有助于为所有用户，特别是创伤幸存者提供公平、安全和赋权的数字卫生技术。通过整合这些原则，开发人员和医疗保健提供者可以创建工具，减少障碍，减轻再创伤风险，并促进积极的健康结果。未来的研究应侧重于评估创伤信息数字化工具在不同环境中的实施和影响。

{"title":"\"I need to feel safe before I can engage\": embedding trauma-informed principles in sexual and reproductive health digital technologies.","authors":"Agnes Kyamulabi, Abdul-Fatawu Abdulai","doi":"10.3389/fdgth.2026.1733713","DOIUrl":"https://doi.org/10.3389/fdgth.2026.1733713","url":null,"abstract":"Background: The use of digital health technologies to access information and services related to sexual and reproductive health has been increasing. Despite the usefulness of these technologies, there are emerging concerns that they could inadvertently trigger, perpetuate and exacerbate trauma among patients. The purpose of this study was to explore trauma-informed care principles that could be applied in designing and/or utilizing sexual and reproductive health services.Method: We conducted 5 focus group discussions with participants who have used digital health technologies to access sexual and reproductive health services in Western Canada. The discussion centred on ways sexual health-related digital technologies could prevent triggering or perpetuating trauma among patients. The discussion took place over Zoom, and the data were analyzed using a thematic analysis approach.Results: The study revealed five main considerations that could be adopted in the design and use of sexual and reproductive health technologies to prevent the unintended consequences of trauma. These include (1) integrating accessibility and inclusivity features; (2) integrating confidentiality, safety, and privacy features like quick exit buttons; (3) using empathetic language and terminologies; (4) integrating emotional and psychological support services; and (5) implementing aesthetic design features.Conclusion: The findings of this study would help produce equitable, safe, and empowering digital health technologies for all users, particularly trauma survivors. By integrating these principles, developers and healthcare providers can create tools that reduce barriers, mitigate re-traumatization risks, and promote positive health outcomes. Future research should focus on evaluating the implementation and impact of trauma-informed digital tools in diverse settings.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"8 ","pages":"1733713"},"PeriodicalIF":3.2,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12999910/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147500926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating large language models for automated TNM staging from PET-CT reports: a multi-cancer comparative study. 从PET-CT报告中评估自动TNM分期的大型语言模型：一项多癌症比较研究。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-04 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1741973

Wen Xu, Lixiu Cao, Qijun Shen, Yanna Shan, Shushu Pan, Mei Ruan

Purpose: To evaluate three large language models (LLMs), including ChatGPT 5, ChatGPT 4o, and ChatGPT 3.5, in automating TNM staging from PET-CT reports across six cancer types, and to assess their clinical utility compared with junior radiologists.

Materials and methods: PET-CT reports from 552 treatment-naive patients in two institutions with confirmed primary malignancies (lung, breast, liver, pancreatic, renal, and prostate cancer) were analyzed. Three ChatGPT-series LLMs and five junior radiologists independently performed TNM staging. Reference standards were established by two senior radiologists according to the 8th version of American Joint Committee on Cancer (AJCC) staging system. Performance was evaluated using accuracy rates. Intra-model agreement was assessed by repeating each model three times per report with identical prompts, and inter-model agreement was evaluated using Cohen's κ coefficients.

Results: ChatGPT 5 achieved the highest overall accuracy (82.1%, 453/552), followed by ChatGPT 4o (74.3%, 410/552), both significantly outperforming ChatGPT 3.5 (59.6%, 329/552) and junior radiologists (77.0%, 425/552; p = 0.041 for ChatGPT 5 vs. junior radiologists). Accuracy varied by cancer type, with the highest performance in lung cancer staging (88.5%) and the lowest in pancreatic cancer (69.2%). Across TNM categories, all models achieved the best performance in T staging, followed by N staging, with M staging remaining the most challenging. ChatGPT 5 showed near-perfect intra-model agreement (κ = 0.96), while inter-model agreement ranged from moderate between ChatGPT 3.5 and 4o (κ = 0.58) to substantial between ChatGPT 5 and 4o (κ = 0.78). ChatGPT 5 processed cases markedly faster than junior radiologists (8.3 ± 3.2 vs. 92.5 ± 21.7 s per case; p < 0.001).

Conclusion: Among the three LLMs, ChatGPT 5 demonstrated the highest accuracy, stability, and efficiency in automated TNM staging from PET-CT reports, achieving performance comparable to or slightly exceeding junior radiologists. Its advantages in T staging and lung cancer evaluation highlight its clinical utility as a potential decision-support tool.

目的：评估三种大型语言模型（llm），包括ChatGPT 5、ChatGPT 40和ChatGPT 3.5，在6种癌症类型的PET-CT报告中自动进行TNM分期，并评估其与初级放射科医生的临床应用。材料和方法：我们分析了来自两个机构的552例确诊原发性恶性肿瘤（肺癌、乳腺癌、肝癌、胰腺癌、肾癌和前列腺癌）的未接受治疗患者的PET-CT报告。三位chatgpt系列法学硕士和五位初级放射科医师独立进行TNM分期。参考标准由两位资深放射科医师根据美国癌症联合委员会（AJCC）第八版分期系统建立。使用准确率来评估性能。模型内的一致性通过在每个报告中重复每个模型三次来评估，并使用Cohen的κ系数来评估模型间的一致性。结果：ChatGPT 5的总体准确率最高（82.1%,453/552），其次是ChatGPT 40(74.3%, 410/552)，两者均显著优于ChatGPT 3.5（59.6%, 329/552）和初级放射科医生（77.0%,425/552；ChatGPT 5与初级放射科医生的p = 0.041）。准确率因癌症类型而异，肺癌分期准确率最高（88.5%），胰腺癌分期准确率最低（69.2%）。在TNM类别中，所有模型在T分期中表现最佳，其次是N分期，M分期仍然是最具挑战性的。ChatGPT 5表现出接近完美的模型内一致性（κ = 0.96），而模型间一致性从ChatGPT 3.5和40之间的中度（κ = 0.58）到ChatGPT 5和40之间的大量（κ = 0.78）不等。ChatGPT 5处理病例的速度明显快于初级放射科医生（每例8.3±3.2秒vs. 92.5±21.7秒）；p结论：在三位llm中，ChatGPT 5在PET-CT报告的TNM自动分期方面表现出最高的准确性、稳定性和效率，其表现与初级放射科医生相当或略高于初级放射科医生。它在T分期和肺癌评估方面的优势突出了其作为潜在决策支持工具的临床实用性。

{"title":"Evaluating large language models for automated TNM staging from PET-CT reports: a multi-cancer comparative study.","authors":"Wen Xu, Lixiu Cao, Qijun Shen, Yanna Shan, Shushu Pan, Mei Ruan","doi":"10.3389/fdgth.2026.1741973","DOIUrl":"https://doi.org/10.3389/fdgth.2026.1741973","url":null,"abstract":"Purpose: To evaluate three large language models (LLMs), including ChatGPT 5, ChatGPT 4o, and ChatGPT 3.5, in automating TNM staging from PET-CT reports across six cancer types, and to assess their clinical utility compared with junior radiologists.Materials and methods: PET-CT reports from 552 treatment-naive patients in two institutions with confirmed primary malignancies (lung, breast, liver, pancreatic, renal, and prostate cancer) were analyzed. Three ChatGPT-series LLMs and five junior radiologists independently performed TNM staging. Reference standards were established by two senior radiologists according to the 8th version of American Joint Committee on Cancer (AJCC) staging system. Performance was evaluated using accuracy rates. Intra-model agreement was assessed by repeating each model three times per report with identical prompts, and inter-model agreement was evaluated using Cohen's κ coefficients.Results: ChatGPT 5 achieved the highest overall accuracy (82.1%, 453/552), followed by ChatGPT 4o (74.3%, 410/552), both significantly outperforming ChatGPT 3.5 (59.6%, 329/552) and junior radiologists (77.0%, 425/552; p = 0.041 for ChatGPT 5 vs. junior radiologists). Accuracy varied by cancer type, with the highest performance in lung cancer staging (88.5%) and the lowest in pancreatic cancer (69.2%). Across TNM categories, all models achieved the best performance in T staging, followed by N staging, with M staging remaining the most challenging. ChatGPT 5 showed near-perfect intra-model agreement (κ = 0.96), while inter-model agreement ranged from moderate between ChatGPT 3.5 and 4o (κ = 0.58) to substantial between ChatGPT 5 and 4o (κ = 0.78). ChatGPT 5 processed cases markedly faster than junior radiologists (8.3 ± 3.2 vs. 92.5 ± 21.7 s per case; p < 0.001).Conclusion: Among the three LLMs, ChatGPT 5 demonstrated the highest accuracy, stability, and efficiency in automated TNM staging from PET-CT reports, achieving performance comparable to or slightly exceeding junior radiologists. Its advantages in T staging and lung cancer evaluation highlight its clinical utility as a potential decision-support tool.","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"8 ","pages":"1741973"},"PeriodicalIF":3.2,"publicationDate":"2026-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12996206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147488606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial: The digitalization of neurology-volume II. 编辑：神经病学的数字化第二卷。

IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES

Frontiers in digital health

Pub Date : 2026-03-04 eCollection Date: 2026-01-01 DOI: 10.3389/fdgth.2026.1806851

Daniel B Hier, Michael D Carrithers, Jorge M Rodríguez-Fernández, Benjamin Kummer

引用次数: 0