首页 > 最新文献

Frontiers in digital health最新文献

英文 中文
Artificial intelligence in emergency department triage: perspective of human professionals. 人工智能在急诊科分诊中的应用:人类专业人员的视角。
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-06 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1693060
Alina Petrica, Adina Maria Marza, Claudiu Barsac, Andreea Cebzan, Ioan Dragan, Daniela Zaharie, Raluca Horhat, Diana Lungeanu

Background: The triage process in emergency departments (EDs) is complex, and AI-based solutions have begun to target it. At this pivotal stage, the challenge lies less in designing smarter algorithms than in fostering trust and alignment among medical and technical stakeholders. We explored professional attitudes towards AI in ED triage, focusing on alignments and misalignments across backgrounds.

Methods: An anonymous online cross-sectional survey was distributed through professional networks of healthcare providers and IT professionals, between May 2024 and February 2025. The questionnaire covered four areas: (a) the General Attitudes towards Artificial Intelligence Scale (GAAIS); (b) professional background and career level; (c) challenges and priorities for AI applications in triage; and (d) the AI Attitude Scale (AIAS-4). Constructs from the extended Unified Theory of Acceptance and Use of Technology (UTAUT2) were also applied. Cluster analysis (KMeans) was conducted based on GAAIS-positive, GAAIS-negative, and AIAS-4 scores.

Results: From a total of 151 professionals, Kmeans identified three clusters: K0 (cautious/critical, n = 39), K1 (enthusiastic/optimistic, n = 35), and K2 (balanced/pragmatic, n = 77). Approximately two-thirds of K2 (47/77; 61%) were healthcare providers. Six out of 20 (30%) medical professionals in K0 reported that AI could play no role in ED triage, but only 1/15 (7%) and 1/47 (2%) of healthcare providers gave this response in K1 and K2, respectively. Lack of knowledge of AI tools was also most frequent in K0 (14/39; 36%). Recognition of necessity of constraints showed marked contrasts in their mean ± SD scores: (a) for data availability/quality, 2.95 ± 1.98 (K0), 4.27 ± 1.1 (K1), and 4.20 ± 0.94 (K2); (b) for the integration of AI-based applications into existing workflows, 2.95 ± 1.05, 4.20 ± 0.94, and 3.47 ± 1.02 in K0, K1, and K2, respectively. Among the UTAUT2 constructs, hedonic motivation differed most significantly, with mean ± SD values of 3.41 ± 1.0 (K0), 6.86 ± 0.97 (K1), and 5.07 ± 1.08 (K2).

Conclusions: Stakeholders' perspectives on AI in ED triage are heterogeneous and not solely determined by professional background or role. Hedonic motivation emerged as a key driver of enthusiasm. Educational strategies should follow two directions: (a) structured AI programs for enthusiastic developers from diverse fields, and (b) AI literacy for all healthcare professionals to support competent use as consumers.

背景:急诊科(EDs)的分诊过程是复杂的,基于人工智能的解决方案已经开始针对它。在这个关键阶段,挑战不在于设计更智能的算法,而在于培养医疗和技术利益相关者之间的信任和协调。我们探讨了在急诊室分类中对人工智能的专业态度,重点关注不同背景的对齐和不对齐。方法:在2024年5月至2025年2月期间,通过医疗保健提供者和IT专业人员的专业网络进行匿名在线横断面调查。问卷涵盖四个范畴:(a)对人工智能的一般态度量表(GAAIS);(b)专业背景和职业水平;(c)人工智能在分诊中的应用面临的挑战和优先事项;(d) AI态度量表(AIAS-4)。扩展的技术接受和使用统一理论(UTAUT2)的结构也被应用。根据gaais阳性、gaais阴性和AIAS-4评分进行聚类分析(KMeans)。结果:从总共151名专业人士中,Kmeans确定了三个集群:K0(谨慎/关键,n = 39), K1(热情/乐观,n = 35)和K2(平衡/务实,n = 77)。大约三分之二的K2(47/77; 61%)是医疗保健提供者。K0的20名医疗专业人员中有6名(30%)报告说,人工智能在急诊室分诊中不起作用,但分别只有1/15(7%)和1/47(2%)的医疗保健提供者在K1和K2中给出了这种反应。在K0中,缺乏对AI工具的了解也是最常见的(14/39;36%)。对约束必要性的认识在其平均±SD评分中显示出显著的差异:(a)数据可用性/质量,为2.95±1.98 (K0), 4.27±1.1 (K1)和4.20±0.94 (K2);(b)将基于人工智能的应用程序集成到现有工作流中,K0、K1和K2分别为2.95±1.05、4.20±0.94和3.47±1.02。在UTAUT2构念中,享乐动机差异最显著,平均±SD值分别为3.41±1.0 (K0)、6.86±0.97 (K1)和5.07±1.08 (K2)。结论:利益相关者对人工智能在急诊科分诊中的观点是不同的,并不仅仅由专业背景或角色决定。享乐动机成为了热情的关键驱动因素。教育策略应遵循两个方向:(a)为来自不同领域的热情开发人员提供结构化的人工智能计划,以及(b)为所有医疗保健专业人员提供人工智能知识,以支持作为消费者的熟练使用。
{"title":"Artificial intelligence in emergency department triage: perspective of human professionals.","authors":"Alina Petrica, Adina Maria Marza, Claudiu Barsac, Andreea Cebzan, Ioan Dragan, Daniela Zaharie, Raluca Horhat, Diana Lungeanu","doi":"10.3389/fdgth.2025.1693060","DOIUrl":"10.3389/fdgth.2025.1693060","url":null,"abstract":"<p><strong>Background: </strong>The triage process in emergency departments (EDs) is complex, and AI-based solutions have begun to target it. At this pivotal stage, the challenge lies less in designing smarter algorithms than in fostering trust and alignment among medical and technical stakeholders. We explored professional attitudes towards AI in ED triage, focusing on alignments and misalignments across backgrounds.</p><p><strong>Methods: </strong>An anonymous online cross-sectional survey was distributed through professional networks of healthcare providers and IT professionals, between May 2024 and February 2025. The questionnaire covered four areas: (a) the General Attitudes towards Artificial Intelligence Scale (GAAIS); (b) professional background and career level; (c) challenges and priorities for AI applications in triage; and (d) the AI Attitude Scale (AIAS-4). Constructs from the extended Unified Theory of Acceptance and Use of Technology (UTAUT2) were also applied. Cluster analysis (<i>KMeans</i>) was conducted based on GAAIS-positive, GAAIS-negative, and AIAS-4 scores.</p><p><strong>Results: </strong>From a total of 151 professionals, <i>Kmeans</i> identified three clusters: K0 (cautious/critical, <i>n</i> = 39), K1 (enthusiastic/optimistic, <i>n</i> = 35), and K2 (balanced/pragmatic, <i>n</i> = 77). Approximately two-thirds of K2 (47/77; 61%) were healthcare providers. Six out of 20 (30%) medical professionals in K0 reported that AI could play no role in ED triage, but only 1/15 (7%) and 1/47 (2%) of healthcare providers gave this response in K1 and K2, respectively. Lack of knowledge of AI tools was also most frequent in K0 (14/39; 36%). Recognition of necessity of constraints showed marked contrasts in their mea<i>n</i> ± SD scores: (a) for data availability/quality, 2.95 ± 1.98 (K0), 4.27 ± 1.1 (K1), and 4.20 ± 0.94 (K2); (b) for the integration of AI-based applications into existing workflows, 2.95 ± 1.05, 4.20 ± 0.94, and 3.47 ± 1.02 in K0, K1, and K2, respectively. Among the UTAUT2 constructs, hedonic motivation differed most significantly, with mean ± SD values of 3.41 ± 1.0 (K0), 6.86 ± 0.97 (K1), and 5.07 ± 1.08 (K2).</p><p><strong>Conclusions: </strong>Stakeholders' perspectives on AI in ED triage are heterogeneous and not solely determined by professional background or role. Hedonic motivation emerged as a key driver of enthusiasm. Educational strategies should follow two directions: (a) structured AI programs for enthusiastic developers from diverse fields, and (b) AI literacy for all healthcare professionals to support competent use as consumers.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1693060"},"PeriodicalIF":3.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12816261/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language models for neurology: a mini review. 神经学的大型语言模型:一个小回顾。
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-06 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1732759
Donald C Wunsch Iii, Daniel B Hier

Large language models have the potential to transform neurology by augmenting diagnostic reasoning, streamlining documentation, and improving workflow efficiency. This Mini Review surveys emerging applications of large language models in Alzheimer's disease, Parkinson's disease, multiple sclerosis, and epilepsy, with emphasis on ambient documentation, multimodal data integration, and clinical decision support. Key barriers to adoption include bias, privacy, reliability, and regulatory alignment. Looking ahead, neurology-focused language models may develop greater fluency in biomedical ontologies and FHIR standards, improving data interoperability and supporting more seamless collaboration between clinicians and AI systems. Two future developments have the potential to be particularly impactful: (1) the integration of multi-omic and neuroimaging data with digital-twin simulations to advance precision neurology, and (2) broader adoption of ambient documentation and other language-model-based efficiencies that could reduce administrative and cognitive burden. Ultimately, the clinical success of large language models will depend on continued progress in model robustness, ethical governance, and careful implementation.

大型语言模型有可能通过增强诊断推理、简化文档和提高工作流程效率来改变神经学。这篇迷你综述调查了大型语言模型在阿尔茨海默病、帕金森病、多发性硬化症和癫痫中的新应用,重点是环境文件、多模式数据集成和临床决策支持。采用的主要障碍包括偏见、隐私、可靠性和监管一致性。展望未来,以神经学为重点的语言模型可能会在生物医学本体和FHIR标准中发展得更加流畅,从而提高数据互操作性,并支持临床医生和人工智能系统之间的无缝协作。两个未来的发展有可能特别有影响力:(1)多组学和神经成像数据与数字孪生模拟的集成,以推进精确神经学;(2)更广泛地采用环境文档和其他基于语言模型的效率,可以减少管理和认知负担。最终,大型语言模型的临床成功将取决于模型鲁棒性、伦理治理和谨慎实现方面的持续进步。
{"title":"Large language models for neurology: a mini review.","authors":"Donald C Wunsch Iii, Daniel B Hier","doi":"10.3389/fdgth.2025.1732759","DOIUrl":"10.3389/fdgth.2025.1732759","url":null,"abstract":"<p><p>Large language models have the potential to transform neurology by augmenting diagnostic reasoning, streamlining documentation, and improving workflow efficiency. This Mini Review surveys emerging applications of large language models in Alzheimer's disease, Parkinson's disease, multiple sclerosis, and epilepsy, with emphasis on ambient documentation, multimodal data integration, and clinical decision support. Key barriers to adoption include bias, privacy, reliability, and regulatory alignment. Looking ahead, neurology-focused language models may develop greater fluency in biomedical ontologies and FHIR standards, improving data interoperability and supporting more seamless collaboration between clinicians and AI systems. Two future developments have the potential to be particularly impactful: (1) the integration of multi-omic and neuroimaging data with digital-twin simulations to advance precision neurology, and (2) broader adoption of ambient documentation and other language-model-based efficiencies that could reduce administrative and cognitive burden. Ultimately, the clinical success of large language models will depend on continued progress in model robustness, ethical governance, and careful implementation.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1732759"},"PeriodicalIF":3.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12816337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and validation of screening tool for excessive and problematic use of internet and digital devices (STEPS-IDD) based on the WHO framework (ICD-11) for addictive behaviours. 根据世卫组织成瘾行为框架(ICD-11)开发和验证互联网和数字设备过度和有问题使用的筛查工具(STEPS-IDD)。
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-06 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1671623
Yatan Pal Singh Balhara, Swarndeep Singh, Ilika Guha Majumdar, Ayesha Ayoob, Aastha Singh

Background: The widespread use of internet and digital devices has been accompanied by growing concern regarding harms associated with their excessive or problematic use. The World Health Organization has also formally included some of these in its latest classificatory system (ICD-11) under the category of "disorders due to addictive behaviours". However, a validated, comprehensive screening tool aligned with ICD-11 that screens for these potentially addictive behaviours is lacking. This study aimed to develop and validate the Screening Tool for Excessive and Problematic use of Internet and Digital Devices (STEPS-IDD), designed to assess multiple addictive behaviours based on ICD-11 criteria.

Methods: STEPS-IDD was developed based on the ICD-11 framework for disorders due to addictive behaviours It was applied to assess well-established behavioural addictions like gaming and gambling disorder, as well as less-established but widely researched ones such as problematic use of social media, online shopping/buying, OTT content watching, and pornography watching. Face validity was established through expert review and feedback. Construct validity was evaluated through exploratory factor analysis (EFA), and Cronbach's alpha coefficients were estimated to assess internal consistency. To examine concurrent validity, correlations between scores obtained on the newly developed STEPS-IDD sub-sections and the previously validated Gaming Disorder and Hazardous Gaming Scale (GDHGS) and modified GDHGS for other behaviours were assessed. Receiver Operating Characteristic (ROC) analyses were conducted to determine optimal STEPS-IDD cut-off scores for different behaviours.

Results: Data from a total of 112 college students (64.3% female) with a mean age of 20.5 years were analyzed. STEPS-IDD demonstrated good construct validity, with EFA revealing predominantly unidimensional factor structure for most behavioural domains. Internal consistency was excellent (Cronbach's α = 0.86-0.91 across sub-sections). Concurrent validity was supported by moderate to strong positive correlations (r = 0.44-0.76) of STEPS-IDD sub-sections with corresponding GDHGS and modified GDHGS scores. ROC analyses yielded optimal cut-off scores with high sensitivity and acceptable specificity for different behaviours, and fair to excellent overall diagnostic accuracy.

Conclusion: STEPS-IDD is a psychometrically robust, brief yet comprehensive screening tool grounded in the ICD-11 framework, for the risk stratification in the context of addictive behaviours related to the use of the internet and digital devices.

背景:随着互联网和数字设备的广泛使用,人们越来越关注其过度使用或有问题使用所带来的危害。世界卫生组织还将其中一些疾病正式纳入其最新分类系统(ICD-11),归入“成瘾行为引起的疾病”类别。然而,缺乏一种与ICD-11相一致的经过验证的全面筛查工具,以筛查这些潜在的成瘾行为。本研究旨在开发和验证互联网和数字设备过度和有问题使用的筛选工具(STEPS-IDD),旨在根据ICD-11标准评估多种成瘾行为。方法:STEPS-IDD是在ICD-11成瘾行为障碍框架的基础上开发的,它被用于评估已经建立的行为成瘾,如游戏和赌博障碍,以及不太建立但研究广泛的行为成瘾,如有问题地使用社交媒体、在线购物/购买、观看OTT内容和观看色情内容。通过专家评审和反馈建立面孔效度。通过探索性因子分析(EFA)评估结构效度,并估计Cronbach's alpha系数来评估内部一致性。为了检验并发效度,新开发的STEPS-IDD子部分获得的分数与先前验证的游戏障碍和危险游戏量表(GDHGS)以及其他行为的修改GDHGS之间的相关性进行了评估。进行受试者工作特征(ROC)分析,以确定不同行为的最佳STEPS-IDD分值。结果:共对112名大学生进行数据分析,其中女性占64.3%,平均年龄20.5岁。STEPS-IDD表现出良好的结构效度,EFA在大多数行为领域显示出主要的一维因素结构。内部一致性极好(Cronbach’s α = 0.86-0.91)。STEPS-IDD分段与相应的GDHGS和修改后的GDHGS得分呈中至强正相关(r = 0.44-0.76),支持并发效度。ROC分析为不同行为提供了高灵敏度和可接受的特异性的最佳截止评分,并且总体诊断准确性相当优异。结论:STEPS-IDD是一种基于ICD-11框架的心理测量学上强大,简短而全面的筛查工具,用于与使用互联网和数字设备相关的成瘾行为背景下的风险分层。
{"title":"Development and validation of screening tool for excessive and problematic use of internet and digital devices (STEPS-IDD) based on the WHO framework (ICD-11) for addictive behaviours.","authors":"Yatan Pal Singh Balhara, Swarndeep Singh, Ilika Guha Majumdar, Ayesha Ayoob, Aastha Singh","doi":"10.3389/fdgth.2025.1671623","DOIUrl":"10.3389/fdgth.2025.1671623","url":null,"abstract":"<p><strong>Background: </strong>The widespread use of internet and digital devices has been accompanied by growing concern regarding harms associated with their excessive or problematic use. The World Health Organization has also formally included some of these in its latest classificatory system (ICD-11) under the category of \"disorders due to addictive behaviours\". However, a validated, comprehensive screening tool aligned with ICD-11 that screens for these potentially addictive behaviours is lacking. This study aimed to develop and validate the Screening Tool for Excessive and Problematic use of Internet and Digital Devices (STEPS-IDD), designed to assess multiple addictive behaviours based on ICD-11 criteria.</p><p><strong>Methods: </strong>STEPS-IDD was developed based on the ICD-11 framework for disorders due to addictive behaviours It was applied to assess well-established behavioural addictions like gaming and gambling disorder, as well as less-established but widely researched ones such as problematic use of social media, online shopping/buying, OTT content watching, and pornography watching. Face validity was established through expert review and feedback. Construct validity was evaluated through exploratory factor analysis (EFA), and Cronbach's alpha coefficients were estimated to assess internal consistency. To examine concurrent validity, correlations between scores obtained on the newly developed STEPS-IDD sub-sections and the previously validated Gaming Disorder and Hazardous Gaming Scale (GDHGS) and modified GDHGS for other behaviours were assessed. Receiver Operating Characteristic (ROC) analyses were conducted to determine optimal STEPS-IDD cut-off scores for different behaviours.</p><p><strong>Results: </strong>Data from a total of 112 college students (64.3% female) with a mean age of 20.5 years were analyzed. STEPS-IDD demonstrated good construct validity, with EFA revealing predominantly unidimensional factor structure for most behavioural domains. Internal consistency was excellent (Cronbach's α = 0.86-0.91 across sub-sections). Concurrent validity was supported by moderate to strong positive correlations (r = 0.44-0.76) of STEPS-IDD sub-sections with corresponding GDHGS and modified GDHGS scores. ROC analyses yielded optimal cut-off scores with high sensitivity and acceptable specificity for different behaviours, and fair to excellent overall diagnostic accuracy.</p><p><strong>Conclusion: </strong>STEPS-IDD is a psychometrically robust, brief yet comprehensive screening tool grounded in the ICD-11 framework, for the risk stratification in the context of addictive behaviours related to the use of the internet and digital devices.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1671623"},"PeriodicalIF":3.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12816239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced EEG signal classification for neural prosthetic devices using metaheuristic and deep learning techniques. 基于元启发式和深度学习技术的神经义肢脑电信号高级分类。
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-06 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1706660
Thippagudisa Kishore Babu, Damodar Reddy Edla, Suresh Dara, Mohan Allam

Introduction: For neural prosthetic devices, accurate classification of high dimensional electroencephalography (EEG) signals is significantly impaired by the existence of redundant and irrelevant features that deteriorate the classifier generalization and computation efficiency. This work presents a new and unified optimal-driven framework to challenge these issues and improve EEG-based MI signal decoding.

Methods: The proposed method combines a modified feature selection model of coati optimization algorithm (COA) and different machine/deep learning classifiers. The novelty of the COA is its dynamic and parameter-free adaptation mechanism, in association with opposition-based learning a better exploration exploitation balance can be maintained in high-dimensional feature space. The generated optimized feature subsets are then employed to train a battery of classifiers such as support vector machines (SVM), random forests (RF), convolutional neural networks (CNN) and recurrent neural networks (RNN) for motor imagery task recognition. In experiments, we verify SSRC on commonly used benchmark EEG datasets such as the PhysioNet Motor Movement/Imagery dataset.

Results: The experimental results showed that the COA + CNN model had the best performance of classification. The model demonstrated a classification accuracy of 96.8% of prediction, with precision at moderate AH hour and predicted as either being more likely to discharge or remain in care = 96.4%, recall = 96.9% and F1-score = 96.6%. This presents a remarkable 6.5% gain in classification accuracy over the best rival feature selection technique and significantly outperformed conventional metaheuristic algorithms such as PSO (90.3% accuracy) and GA (89.7% accuracy) as well as filter-type techniques such as mRMR (86.8%) and ReliefF (84.3%).

Discussion/conclusion: The combined evolved metaheatistic for feature subset selection with deep learning architectures is a powerful approach for an accurate classification EEG signals. The findings confirm that the COA-based approach provides a robust, computationally-efficient, and scalable method for achieving high-accuracy classification-essential for promoting the reliability and real-time operation of future neural prosthetic control systems.

摘要:对于神经义肢设备,由于存在冗余和不相关的特征,严重影响了对高维脑电图信号的准确分类,从而降低了分类器的泛化和计算效率。这项工作提出了一个新的和统一的最优驱动框架来挑战这些问题,并改进基于脑电图的MI信号解码。方法:提出的方法结合了改进的coati优化算法(COA)的特征选择模型和不同的机器/深度学习分类器。COA的新颖之处在于它的动态和无参数自适应机制,结合基于对立的学习,可以在高维特征空间中保持更好的探索利用平衡。然后使用生成的优化特征子集来训练一系列分类器,如支持向量机(SVM)、随机森林(RF)、卷积神经网络(CNN)和递归神经网络(RNN),用于运动图像任务识别。在实验中,我们在常用的基准EEG数据集(如PhysioNet运动/图像数据集)上验证了SSRC。结果:实验结果表明,COA + CNN模型的分类性能最好。该模型预测的分类准确率为96.8%,在中度AH小时的准确率为96.4%,预测出院或继续护理的可能性为96.9%,召回率为96.9%,f1评分为96.6%。这比最佳竞争对手特征选择技术的分类精度提高了6.5%,显著优于传统的元启发式算法,如PSO(准确率90.3%)和GA(准确率89.7%),以及过滤器类型的技术,如mRMR(准确率86.8%)和ReliefF(准确率84.3%)。讨论/结论:特征子集选择的进化元算法与深度学习架构相结合,是一种准确分类脑电信号的有效方法。研究结果证实,基于coa的方法为实现高精度分类提供了一种鲁棒性、计算效率高、可扩展的方法,这对于提高未来神经假肢控制系统的可靠性和实时性至关重要。
{"title":"Advanced EEG signal classification for neural prosthetic devices using metaheuristic and deep learning techniques.","authors":"Thippagudisa Kishore Babu, Damodar Reddy Edla, Suresh Dara, Mohan Allam","doi":"10.3389/fdgth.2025.1706660","DOIUrl":"10.3389/fdgth.2025.1706660","url":null,"abstract":"<p><strong>Introduction: </strong>For neural prosthetic devices, accurate classification of high dimensional electroencephalography (EEG) signals is significantly impaired by the existence of redundant and irrelevant features that deteriorate the classifier generalization and computation efficiency. This work presents a new and unified optimal-driven framework to challenge these issues and improve EEG-based MI signal decoding.</p><p><strong>Methods: </strong>The proposed method combines a modified feature selection model of coati optimization algorithm (COA) and different machine/deep learning classifiers. The novelty of the COA is its dynamic and parameter-free adaptation mechanism, in association with opposition-based learning a better exploration exploitation balance can be maintained in high-dimensional feature space. The generated optimized feature subsets are then employed to train a battery of classifiers such as support vector machines (SVM), random forests (RF), convolutional neural networks (CNN) and recurrent neural networks (RNN) for motor imagery task recognition. In experiments, we verify SSRC on commonly used benchmark EEG datasets such as the PhysioNet Motor Movement/Imagery dataset.</p><p><strong>Results: </strong>The experimental results showed that the COA + CNN model had the best performance of classification. The model demonstrated a classification accuracy of 96.8% of prediction, with precision at moderate AH hour and predicted as either being more likely to discharge or remain in care = 96.4%, recall = 96.9% and F1-score = 96.6%. This presents a remarkable 6.5% gain in classification accuracy over the best rival feature selection technique and significantly outperformed conventional metaheuristic algorithms such as PSO (90.3% accuracy) and GA (89.7% accuracy) as well as filter-type techniques such as mRMR (86.8%) and ReliefF (84.3%).</p><p><strong>Discussion/conclusion: </strong>The combined evolved metaheatistic for feature subset selection with deep learning architectures is a powerful approach for an accurate classification EEG signals. The findings confirm that the COA-based approach provides a robust, computationally-efficient, and scalable method for achieving high-accuracy classification-essential for promoting the reliability and real-time operation of future neural prosthetic control systems.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1706660"},"PeriodicalIF":3.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12815840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Remote photoplethysmography for health assessment: a review informed by IntelliProve technology. 用于健康评估的远程光容积脉搏图:由IntelliProve技术提供的综述
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-05 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1667423
Alora Brown, Joeri Tulkens, Maxime Mattelin, Tanguy Sanglet, Brecht Dhuyvetters

Background: Remote photoplethysmography (rPPG) is a non-invasive method that accurately measures clinical biomarkers, including heart rate, respiration rate, heart rate variability, blood pressure and oxygen saturation. The contactless technique relies on standard cameras and ambient light, proving highly accessible and significant for the assessment of general health. Despite its potential, comprehensive research on rPPG applications for health assessment is scarce.

Objective: This review summarizes the current state of knowledge on rPPG health assessments, covering both fundamental physiological monitoring and higher-level health insights. The paper consults the rPPG-based HealthTech company, IntelliProve, as a real-world example to identify relevant outputs that are currently applied in everyday settings.

Methods: A literature review was performed to identify validated physiological biomarkers and emerging health metrics in rPPG research, using Google Scholar, PubMed and Scopus.

Results: The search identified 96 relevant studies, of which 54 directly investigated rPPG-related technologies. The remaining papers provided theoretical context and complementary support relevant to rPPG-based health metrics. Similarly to IntelliProve's approach, several studies combined rPPG with additional inputs to enhance the accuracy of complex health assessments, such as sleep quality evaluation. The review identified well-established health outputs, including heart rate, respiratory rate, heart rate variability, hypertension risk and mental stress detection, as well as exploratory health metrics, including the assessment of mental health risk energy levels, sleep quality and resonant breathing state. To the author's knowledge, existing literature heavily focuses on basic vitals derivation, with limited research into rPPG's broader health applications.

Conclusions: This review synthesizes rPPG-based health applications, demonstrating strong evidence for fundamental physiological monitoring and an increasing interest in higher-level health metrics. Overall, this paper establishes the groundwork for continued research into the growing application of rPPG for health assessments.

背景:远程光容积脉搏波描记(rPPG)是一种非侵入性方法,可准确测量临床生物标志物,包括心率、呼吸速率、心率变异性、血压和血氧饱和度。这种非接触式技术依赖于标准的摄像头和环境光,被证明是高度可及的,对一般健康评估具有重要意义。尽管有潜力,但关于rPPG在健康评估中的应用的综合研究很少。目的:综述了rPPG健康评估的现状,包括基础生理监测和更高层次的健康见解。该论文咨询了总部位于rppg的HealthTech公司IntelliProve,作为一个现实世界的例子,以确定目前在日常环境中应用的相关输出。方法:使用谷歌Scholar、PubMed和Scopus进行文献综述,以确定rPPG研究中经过验证的生理生物标志物和新兴健康指标。结果:检索到96项相关研究,其中54项直接研究了rppg相关技术。其余的论文提供了与基于rppg的健康指标相关的理论背景和补充支持。与IntelliProve的方法类似,一些研究将rPPG与其他输入相结合,以提高复杂健康评估(如睡眠质量评估)的准确性。该审查确定了完善的健康产出,包括心率、呼吸频率、心率变异性、高血压风险和精神压力检测,以及探索性健康指标,包括评估精神健康风险能量水平、睡眠质量和共振呼吸状态。据作者所知,现有文献主要侧重于基本生命的推导,对rPPG更广泛的健康应用的研究有限。结论:本综述综合了基于rppg的健康应用,为基础生理监测提供了强有力的证据,并对更高水平的健康指标越来越感兴趣。总体而言,本文为继续研究rPPG在健康评估中的日益广泛应用奠定了基础。
{"title":"Remote photoplethysmography for health assessment: a review informed by IntelliProve technology.","authors":"Alora Brown, Joeri Tulkens, Maxime Mattelin, Tanguy Sanglet, Brecht Dhuyvetters","doi":"10.3389/fdgth.2025.1667423","DOIUrl":"10.3389/fdgth.2025.1667423","url":null,"abstract":"<p><strong>Background: </strong>Remote photoplethysmography (rPPG) is a non-invasive method that accurately measures clinical biomarkers, including heart rate, respiration rate, heart rate variability, blood pressure and oxygen saturation. The contactless technique relies on standard cameras and ambient light, proving highly accessible and significant for the assessment of general health. Despite its potential, comprehensive research on rPPG applications for health assessment is scarce.</p><p><strong>Objective: </strong>This review summarizes the current state of knowledge on rPPG health assessments, covering both fundamental physiological monitoring and higher-level health insights. The paper consults the rPPG-based HealthTech company, IntelliProve, as a real-world example to identify relevant outputs that are currently applied in everyday settings.</p><p><strong>Methods: </strong>A literature review was performed to identify validated physiological biomarkers and emerging health metrics in rPPG research, using Google Scholar, PubMed and Scopus.</p><p><strong>Results: </strong>The search identified 96 relevant studies, of which 54 directly investigated rPPG-related technologies. The remaining papers provided theoretical context and complementary support relevant to rPPG-based health metrics. Similarly to IntelliProve's approach, several studies combined rPPG with additional inputs to enhance the accuracy of complex health assessments, such as sleep quality evaluation. The review identified well-established health outputs, including heart rate, respiratory rate, heart rate variability, hypertension risk and mental stress detection, as well as exploratory health metrics, including the assessment of mental health risk energy levels, sleep quality and resonant breathing state. To the author's knowledge, existing literature heavily focuses on basic vitals derivation, with limited research into rPPG's broader health applications.</p><p><strong>Conclusions: </strong>This review synthesizes rPPG-based health applications, demonstrating strong evidence for fundamental physiological monitoring and an increasing interest in higher-level health metrics. Overall, this paper establishes the groundwork for continued research into the growing application of rPPG for health assessments.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1667423"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12812591/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large language model bias auditing for periodontal diagnosis using an ambiguity-probe methodology: a pilot study. 使用歧义探测方法学进行牙周诊断的大语言模型偏差审计:一项试点研究。
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-05 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1687820
Teerachate Nantakeeratipat

Background: Large Language Models (LLMs) in healthcare holds immense promise yet carries the risk of perpetuating social biases. While artificial intelligence (AI) fairness is a growing concern, a gap exists in understanding how these models perform under conditions of clinical ambiguity, a common feature in real-world practice.

Methods: We conducted a study using an ambiguity-probe methodology with a set of 42 sociodemographic personas and 15 clinical vignettes based on the 2018 classification of periodontal diseases. Ten were clear-cut scenarios with established ground truths, while five were intentionally ambiguous. OpenAI's GPT-4o and Google's Gemini 2.5 Pro were prompted to provide periodontal stage and grade assessments using 630 vignette-persona combinations per model.

Results: In clear-cut scenarios, GPT-4o demonstrated significantly higher combined (stage and grade) accuracy (70.5%) than Gemini Pro (33.3%). However, a robust fairness analysis using cumulative link models with false discovery rate correction revealed no statistically significant sociodemographic bias in either model. This finding held true across both clear-cut and ambiguous clinical scenarios.

Conclusion: To our knowledge, this is among the first study to use simulated clinical ambiguity to reveal the distinct ethical fingerprints of LLMs in a dental context. While LLM performance gaps exist, our analysis decouples accuracy from fairness, demonstrating that both models maintain sociodemographic neutrality. We identify that the observed errors are not bias, but rather diagnostic boundary instability. This highlights a critical need for future research to differentiate between these two distinct types of model failure to build genuinely reliable AI.

背景:大型语言模型(LLMs)在医疗保健领域有着巨大的前景,但也有延续社会偏见的风险。虽然人工智能(AI)公平性日益受到关注,但在理解这些模型在临床模糊性条件下的表现方面存在差距,这是现实世界实践中的一个常见特征。方法:基于2018年牙周病分类,我们使用歧义探测方法对42个社会人口学人物角色和15个临床小故事进行了研究。其中10个是明确的场景,有既定的基本事实,而5个是故意模棱两可的。OpenAI的gpt - 40和谷歌的Gemini 2.5 Pro被提示提供牙周阶段和等级评估,每个模型使用630个虚拟人物组合。结果:在明确的情况下,gpt - 40的综合(分期和分级)准确率(70.5%)明显高于Gemini Pro(33.3%)。然而,使用带有错误发现率校正的累积链接模型进行的稳健的公平性分析显示,两种模型都没有统计学上显著的社会人口统计学偏差。这一发现在明确和模糊的临床情况下都是正确的。结论:据我们所知,这是第一个使用模拟临床歧义来揭示法学硕士在牙科环境中独特的伦理指纹的研究。虽然法学硕士的表现差距存在,但我们的分析将准确性与公平性解耦,表明两种模型都保持社会人口中立。我们发现观察到的误差不是偏差,而是诊断边界不稳定性。这突出了未来研究的关键需求,即区分这两种不同类型的模型故障,以构建真正可靠的人工智能。
{"title":"Large language model bias auditing for periodontal diagnosis using an ambiguity-probe methodology: a pilot study.","authors":"Teerachate Nantakeeratipat","doi":"10.3389/fdgth.2025.1687820","DOIUrl":"10.3389/fdgth.2025.1687820","url":null,"abstract":"<p><strong>Background: </strong>Large Language Models (LLMs) in healthcare holds immense promise yet carries the risk of perpetuating social biases. While artificial intelligence (AI) fairness is a growing concern, a gap exists in understanding how these models perform under conditions of clinical ambiguity, a common feature in real-world practice.</p><p><strong>Methods: </strong>We conducted a study using an ambiguity-probe methodology with a set of 42 sociodemographic personas and 15 clinical vignettes based on the 2018 classification of periodontal diseases. Ten were clear-cut scenarios with established ground truths, while five were intentionally ambiguous. OpenAI's GPT-4o and Google's Gemini 2.5 Pro were prompted to provide periodontal stage and grade assessments using 630 vignette-persona combinations per model.</p><p><strong>Results: </strong>In clear-cut scenarios, GPT-4o demonstrated significantly higher combined (stage and grade) accuracy (70.5%) than Gemini Pro (33.3%). However, a robust fairness analysis using cumulative link models with false discovery rate correction revealed no statistically significant sociodemographic bias in either model. This finding held true across both clear-cut and ambiguous clinical scenarios.</p><p><strong>Conclusion: </strong>To our knowledge, this is among the first study to use simulated clinical ambiguity to reveal the distinct ethical fingerprints of LLMs in a dental context. While LLM performance gaps exist, our analysis decouples accuracy from fairness, demonstrating that both models maintain sociodemographic neutrality. We identify that the observed errors are not bias, but rather diagnostic boundary instability. This highlights a critical need for future research to differentiate between these two distinct types of model failure to build genuinely reliable AI.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1687820"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12812596/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A structured framework to improve usability in EHR implementation: a user-centered case study in Brazilian mental healthcare. 改善电子病历实施可用性的结构化框架:巴西精神卫生保健以用户为中心的案例研究。
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-05 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1676631
Fernanda Peron Gaspary, Daniel Baia Amaral, Cristian Vinicius Fagundes, Luis Felipe Dias Lopes, João Francisco Pollo Gaspary

Background: Electronic Health Record (EHR) systems are central to digital health transformation, yet usability challenges continue to constrain their effectiveness, particularly in mental healthcare contexts.

Objectives: To develop and describe a structured, user-centered framework for improving EHR usability based on a Brazilian outpatient mental health case study.

Methods: This qualitative design research study, guided by the Double Diamond design methodology, followed four iterative phases (Discover, Define, Develop, Deliver) and conducted qualitative interviews with 21 healthcare professionals. Data were organized using the Certainties, Suppositions, and Doubts (CSD) matrix and triangulated through heuristic evaluation and prototype testing.

Results: Key barriers included non-standardized navigation flows, limited integration with external systems, and inflexible documentation structures. Based on these findings, the study proposes design-driven improvements such as customizable templates, real-time validation features, and workflow-specific interface adjustments.

Conclusions: By integrating service design logic with usability-driven interface adaptations and addressing both systemic usability gaps and contextual demands, this research contributes actionable insights for advancing human-centered EHR innovation, with particular relevance to complex mental healthcare workflows.

背景:电子健康记录(EHR)系统是数字健康转型的核心,但可用性挑战继续限制其有效性,特别是在精神卫生保健环境中。目的:开发和描述一个结构化的,以用户为中心的框架,以提高电子病历的可用性为基础的巴西门诊精神卫生案例研究。方法:本定性设计研究在双钻石设计方法论的指导下,遵循四个迭代阶段(发现、定义、开发、交付),并对21名医疗保健专业人员进行定性访谈。使用确定性,假设和怀疑(CSD)矩阵组织数据,并通过启发式评估和原型测试进行三角化。结果:主要障碍包括非标准化的导航流程,与外部系统的有限集成,以及不灵活的文档结构。基于这些发现,该研究提出了设计驱动的改进,如可定制模板、实时验证特性和特定于工作流的接口调整。结论:通过将服务设计逻辑与可用性驱动的界面适应性相结合,并解决系统可用性差距和上下文需求,本研究为推进以人为中心的电子病历创新提供了可操作的见解,特别是与复杂的精神卫生保健工作流程相关。
{"title":"A structured framework to improve usability in EHR implementation: a user-centered case study in Brazilian mental healthcare.","authors":"Fernanda Peron Gaspary, Daniel Baia Amaral, Cristian Vinicius Fagundes, Luis Felipe Dias Lopes, João Francisco Pollo Gaspary","doi":"10.3389/fdgth.2025.1676631","DOIUrl":"10.3389/fdgth.2025.1676631","url":null,"abstract":"<p><strong>Background: </strong>Electronic Health Record (EHR) systems are central to digital health transformation, yet usability challenges continue to constrain their effectiveness, particularly in mental healthcare contexts.</p><p><strong>Objectives: </strong>To develop and describe a structured, user-centered framework for improving EHR usability based on a Brazilian outpatient mental health case study.</p><p><strong>Methods: </strong>This qualitative design research study, guided by the Double Diamond design methodology, followed four iterative phases (Discover, Define, Develop, Deliver) and conducted qualitative interviews with 21 healthcare professionals. Data were organized using the Certainties, Suppositions, and Doubts (CSD) matrix and triangulated through heuristic evaluation and prototype testing.</p><p><strong>Results: </strong>Key barriers included non-standardized navigation flows, limited integration with external systems, and inflexible documentation structures. Based on these findings, the study proposes design-driven improvements such as customizable templates, real-time validation features, and workflow-specific interface adjustments.</p><p><strong>Conclusions: </strong>By integrating service design logic with usability-driven interface adaptations and addressing both systemic usability gaps and contextual demands, this research contributes actionable insights for advancing human-centered EHR innovation, with particular relevance to complex mental healthcare workflows.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1676631"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12812964/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of multiple generative large language models on neurology board-style questions. 神经学试题中多生成大语言模型的评价。
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-05 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1737882
Mohammad Almomani, Vijaya Valaparla, James Weatherhead, Xiang Fang, Alok Dabi, Chih-Ying Li, Peter McCaffrey, Dan Hier, Jorge Mario Rodríguez-Fernández

Objective: To compare the performance of eight large language models (LLMs) with neurology residents on board-style multiple-choice questions across seven subspecialties and two cognitive levels.

Methods: In a cross-sectional benchmarking study, we evaluated Bard, Claude, Gemini v1, Gemini 2.5, ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, and ChatGPT-5 using 107 text-only items spanning movement disorders, vascular neurology, neuroanatomy, neuroimmunology, epilepsy, neuromuscular disease, and neuro-infectious disease. Items were labeled as lower- or higher-order per Bloom's taxonomy by two neurologists. Models answered each item in a fresh session and reported confidence and Bloom classification. Residents completed the same set under exam-like conditions. Outcomes included overall and domain accuracies, guessing-adjusted accuracy, confidence-accuracy calibration (Spearman ρ), agreement with expert Bloom labels (Cohen κ), and inter-generation scaling (linear regression of topic-level accuracies). Group differences used Fisher exact or χ 2 tests with Bonferroni correction.

Results: Residents scored 64.9%. ChatGPT-5 achieved 84.1% and ChatGPT-4o 81.3%, followed by Gemini 2.5 at 77.6% and ChatGPT-4 at 68.2%; Claude (56.1%), Bard (54.2%), ChatGPT-3.5 (53.3%), and Gemini v1 (39.3%) underperformed residents. On higher-order items, ChatGPT-5 (86%) and ChatGPT-4o (82.5%) maintained superiority; Gemini 2.5 matched 82.5%. Guessing-adjusted accuracy preserved rank order (ChatGPT-5 78.8%, ChatGPT-4o 75.1%, Gemini 2.5 70.1%). Confidence-accuracy calibration was weak across models. Inter-generation scaling was strong within the ChatGPT lineage (ChatGPT-4 to 4o R 2 = 0.765, p = 0.010; 4o to 5 R 2 = 0.908, p < 0.001) but absent for Gemini v1 to 2.5 (R2 = 0.002, p = 0.918), suggesting discontinuous improvements.

Conclusions: LLMs-particularly ChatGPT-5 and ChatGPT-4o-exceeded resident performance on text-based neurology board-style questions across subspecialties and cognitive levels. Gemini 2.5 showed substantial gains over v1 but with domain-uneven scaling. Given weak confidence calibration, LLMs should be integrated as supervised educational adjuncts with ongoing validation, version governance, and transparent metadata to support safe use in neurology education.

目的:比较8种大型语言模型(llm)与神经内科住院医师在7个亚专业和2个认知水平上的板式选择题的表现。方法:在一项横断面基准研究中,我们使用107个纯文本项目评估Bard、Claude、Gemini v1、Gemini 2.5、ChatGPT-3.5、ChatGPT-4、chatgpt - 40和ChatGPT-5,涵盖运动障碍、血管神经病学、神经解剖学、神经免疫学、癫痫、神经肌肉疾病和神经传染病。根据布鲁姆的分类法,两名神经学家将项目标记为低阶或高阶。模型在一个新的会话中回答每个问题,并报告信心和Bloom分类。居民们在类似考试的条件下完成了同样的测试。结果包括总体和领域精度、猜测校正精度、置信度精度校准(Spearman ρ)、与专家Bloom标签的一致性(Cohen κ)和代际标度(主题水平精度的线性回归)。组间差异采用Fisher精确检验或Bonferroni校正的χ 2检验。结果:居民得分为64.9%。ChatGPT-5达到84.1%,chatgpt - 40达到81.3%,其次是Gemini 2.5 77.6%, ChatGPT-4 68.2%;Claude(56.1%)、Bard(54.2%)、ChatGPT-3.5(53.3%)和Gemini v1(39.3%)表现不佳。在高阶项目上,ChatGPT-5(86%)和chatgpt - 40(82.5%)保持优势;双子座2.5匹配82.5%。猜测调整后的精度保持了排名顺序(ChatGPT-5 78.8%, chatgpt - 40 75.1%, Gemini 2.5 70.1%)。各模型的置信度-准确度校准较弱。在ChatGPT谱系中,代际尺度很强(ChatGPT-4 ~ 40 r2 = 0.765, p = 0.010; ChatGPT- 40 ~ 5 r2 = 0.908, p = 0.002, p = 0.918),表明改进是非连续的。结论:法学硕士——尤其是ChatGPT-5和chatgpt - 40——在基于文本的神经学板式问题上的表现超过住院医师,跨越亚专业和认知水平。Gemini 2.5在v1的基础上有了实质性的进步,但缩放范围不均匀。考虑到弱置信度校准,法学硕士应该作为受监督的教育辅助工具,与持续的验证、版本治理和透明的元数据相结合,以支持在神经学教育中的安全使用。
{"title":"Evaluation of multiple generative large language models on neurology board-style questions.","authors":"Mohammad Almomani, Vijaya Valaparla, James Weatherhead, Xiang Fang, Alok Dabi, Chih-Ying Li, Peter McCaffrey, Dan Hier, Jorge Mario Rodríguez-Fernández","doi":"10.3389/fdgth.2025.1737882","DOIUrl":"10.3389/fdgth.2025.1737882","url":null,"abstract":"<p><strong>Objective: </strong>To compare the performance of eight large language models (LLMs) with neurology residents on board-style multiple-choice questions across seven subspecialties and two cognitive levels.</p><p><strong>Methods: </strong>In a cross-sectional benchmarking study, we evaluated Bard, Claude, Gemini v1, Gemini 2.5, ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, and ChatGPT-5 using 107 text-only items spanning movement disorders, vascular neurology, neuroanatomy, neuroimmunology, epilepsy, neuromuscular disease, and neuro-infectious disease. Items were labeled as lower- or higher-order per Bloom's taxonomy by two neurologists. Models answered each item in a fresh session and reported confidence and Bloom classification. Residents completed the same set under exam-like conditions. Outcomes included overall and domain accuracies, guessing-adjusted accuracy, confidence-accuracy calibration (Spearman <i>ρ</i>), agreement with expert Bloom labels (Cohen <i>κ</i>), and inter-generation scaling (linear regression of topic-level accuracies). Group differences used Fisher exact or <i>χ</i> <sup>2</sup> tests with Bonferroni correction.</p><p><strong>Results: </strong>Residents scored 64.9%. ChatGPT-5 achieved 84.1% and ChatGPT-4o 81.3%, followed by Gemini 2.5 at 77.6% and ChatGPT-4 at 68.2%; Claude (56.1%), Bard (54.2%), ChatGPT-3.5 (53.3%), and Gemini v1 (39.3%) underperformed residents. On higher-order items, ChatGPT-5 (86%) and ChatGPT-4o (82.5%) maintained superiority; Gemini 2.5 matched 82.5%. Guessing-adjusted accuracy preserved rank order (ChatGPT-5 78.8%, ChatGPT-4o 75.1%, Gemini 2.5 70.1%). Confidence-accuracy calibration was weak across models. Inter-generation scaling was strong within the ChatGPT lineage (ChatGPT-4 to 4o <i>R</i> <sup>2</sup> = 0.765, <i>p</i> = 0.010; 4o to 5 <i>R</i> <sup>2</sup> = 0.908, <i>p</i> < 0.001) but absent for Gemini v1 to 2.5 (R<sup>2</sup> = 0.002, <i>p</i> = 0.918), suggesting discontinuous improvements.</p><p><strong>Conclusions: </strong>LLMs-particularly ChatGPT-5 and ChatGPT-4o-exceeded resident performance on text-based neurology board-style questions across subspecialties and cognitive levels. Gemini 2.5 showed substantial gains over v1 but with domain-uneven scaling. Given weak confidence calibration, LLMs should be integrated as supervised educational adjuncts with ongoing validation, version governance, and transparent metadata to support safe use in neurology education.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1737882"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12813092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transforming clinical reasoning-the role of AI in supporting human cognitive limitations. 转变临床推理——人工智能在支持人类认知局限中的作用。
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-05 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1715440
Colin John Greengrass

Clinical reasoning is foundational to medical practice, requiring clinicians to synthesise complex information, recognise patterns, and apply causal reasoning to reach accurate diagnoses and guide patient management. However, human cognition is inherently limited by factors such as limitations in working memory capacity, constraints in cognitive load, a general reliance on heuristics; with an inherent vulnerability to biases including anchoring, availability bias, and premature closure. Cognitive fatigue and cognitive overload, particularly apparent in high-pressure environments, further compromise diagnostic accuracy and efficiency. Artificial intelligence (AI) presents a transformative opportunity to overcome these limitations by supplementing and supporting decision-making. With AI's advanced computational capabilities, these systems can analyse large datasets, detect subtle or atypical patterns, and provide accurate evidence-based diagnoses. Furthermore, by leveraging machine learning and probabilistic modelling, AI reduces dependence on incomplete heuristics and potentially mitigates cognitive biases. It also ensures consistent performance, unaffected by fatigue or information overload. These attributes likely make AI an invaluable tool for enhancing the accuracy and efficiency of diagnostic reasoning. Through a narrative review, this article examines the cognitive limitations inherent in diagnostic reasoning and considers how AI can be positioned as a collaborative partner in addressing them. Drawing on the concept of Mutual Theory of Mind, the author identifies a set of indicators that should inform the design of future frameworks for human-AI interaction in clinical decision-making. These highlight how AI could dynamically adapt to human reasoning states, reduce bias, and promote more transparent and adaptive diagnostic support in high-stakes clinical environments.

临床推理是医疗实践的基础,要求临床医生综合复杂的信息,识别模式,并应用因果推理来达到准确的诊断和指导患者管理。然而,人类的认知本质上受到诸如工作记忆容量的限制、认知负荷的限制、对启发式的普遍依赖等因素的限制;具有固有的偏见脆弱性,包括锚定,可用性偏见和过早关闭。认知疲劳和认知超载,特别是在高压环境中,进一步损害诊断的准确性和效率。人工智能(AI)提供了一个变革性的机会,通过补充和支持决策来克服这些限制。借助人工智能先进的计算能力,这些系统可以分析大型数据集,检测细微或非典型模式,并提供准确的循证诊断。此外,通过利用机器学习和概率建模,人工智能减少了对不完全启发式的依赖,并可能减轻认知偏见。它还确保一致的性能,不受疲劳或信息过载的影响。这些属性可能使人工智能成为提高诊断推理准确性和效率的宝贵工具。通过叙述回顾,本文研究了诊断推理固有的认知局限性,并考虑了如何将人工智能定位为解决这些问题的合作伙伴。根据相互心理理论的概念,作者确定了一组指标,这些指标应该为临床决策中人类与人工智能交互的未来框架的设计提供信息。这些突出了人工智能如何动态适应人类的推理状态,减少偏见,并在高风险的临床环境中促进更透明和适应性的诊断支持。
{"title":"Transforming clinical reasoning-the role of AI in supporting human cognitive limitations.","authors":"Colin John Greengrass","doi":"10.3389/fdgth.2025.1715440","DOIUrl":"10.3389/fdgth.2025.1715440","url":null,"abstract":"<p><p>Clinical reasoning is foundational to medical practice, requiring clinicians to synthesise complex information, recognise patterns, and apply causal reasoning to reach accurate diagnoses and guide patient management. However, human cognition is inherently limited by factors such as limitations in working memory capacity, constraints in cognitive load, a general reliance on heuristics; with an inherent vulnerability to biases including anchoring, availability bias, and premature closure. Cognitive fatigue and cognitive overload, particularly apparent in high-pressure environments, further compromise diagnostic accuracy and efficiency. Artificial intelligence (AI) presents a transformative opportunity to overcome these limitations by supplementing and supporting decision-making. With AI's advanced computational capabilities, these systems can analyse large datasets, detect subtle or atypical patterns, and provide accurate evidence-based diagnoses. Furthermore, by leveraging machine learning and probabilistic modelling, AI reduces dependence on incomplete heuristics and potentially mitigates cognitive biases. It also ensures consistent performance, unaffected by fatigue or information overload. These attributes likely make AI an invaluable tool for enhancing the accuracy and efficiency of diagnostic reasoning. Through a narrative review, this article examines the cognitive limitations inherent in diagnostic reasoning and considers how AI can be positioned as a collaborative partner in addressing them. Drawing on the concept of <i>Mutual Theory of Mind</i>, the author identifies a set of indicators that should inform the design of future frameworks for human-AI interaction in clinical decision-making. These highlight how AI could dynamically adapt to human reasoning states, reduce bias, and promote more transparent and adaptive diagnostic support in high-stakes clinical environments.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1715440"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12813117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating AI-driven precision oncology for breast cancer in low- and middle-income countries: a review of machine learning performance, genomic data use, and clinical feasibility. 评估中低收入国家人工智能驱动的乳腺癌精准肿瘤学:机器学习性能、基因组数据使用和临床可行性综述
IF 3.2 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2026-01-02 eCollection Date: 2025-01-01 DOI: 10.3389/fdgth.2025.1702339
Luis Fabián Salazar-Garcés, Elizabeth Morales-Urrutia, Franklin Cashabamba, Ricardo Xavier Proaño Alulema, Lizette Elena Leiva Suero

Background: Artificial intelligence (AI) systems are increasingly used to support treatment decision-making in breast cancer, yet their performance and feasibility in low- and middle-income countries (LMICs) remain incompletely defined. Many high-performing models, particularly genomic and multimodal systems trained on The Cancer Genome Atlas (TCGA), raise questions about cross-domain generalizability and equity.

Methods: We conducted an AI-assisted scoping review combining Boolean database searches with semantic retrieval tools (Elicit, Semantic Scholar, Connected Papers). From 497 unique records, 43 studies met inclusion criteria and 34 reported quantitative metrics. Data extraction included study design, AI model type (treatment-recommendation, prognostic, or diagnostic/subtyping), input modalities, and validation strategies. Risk of bias was assessed using a hybrid PROBAST-AI/QUADAS-AI framework.

Results: Treatment-recommendation systems (e.g., WFO, Navya) showed concordance ranges of 67%-97% in early-stage settings but markedly lower performance in metastatic disease. Prognostic and multimodal models frequently achieved AUCs of 0.90-0.99. HIC-trained genomic models demonstrated consistent declines during external LMIC validation (e.g., CDK4/6 response model: AUC 0.9956 → 0.9795). LMIC implementations reported reduced time-to-treatment and improved adherence to guidelines, but these gains were constrained by gaps in electronic health records, limited digital pathology, and insufficient local genomic testing capacity.

Conclusions: AI-enabled systems show promise for improving breast cancer treatment planning, especially in early-stage disease and resource-limited settings. However, the evidence base remains dominated by HIC-derived datasets and retrospective analyses, with persistent challenges related to domain shift, data representativeness, and genomic governance. Advancing equitable AI-driven oncology will require prospective multicenter validation, expanded LMIC-based data generation, and context-specific implementation strategies.

背景:人工智能(AI)系统越来越多地用于支持乳腺癌的治疗决策,但其在低收入和中等收入国家(LMICs)的表现和可行性仍不完全确定。许多高性能的模型,特别是在癌症基因组图谱(TCGA)上训练的基因组和多模态系统,提出了关于跨领域概括性和公平性的问题。方法:我们结合布尔数据库搜索和语义检索工具(Elicit, semantic Scholar, Connected Papers)进行了人工智能辅助的范围审查。从497个独特的记录中,43个研究符合纳入标准,34个报告了定量指标。数据提取包括研究设计、人工智能模型类型(治疗推荐、预后或诊断/亚型)、输入方式和验证策略。使用混合PROBAST-AI/QUADAS-AI框架评估偏倚风险。结果:治疗推荐系统(如WFO, Navya)在早期环境下的一致性范围为67%-97%,但在转移性疾病中的表现明显较低。预后和多模态模型的auc通常为0.90-0.99。hic训练的基因组模型在外部LMIC验证期间显示出一致的下降(例如,CDK4/6响应模型:AUC为0.9956→0.9795)。LMIC的实施报告说,治疗时间缩短了,对指南的遵守情况得到了改善,但这些成果受到电子健康记录方面的差距、数字病理有限以及地方基因组检测能力不足的制约。结论:人工智能系统有望改善乳腺癌治疗计划,特别是在早期疾病和资源有限的环境中。然而,证据基础仍然以hic衍生的数据集和回顾性分析为主,存在与领域转移、数据代表性和基因组治理相关的持续挑战。推进公平的人工智能驱动的肿瘤学将需要前瞻性的多中心验证,扩大基于lmic的数据生成,以及针对具体情况的实施策略。
{"title":"Evaluating AI-driven precision oncology for breast cancer in low- and middle-income countries: a review of machine learning performance, genomic data use, and clinical feasibility.","authors":"Luis Fabián Salazar-Garcés, Elizabeth Morales-Urrutia, Franklin Cashabamba, Ricardo Xavier Proaño Alulema, Lizette Elena Leiva Suero","doi":"10.3389/fdgth.2025.1702339","DOIUrl":"10.3389/fdgth.2025.1702339","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) systems are increasingly used to support treatment decision-making in breast cancer, yet their performance and feasibility in low- and middle-income countries (LMICs) remain incompletely defined. Many high-performing models, particularly genomic and multimodal systems trained on The Cancer Genome Atlas (TCGA), raise questions about cross-domain generalizability and equity.</p><p><strong>Methods: </strong>We conducted an AI-assisted scoping review combining Boolean database searches with semantic retrieval tools (Elicit, Semantic Scholar, Connected Papers). From 497 unique records, 43 studies met inclusion criteria and 34 reported quantitative metrics. Data extraction included study design, AI model type (treatment-recommendation, prognostic, or diagnostic/subtyping), input modalities, and validation strategies. Risk of bias was assessed using a hybrid PROBAST-AI/QUADAS-AI framework.</p><p><strong>Results: </strong>Treatment-recommendation systems (e.g., WFO, Navya) showed concordance ranges of 67%-97% in early-stage settings but markedly lower performance in metastatic disease. Prognostic and multimodal models frequently achieved AUCs of 0.90-0.99. HIC-trained genomic models demonstrated consistent declines during external LMIC validation (e.g., CDK4/6 response model: AUC 0.9956 → 0.9795). LMIC implementations reported reduced time-to-treatment and improved adherence to guidelines, but these gains were constrained by gaps in electronic health records, limited digital pathology, and insufficient local genomic testing capacity.</p><p><strong>Conclusions: </strong>AI-enabled systems show promise for improving breast cancer treatment planning, especially in early-stage disease and resource-limited settings. However, the evidence base remains dominated by HIC-derived datasets and retrospective analyses, with persistent challenges related to domain shift, data representativeness, and genomic governance. Advancing equitable AI-driven oncology will require prospective multicenter validation, expanded LMIC-based data generation, and context-specific implementation strategies.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1702339"},"PeriodicalIF":3.2,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12808440/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in digital health
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1