Pub Date : 2026-01-06eCollection Date: 2025-01-01DOI: 10.3389/fdgth.2025.1693060
Alina Petrica, Adina Maria Marza, Claudiu Barsac, Andreea Cebzan, Ioan Dragan, Daniela Zaharie, Raluca Horhat, Diana Lungeanu
Background: The triage process in emergency departments (EDs) is complex, and AI-based solutions have begun to target it. At this pivotal stage, the challenge lies less in designing smarter algorithms than in fostering trust and alignment among medical and technical stakeholders. We explored professional attitudes towards AI in ED triage, focusing on alignments and misalignments across backgrounds.
Methods: An anonymous online cross-sectional survey was distributed through professional networks of healthcare providers and IT professionals, between May 2024 and February 2025. The questionnaire covered four areas: (a) the General Attitudes towards Artificial Intelligence Scale (GAAIS); (b) professional background and career level; (c) challenges and priorities for AI applications in triage; and (d) the AI Attitude Scale (AIAS-4). Constructs from the extended Unified Theory of Acceptance and Use of Technology (UTAUT2) were also applied. Cluster analysis (KMeans) was conducted based on GAAIS-positive, GAAIS-negative, and AIAS-4 scores.
Results: From a total of 151 professionals, Kmeans identified three clusters: K0 (cautious/critical, n = 39), K1 (enthusiastic/optimistic, n = 35), and K2 (balanced/pragmatic, n = 77). Approximately two-thirds of K2 (47/77; 61%) were healthcare providers. Six out of 20 (30%) medical professionals in K0 reported that AI could play no role in ED triage, but only 1/15 (7%) and 1/47 (2%) of healthcare providers gave this response in K1 and K2, respectively. Lack of knowledge of AI tools was also most frequent in K0 (14/39; 36%). Recognition of necessity of constraints showed marked contrasts in their mean ± SD scores: (a) for data availability/quality, 2.95 ± 1.98 (K0), 4.27 ± 1.1 (K1), and 4.20 ± 0.94 (K2); (b) for the integration of AI-based applications into existing workflows, 2.95 ± 1.05, 4.20 ± 0.94, and 3.47 ± 1.02 in K0, K1, and K2, respectively. Among the UTAUT2 constructs, hedonic motivation differed most significantly, with mean ± SD values of 3.41 ± 1.0 (K0), 6.86 ± 0.97 (K1), and 5.07 ± 1.08 (K2).
Conclusions: Stakeholders' perspectives on AI in ED triage are heterogeneous and not solely determined by professional background or role. Hedonic motivation emerged as a key driver of enthusiasm. Educational strategies should follow two directions: (a) structured AI programs for enthusiastic developers from diverse fields, and (b) AI literacy for all healthcare professionals to support competent use as consumers.
{"title":"Artificial intelligence in emergency department triage: perspective of human professionals.","authors":"Alina Petrica, Adina Maria Marza, Claudiu Barsac, Andreea Cebzan, Ioan Dragan, Daniela Zaharie, Raluca Horhat, Diana Lungeanu","doi":"10.3389/fdgth.2025.1693060","DOIUrl":"10.3389/fdgth.2025.1693060","url":null,"abstract":"<p><strong>Background: </strong>The triage process in emergency departments (EDs) is complex, and AI-based solutions have begun to target it. At this pivotal stage, the challenge lies less in designing smarter algorithms than in fostering trust and alignment among medical and technical stakeholders. We explored professional attitudes towards AI in ED triage, focusing on alignments and misalignments across backgrounds.</p><p><strong>Methods: </strong>An anonymous online cross-sectional survey was distributed through professional networks of healthcare providers and IT professionals, between May 2024 and February 2025. The questionnaire covered four areas: (a) the General Attitudes towards Artificial Intelligence Scale (GAAIS); (b) professional background and career level; (c) challenges and priorities for AI applications in triage; and (d) the AI Attitude Scale (AIAS-4). Constructs from the extended Unified Theory of Acceptance and Use of Technology (UTAUT2) were also applied. Cluster analysis (<i>KMeans</i>) was conducted based on GAAIS-positive, GAAIS-negative, and AIAS-4 scores.</p><p><strong>Results: </strong>From a total of 151 professionals, <i>Kmeans</i> identified three clusters: K0 (cautious/critical, <i>n</i> = 39), K1 (enthusiastic/optimistic, <i>n</i> = 35), and K2 (balanced/pragmatic, <i>n</i> = 77). Approximately two-thirds of K2 (47/77; 61%) were healthcare providers. Six out of 20 (30%) medical professionals in K0 reported that AI could play no role in ED triage, but only 1/15 (7%) and 1/47 (2%) of healthcare providers gave this response in K1 and K2, respectively. Lack of knowledge of AI tools was also most frequent in K0 (14/39; 36%). Recognition of necessity of constraints showed marked contrasts in their mea<i>n</i> ± SD scores: (a) for data availability/quality, 2.95 ± 1.98 (K0), 4.27 ± 1.1 (K1), and 4.20 ± 0.94 (K2); (b) for the integration of AI-based applications into existing workflows, 2.95 ± 1.05, 4.20 ± 0.94, and 3.47 ± 1.02 in K0, K1, and K2, respectively. Among the UTAUT2 constructs, hedonic motivation differed most significantly, with mean ± SD values of 3.41 ± 1.0 (K0), 6.86 ± 0.97 (K1), and 5.07 ± 1.08 (K2).</p><p><strong>Conclusions: </strong>Stakeholders' perspectives on AI in ED triage are heterogeneous and not solely determined by professional background or role. Hedonic motivation emerged as a key driver of enthusiasm. Educational strategies should follow two directions: (a) structured AI programs for enthusiastic developers from diverse fields, and (b) AI literacy for all healthcare professionals to support competent use as consumers.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1693060"},"PeriodicalIF":3.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12816261/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06eCollection Date: 2025-01-01DOI: 10.3389/fdgth.2025.1732759
Donald C Wunsch Iii, Daniel B Hier
Large language models have the potential to transform neurology by augmenting diagnostic reasoning, streamlining documentation, and improving workflow efficiency. This Mini Review surveys emerging applications of large language models in Alzheimer's disease, Parkinson's disease, multiple sclerosis, and epilepsy, with emphasis on ambient documentation, multimodal data integration, and clinical decision support. Key barriers to adoption include bias, privacy, reliability, and regulatory alignment. Looking ahead, neurology-focused language models may develop greater fluency in biomedical ontologies and FHIR standards, improving data interoperability and supporting more seamless collaboration between clinicians and AI systems. Two future developments have the potential to be particularly impactful: (1) the integration of multi-omic and neuroimaging data with digital-twin simulations to advance precision neurology, and (2) broader adoption of ambient documentation and other language-model-based efficiencies that could reduce administrative and cognitive burden. Ultimately, the clinical success of large language models will depend on continued progress in model robustness, ethical governance, and careful implementation.
{"title":"Large language models for neurology: a mini review.","authors":"Donald C Wunsch Iii, Daniel B Hier","doi":"10.3389/fdgth.2025.1732759","DOIUrl":"10.3389/fdgth.2025.1732759","url":null,"abstract":"<p><p>Large language models have the potential to transform neurology by augmenting diagnostic reasoning, streamlining documentation, and improving workflow efficiency. This Mini Review surveys emerging applications of large language models in Alzheimer's disease, Parkinson's disease, multiple sclerosis, and epilepsy, with emphasis on ambient documentation, multimodal data integration, and clinical decision support. Key barriers to adoption include bias, privacy, reliability, and regulatory alignment. Looking ahead, neurology-focused language models may develop greater fluency in biomedical ontologies and FHIR standards, improving data interoperability and supporting more seamless collaboration between clinicians and AI systems. Two future developments have the potential to be particularly impactful: (1) the integration of multi-omic and neuroimaging data with digital-twin simulations to advance precision neurology, and (2) broader adoption of ambient documentation and other language-model-based efficiencies that could reduce administrative and cognitive burden. Ultimately, the clinical success of large language models will depend on continued progress in model robustness, ethical governance, and careful implementation.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1732759"},"PeriodicalIF":3.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12816337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: The widespread use of internet and digital devices has been accompanied by growing concern regarding harms associated with their excessive or problematic use. The World Health Organization has also formally included some of these in its latest classificatory system (ICD-11) under the category of "disorders due to addictive behaviours". However, a validated, comprehensive screening tool aligned with ICD-11 that screens for these potentially addictive behaviours is lacking. This study aimed to develop and validate the Screening Tool for Excessive and Problematic use of Internet and Digital Devices (STEPS-IDD), designed to assess multiple addictive behaviours based on ICD-11 criteria.
Methods: STEPS-IDD was developed based on the ICD-11 framework for disorders due to addictive behaviours It was applied to assess well-established behavioural addictions like gaming and gambling disorder, as well as less-established but widely researched ones such as problematic use of social media, online shopping/buying, OTT content watching, and pornography watching. Face validity was established through expert review and feedback. Construct validity was evaluated through exploratory factor analysis (EFA), and Cronbach's alpha coefficients were estimated to assess internal consistency. To examine concurrent validity, correlations between scores obtained on the newly developed STEPS-IDD sub-sections and the previously validated Gaming Disorder and Hazardous Gaming Scale (GDHGS) and modified GDHGS for other behaviours were assessed. Receiver Operating Characteristic (ROC) analyses were conducted to determine optimal STEPS-IDD cut-off scores for different behaviours.
Results: Data from a total of 112 college students (64.3% female) with a mean age of 20.5 years were analyzed. STEPS-IDD demonstrated good construct validity, with EFA revealing predominantly unidimensional factor structure for most behavioural domains. Internal consistency was excellent (Cronbach's α = 0.86-0.91 across sub-sections). Concurrent validity was supported by moderate to strong positive correlations (r = 0.44-0.76) of STEPS-IDD sub-sections with corresponding GDHGS and modified GDHGS scores. ROC analyses yielded optimal cut-off scores with high sensitivity and acceptable specificity for different behaviours, and fair to excellent overall diagnostic accuracy.
Conclusion: STEPS-IDD is a psychometrically robust, brief yet comprehensive screening tool grounded in the ICD-11 framework, for the risk stratification in the context of addictive behaviours related to the use of the internet and digital devices.
{"title":"Development and validation of screening tool for excessive and problematic use of internet and digital devices (STEPS-IDD) based on the WHO framework (ICD-11) for addictive behaviours.","authors":"Yatan Pal Singh Balhara, Swarndeep Singh, Ilika Guha Majumdar, Ayesha Ayoob, Aastha Singh","doi":"10.3389/fdgth.2025.1671623","DOIUrl":"10.3389/fdgth.2025.1671623","url":null,"abstract":"<p><strong>Background: </strong>The widespread use of internet and digital devices has been accompanied by growing concern regarding harms associated with their excessive or problematic use. The World Health Organization has also formally included some of these in its latest classificatory system (ICD-11) under the category of \"disorders due to addictive behaviours\". However, a validated, comprehensive screening tool aligned with ICD-11 that screens for these potentially addictive behaviours is lacking. This study aimed to develop and validate the Screening Tool for Excessive and Problematic use of Internet and Digital Devices (STEPS-IDD), designed to assess multiple addictive behaviours based on ICD-11 criteria.</p><p><strong>Methods: </strong>STEPS-IDD was developed based on the ICD-11 framework for disorders due to addictive behaviours It was applied to assess well-established behavioural addictions like gaming and gambling disorder, as well as less-established but widely researched ones such as problematic use of social media, online shopping/buying, OTT content watching, and pornography watching. Face validity was established through expert review and feedback. Construct validity was evaluated through exploratory factor analysis (EFA), and Cronbach's alpha coefficients were estimated to assess internal consistency. To examine concurrent validity, correlations between scores obtained on the newly developed STEPS-IDD sub-sections and the previously validated Gaming Disorder and Hazardous Gaming Scale (GDHGS) and modified GDHGS for other behaviours were assessed. Receiver Operating Characteristic (ROC) analyses were conducted to determine optimal STEPS-IDD cut-off scores for different behaviours.</p><p><strong>Results: </strong>Data from a total of 112 college students (64.3% female) with a mean age of 20.5 years were analyzed. STEPS-IDD demonstrated good construct validity, with EFA revealing predominantly unidimensional factor structure for most behavioural domains. Internal consistency was excellent (Cronbach's α = 0.86-0.91 across sub-sections). Concurrent validity was supported by moderate to strong positive correlations (r = 0.44-0.76) of STEPS-IDD sub-sections with corresponding GDHGS and modified GDHGS scores. ROC analyses yielded optimal cut-off scores with high sensitivity and acceptable specificity for different behaviours, and fair to excellent overall diagnostic accuracy.</p><p><strong>Conclusion: </strong>STEPS-IDD is a psychometrically robust, brief yet comprehensive screening tool grounded in the ICD-11 framework, for the risk stratification in the context of addictive behaviours related to the use of the internet and digital devices.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1671623"},"PeriodicalIF":3.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12816239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: For neural prosthetic devices, accurate classification of high dimensional electroencephalography (EEG) signals is significantly impaired by the existence of redundant and irrelevant features that deteriorate the classifier generalization and computation efficiency. This work presents a new and unified optimal-driven framework to challenge these issues and improve EEG-based MI signal decoding.
Methods: The proposed method combines a modified feature selection model of coati optimization algorithm (COA) and different machine/deep learning classifiers. The novelty of the COA is its dynamic and parameter-free adaptation mechanism, in association with opposition-based learning a better exploration exploitation balance can be maintained in high-dimensional feature space. The generated optimized feature subsets are then employed to train a battery of classifiers such as support vector machines (SVM), random forests (RF), convolutional neural networks (CNN) and recurrent neural networks (RNN) for motor imagery task recognition. In experiments, we verify SSRC on commonly used benchmark EEG datasets such as the PhysioNet Motor Movement/Imagery dataset.
Results: The experimental results showed that the COA + CNN model had the best performance of classification. The model demonstrated a classification accuracy of 96.8% of prediction, with precision at moderate AH hour and predicted as either being more likely to discharge or remain in care = 96.4%, recall = 96.9% and F1-score = 96.6%. This presents a remarkable 6.5% gain in classification accuracy over the best rival feature selection technique and significantly outperformed conventional metaheuristic algorithms such as PSO (90.3% accuracy) and GA (89.7% accuracy) as well as filter-type techniques such as mRMR (86.8%) and ReliefF (84.3%).
Discussion/conclusion: The combined evolved metaheatistic for feature subset selection with deep learning architectures is a powerful approach for an accurate classification EEG signals. The findings confirm that the COA-based approach provides a robust, computationally-efficient, and scalable method for achieving high-accuracy classification-essential for promoting the reliability and real-time operation of future neural prosthetic control systems.
{"title":"Advanced EEG signal classification for neural prosthetic devices using metaheuristic and deep learning techniques.","authors":"Thippagudisa Kishore Babu, Damodar Reddy Edla, Suresh Dara, Mohan Allam","doi":"10.3389/fdgth.2025.1706660","DOIUrl":"10.3389/fdgth.2025.1706660","url":null,"abstract":"<p><strong>Introduction: </strong>For neural prosthetic devices, accurate classification of high dimensional electroencephalography (EEG) signals is significantly impaired by the existence of redundant and irrelevant features that deteriorate the classifier generalization and computation efficiency. This work presents a new and unified optimal-driven framework to challenge these issues and improve EEG-based MI signal decoding.</p><p><strong>Methods: </strong>The proposed method combines a modified feature selection model of coati optimization algorithm (COA) and different machine/deep learning classifiers. The novelty of the COA is its dynamic and parameter-free adaptation mechanism, in association with opposition-based learning a better exploration exploitation balance can be maintained in high-dimensional feature space. The generated optimized feature subsets are then employed to train a battery of classifiers such as support vector machines (SVM), random forests (RF), convolutional neural networks (CNN) and recurrent neural networks (RNN) for motor imagery task recognition. In experiments, we verify SSRC on commonly used benchmark EEG datasets such as the PhysioNet Motor Movement/Imagery dataset.</p><p><strong>Results: </strong>The experimental results showed that the COA + CNN model had the best performance of classification. The model demonstrated a classification accuracy of 96.8% of prediction, with precision at moderate AH hour and predicted as either being more likely to discharge or remain in care = 96.4%, recall = 96.9% and F1-score = 96.6%. This presents a remarkable 6.5% gain in classification accuracy over the best rival feature selection technique and significantly outperformed conventional metaheuristic algorithms such as PSO (90.3% accuracy) and GA (89.7% accuracy) as well as filter-type techniques such as mRMR (86.8%) and ReliefF (84.3%).</p><p><strong>Discussion/conclusion: </strong>The combined evolved metaheatistic for feature subset selection with deep learning architectures is a powerful approach for an accurate classification EEG signals. The findings confirm that the COA-based approach provides a robust, computationally-efficient, and scalable method for achieving high-accuracy classification-essential for promoting the reliability and real-time operation of future neural prosthetic control systems.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1706660"},"PeriodicalIF":3.2,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12815840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05eCollection Date: 2025-01-01DOI: 10.3389/fdgth.2025.1667423
Alora Brown, Joeri Tulkens, Maxime Mattelin, Tanguy Sanglet, Brecht Dhuyvetters
Background: Remote photoplethysmography (rPPG) is a non-invasive method that accurately measures clinical biomarkers, including heart rate, respiration rate, heart rate variability, blood pressure and oxygen saturation. The contactless technique relies on standard cameras and ambient light, proving highly accessible and significant for the assessment of general health. Despite its potential, comprehensive research on rPPG applications for health assessment is scarce.
Objective: This review summarizes the current state of knowledge on rPPG health assessments, covering both fundamental physiological monitoring and higher-level health insights. The paper consults the rPPG-based HealthTech company, IntelliProve, as a real-world example to identify relevant outputs that are currently applied in everyday settings.
Methods: A literature review was performed to identify validated physiological biomarkers and emerging health metrics in rPPG research, using Google Scholar, PubMed and Scopus.
Results: The search identified 96 relevant studies, of which 54 directly investigated rPPG-related technologies. The remaining papers provided theoretical context and complementary support relevant to rPPG-based health metrics. Similarly to IntelliProve's approach, several studies combined rPPG with additional inputs to enhance the accuracy of complex health assessments, such as sleep quality evaluation. The review identified well-established health outputs, including heart rate, respiratory rate, heart rate variability, hypertension risk and mental stress detection, as well as exploratory health metrics, including the assessment of mental health risk energy levels, sleep quality and resonant breathing state. To the author's knowledge, existing literature heavily focuses on basic vitals derivation, with limited research into rPPG's broader health applications.
Conclusions: This review synthesizes rPPG-based health applications, demonstrating strong evidence for fundamental physiological monitoring and an increasing interest in higher-level health metrics. Overall, this paper establishes the groundwork for continued research into the growing application of rPPG for health assessments.
{"title":"Remote photoplethysmography for health assessment: a review informed by IntelliProve technology.","authors":"Alora Brown, Joeri Tulkens, Maxime Mattelin, Tanguy Sanglet, Brecht Dhuyvetters","doi":"10.3389/fdgth.2025.1667423","DOIUrl":"10.3389/fdgth.2025.1667423","url":null,"abstract":"<p><strong>Background: </strong>Remote photoplethysmography (rPPG) is a non-invasive method that accurately measures clinical biomarkers, including heart rate, respiration rate, heart rate variability, blood pressure and oxygen saturation. The contactless technique relies on standard cameras and ambient light, proving highly accessible and significant for the assessment of general health. Despite its potential, comprehensive research on rPPG applications for health assessment is scarce.</p><p><strong>Objective: </strong>This review summarizes the current state of knowledge on rPPG health assessments, covering both fundamental physiological monitoring and higher-level health insights. The paper consults the rPPG-based HealthTech company, IntelliProve, as a real-world example to identify relevant outputs that are currently applied in everyday settings.</p><p><strong>Methods: </strong>A literature review was performed to identify validated physiological biomarkers and emerging health metrics in rPPG research, using Google Scholar, PubMed and Scopus.</p><p><strong>Results: </strong>The search identified 96 relevant studies, of which 54 directly investigated rPPG-related technologies. The remaining papers provided theoretical context and complementary support relevant to rPPG-based health metrics. Similarly to IntelliProve's approach, several studies combined rPPG with additional inputs to enhance the accuracy of complex health assessments, such as sleep quality evaluation. The review identified well-established health outputs, including heart rate, respiratory rate, heart rate variability, hypertension risk and mental stress detection, as well as exploratory health metrics, including the assessment of mental health risk energy levels, sleep quality and resonant breathing state. To the author's knowledge, existing literature heavily focuses on basic vitals derivation, with limited research into rPPG's broader health applications.</p><p><strong>Conclusions: </strong>This review synthesizes rPPG-based health applications, demonstrating strong evidence for fundamental physiological monitoring and an increasing interest in higher-level health metrics. Overall, this paper establishes the groundwork for continued research into the growing application of rPPG for health assessments.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1667423"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12812591/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05eCollection Date: 2025-01-01DOI: 10.3389/fdgth.2025.1687820
Teerachate Nantakeeratipat
Background: Large Language Models (LLMs) in healthcare holds immense promise yet carries the risk of perpetuating social biases. While artificial intelligence (AI) fairness is a growing concern, a gap exists in understanding how these models perform under conditions of clinical ambiguity, a common feature in real-world practice.
Methods: We conducted a study using an ambiguity-probe methodology with a set of 42 sociodemographic personas and 15 clinical vignettes based on the 2018 classification of periodontal diseases. Ten were clear-cut scenarios with established ground truths, while five were intentionally ambiguous. OpenAI's GPT-4o and Google's Gemini 2.5 Pro were prompted to provide periodontal stage and grade assessments using 630 vignette-persona combinations per model.
Results: In clear-cut scenarios, GPT-4o demonstrated significantly higher combined (stage and grade) accuracy (70.5%) than Gemini Pro (33.3%). However, a robust fairness analysis using cumulative link models with false discovery rate correction revealed no statistically significant sociodemographic bias in either model. This finding held true across both clear-cut and ambiguous clinical scenarios.
Conclusion: To our knowledge, this is among the first study to use simulated clinical ambiguity to reveal the distinct ethical fingerprints of LLMs in a dental context. While LLM performance gaps exist, our analysis decouples accuracy from fairness, demonstrating that both models maintain sociodemographic neutrality. We identify that the observed errors are not bias, but rather diagnostic boundary instability. This highlights a critical need for future research to differentiate between these two distinct types of model failure to build genuinely reliable AI.
{"title":"Large language model bias auditing for periodontal diagnosis using an ambiguity-probe methodology: a pilot study.","authors":"Teerachate Nantakeeratipat","doi":"10.3389/fdgth.2025.1687820","DOIUrl":"10.3389/fdgth.2025.1687820","url":null,"abstract":"<p><strong>Background: </strong>Large Language Models (LLMs) in healthcare holds immense promise yet carries the risk of perpetuating social biases. While artificial intelligence (AI) fairness is a growing concern, a gap exists in understanding how these models perform under conditions of clinical ambiguity, a common feature in real-world practice.</p><p><strong>Methods: </strong>We conducted a study using an ambiguity-probe methodology with a set of 42 sociodemographic personas and 15 clinical vignettes based on the 2018 classification of periodontal diseases. Ten were clear-cut scenarios with established ground truths, while five were intentionally ambiguous. OpenAI's GPT-4o and Google's Gemini 2.5 Pro were prompted to provide periodontal stage and grade assessments using 630 vignette-persona combinations per model.</p><p><strong>Results: </strong>In clear-cut scenarios, GPT-4o demonstrated significantly higher combined (stage and grade) accuracy (70.5%) than Gemini Pro (33.3%). However, a robust fairness analysis using cumulative link models with false discovery rate correction revealed no statistically significant sociodemographic bias in either model. This finding held true across both clear-cut and ambiguous clinical scenarios.</p><p><strong>Conclusion: </strong>To our knowledge, this is among the first study to use simulated clinical ambiguity to reveal the distinct ethical fingerprints of LLMs in a dental context. While LLM performance gaps exist, our analysis decouples accuracy from fairness, demonstrating that both models maintain sociodemographic neutrality. We identify that the observed errors are not bias, but rather diagnostic boundary instability. This highlights a critical need for future research to differentiate between these two distinct types of model failure to build genuinely reliable AI.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1687820"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12812596/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05eCollection Date: 2025-01-01DOI: 10.3389/fdgth.2025.1676631
Fernanda Peron Gaspary, Daniel Baia Amaral, Cristian Vinicius Fagundes, Luis Felipe Dias Lopes, João Francisco Pollo Gaspary
Background: Electronic Health Record (EHR) systems are central to digital health transformation, yet usability challenges continue to constrain their effectiveness, particularly in mental healthcare contexts.
Objectives: To develop and describe a structured, user-centered framework for improving EHR usability based on a Brazilian outpatient mental health case study.
Methods: This qualitative design research study, guided by the Double Diamond design methodology, followed four iterative phases (Discover, Define, Develop, Deliver) and conducted qualitative interviews with 21 healthcare professionals. Data were organized using the Certainties, Suppositions, and Doubts (CSD) matrix and triangulated through heuristic evaluation and prototype testing.
Results: Key barriers included non-standardized navigation flows, limited integration with external systems, and inflexible documentation structures. Based on these findings, the study proposes design-driven improvements such as customizable templates, real-time validation features, and workflow-specific interface adjustments.
Conclusions: By integrating service design logic with usability-driven interface adaptations and addressing both systemic usability gaps and contextual demands, this research contributes actionable insights for advancing human-centered EHR innovation, with particular relevance to complex mental healthcare workflows.
{"title":"A structured framework to improve usability in EHR implementation: a user-centered case study in Brazilian mental healthcare.","authors":"Fernanda Peron Gaspary, Daniel Baia Amaral, Cristian Vinicius Fagundes, Luis Felipe Dias Lopes, João Francisco Pollo Gaspary","doi":"10.3389/fdgth.2025.1676631","DOIUrl":"10.3389/fdgth.2025.1676631","url":null,"abstract":"<p><strong>Background: </strong>Electronic Health Record (EHR) systems are central to digital health transformation, yet usability challenges continue to constrain their effectiveness, particularly in mental healthcare contexts.</p><p><strong>Objectives: </strong>To develop and describe a structured, user-centered framework for improving EHR usability based on a Brazilian outpatient mental health case study.</p><p><strong>Methods: </strong>This qualitative design research study, guided by the Double Diamond design methodology, followed four iterative phases (Discover, Define, Develop, Deliver) and conducted qualitative interviews with 21 healthcare professionals. Data were organized using the Certainties, Suppositions, and Doubts (CSD) matrix and triangulated through heuristic evaluation and prototype testing.</p><p><strong>Results: </strong>Key barriers included non-standardized navigation flows, limited integration with external systems, and inflexible documentation structures. Based on these findings, the study proposes design-driven improvements such as customizable templates, real-time validation features, and workflow-specific interface adjustments.</p><p><strong>Conclusions: </strong>By integrating service design logic with usability-driven interface adaptations and addressing both systemic usability gaps and contextual demands, this research contributes actionable insights for advancing human-centered EHR innovation, with particular relevance to complex mental healthcare workflows.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1676631"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12812964/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05eCollection Date: 2025-01-01DOI: 10.3389/fdgth.2025.1737882
Mohammad Almomani, Vijaya Valaparla, James Weatherhead, Xiang Fang, Alok Dabi, Chih-Ying Li, Peter McCaffrey, Dan Hier, Jorge Mario Rodríguez-Fernández
Objective: To compare the performance of eight large language models (LLMs) with neurology residents on board-style multiple-choice questions across seven subspecialties and two cognitive levels.
Methods: In a cross-sectional benchmarking study, we evaluated Bard, Claude, Gemini v1, Gemini 2.5, ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, and ChatGPT-5 using 107 text-only items spanning movement disorders, vascular neurology, neuroanatomy, neuroimmunology, epilepsy, neuromuscular disease, and neuro-infectious disease. Items were labeled as lower- or higher-order per Bloom's taxonomy by two neurologists. Models answered each item in a fresh session and reported confidence and Bloom classification. Residents completed the same set under exam-like conditions. Outcomes included overall and domain accuracies, guessing-adjusted accuracy, confidence-accuracy calibration (Spearman ρ), agreement with expert Bloom labels (Cohen κ), and inter-generation scaling (linear regression of topic-level accuracies). Group differences used Fisher exact or χ2 tests with Bonferroni correction.
Results: Residents scored 64.9%. ChatGPT-5 achieved 84.1% and ChatGPT-4o 81.3%, followed by Gemini 2.5 at 77.6% and ChatGPT-4 at 68.2%; Claude (56.1%), Bard (54.2%), ChatGPT-3.5 (53.3%), and Gemini v1 (39.3%) underperformed residents. On higher-order items, ChatGPT-5 (86%) and ChatGPT-4o (82.5%) maintained superiority; Gemini 2.5 matched 82.5%. Guessing-adjusted accuracy preserved rank order (ChatGPT-5 78.8%, ChatGPT-4o 75.1%, Gemini 2.5 70.1%). Confidence-accuracy calibration was weak across models. Inter-generation scaling was strong within the ChatGPT lineage (ChatGPT-4 to 4o R2 = 0.765, p = 0.010; 4o to 5 R2 = 0.908, p < 0.001) but absent for Gemini v1 to 2.5 (R2 = 0.002, p = 0.918), suggesting discontinuous improvements.
Conclusions: LLMs-particularly ChatGPT-5 and ChatGPT-4o-exceeded resident performance on text-based neurology board-style questions across subspecialties and cognitive levels. Gemini 2.5 showed substantial gains over v1 but with domain-uneven scaling. Given weak confidence calibration, LLMs should be integrated as supervised educational adjuncts with ongoing validation, version governance, and transparent metadata to support safe use in neurology education.
{"title":"Evaluation of multiple generative large language models on neurology board-style questions.","authors":"Mohammad Almomani, Vijaya Valaparla, James Weatherhead, Xiang Fang, Alok Dabi, Chih-Ying Li, Peter McCaffrey, Dan Hier, Jorge Mario Rodríguez-Fernández","doi":"10.3389/fdgth.2025.1737882","DOIUrl":"10.3389/fdgth.2025.1737882","url":null,"abstract":"<p><strong>Objective: </strong>To compare the performance of eight large language models (LLMs) with neurology residents on board-style multiple-choice questions across seven subspecialties and two cognitive levels.</p><p><strong>Methods: </strong>In a cross-sectional benchmarking study, we evaluated Bard, Claude, Gemini v1, Gemini 2.5, ChatGPT-3.5, ChatGPT-4, ChatGPT-4o, and ChatGPT-5 using 107 text-only items spanning movement disorders, vascular neurology, neuroanatomy, neuroimmunology, epilepsy, neuromuscular disease, and neuro-infectious disease. Items were labeled as lower- or higher-order per Bloom's taxonomy by two neurologists. Models answered each item in a fresh session and reported confidence and Bloom classification. Residents completed the same set under exam-like conditions. Outcomes included overall and domain accuracies, guessing-adjusted accuracy, confidence-accuracy calibration (Spearman <i>ρ</i>), agreement with expert Bloom labels (Cohen <i>κ</i>), and inter-generation scaling (linear regression of topic-level accuracies). Group differences used Fisher exact or <i>χ</i> <sup>2</sup> tests with Bonferroni correction.</p><p><strong>Results: </strong>Residents scored 64.9%. ChatGPT-5 achieved 84.1% and ChatGPT-4o 81.3%, followed by Gemini 2.5 at 77.6% and ChatGPT-4 at 68.2%; Claude (56.1%), Bard (54.2%), ChatGPT-3.5 (53.3%), and Gemini v1 (39.3%) underperformed residents. On higher-order items, ChatGPT-5 (86%) and ChatGPT-4o (82.5%) maintained superiority; Gemini 2.5 matched 82.5%. Guessing-adjusted accuracy preserved rank order (ChatGPT-5 78.8%, ChatGPT-4o 75.1%, Gemini 2.5 70.1%). Confidence-accuracy calibration was weak across models. Inter-generation scaling was strong within the ChatGPT lineage (ChatGPT-4 to 4o <i>R</i> <sup>2</sup> = 0.765, <i>p</i> = 0.010; 4o to 5 <i>R</i> <sup>2</sup> = 0.908, <i>p</i> < 0.001) but absent for Gemini v1 to 2.5 (R<sup>2</sup> = 0.002, <i>p</i> = 0.918), suggesting discontinuous improvements.</p><p><strong>Conclusions: </strong>LLMs-particularly ChatGPT-5 and ChatGPT-4o-exceeded resident performance on text-based neurology board-style questions across subspecialties and cognitive levels. Gemini 2.5 showed substantial gains over v1 but with domain-uneven scaling. Given weak confidence calibration, LLMs should be integrated as supervised educational adjuncts with ongoing validation, version governance, and transparent metadata to support safe use in neurology education.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1737882"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12813092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05eCollection Date: 2025-01-01DOI: 10.3389/fdgth.2025.1715440
Colin John Greengrass
Clinical reasoning is foundational to medical practice, requiring clinicians to synthesise complex information, recognise patterns, and apply causal reasoning to reach accurate diagnoses and guide patient management. However, human cognition is inherently limited by factors such as limitations in working memory capacity, constraints in cognitive load, a general reliance on heuristics; with an inherent vulnerability to biases including anchoring, availability bias, and premature closure. Cognitive fatigue and cognitive overload, particularly apparent in high-pressure environments, further compromise diagnostic accuracy and efficiency. Artificial intelligence (AI) presents a transformative opportunity to overcome these limitations by supplementing and supporting decision-making. With AI's advanced computational capabilities, these systems can analyse large datasets, detect subtle or atypical patterns, and provide accurate evidence-based diagnoses. Furthermore, by leveraging machine learning and probabilistic modelling, AI reduces dependence on incomplete heuristics and potentially mitigates cognitive biases. It also ensures consistent performance, unaffected by fatigue or information overload. These attributes likely make AI an invaluable tool for enhancing the accuracy and efficiency of diagnostic reasoning. Through a narrative review, this article examines the cognitive limitations inherent in diagnostic reasoning and considers how AI can be positioned as a collaborative partner in addressing them. Drawing on the concept of Mutual Theory of Mind, the author identifies a set of indicators that should inform the design of future frameworks for human-AI interaction in clinical decision-making. These highlight how AI could dynamically adapt to human reasoning states, reduce bias, and promote more transparent and adaptive diagnostic support in high-stakes clinical environments.
{"title":"Transforming clinical reasoning-the role of AI in supporting human cognitive limitations.","authors":"Colin John Greengrass","doi":"10.3389/fdgth.2025.1715440","DOIUrl":"10.3389/fdgth.2025.1715440","url":null,"abstract":"<p><p>Clinical reasoning is foundational to medical practice, requiring clinicians to synthesise complex information, recognise patterns, and apply causal reasoning to reach accurate diagnoses and guide patient management. However, human cognition is inherently limited by factors such as limitations in working memory capacity, constraints in cognitive load, a general reliance on heuristics; with an inherent vulnerability to biases including anchoring, availability bias, and premature closure. Cognitive fatigue and cognitive overload, particularly apparent in high-pressure environments, further compromise diagnostic accuracy and efficiency. Artificial intelligence (AI) presents a transformative opportunity to overcome these limitations by supplementing and supporting decision-making. With AI's advanced computational capabilities, these systems can analyse large datasets, detect subtle or atypical patterns, and provide accurate evidence-based diagnoses. Furthermore, by leveraging machine learning and probabilistic modelling, AI reduces dependence on incomplete heuristics and potentially mitigates cognitive biases. It also ensures consistent performance, unaffected by fatigue or information overload. These attributes likely make AI an invaluable tool for enhancing the accuracy and efficiency of diagnostic reasoning. Through a narrative review, this article examines the cognitive limitations inherent in diagnostic reasoning and considers how AI can be positioned as a collaborative partner in addressing them. Drawing on the concept of <i>Mutual Theory of Mind</i>, the author identifies a set of indicators that should inform the design of future frameworks for human-AI interaction in clinical decision-making. These highlight how AI could dynamically adapt to human reasoning states, reduce bias, and promote more transparent and adaptive diagnostic support in high-stakes clinical environments.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1715440"},"PeriodicalIF":3.2,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12813117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02eCollection Date: 2025-01-01DOI: 10.3389/fdgth.2025.1702339
Luis Fabián Salazar-Garcés, Elizabeth Morales-Urrutia, Franklin Cashabamba, Ricardo Xavier Proaño Alulema, Lizette Elena Leiva Suero
Background: Artificial intelligence (AI) systems are increasingly used to support treatment decision-making in breast cancer, yet their performance and feasibility in low- and middle-income countries (LMICs) remain incompletely defined. Many high-performing models, particularly genomic and multimodal systems trained on The Cancer Genome Atlas (TCGA), raise questions about cross-domain generalizability and equity.
Methods: We conducted an AI-assisted scoping review combining Boolean database searches with semantic retrieval tools (Elicit, Semantic Scholar, Connected Papers). From 497 unique records, 43 studies met inclusion criteria and 34 reported quantitative metrics. Data extraction included study design, AI model type (treatment-recommendation, prognostic, or diagnostic/subtyping), input modalities, and validation strategies. Risk of bias was assessed using a hybrid PROBAST-AI/QUADAS-AI framework.
Results: Treatment-recommendation systems (e.g., WFO, Navya) showed concordance ranges of 67%-97% in early-stage settings but markedly lower performance in metastatic disease. Prognostic and multimodal models frequently achieved AUCs of 0.90-0.99. HIC-trained genomic models demonstrated consistent declines during external LMIC validation (e.g., CDK4/6 response model: AUC 0.9956 → 0.9795). LMIC implementations reported reduced time-to-treatment and improved adherence to guidelines, but these gains were constrained by gaps in electronic health records, limited digital pathology, and insufficient local genomic testing capacity.
Conclusions: AI-enabled systems show promise for improving breast cancer treatment planning, especially in early-stage disease and resource-limited settings. However, the evidence base remains dominated by HIC-derived datasets and retrospective analyses, with persistent challenges related to domain shift, data representativeness, and genomic governance. Advancing equitable AI-driven oncology will require prospective multicenter validation, expanded LMIC-based data generation, and context-specific implementation strategies.
{"title":"Evaluating AI-driven precision oncology for breast cancer in low- and middle-income countries: a review of machine learning performance, genomic data use, and clinical feasibility.","authors":"Luis Fabián Salazar-Garcés, Elizabeth Morales-Urrutia, Franklin Cashabamba, Ricardo Xavier Proaño Alulema, Lizette Elena Leiva Suero","doi":"10.3389/fdgth.2025.1702339","DOIUrl":"10.3389/fdgth.2025.1702339","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) systems are increasingly used to support treatment decision-making in breast cancer, yet their performance and feasibility in low- and middle-income countries (LMICs) remain incompletely defined. Many high-performing models, particularly genomic and multimodal systems trained on The Cancer Genome Atlas (TCGA), raise questions about cross-domain generalizability and equity.</p><p><strong>Methods: </strong>We conducted an AI-assisted scoping review combining Boolean database searches with semantic retrieval tools (Elicit, Semantic Scholar, Connected Papers). From 497 unique records, 43 studies met inclusion criteria and 34 reported quantitative metrics. Data extraction included study design, AI model type (treatment-recommendation, prognostic, or diagnostic/subtyping), input modalities, and validation strategies. Risk of bias was assessed using a hybrid PROBAST-AI/QUADAS-AI framework.</p><p><strong>Results: </strong>Treatment-recommendation systems (e.g., WFO, Navya) showed concordance ranges of 67%-97% in early-stage settings but markedly lower performance in metastatic disease. Prognostic and multimodal models frequently achieved AUCs of 0.90-0.99. HIC-trained genomic models demonstrated consistent declines during external LMIC validation (e.g., CDK4/6 response model: AUC 0.9956 → 0.9795). LMIC implementations reported reduced time-to-treatment and improved adherence to guidelines, but these gains were constrained by gaps in electronic health records, limited digital pathology, and insufficient local genomic testing capacity.</p><p><strong>Conclusions: </strong>AI-enabled systems show promise for improving breast cancer treatment planning, especially in early-stage disease and resource-limited settings. However, the evidence base remains dominated by HIC-derived datasets and retrospective analyses, with persistent challenges related to domain shift, data representativeness, and genomic governance. Advancing equitable AI-driven oncology will require prospective multicenter validation, expanded LMIC-based data generation, and context-specific implementation strategies.</p>","PeriodicalId":73078,"journal":{"name":"Frontiers in digital health","volume":"7 ","pages":"1702339"},"PeriodicalIF":3.2,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12808440/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}