Pub Date : 2025-12-10eCollection Date: 2025-01-01DOI: 10.3389/frai.2025.1676225
Chloe Gros, Leon Kester, Marieke Martens, Peter Werkhoven
Introduction: As automated vehicles (AVs) assume increasing decision-making responsibilities, ensuring their alignment with societal values becomes essential. Existing ethical frameworks for AVs have primarily remained conceptual, lacking empirical operationalization. To address this gap, this study develops an Ethical Goal Function (EGF)-a quantitative model that encodes societal moral preferences for AV decision-making-within the theoretical framework of Augmented Utilitarianism (AU). AU integrates consequentialist, deontological, and virtue-ethical principles while remaining adaptable to evolving societal values. This work also proposes embedding the EGF into a Socio-Technological Feedback (SOTEF) Loop, enabling continuous refinement of AV decision systems through stakeholder input.
Methods: The EGF was constructed using discrete choice experiments (DCEs) conducted with Dutch university students (N = 89). Participants evaluated AV-relevant moral scenarios characterized by six ethically salient attributes: physical harm, psychological harm, moral responsibility, fair innings, legality, and environmental harm. These attributes were derived from biomedical ethics and moral psychology and validated in prior AV ethics research. Using participants' choices, a multinomial logit (MNL) model was estimated to derive attribute weights representing aggregate societal moral preferences. Model performance was evaluated using 5-fold cross-validation.
Results: The MNL model produced stable attribute weights across folds, achieving an average predictive accuracy of 63.8% (SD = 3.3%). These results demonstrate that the selected attributes and underlying AU-based framework can meaningfully predict participants' ethical preferences in AV decision scenarios. The EGF thus represents a data-driven, empirically grounded method for translating societal moral judgments into computationally usable parameters for AV decision-making systems.
Discussion: This study contributes the first empirical operationalization of ethical frameworks for AVs through the development of an Ethical Goal Function and demonstrates how it can be embedded in a Socio-Technological Feedback (SOTEF) Loop for continuous societal alignment. The dual contribution advances both the theoretical grounding and practical implementation of human-centered ethics in automated decision-making. However, several limitations remain. The reliance on a Dutch university sample restricts cultural generalizability, and textual presentation may limit ecological validity. Future work should expand the cultural diversity of participants and compare alternative presentation modalities (e.g., visual, immersive) to better capture real-world decision contexts.
{"title":"Modelling societal preferences for automated vehicle behaviour with ethical goal functions.","authors":"Chloe Gros, Leon Kester, Marieke Martens, Peter Werkhoven","doi":"10.3389/frai.2025.1676225","DOIUrl":"10.3389/frai.2025.1676225","url":null,"abstract":"<p><strong>Introduction: </strong>As automated vehicles (AVs) assume increasing decision-making responsibilities, ensuring their alignment with societal values becomes essential. Existing ethical frameworks for AVs have primarily remained conceptual, lacking empirical operationalization. To address this gap, this study develops an Ethical Goal Function (EGF)-a quantitative model that encodes societal moral preferences for AV decision-making-within the theoretical framework of Augmented Utilitarianism (AU). AU integrates consequentialist, deontological, and virtue-ethical principles while remaining adaptable to evolving societal values. This work also proposes embedding the EGF into a Socio-Technological Feedback (SOTEF) Loop, enabling continuous refinement of AV decision systems through stakeholder input.</p><p><strong>Methods: </strong>The EGF was constructed using discrete choice experiments (DCEs) conducted with Dutch university students (N = 89). Participants evaluated AV-relevant moral scenarios characterized by six ethically salient attributes: physical harm, psychological harm, moral responsibility, fair innings, legality, and environmental harm. These attributes were derived from biomedical ethics and moral psychology and validated in prior AV ethics research. Using participants' choices, a multinomial logit (MNL) model was estimated to derive attribute weights representing aggregate societal moral preferences. Model performance was evaluated using 5-fold cross-validation.</p><p><strong>Results: </strong>The MNL model produced stable attribute weights across folds, achieving an average predictive accuracy of 63.8% (SD = 3.3%). These results demonstrate that the selected attributes and underlying AU-based framework can meaningfully predict participants' ethical preferences in AV decision scenarios. The EGF thus represents a data-driven, empirically grounded method for translating societal moral judgments into computationally usable parameters for AV decision-making systems.</p><p><strong>Discussion: </strong>This study contributes the first empirical operationalization of ethical frameworks for AVs through the development of an Ethical Goal Function and demonstrates how it can be embedded in a Socio-Technological Feedback (SOTEF) Loop for continuous societal alignment. The dual contribution advances both the theoretical grounding and practical implementation of human-centered ethics in automated decision-making. However, several limitations remain. The reliance on a Dutch university sample restricts cultural generalizability, and textual presentation may limit ecological validity. Future work should expand the cultural diversity of participants and compare alternative presentation modalities (e.g., visual, immersive) to better capture real-world decision contexts.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1676225"},"PeriodicalIF":4.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12727885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145834908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-10eCollection Date: 2025-01-01DOI: 10.3389/frai.2025.1738782
Abdallah Qawasmeh, Salahaldeen Deeb, Alhareth M Amro, Khaled Alhashlamon, Ibrahim Althaher, Nour Yaser Mohammad Shadeed, Khadija Mohammad, Farid K Abu Shama
Background: Artificial intelligence (AI) is increasingly used in medical education to support academic learning, clinical competence, and efficiency. However, the extent and impact of AI usage among medical interns, particularly in Palestine, remain underexplored.
Objective: This study aimed to assess the prevalence of AI usage among internship doctors in Palestine and evaluate its perceived impact on their academic performance, clinical competence, time management, and research skills.
Methods: A cross-sectional survey was conducted with 307 internship doctors in Palestine. The survey collected data on the frequency and types of AI tools used, including ChatGPT, and interns' perceptions of AI's impact on their training. Demographic information, such as age, gender, and university affiliation, was also gathered to explore potential associations with AI usage patterns.
Results: The study found that 76.9% of interns used AI regularly, with ChatGPT being the most popular tool (76.2%). Despite frequent use, only 3.3% reported formal AI training. The majority of interns perceived AI as beneficial in improving academic performance (61%), clinical competence (67%), and time management (74%). Notably, time management showed the highest perceived improvement. However, 75.9% expressed concerns about becoming overly reliant on AI, fearing it could diminish critical thinking and clinical judgment. Age and university affiliation were associated with differences in AI usage patterns and perceived benefits, with older interns and those from international universities reporting greater perceived improvements.
Conclusion: This cross-sectional study highlights the widespread use of AI among internship doctors in Palestine and generally positive perceptions of its educational value, particularly for academic performance and clinical competence. However, it also reveals a substantial gap in formal AI training, suggesting a need for structured, ethically grounded AI education in medical curricula. Because the study is exploratory and cross-sectional, these findings should be interpreted as perceived associations rather than evidence that AI use or training causes improved outcomes; future longitudinal and interventional studies are needed to clarify long term effects.
{"title":"Exploring the use and perceived impact of artificial intelligence in medical internship: a cross-sectional study of Palestinian doctors.","authors":"Abdallah Qawasmeh, Salahaldeen Deeb, Alhareth M Amro, Khaled Alhashlamon, Ibrahim Althaher, Nour Yaser Mohammad Shadeed, Khadija Mohammad, Farid K Abu Shama","doi":"10.3389/frai.2025.1738782","DOIUrl":"10.3389/frai.2025.1738782","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) is increasingly used in medical education to support academic learning, clinical competence, and efficiency. However, the extent and impact of AI usage among medical interns, particularly in Palestine, remain underexplored.</p><p><strong>Objective: </strong>This study aimed to assess the prevalence of AI usage among internship doctors in Palestine and evaluate its perceived impact on their academic performance, clinical competence, time management, and research skills.</p><p><strong>Methods: </strong>A cross-sectional survey was conducted with 307 internship doctors in Palestine. The survey collected data on the frequency and types of AI tools used, including ChatGPT, and interns' perceptions of AI's impact on their training. Demographic information, such as age, gender, and university affiliation, was also gathered to explore potential associations with AI usage patterns.</p><p><strong>Results: </strong>The study found that 76.9% of interns used AI regularly, with ChatGPT being the most popular tool (76.2%). Despite frequent use, only 3.3% reported formal AI training. The majority of interns perceived AI as beneficial in improving academic performance (61%), clinical competence (67%), and time management (74%). Notably, time management showed the highest perceived improvement. However, 75.9% expressed concerns about becoming overly reliant on AI, fearing it could diminish critical thinking and clinical judgment. Age and university affiliation were associated with differences in AI usage patterns and perceived benefits, with older interns and those from international universities reporting greater perceived improvements.</p><p><strong>Conclusion: </strong>This cross-sectional study highlights the widespread use of AI among internship doctors in Palestine and generally positive perceptions of its educational value, particularly for academic performance and clinical competence. However, it also reveals a substantial gap in formal AI training, suggesting a need for structured, ethically grounded AI education in medical curricula. Because the study is exploratory and cross-sectional, these findings should be interpreted as perceived associations rather than evidence that AI use or training causes improved outcomes; future longitudinal and interventional studies are needed to clarify long term effects.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1738782"},"PeriodicalIF":4.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12727930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145834954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09eCollection Date: 2025-01-01DOI: 10.3389/frai.2025.1694450
Abdulgani Kahraman
Cardiovascular disease continues to cause an important global health challenge, highlighting the critical importance of early detection in mitigating cardiac-related issues. There is a significant demand for reliable diagnostic alternatives. Taking advantage of health data through diverse machine learning algorithms may offer a more precise diagnostic approach. Machine learning-based decision support systems that utilize patients' clinical parameters present a promising solution for diagnosing cardiovascular disease. In this research, we collected extensive publicly available healthcare records. We integrated medical datasets based on common features to implement several machine learning models aimed at exploring the potential for more robust predictions of cardiovascular disease (CVD). The merged dataset initially contained 323,680 samples sourced from multiple databases. Following data preprocessing steps including cleaning, alignment of features, and removal of missing values, the final dataset consisted of 311,710 samples used for model training and evaluation. In our experiments, the CatBoost model achieved the highest area under the curve (AUC) of up to 94.1%.
{"title":"Machine learning techniques for improved prediction of cardiovascular diseases using integrated healthcare data.","authors":"Abdulgani Kahraman","doi":"10.3389/frai.2025.1694450","DOIUrl":"10.3389/frai.2025.1694450","url":null,"abstract":"<p><p>Cardiovascular disease continues to cause an important global health challenge, highlighting the critical importance of early detection in mitigating cardiac-related issues. There is a significant demand for reliable diagnostic alternatives. Taking advantage of health data through diverse machine learning algorithms may offer a more precise diagnostic approach. Machine learning-based decision support systems that utilize patients' clinical parameters present a promising solution for diagnosing cardiovascular disease. In this research, we collected extensive publicly available healthcare records. We integrated medical datasets based on common features to implement several machine learning models aimed at exploring the potential for more robust predictions of cardiovascular disease (CVD). The merged dataset initially contained 323,680 samples sourced from multiple databases. Following data preprocessing steps including cleaning, alignment of features, and removal of missing values, the final dataset consisted of 311,710 samples used for model training and evaluation. In our experiments, the CatBoost model achieved the highest area under the curve (AUC) of up to 94.1%.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1694450"},"PeriodicalIF":4.7,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12723862/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09eCollection Date: 2025-01-01DOI: 10.3389/frai.2025.1683322
Guannan Zhai, Merav Bar, Andrew J Cowan, Samuel Rubinstein, Qian Shi, Ningjie Zhang, En Xie, Will Ma
Background: Evidence-based medicine is crucial for clinical decision-making, yet studies suggest that a significant proportion of treatment decisions do not fully incorporate the latest evidence. Large Language Models (LLMs) show promise in bridging this gap, but their reliability for medical recommendations remains uncertain.
Methods: We conducted an evaluation study comparing five LLMs' recommendations across 50 clinical scenarios related to multiple myeloma diagnosis, staging, treatment, and management, using a unified evidence cutoff of June 2024. The evaluation included three general-purpose LLMs (OpenAI o1-preview, Claude 3.5 Sonnet, Gemini 1.5 Pro), one retrieval-augmented generation (RAG) system (Myelo), and one agentic workflow-based system (HopeAI). General-purpose LLMs generated responses based solely on their internal knowledge, while the RAG system enhanced these capabilities by incorporating external knowledge retrieval. The agentic workflow system extended the RAG approach by implementing multi-step reasoning and coordinating with multiple tools and external systems for complex task execution. Three independent hematologist-oncologists evaluated the LLM-generated responses using standardized scoring criteria developed specifically for this study. Performance assessment encompassed five dimensions: accuracy, relevance, comprehensiveness, hallucination rate, and clinical use readiness.
Results: HopeAI demonstrated superior performance across accuracy (82.0%), relevance (85.3%), and comprehensiveness (74.0%), compared to OpenAI o1-preview (64.7, 57.3, 36.0%), Claude 3.5 Sonnet (50.0, 51.3, 29.3%), Gemini 1.5 Pro (48.0, 46.0, 30.0%), and Myelo (58.7, 56, 32.7%). Hallucination rates were consistently low across all systems: HopeAI (5.3%), OpenAI o1-preview (3.3%), Claude 3.5 Sonnet (10.0%), Gemini 1.5 Pro (8.0%), and Myelo (5.3%). Clinical use readiness scores were relatively low for all systems: HopeAI (25.3%), OpenAI o1-preview (6.0%), Claude 3.5 Sonnet (2.7%), Gemini 1.5 Pro (4.0%), and Myelo (4.0%).
Conclusion: This study demonstrates that while current LLMs show promise in medical decision support, their recommendations require careful clinical supervision to ensure patient safety and optimal care. Further research is needed to improve their clinical use readiness before integration into oncology workflows. These findings provide valuable insights into the capabilities and limitations of LLMs in oncology, guiding future research and development efforts toward integrating AI into clinical workflows.
{"title":"AI for evidence-based treatment recommendation in oncology: a blinded evaluation of large language models and agentic workflows.","authors":"Guannan Zhai, Merav Bar, Andrew J Cowan, Samuel Rubinstein, Qian Shi, Ningjie Zhang, En Xie, Will Ma","doi":"10.3389/frai.2025.1683322","DOIUrl":"10.3389/frai.2025.1683322","url":null,"abstract":"<p><strong>Background: </strong>Evidence-based medicine is crucial for clinical decision-making, yet studies suggest that a significant proportion of treatment decisions do not fully incorporate the latest evidence. Large Language Models (LLMs) show promise in bridging this gap, but their reliability for medical recommendations remains uncertain.</p><p><strong>Methods: </strong>We conducted an evaluation study comparing five LLMs' recommendations across 50 clinical scenarios related to multiple myeloma diagnosis, staging, treatment, and management, using a unified evidence cutoff of June 2024. The evaluation included three general-purpose LLMs (OpenAI o1-preview, Claude 3.5 Sonnet, Gemini 1.5 Pro), one retrieval-augmented generation (RAG) system (Myelo), and one agentic workflow-based system (HopeAI). General-purpose LLMs generated responses based solely on their internal knowledge, while the RAG system enhanced these capabilities by incorporating external knowledge retrieval. The agentic workflow system extended the RAG approach by implementing multi-step reasoning and coordinating with multiple tools and external systems for complex task execution. Three independent hematologist-oncologists evaluated the LLM-generated responses using standardized scoring criteria developed specifically for this study. Performance assessment encompassed five dimensions: accuracy, relevance, comprehensiveness, hallucination rate, and clinical use readiness.</p><p><strong>Results: </strong>HopeAI demonstrated superior performance across accuracy (82.0%), relevance (85.3%), and comprehensiveness (74.0%), compared to OpenAI o1-preview (64.7, 57.3, 36.0%), Claude 3.5 Sonnet (50.0, 51.3, 29.3%), Gemini 1.5 Pro (48.0, 46.0, 30.0%), and Myelo (58.7, 56, 32.7%). Hallucination rates were consistently low across all systems: HopeAI (5.3%), OpenAI o1-preview (3.3%), Claude 3.5 Sonnet (10.0%), Gemini 1.5 Pro (8.0%), and Myelo (5.3%). Clinical use readiness scores were relatively low for all systems: HopeAI (25.3%), OpenAI o1-preview (6.0%), Claude 3.5 Sonnet (2.7%), Gemini 1.5 Pro (4.0%), and Myelo (4.0%).</p><p><strong>Conclusion: </strong>This study demonstrates that while current LLMs show promise in medical decision support, their recommendations require careful clinical supervision to ensure patient safety and optimal care. Further research is needed to improve their clinical use readiness before integration into oncology workflows. These findings provide valuable insights into the capabilities and limitations of LLMs in oncology, guiding future research and development efforts toward integrating AI into clinical workflows.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1683322"},"PeriodicalIF":4.7,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12722510/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09eCollection Date: 2025-01-01DOI: 10.3389/frai.2025.1696448
Grega Močnik, Ana Rehberger, Žan Smogavc, Izidor Mlakar, Urška Smrke, Sara Močnik
Mental health disorders, such as depression, anxiety, and borderline personality disorder (BPD), are common, often begin early, and can cause profound impairment. Traditional assessments rely heavily on subjective reports and clinical observation, which can be inconsistent and biased. Recent advances in AI offer a promising complement by analyzing objective, observable cues from speech, language, facial expressions, physiological signals, and digital behavior. Explainable AI ensures these patterns remain interpretable and clinically meaningful. A synthesis of 24 recent systematic and scoping reviews shows that depression is linked to self-focused negative language, slowed and monotonous speech, reduced facial expressivity, disrupted sleep and activity, and altered phone or online behavior. Anxiety disorders present with negative language bias, monotone speech with pauses, physiological hyperarousal, and avoidance-related behaviors. BPD exhibits more complex patterns, including impersonal or externally focused language, speech dysregulation, paradoxical facial expressions, autonomic dysregulation, and socially ambivalent behaviors. Some cues, like reduced heart rate variability and flattened speech, appear across conditions, suggesting shared transdiagnostic mechanisms, while BPD's interpersonal and emotional ambivalence stands out. These findings highlight the potential of observable, digitally measurable cues to complement traditional assessments, enabling earlier detection, ongoing monitoring, and more personalized interventions in psychiatry.
{"title":"Multimodal observable cues in mood, anxiety, and borderline personality disorders: a review of reviews to inform explainable AI in mental health.","authors":"Grega Močnik, Ana Rehberger, Žan Smogavc, Izidor Mlakar, Urška Smrke, Sara Močnik","doi":"10.3389/frai.2025.1696448","DOIUrl":"10.3389/frai.2025.1696448","url":null,"abstract":"<p><p>Mental health disorders, such as depression, anxiety, and borderline personality disorder (BPD), are common, often begin early, and can cause profound impairment. Traditional assessments rely heavily on subjective reports and clinical observation, which can be inconsistent and biased. Recent advances in AI offer a promising complement by analyzing objective, observable cues from speech, language, facial expressions, physiological signals, and digital behavior. Explainable AI ensures these patterns remain interpretable and clinically meaningful. A synthesis of 24 recent systematic and scoping reviews shows that depression is linked to self-focused negative language, slowed and monotonous speech, reduced facial expressivity, disrupted sleep and activity, and altered phone or online behavior. Anxiety disorders present with negative language bias, monotone speech with pauses, physiological hyperarousal, and avoidance-related behaviors. BPD exhibits more complex patterns, including impersonal or externally focused language, speech dysregulation, paradoxical facial expressions, autonomic dysregulation, and socially ambivalent behaviors. Some cues, like reduced heart rate variability and flattened speech, appear across conditions, suggesting shared transdiagnostic mechanisms, while BPD's interpersonal and emotional ambivalence stands out. These findings highlight the potential of observable, digitally measurable cues to complement traditional assessments, enabling earlier detection, ongoing monitoring, and more personalized interventions in psychiatry.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1696448"},"PeriodicalIF":4.7,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12723009/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09eCollection Date: 2025-01-01DOI: 10.3389/frai.2025.1718503
Austin J Ruiz, Sofía I Hernández Torres, Eric J Snider
Point of care ultrasound (POCUS) is commonly used for diagnostic triage of internal injuries in both civilian and military trauma. In resource constrained environments, such as mass-casualty situations on the battlefield, POCUS allows medical providers to rapidly and noninvasively assess for free fluid or hemorrhage induced by trauma. A major disadvantage of POCUS diagnostics is the skill threshold needed to acquire and interpret ultrasound scans. For this purpose, AI has been shown to be an effective tool to aid the caregiver when interpreting medical imaging. Here, we focus on sophisticated AI training methodologies to improve the blind, real-time diagnostic accuracy of AI models for detection of hemorrhage in two major abdominal scan sites. In this work, we used a retrospective dataset of over 60,000 swine ultrasound images to train binary classification models exploring frame-pooling methods using the backbone of a pre-existing model architecture to handle multi-channel inputs for detecting free fluid in the pelvic and right-upper-quadrant regions. Earlier classifications models had achieved 0.59 and 0.70 accuracy metrics in blind predictions, respectively. After implementing this novel training technique, performance accuracy improved to over 0.90 for both scan sites. These are promising results demonstrating a significant diagnostic improvement which encourages further optimization to achieve similar results using clinical data. Furthermore, these results show how AI-informed diagnostics can offload cognitive burden in situations where casualties may benefit from rapid triage decision making.
{"title":"Adaptation of convolutional neural networks for real-time abdominal ultrasound interpretation.","authors":"Austin J Ruiz, Sofía I Hernández Torres, Eric J Snider","doi":"10.3389/frai.2025.1718503","DOIUrl":"10.3389/frai.2025.1718503","url":null,"abstract":"<p><p>Point of care ultrasound (POCUS) is commonly used for diagnostic triage of internal injuries in both civilian and military trauma. In resource constrained environments, such as mass-casualty situations on the battlefield, POCUS allows medical providers to rapidly and noninvasively assess for free fluid or hemorrhage induced by trauma. A major disadvantage of POCUS diagnostics is the skill threshold needed to acquire and interpret ultrasound scans. For this purpose, AI has been shown to be an effective tool to aid the caregiver when interpreting medical imaging. Here, we focus on sophisticated AI training methodologies to improve the blind, real-time diagnostic accuracy of AI models for detection of hemorrhage in two major abdominal scan sites. In this work, we used a retrospective dataset of over 60,000 swine ultrasound images to train binary classification models exploring frame-pooling methods using the backbone of a pre-existing model architecture to handle multi-channel inputs for detecting free fluid in the pelvic and right-upper-quadrant regions. Earlier classifications models had achieved 0.59 and 0.70 accuracy metrics in blind predictions, respectively. After implementing this novel training technique, performance accuracy improved to over 0.90 for both scan sites. These are promising results demonstrating a significant diagnostic improvement which encourages further optimization to achieve similar results using clinical data. Furthermore, these results show how AI-informed diagnostics can offload cognitive burden in situations where casualties may benefit from rapid triage decision making.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1718503"},"PeriodicalIF":4.7,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12722998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08eCollection Date: 2025-01-01DOI: 10.3389/frai.2025.1717913
Muhammad Ikram Ul Haq, Waqas Haider Bangyal, Arfan Jaffar, Asma Abdullah Alfayez, Adnan Ashraf, Meshari Alazmi, Mubbashar Hussain
Alzheimer's disease (AD) is an incurable, progressive neurodegenerative disorder. It is characterized by a gradual decline in memory, cognition, and behavior, which ultimately results in severe dementia and functional dependence. AD begins to develop in the brain at an early stage, while its symptoms appear gradually over time. Early diagnosis and classification of Alzheimer's is a critical research focus due to its silent progression. The current literature highlights a gap in gender-based studies, revealing that the risk of AD varies by gender, age, race, and ethnicity. The nature of the association between AD and these factors requires further exploration to better understand their impact on disease risk and progression. Effectively employing multiple algorithms is essential for accurate diagnosis of Alzheimer's development. This study proposed the GRDN model, which explored a critical aspect of gender-based Alzheimer's detection. To detect subtle changes in the brain, functional magnetic resonance imaging (fMRI) scans have been acquired from the ADNI dataset. In order to balance class distribution and enhance classifier performance on underrepresented groups, a generative adversarial network (GAN) is applied. A balanced dataset is provided to the ResNet-50 architecture for feature extraction, resulting in feature matrices set with a range of 100, 250, and 450. These feature set matrices were then fed to a swarm intelligence-based approach, the binary dragonfly algorithm (BDA), for feature selection, which identified the most informative features. After feature engineering, the resultant matrices of feature selection were provided to the five machine learning (ML) classification algorithms for data classification. The results show that as the size of the features set increases and the accuracy of the classification improves. The simulation results demonstrated that the fineKNN achieved strong performance, with an accuracy of 94.8% on the male group on a feature set of 450, and consistently outperformed other models across all study groups.
{"title":"Gender-based Alzheimer's detection using ResNet-50 and binary dragonfly algorithm on neuroimaging.","authors":"Muhammad Ikram Ul Haq, Waqas Haider Bangyal, Arfan Jaffar, Asma Abdullah Alfayez, Adnan Ashraf, Meshari Alazmi, Mubbashar Hussain","doi":"10.3389/frai.2025.1717913","DOIUrl":"10.3389/frai.2025.1717913","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is an incurable, progressive neurodegenerative disorder. It is characterized by a gradual decline in memory, cognition, and behavior, which ultimately results in severe dementia and functional dependence. AD begins to develop in the brain at an early stage, while its symptoms appear gradually over time. Early diagnosis and classification of Alzheimer's is a critical research focus due to its silent progression. The current literature highlights a gap in gender-based studies, revealing that the risk of AD varies by gender, age, race, and ethnicity. The nature of the association between AD and these factors requires further exploration to better understand their impact on disease risk and progression. Effectively employing multiple algorithms is essential for accurate diagnosis of Alzheimer's development. This study proposed the GRDN model, which explored a critical aspect of gender-based Alzheimer's detection. To detect subtle changes in the brain, functional magnetic resonance imaging (fMRI) scans have been acquired from the ADNI dataset. In order to balance class distribution and enhance classifier performance on underrepresented groups, a generative adversarial network (GAN) is applied. A balanced dataset is provided to the ResNet-50 architecture for feature extraction, resulting in feature matrices set with a range of 100, 250, and 450. These feature set matrices were then fed to a swarm intelligence-based approach, the binary dragonfly algorithm (BDA), for feature selection, which identified the most informative features. After feature engineering, the resultant matrices of feature selection were provided to the five machine learning (ML) classification algorithms for data classification. The results show that as the size of the features set increases and the accuracy of the classification improves. The simulation results demonstrated that the fineKNN achieved strong performance, with an accuracy of 94.8% on the male group on a feature set of 450, and consistently outperformed other models across all study groups.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1717913"},"PeriodicalIF":4.7,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12719442/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-08eCollection Date: 2025-01-01DOI: 10.3389/frai.2025.1706021
Hong Long, Yuancheng Shao, Mini Han Wang, Fengshi Jing, Yuqiao Chen, Shuai Xiao, Jia Gu
Introduction: Laparoscopy is a visual biosensor that can obtain real-time images of the body cavity, assisting in minimally invasive surgery. Laparoscopic cholecystectomy is one of the most frequently performed endoscopic surgeries and the most fundamental modular surgery. However, many iatrogenic complications still occur each year, mainly due to the anatomical recognition errors of surgeons. Therefore, the development of artificial intelligence (AI)-assisted recognition is of great significance.
Methods: This study proposes a method based on the lightweight YOLOv11n model. By introducing the efficient multi-scale feature extraction module, DWR, the real-time performance of the model is enhanced. Additionally, the bidirectional feature pyramid network (BiFPN) is incorporated to strengthen the capability of multi-scale feature fusion. Finally, we developed the LC-YOLOmatch semi-supervised learning framework, which effectively addresses the issue of scarce labeled data in the medical field.
Results: Experimental results on the publicly available Cholec80 dataset show that this method achieves 70% mAP50 and 40.8% mAP50-95, reaching a new technical level and reducing the reliance on manual annotations.
Discussion: These improvements not only highlight its potential in automated surgeries but also significantly enhance assistance in laparoscopic procedures while effectively reducing the incidence of complications.
{"title":"LC-YOLOmatch: a novel scene segmentation approach based on YOLO for laparoscopic cholecystectomy.","authors":"Hong Long, Yuancheng Shao, Mini Han Wang, Fengshi Jing, Yuqiao Chen, Shuai Xiao, Jia Gu","doi":"10.3389/frai.2025.1706021","DOIUrl":"10.3389/frai.2025.1706021","url":null,"abstract":"<p><strong>Introduction: </strong>Laparoscopy is a visual biosensor that can obtain real-time images of the body cavity, assisting in minimally invasive surgery. Laparoscopic cholecystectomy is one of the most frequently performed endoscopic surgeries and the most fundamental modular surgery. However, many iatrogenic complications still occur each year, mainly due to the anatomical recognition errors of surgeons. Therefore, the development of artificial intelligence (AI)-assisted recognition is of great significance.</p><p><strong>Methods: </strong>This study proposes a method based on the lightweight YOLOv11n model. By introducing the efficient multi-scale feature extraction module, DWR, the real-time performance of the model is enhanced. Additionally, the bidirectional feature pyramid network (BiFPN) is incorporated to strengthen the capability of multi-scale feature fusion. Finally, we developed the LC-YOLOmatch semi-supervised learning framework, which effectively addresses the issue of scarce labeled data in the medical field.</p><p><strong>Results: </strong>Experimental results on the publicly available Cholec80 dataset show that this method achieves 70% mAP50 and 40.8% mAP50-95, reaching a new technical level and reducing the reliance on manual annotations.</p><p><strong>Discussion: </strong>These improvements not only highlight its potential in automated surgeries but also significantly enhance assistance in laparoscopic procedures while effectively reducing the incidence of complications.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1706021"},"PeriodicalIF":4.7,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12719465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05eCollection Date: 2025-01-01DOI: 10.3389/frai.2025.1648073
Hadi Khalilia, Jahna Otterbacher, Gábor Bella, Shandy Darma, Fausto Giunchiglia
Lexical-semantic resources (LSRs), such as online lexicons and wordnets, are fundamental to natural language processing applications as well as to fields such as linguistic anthropology and language preservation. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also the rarely addressed issue of bias toward the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual lexical gaps, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food. Our LingoGap crowdsourcing platform facilitates comparisons through microtasks identifying equivalent terms, language-specific terms, and lexical gaps across languages. We validated our method by applying it to two case studies focused on food-related terminology: (1) English and Arabic, and (2) Standard Indonesian and Banjarese. These experiments identified 2,140 lexical gaps in the first case study and 951 in the second. The success of these experiments confirmed the usability of our method and tool for future large-scale lexicon enrichment tasks.
{"title":"Crowdsourcing lexical diversity.","authors":"Hadi Khalilia, Jahna Otterbacher, Gábor Bella, Shandy Darma, Fausto Giunchiglia","doi":"10.3389/frai.2025.1648073","DOIUrl":"10.3389/frai.2025.1648073","url":null,"abstract":"<p><p>Lexical-semantic resources (LSRs), such as online lexicons and wordnets, are fundamental to natural language processing applications as well as to fields such as linguistic anthropology and language preservation. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also the rarely addressed issue of bias toward the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual <i>lexical gaps</i>, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food. Our LingoGap crowdsourcing platform facilitates comparisons through microtasks identifying equivalent terms, language-specific terms, and lexical gaps across languages. We validated our method by applying it to two case studies focused on food-related terminology: (1) English and Arabic, and (2) Standard Indonesian and Banjarese. These experiments identified 2,140 lexical gaps in the first case study and 951 in the second. The success of these experiments confirmed the usability of our method and tool for future large-scale lexicon enrichment tasks.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1648073"},"PeriodicalIF":4.7,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This rapid review explores how artificial intelligence (AI) is integrated into healthcare and examines the factors influencing trust between users and AI systems. By systematically identifying trust-related determinants, this review provides actionable insights to support effective AI adoption in clinical settings. A comprehensive search of MEDLINE (Ovid), Embase (Ovid), and CINAHL (Ebsco) using keywords related to AI, healthcare, and trust yielded 872 unique citations, of which 40 studies met the inclusion criteria after screening. Three core themes were identified. AI literacy highlights the importance of user understanding of AI inputs, processes, and outputs in fostering trust among patients and clinicians. AI psychology reflects demographic and experiential influences on trust, such as age, gender, and prior AI exposure. AI utility emphasizes perceived usefulness, system efficiency, and integration within clinical workflows. Additional considerations include anthropomorphism, privacy and security concerns, and trust-repair mechanisms following system errors, particularly in high-risk clinical contexts. Overall, this review advances the understanding of trustworthy AI in healthcare and offers guidance for future implementation strategies and policy development.
{"title":"Exploring trust factors in AI-healthcare integration: a rapid review.","authors":"Megan Mertz, Kelvi Toskovich, Gavin Shields, Ghislaine Attema, Jennifer Dumond, Erin Cameron","doi":"10.3389/frai.2025.1658510","DOIUrl":"10.3389/frai.2025.1658510","url":null,"abstract":"<p><p>This rapid review explores how artificial intelligence (AI) is integrated into healthcare and examines the factors influencing trust between users and AI systems. By systematically identifying trust-related determinants, this review provides actionable insights to support effective AI adoption in clinical settings. A comprehensive search of MEDLINE (Ovid), Embase (Ovid), and CINAHL (Ebsco) using keywords related to AI, healthcare, and trust yielded 872 unique citations, of which 40 studies met the inclusion criteria after screening. Three core themes were identified. AI literacy highlights the importance of user understanding of AI inputs, processes, and outputs in fostering trust among patients and clinicians. AI psychology reflects demographic and experiential influences on trust, such as age, gender, and prior AI exposure. AI utility emphasizes perceived usefulness, system efficiency, and integration within clinical workflows. Additional considerations include anthropomorphism, privacy and security concerns, and trust-repair mechanisms following system errors, particularly in high-risk clinical contexts. Overall, this review advances the understanding of trustworthy AI in healthcare and offers guidance for future implementation strategies and policy development.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1658510"},"PeriodicalIF":4.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12712919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}