首页 > 最新文献

Frontiers in Artificial Intelligence最新文献

英文 中文
Modelling societal preferences for automated vehicle behaviour with ethical goal functions. 用道德目标函数模拟自动驾驶车辆行为的社会偏好。
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-10 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1676225
Chloe Gros, Leon Kester, Marieke Martens, Peter Werkhoven

Introduction: As automated vehicles (AVs) assume increasing decision-making responsibilities, ensuring their alignment with societal values becomes essential. Existing ethical frameworks for AVs have primarily remained conceptual, lacking empirical operationalization. To address this gap, this study develops an Ethical Goal Function (EGF)-a quantitative model that encodes societal moral preferences for AV decision-making-within the theoretical framework of Augmented Utilitarianism (AU). AU integrates consequentialist, deontological, and virtue-ethical principles while remaining adaptable to evolving societal values. This work also proposes embedding the EGF into a Socio-Technological Feedback (SOTEF) Loop, enabling continuous refinement of AV decision systems through stakeholder input.

Methods: The EGF was constructed using discrete choice experiments (DCEs) conducted with Dutch university students (N = 89). Participants evaluated AV-relevant moral scenarios characterized by six ethically salient attributes: physical harm, psychological harm, moral responsibility, fair innings, legality, and environmental harm. These attributes were derived from biomedical ethics and moral psychology and validated in prior AV ethics research. Using participants' choices, a multinomial logit (MNL) model was estimated to derive attribute weights representing aggregate societal moral preferences. Model performance was evaluated using 5-fold cross-validation.

Results: The MNL model produced stable attribute weights across folds, achieving an average predictive accuracy of 63.8% (SD = 3.3%). These results demonstrate that the selected attributes and underlying AU-based framework can meaningfully predict participants' ethical preferences in AV decision scenarios. The EGF thus represents a data-driven, empirically grounded method for translating societal moral judgments into computationally usable parameters for AV decision-making systems.

Discussion: This study contributes the first empirical operationalization of ethical frameworks for AVs through the development of an Ethical Goal Function and demonstrates how it can be embedded in a Socio-Technological Feedback (SOTEF) Loop for continuous societal alignment. The dual contribution advances both the theoretical grounding and practical implementation of human-centered ethics in automated decision-making. However, several limitations remain. The reliance on a Dutch university sample restricts cultural generalizability, and textual presentation may limit ecological validity. Future work should expand the cultural diversity of participants and compare alternative presentation modalities (e.g., visual, immersive) to better capture real-world decision contexts.

导言:随着自动驾驶汽车(AVs)承担越来越多的决策责任,确保它们与社会价值观保持一致变得至关重要。现有的自动驾驶伦理框架主要是概念性的,缺乏经验操作性。为了解决这一差距,本研究在增强功利主义(AU)的理论框架内开发了一个伦理目标函数(EGF)——一个编码AV决策的社会道德偏好的定量模型。非盟整合了结果主义、义务论和美德伦理原则,同时保持对不断发展的社会价值观的适应性。这项工作还建议将EGF嵌入到社会技术反馈(SOTEF)循环中,从而通过利益相关者的输入不断改进自动驾驶决策系统。方法:以89名荷兰大学生为研究对象,采用离散选择实验(DCEs)构建EGF。参与者评估了与自动驾驶相关的道德情景,这些道德情景具有六个显著的伦理属性:身体伤害、心理伤害、道德责任、公平裁决、合法性和环境危害。这些属性来源于生物医学伦理学和道德心理学,并在先前的AV伦理学研究中得到验证。利用参与者的选择,估计了一个多项式logit (MNL)模型,以获得代表社会总体道德偏好的属性权重。采用5倍交叉验证评估模型性能。结果:MNL模型产生了稳定的属性权重,平均预测准确率为63.8% (SD = 3.3%)。这些结果表明,选择的属性和基础的基于人工智能的框架可以有意义地预测自动驾驶决策场景下参与者的道德偏好。因此,EGF代表了一种数据驱动的、基于经验的方法,用于将社会道德判断转化为自动驾驶决策系统的计算可用参数。讨论:本研究通过开发道德目标函数,为自动驾驶汽车的道德框架提供了第一个经验操作化,并展示了如何将其嵌入社会技术反馈(SOTEF)循环中,以实现持续的社会协调。这一双重贡献推动了以人为中心的伦理在自动化决策中的理论基础和实践实施。然而,仍然存在一些限制。对荷兰大学样本的依赖限制了文化概括性,而文本呈现可能限制生态有效性。未来的工作应该扩大参与者的文化多样性,并比较不同的呈现方式(例如,视觉的,沉浸式的),以更好地捕捉现实世界的决策背景。
{"title":"Modelling societal preferences for automated vehicle behaviour with ethical goal functions.","authors":"Chloe Gros, Leon Kester, Marieke Martens, Peter Werkhoven","doi":"10.3389/frai.2025.1676225","DOIUrl":"10.3389/frai.2025.1676225","url":null,"abstract":"<p><strong>Introduction: </strong>As automated vehicles (AVs) assume increasing decision-making responsibilities, ensuring their alignment with societal values becomes essential. Existing ethical frameworks for AVs have primarily remained conceptual, lacking empirical operationalization. To address this gap, this study develops an Ethical Goal Function (EGF)-a quantitative model that encodes societal moral preferences for AV decision-making-within the theoretical framework of Augmented Utilitarianism (AU). AU integrates consequentialist, deontological, and virtue-ethical principles while remaining adaptable to evolving societal values. This work also proposes embedding the EGF into a Socio-Technological Feedback (SOTEF) Loop, enabling continuous refinement of AV decision systems through stakeholder input.</p><p><strong>Methods: </strong>The EGF was constructed using discrete choice experiments (DCEs) conducted with Dutch university students (N = 89). Participants evaluated AV-relevant moral scenarios characterized by six ethically salient attributes: physical harm, psychological harm, moral responsibility, fair innings, legality, and environmental harm. These attributes were derived from biomedical ethics and moral psychology and validated in prior AV ethics research. Using participants' choices, a multinomial logit (MNL) model was estimated to derive attribute weights representing aggregate societal moral preferences. Model performance was evaluated using 5-fold cross-validation.</p><p><strong>Results: </strong>The MNL model produced stable attribute weights across folds, achieving an average predictive accuracy of 63.8% (SD = 3.3%). These results demonstrate that the selected attributes and underlying AU-based framework can meaningfully predict participants' ethical preferences in AV decision scenarios. The EGF thus represents a data-driven, empirically grounded method for translating societal moral judgments into computationally usable parameters for AV decision-making systems.</p><p><strong>Discussion: </strong>This study contributes the first empirical operationalization of ethical frameworks for AVs through the development of an Ethical Goal Function and demonstrates how it can be embedded in a Socio-Technological Feedback (SOTEF) Loop for continuous societal alignment. The dual contribution advances both the theoretical grounding and practical implementation of human-centered ethics in automated decision-making. However, several limitations remain. The reliance on a Dutch university sample restricts cultural generalizability, and textual presentation may limit ecological validity. Future work should expand the cultural diversity of participants and compare alternative presentation modalities (e.g., visual, immersive) to better capture real-world decision contexts.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1676225"},"PeriodicalIF":4.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12727885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145834908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the use and perceived impact of artificial intelligence in medical internship: a cross-sectional study of Palestinian doctors. 探索人工智能在医学实习中的使用和感知影响:对巴勒斯坦医生的横断面研究。
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-10 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1738782
Abdallah Qawasmeh, Salahaldeen Deeb, Alhareth M Amro, Khaled Alhashlamon, Ibrahim Althaher, Nour Yaser Mohammad Shadeed, Khadija Mohammad, Farid K Abu Shama

Background: Artificial intelligence (AI) is increasingly used in medical education to support academic learning, clinical competence, and efficiency. However, the extent and impact of AI usage among medical interns, particularly in Palestine, remain underexplored.

Objective: This study aimed to assess the prevalence of AI usage among internship doctors in Palestine and evaluate its perceived impact on their academic performance, clinical competence, time management, and research skills.

Methods: A cross-sectional survey was conducted with 307 internship doctors in Palestine. The survey collected data on the frequency and types of AI tools used, including ChatGPT, and interns' perceptions of AI's impact on their training. Demographic information, such as age, gender, and university affiliation, was also gathered to explore potential associations with AI usage patterns.

Results: The study found that 76.9% of interns used AI regularly, with ChatGPT being the most popular tool (76.2%). Despite frequent use, only 3.3% reported formal AI training. The majority of interns perceived AI as beneficial in improving academic performance (61%), clinical competence (67%), and time management (74%). Notably, time management showed the highest perceived improvement. However, 75.9% expressed concerns about becoming overly reliant on AI, fearing it could diminish critical thinking and clinical judgment. Age and university affiliation were associated with differences in AI usage patterns and perceived benefits, with older interns and those from international universities reporting greater perceived improvements.

Conclusion: This cross-sectional study highlights the widespread use of AI among internship doctors in Palestine and generally positive perceptions of its educational value, particularly for academic performance and clinical competence. However, it also reveals a substantial gap in formal AI training, suggesting a need for structured, ethically grounded AI education in medical curricula. Because the study is exploratory and cross-sectional, these findings should be interpreted as perceived associations rather than evidence that AI use or training causes improved outcomes; future longitudinal and interventional studies are needed to clarify long term effects.

背景:人工智能(AI)越来越多地应用于医学教育,以支持学术学习,临床能力和效率。然而,在医疗实习生中,特别是在巴勒斯坦,人工智能使用的程度和影响仍未得到充分探讨。目的:本研究旨在评估巴勒斯坦实习医生中人工智能的使用情况,并评估其对学习成绩、临床能力、时间管理和研究技能的感知影响。方法:对巴勒斯坦地区307名实习医生进行横断面调查。该调查收集了包括ChatGPT在内的人工智能工具使用频率和类型的数据,以及实习生对人工智能对其培训影响的看法。还收集了人口统计信息,如年龄、性别和大学背景,以探索与人工智能使用模式的潜在关联。结果:研究发现,76.9%的实习生经常使用人工智能,其中ChatGPT是最受欢迎的工具(76.2%)。尽管使用频繁,但只有3.3%的人表示接受过正式的人工智能培训。大多数实习生认为人工智能有助于提高学习成绩(61%)、临床能力(67%)和时间管理(74%)。值得注意的是,时间管理表现出最大的改善。然而,75.9%的人担心过度依赖人工智能,担心它会削弱批判性思维和临床判断。年龄和大学背景与人工智能使用模式和感知收益的差异有关,年龄较大的实习生和来自国际大学的实习生报告了更大的感知改善。结论:这项横断面研究突出了巴勒斯坦实习医生中人工智能的广泛使用,以及对其教育价值的普遍积极看法,特别是在学习成绩和临床能力方面。然而,它也揭示了在正式的人工智能培训方面存在巨大差距,这表明需要在医学课程中进行结构化的、有道德基础的人工智能教育。由于该研究是探索性和横断面的,因此这些发现应被解释为感知到的关联,而不是人工智能使用或训练导致改善结果的证据;未来的纵向和干预性研究需要澄清长期影响。
{"title":"Exploring the use and perceived impact of artificial intelligence in medical internship: a cross-sectional study of Palestinian doctors.","authors":"Abdallah Qawasmeh, Salahaldeen Deeb, Alhareth M Amro, Khaled Alhashlamon, Ibrahim Althaher, Nour Yaser Mohammad Shadeed, Khadija Mohammad, Farid K Abu Shama","doi":"10.3389/frai.2025.1738782","DOIUrl":"10.3389/frai.2025.1738782","url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) is increasingly used in medical education to support academic learning, clinical competence, and efficiency. However, the extent and impact of AI usage among medical interns, particularly in Palestine, remain underexplored.</p><p><strong>Objective: </strong>This study aimed to assess the prevalence of AI usage among internship doctors in Palestine and evaluate its perceived impact on their academic performance, clinical competence, time management, and research skills.</p><p><strong>Methods: </strong>A cross-sectional survey was conducted with 307 internship doctors in Palestine. The survey collected data on the frequency and types of AI tools used, including ChatGPT, and interns' perceptions of AI's impact on their training. Demographic information, such as age, gender, and university affiliation, was also gathered to explore potential associations with AI usage patterns.</p><p><strong>Results: </strong>The study found that 76.9% of interns used AI regularly, with ChatGPT being the most popular tool (76.2%). Despite frequent use, only 3.3% reported formal AI training. The majority of interns perceived AI as beneficial in improving academic performance (61%), clinical competence (67%), and time management (74%). Notably, time management showed the highest perceived improvement. However, 75.9% expressed concerns about becoming overly reliant on AI, fearing it could diminish critical thinking and clinical judgment. Age and university affiliation were associated with differences in AI usage patterns and perceived benefits, with older interns and those from international universities reporting greater perceived improvements.</p><p><strong>Conclusion: </strong>This cross-sectional study highlights the widespread use of AI among internship doctors in Palestine and generally positive perceptions of its educational value, particularly for academic performance and clinical competence. However, it also reveals a substantial gap in formal AI training, suggesting a need for structured, ethically grounded AI education in medical curricula. Because the study is exploratory and cross-sectional, these findings should be interpreted as perceived associations rather than evidence that AI use or training causes improved outcomes; future longitudinal and interventional studies are needed to clarify long term effects.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1738782"},"PeriodicalIF":4.7,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12727930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145834954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning techniques for improved prediction of cardiovascular diseases using integrated healthcare data. 利用综合医疗保健数据改进心血管疾病预测的机器学习技术。
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-09 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1694450
Abdulgani Kahraman

Cardiovascular disease continues to cause an important global health challenge, highlighting the critical importance of early detection in mitigating cardiac-related issues. There is a significant demand for reliable diagnostic alternatives. Taking advantage of health data through diverse machine learning algorithms may offer a more precise diagnostic approach. Machine learning-based decision support systems that utilize patients' clinical parameters present a promising solution for diagnosing cardiovascular disease. In this research, we collected extensive publicly available healthcare records. We integrated medical datasets based on common features to implement several machine learning models aimed at exploring the potential for more robust predictions of cardiovascular disease (CVD). The merged dataset initially contained 323,680 samples sourced from multiple databases. Following data preprocessing steps including cleaning, alignment of features, and removal of missing values, the final dataset consisted of 311,710 samples used for model training and evaluation. In our experiments, the CatBoost model achieved the highest area under the curve (AUC) of up to 94.1%.

心血管疾病继续造成一项重要的全球健康挑战,突出了早期发现对减轻心脏相关问题的至关重要性。对可靠的诊断替代方案的需求很大。通过不同的机器学习算法利用健康数据可以提供更精确的诊断方法。利用患者临床参数的基于机器学习的决策支持系统为诊断心血管疾病提供了一个很有前途的解决方案。在这项研究中,我们收集了大量公开可用的医疗记录。我们整合了基于共同特征的医疗数据集,实现了几个机器学习模型,旨在探索心血管疾病(CVD)更稳健预测的潜力。合并后的数据集最初包含来自多个数据库的323,680个样本。经过数据预处理步骤,包括清洗、特征对齐和删除缺失值,最终的数据集由311,710个样本组成,用于模型训练和评估。在我们的实验中,CatBoost模型实现了最高的曲线下面积(AUC),高达94.1%。
{"title":"Machine learning techniques for improved prediction of cardiovascular diseases using integrated healthcare data.","authors":"Abdulgani Kahraman","doi":"10.3389/frai.2025.1694450","DOIUrl":"10.3389/frai.2025.1694450","url":null,"abstract":"<p><p>Cardiovascular disease continues to cause an important global health challenge, highlighting the critical importance of early detection in mitigating cardiac-related issues. There is a significant demand for reliable diagnostic alternatives. Taking advantage of health data through diverse machine learning algorithms may offer a more precise diagnostic approach. Machine learning-based decision support systems that utilize patients' clinical parameters present a promising solution for diagnosing cardiovascular disease. In this research, we collected extensive publicly available healthcare records. We integrated medical datasets based on common features to implement several machine learning models aimed at exploring the potential for more robust predictions of cardiovascular disease (CVD). The merged dataset initially contained 323,680 samples sourced from multiple databases. Following data preprocessing steps including cleaning, alignment of features, and removal of missing values, the final dataset consisted of 311,710 samples used for model training and evaluation. In our experiments, the CatBoost model achieved the highest area under the curve (AUC) of up to 94.1%.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1694450"},"PeriodicalIF":4.7,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12723862/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI for evidence-based treatment recommendation in oncology: a blinded evaluation of large language models and agentic workflows. 人工智能用于肿瘤学循证治疗推荐:大型语言模型和代理工作流程的盲法评估
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-09 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1683322
Guannan Zhai, Merav Bar, Andrew J Cowan, Samuel Rubinstein, Qian Shi, Ningjie Zhang, En Xie, Will Ma

Background: Evidence-based medicine is crucial for clinical decision-making, yet studies suggest that a significant proportion of treatment decisions do not fully incorporate the latest evidence. Large Language Models (LLMs) show promise in bridging this gap, but their reliability for medical recommendations remains uncertain.

Methods: We conducted an evaluation study comparing five LLMs' recommendations across 50 clinical scenarios related to multiple myeloma diagnosis, staging, treatment, and management, using a unified evidence cutoff of June 2024. The evaluation included three general-purpose LLMs (OpenAI o1-preview, Claude 3.5 Sonnet, Gemini 1.5 Pro), one retrieval-augmented generation (RAG) system (Myelo), and one agentic workflow-based system (HopeAI). General-purpose LLMs generated responses based solely on their internal knowledge, while the RAG system enhanced these capabilities by incorporating external knowledge retrieval. The agentic workflow system extended the RAG approach by implementing multi-step reasoning and coordinating with multiple tools and external systems for complex task execution. Three independent hematologist-oncologists evaluated the LLM-generated responses using standardized scoring criteria developed specifically for this study. Performance assessment encompassed five dimensions: accuracy, relevance, comprehensiveness, hallucination rate, and clinical use readiness.

Results: HopeAI demonstrated superior performance across accuracy (82.0%), relevance (85.3%), and comprehensiveness (74.0%), compared to OpenAI o1-preview (64.7, 57.3, 36.0%), Claude 3.5 Sonnet (50.0, 51.3, 29.3%), Gemini 1.5 Pro (48.0, 46.0, 30.0%), and Myelo (58.7, 56, 32.7%). Hallucination rates were consistently low across all systems: HopeAI (5.3%), OpenAI o1-preview (3.3%), Claude 3.5 Sonnet (10.0%), Gemini 1.5 Pro (8.0%), and Myelo (5.3%). Clinical use readiness scores were relatively low for all systems: HopeAI (25.3%), OpenAI o1-preview (6.0%), Claude 3.5 Sonnet (2.7%), Gemini 1.5 Pro (4.0%), and Myelo (4.0%).

Conclusion: This study demonstrates that while current LLMs show promise in medical decision support, their recommendations require careful clinical supervision to ensure patient safety and optimal care. Further research is needed to improve their clinical use readiness before integration into oncology workflows. These findings provide valuable insights into the capabilities and limitations of LLMs in oncology, guiding future research and development efforts toward integrating AI into clinical workflows.

背景:循证医学对临床决策至关重要,但研究表明,很大一部分治疗决策没有充分纳入最新证据。大型语言模型(llm)有望弥合这一差距,但它们在医学建议方面的可靠性仍不确定。方法:我们进行了一项评估研究,比较了5位法学硕士在50种与多发性骨髓瘤诊断、分期、治疗和管理相关的临床方案中的建议,使用统一的证据截止日期为2024年6月。评估包括三个通用llm (OpenAI 01 -preview, Claude 3.5 Sonnet, Gemini 1.5 Pro),一个检索增强生成(RAG)系统(Myelo)和一个基于代理工作流的系统(HopeAI)。通用法学硕士仅根据其内部知识生成响应,而RAG系统通过合并外部知识检索增强了这些功能。代理工作流系统通过实现多步推理和与多个工具和外部系统协调复杂任务执行,扩展了RAG方法。三位独立的血液学肿瘤学家使用专门为本研究开发的标准化评分标准评估llm产生的反应。绩效评估包括五个维度:准确性、相关性、全面性、幻觉率和临床使用准备度。结果:与OpenAI 01 -preview(64.7, 57.3, 36.0%)、Claude 3.5 Sonnet(50.0, 51.3, 29.3%)、Gemini 1.5 Pro(48.0, 46.0, 30.0%)和Myelo(58.7, 556, 32.7%)相比,HopeAI在准确性(82.0%)、相关性(85.3%)和全面性(74.0%)方面表现优异。所有系统的幻觉率都很低:HopeAI (5.3%), OpenAI 01 -preview (3.3%), Claude 3.5 Sonnet (10.0%), Gemini 1.5 Pro(8.0%)和Myelo(5.3%)。所有系统的临床使用准备度评分相对较低:HopeAI(25.3%)、OpenAI 01 -preview(6.0%)、Claude 3.5 Sonnet(2.7%)、Gemini 1.5 Pro(4.0%)和Myelo(4.0%)。结论:本研究表明,虽然目前llm在医疗决策支持方面表现出希望,但他们的建议需要仔细的临床监督,以确保患者安全和最佳护理。在整合到肿瘤学工作流程之前,需要进一步的研究来提高它们的临床使用准备。这些发现为肿瘤学法学硕士的能力和局限性提供了有价值的见解,指导了未来将人工智能整合到临床工作流程中的研究和开发工作。
{"title":"AI for evidence-based treatment recommendation in oncology: a blinded evaluation of large language models and agentic workflows.","authors":"Guannan Zhai, Merav Bar, Andrew J Cowan, Samuel Rubinstein, Qian Shi, Ningjie Zhang, En Xie, Will Ma","doi":"10.3389/frai.2025.1683322","DOIUrl":"10.3389/frai.2025.1683322","url":null,"abstract":"<p><strong>Background: </strong>Evidence-based medicine is crucial for clinical decision-making, yet studies suggest that a significant proportion of treatment decisions do not fully incorporate the latest evidence. Large Language Models (LLMs) show promise in bridging this gap, but their reliability for medical recommendations remains uncertain.</p><p><strong>Methods: </strong>We conducted an evaluation study comparing five LLMs' recommendations across 50 clinical scenarios related to multiple myeloma diagnosis, staging, treatment, and management, using a unified evidence cutoff of June 2024. The evaluation included three general-purpose LLMs (OpenAI o1-preview, Claude 3.5 Sonnet, Gemini 1.5 Pro), one retrieval-augmented generation (RAG) system (Myelo), and one agentic workflow-based system (HopeAI). General-purpose LLMs generated responses based solely on their internal knowledge, while the RAG system enhanced these capabilities by incorporating external knowledge retrieval. The agentic workflow system extended the RAG approach by implementing multi-step reasoning and coordinating with multiple tools and external systems for complex task execution. Three independent hematologist-oncologists evaluated the LLM-generated responses using standardized scoring criteria developed specifically for this study. Performance assessment encompassed five dimensions: accuracy, relevance, comprehensiveness, hallucination rate, and clinical use readiness.</p><p><strong>Results: </strong>HopeAI demonstrated superior performance across accuracy (82.0%), relevance (85.3%), and comprehensiveness (74.0%), compared to OpenAI o1-preview (64.7, 57.3, 36.0%), Claude 3.5 Sonnet (50.0, 51.3, 29.3%), Gemini 1.5 Pro (48.0, 46.0, 30.0%), and Myelo (58.7, 56, 32.7%). Hallucination rates were consistently low across all systems: HopeAI (5.3%), OpenAI o1-preview (3.3%), Claude 3.5 Sonnet (10.0%), Gemini 1.5 Pro (8.0%), and Myelo (5.3%). Clinical use readiness scores were relatively low for all systems: HopeAI (25.3%), OpenAI o1-preview (6.0%), Claude 3.5 Sonnet (2.7%), Gemini 1.5 Pro (4.0%), and Myelo (4.0%).</p><p><strong>Conclusion: </strong>This study demonstrates that while current LLMs show promise in medical decision support, their recommendations require careful clinical supervision to ensure patient safety and optimal care. Further research is needed to improve their clinical use readiness before integration into oncology workflows. These findings provide valuable insights into the capabilities and limitations of LLMs in oncology, guiding future research and development efforts toward integrating AI into clinical workflows.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1683322"},"PeriodicalIF":4.7,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12722510/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal observable cues in mood, anxiety, and borderline personality disorders: a review of reviews to inform explainable AI in mental health. 情绪、焦虑和边缘型人格障碍中的多模态可观察线索:为心理健康中可解释的人工智能提供信息的综述
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-09 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1696448
Grega Močnik, Ana Rehberger, Žan Smogavc, Izidor Mlakar, Urška Smrke, Sara Močnik

Mental health disorders, such as depression, anxiety, and borderline personality disorder (BPD), are common, often begin early, and can cause profound impairment. Traditional assessments rely heavily on subjective reports and clinical observation, which can be inconsistent and biased. Recent advances in AI offer a promising complement by analyzing objective, observable cues from speech, language, facial expressions, physiological signals, and digital behavior. Explainable AI ensures these patterns remain interpretable and clinically meaningful. A synthesis of 24 recent systematic and scoping reviews shows that depression is linked to self-focused negative language, slowed and monotonous speech, reduced facial expressivity, disrupted sleep and activity, and altered phone or online behavior. Anxiety disorders present with negative language bias, monotone speech with pauses, physiological hyperarousal, and avoidance-related behaviors. BPD exhibits more complex patterns, including impersonal or externally focused language, speech dysregulation, paradoxical facial expressions, autonomic dysregulation, and socially ambivalent behaviors. Some cues, like reduced heart rate variability and flattened speech, appear across conditions, suggesting shared transdiagnostic mechanisms, while BPD's interpersonal and emotional ambivalence stands out. These findings highlight the potential of observable, digitally measurable cues to complement traditional assessments, enabling earlier detection, ongoing monitoring, and more personalized interventions in psychiatry.

精神健康障碍,如抑郁、焦虑和边缘型人格障碍(BPD),很常见,通常开始得很早,并可能导致严重的损害。传统的评估在很大程度上依赖于主观报告和临床观察,这可能是不一致和有偏见的。人工智能的最新进展通过分析语音、语言、面部表情、生理信号和数字行为中的客观、可观察的线索,提供了一个有希望的补充。可解释的人工智能确保这些模式保持可解释性和临床意义。一项综合了最近24项系统性和范围性评论的研究表明,抑郁症与以自我为中心的消极语言、缓慢而单调的讲话、面部表情减少、睡眠和活动中断以及手机或上网行为的改变有关。焦虑障碍表现为负性语言偏见、言语单调停顿、生理性亢奋和回避相关行为。BPD表现出更复杂的模式,包括非人格化或外部聚焦的语言、言语失调、矛盾的面部表情、自主神经失调和社会矛盾行为。一些线索,如心率变异性降低和言语平缓,在不同的情况下都会出现,表明有共同的跨诊断机制,而BPD的人际和情感矛盾心理则很突出。这些发现强调了可观察的、数字可测量的线索补充传统评估的潜力,使精神病学的早期发现、持续监测和更个性化的干预成为可能。
{"title":"Multimodal observable cues in mood, anxiety, and borderline personality disorders: a review of reviews to inform explainable AI in mental health.","authors":"Grega Močnik, Ana Rehberger, Žan Smogavc, Izidor Mlakar, Urška Smrke, Sara Močnik","doi":"10.3389/frai.2025.1696448","DOIUrl":"10.3389/frai.2025.1696448","url":null,"abstract":"<p><p>Mental health disorders, such as depression, anxiety, and borderline personality disorder (BPD), are common, often begin early, and can cause profound impairment. Traditional assessments rely heavily on subjective reports and clinical observation, which can be inconsistent and biased. Recent advances in AI offer a promising complement by analyzing objective, observable cues from speech, language, facial expressions, physiological signals, and digital behavior. Explainable AI ensures these patterns remain interpretable and clinically meaningful. A synthesis of 24 recent systematic and scoping reviews shows that depression is linked to self-focused negative language, slowed and monotonous speech, reduced facial expressivity, disrupted sleep and activity, and altered phone or online behavior. Anxiety disorders present with negative language bias, monotone speech with pauses, physiological hyperarousal, and avoidance-related behaviors. BPD exhibits more complex patterns, including impersonal or externally focused language, speech dysregulation, paradoxical facial expressions, autonomic dysregulation, and socially ambivalent behaviors. Some cues, like reduced heart rate variability and flattened speech, appear across conditions, suggesting shared transdiagnostic mechanisms, while BPD's interpersonal and emotional ambivalence stands out. These findings highlight the potential of observable, digitally measurable cues to complement traditional assessments, enabling earlier detection, ongoing monitoring, and more personalized interventions in psychiatry.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1696448"},"PeriodicalIF":4.7,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12723009/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptation of convolutional neural networks for real-time abdominal ultrasound interpretation. 卷积神经网络在实时腹部超声判读中的应用。
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-09 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1718503
Austin J Ruiz, Sofía I Hernández Torres, Eric J Snider

Point of care ultrasound (POCUS) is commonly used for diagnostic triage of internal injuries in both civilian and military trauma. In resource constrained environments, such as mass-casualty situations on the battlefield, POCUS allows medical providers to rapidly and noninvasively assess for free fluid or hemorrhage induced by trauma. A major disadvantage of POCUS diagnostics is the skill threshold needed to acquire and interpret ultrasound scans. For this purpose, AI has been shown to be an effective tool to aid the caregiver when interpreting medical imaging. Here, we focus on sophisticated AI training methodologies to improve the blind, real-time diagnostic accuracy of AI models for detection of hemorrhage in two major abdominal scan sites. In this work, we used a retrospective dataset of over 60,000 swine ultrasound images to train binary classification models exploring frame-pooling methods using the backbone of a pre-existing model architecture to handle multi-channel inputs for detecting free fluid in the pelvic and right-upper-quadrant regions. Earlier classifications models had achieved 0.59 and 0.70 accuracy metrics in blind predictions, respectively. After implementing this novel training technique, performance accuracy improved to over 0.90 for both scan sites. These are promising results demonstrating a significant diagnostic improvement which encourages further optimization to achieve similar results using clinical data. Furthermore, these results show how AI-informed diagnostics can offload cognitive burden in situations where casualties may benefit from rapid triage decision making.

点超声(POCUS)通常用于民用和军用创伤的内伤诊断分诊。在资源有限的环境中,例如战场上的大规模伤亡情况,POCUS使医疗提供者能够快速和无创地评估创伤引起的游离液体或出血。POCUS诊断的一个主要缺点是获取和解释超声扫描所需的技能阈值。为此,人工智能已被证明是一种有效的工具,可以帮助护理人员解释医学图像。在这里,我们专注于复杂的人工智能训练方法,以提高人工智能模型在检测两个主要腹部扫描部位出血时的盲、实时诊断准确性。在这项工作中,我们使用了超过60,000张猪超声图像的回顾性数据集来训练二元分类模型,利用预先存在的模型架构的主干来探索帧池方法,以处理多通道输入,以检测骨盆和右上象限区域的自由液体。早期的分类模型在盲预测中分别达到了0.59和0.70的准确度指标。在实施这种新颖的训练技术后,两个扫描站点的性能精度都提高到0.90以上。这些有希望的结果显示了显著的诊断改进,鼓励进一步优化,以利用临床数据实现类似的结果。此外,这些结果表明,在伤员可能受益于快速分诊决策的情况下,人工智能知情诊断如何减轻认知负担。
{"title":"Adaptation of convolutional neural networks for real-time abdominal ultrasound interpretation.","authors":"Austin J Ruiz, Sofía I Hernández Torres, Eric J Snider","doi":"10.3389/frai.2025.1718503","DOIUrl":"10.3389/frai.2025.1718503","url":null,"abstract":"<p><p>Point of care ultrasound (POCUS) is commonly used for diagnostic triage of internal injuries in both civilian and military trauma. In resource constrained environments, such as mass-casualty situations on the battlefield, POCUS allows medical providers to rapidly and noninvasively assess for free fluid or hemorrhage induced by trauma. A major disadvantage of POCUS diagnostics is the skill threshold needed to acquire and interpret ultrasound scans. For this purpose, AI has been shown to be an effective tool to aid the caregiver when interpreting medical imaging. Here, we focus on sophisticated AI training methodologies to improve the blind, real-time diagnostic accuracy of AI models for detection of hemorrhage in two major abdominal scan sites. In this work, we used a retrospective dataset of over 60,000 swine ultrasound images to train binary classification models exploring frame-pooling methods using the backbone of a pre-existing model architecture to handle multi-channel inputs for detecting free fluid in the pelvic and right-upper-quadrant regions. Earlier classifications models had achieved 0.59 and 0.70 accuracy metrics in blind predictions, respectively. After implementing this novel training technique, performance accuracy improved to over 0.90 for both scan sites. These are promising results demonstrating a significant diagnostic improvement which encourages further optimization to achieve similar results using clinical data. Furthermore, these results show how AI-informed diagnostics can offload cognitive burden in situations where casualties may benefit from rapid triage decision making.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1718503"},"PeriodicalIF":4.7,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12722998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145828523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gender-based Alzheimer's detection using ResNet-50 and binary dragonfly algorithm on neuroimaging. 基于性别的阿尔茨海默病检测:基于ResNet-50和二元蜻蜓算法的神经影像学。
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-08 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1717913
Muhammad Ikram Ul Haq, Waqas Haider Bangyal, Arfan Jaffar, Asma Abdullah Alfayez, Adnan Ashraf, Meshari Alazmi, Mubbashar Hussain

Alzheimer's disease (AD) is an incurable, progressive neurodegenerative disorder. It is characterized by a gradual decline in memory, cognition, and behavior, which ultimately results in severe dementia and functional dependence. AD begins to develop in the brain at an early stage, while its symptoms appear gradually over time. Early diagnosis and classification of Alzheimer's is a critical research focus due to its silent progression. The current literature highlights a gap in gender-based studies, revealing that the risk of AD varies by gender, age, race, and ethnicity. The nature of the association between AD and these factors requires further exploration to better understand their impact on disease risk and progression. Effectively employing multiple algorithms is essential for accurate diagnosis of Alzheimer's development. This study proposed the GRDN model, which explored a critical aspect of gender-based Alzheimer's detection. To detect subtle changes in the brain, functional magnetic resonance imaging (fMRI) scans have been acquired from the ADNI dataset. In order to balance class distribution and enhance classifier performance on underrepresented groups, a generative adversarial network (GAN) is applied. A balanced dataset is provided to the ResNet-50 architecture for feature extraction, resulting in feature matrices set with a range of 100, 250, and 450. These feature set matrices were then fed to a swarm intelligence-based approach, the binary dragonfly algorithm (BDA), for feature selection, which identified the most informative features. After feature engineering, the resultant matrices of feature selection were provided to the five machine learning (ML) classification algorithms for data classification. The results show that as the size of the features set increases and the accuracy of the classification improves. The simulation results demonstrated that the fineKNN achieved strong performance, with an accuracy of 94.8% on the male group on a feature set of 450, and consistently outperformed other models across all study groups.

阿尔茨海默病(AD)是一种无法治愈的进行性神经退行性疾病。它的特点是记忆、认知和行为逐渐下降,最终导致严重的痴呆和功能依赖。阿尔茨海默病在早期就开始在大脑中发展,随着时间的推移,其症状逐渐显现。由于阿尔茨海默病的沉默进展,早期诊断和分类是一个关键的研究热点。目前的文献强调了基于性别的研究的差距,揭示了AD的风险因性别、年龄、种族和民族而异。阿尔茨海默病与这些因素之间关系的本质需要进一步探索,以更好地了解它们对疾病风险和进展的影响。有效地使用多种算法是准确诊断阿尔茨海默病发展的必要条件。本研究提出了GRDN模型,该模型探索了基于性别的阿尔茨海默病检测的一个关键方面。为了检测大脑的细微变化,从ADNI数据集中获得了功能磁共振成像(fMRI)扫描。为了平衡类别分布和提高分类器在未被充分代表的群体上的性能,应用了生成对抗网络(GAN)。一个平衡的数据集被提供给ResNet-50架构进行特征提取,从而得到范围为100、250和450的特征矩阵集。然后将这些特征集矩阵馈送到基于群体智能的方法,即二进制蜻蜓算法(BDA)进行特征选择,从而识别出信息量最大的特征。特征工程完成后,将特征选择的结果矩阵提供给五种机器学习(ML)分类算法进行数据分类。结果表明,随着特征集的增大,分类的准确率提高。仿真结果表明,fineKNN取得了很强的性能,在450个特征集上,男性组的准确率为94.8%,并且在所有研究组中始终优于其他模型。
{"title":"Gender-based Alzheimer's detection using ResNet-50 and binary dragonfly algorithm on neuroimaging.","authors":"Muhammad Ikram Ul Haq, Waqas Haider Bangyal, Arfan Jaffar, Asma Abdullah Alfayez, Adnan Ashraf, Meshari Alazmi, Mubbashar Hussain","doi":"10.3389/frai.2025.1717913","DOIUrl":"10.3389/frai.2025.1717913","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is an incurable, progressive neurodegenerative disorder. It is characterized by a gradual decline in memory, cognition, and behavior, which ultimately results in severe dementia and functional dependence. AD begins to develop in the brain at an early stage, while its symptoms appear gradually over time. Early diagnosis and classification of Alzheimer's is a critical research focus due to its silent progression. The current literature highlights a gap in gender-based studies, revealing that the risk of AD varies by gender, age, race, and ethnicity. The nature of the association between AD and these factors requires further exploration to better understand their impact on disease risk and progression. Effectively employing multiple algorithms is essential for accurate diagnosis of Alzheimer's development. This study proposed the GRDN model, which explored a critical aspect of gender-based Alzheimer's detection. To detect subtle changes in the brain, functional magnetic resonance imaging (fMRI) scans have been acquired from the ADNI dataset. In order to balance class distribution and enhance classifier performance on underrepresented groups, a generative adversarial network (GAN) is applied. A balanced dataset is provided to the ResNet-50 architecture for feature extraction, resulting in feature matrices set with a range of 100, 250, and 450. These feature set matrices were then fed to a swarm intelligence-based approach, the binary dragonfly algorithm (BDA), for feature selection, which identified the most informative features. After feature engineering, the resultant matrices of feature selection were provided to the five machine learning (ML) classification algorithms for data classification. The results show that as the size of the features set increases and the accuracy of the classification improves. The simulation results demonstrated that the fineKNN achieved strong performance, with an accuracy of 94.8% on the male group on a feature set of 450, and consistently outperformed other models across all study groups.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1717913"},"PeriodicalIF":4.7,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12719442/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LC-YOLOmatch: a novel scene segmentation approach based on YOLO for laparoscopic cholecystectomy. LC-YOLOmatch:一种基于YOLO的腹腔镜胆囊切除术场景分割新方法。
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-08 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1706021
Hong Long, Yuancheng Shao, Mini Han Wang, Fengshi Jing, Yuqiao Chen, Shuai Xiao, Jia Gu

Introduction: Laparoscopy is a visual biosensor that can obtain real-time images of the body cavity, assisting in minimally invasive surgery. Laparoscopic cholecystectomy is one of the most frequently performed endoscopic surgeries and the most fundamental modular surgery. However, many iatrogenic complications still occur each year, mainly due to the anatomical recognition errors of surgeons. Therefore, the development of artificial intelligence (AI)-assisted recognition is of great significance.

Methods: This study proposes a method based on the lightweight YOLOv11n model. By introducing the efficient multi-scale feature extraction module, DWR, the real-time performance of the model is enhanced. Additionally, the bidirectional feature pyramid network (BiFPN) is incorporated to strengthen the capability of multi-scale feature fusion. Finally, we developed the LC-YOLOmatch semi-supervised learning framework, which effectively addresses the issue of scarce labeled data in the medical field.

Results: Experimental results on the publicly available Cholec80 dataset show that this method achieves 70% mAP50 and 40.8% mAP50-95, reaching a new technical level and reducing the reliance on manual annotations.

Discussion: These improvements not only highlight its potential in automated surgeries but also significantly enhance assistance in laparoscopic procedures while effectively reducing the incidence of complications.

腹腔镜是一种视觉生物传感器,可以获得体腔的实时图像,辅助微创手术。腹腔镜胆囊切除术是最常见的内镜手术之一,也是最基本的模块化手术。然而,每年仍有许多医源性并发症发生,主要是由于外科医生的解剖识别错误。因此,发展人工智能(AI)辅助识别具有重要意义。方法:本研究提出了一种基于轻量级YOLOv11n模型的方法。通过引入高效的多尺度特征提取模块DWR,增强了模型的实时性。此外,还引入了双向特征金字塔网络(BiFPN),增强了多尺度特征融合的能力。最后,我们开发了LC-YOLOmatch半监督学习框架,有效地解决了医疗领域标记数据稀缺的问题。结果:在公开的Cholec80数据集上的实验结果表明,该方法实现了70%的mAP50和40.8%的mAP50-95,达到了一个新的技术水平,减少了对人工标注的依赖。讨论:这些改进不仅突出了其在自动化手术中的潜力,而且显著增强了腹腔镜手术的辅助,同时有效地降低了并发症的发生率。
{"title":"LC-YOLOmatch: a novel scene segmentation approach based on YOLO for laparoscopic cholecystectomy.","authors":"Hong Long, Yuancheng Shao, Mini Han Wang, Fengshi Jing, Yuqiao Chen, Shuai Xiao, Jia Gu","doi":"10.3389/frai.2025.1706021","DOIUrl":"10.3389/frai.2025.1706021","url":null,"abstract":"<p><strong>Introduction: </strong>Laparoscopy is a visual biosensor that can obtain real-time images of the body cavity, assisting in minimally invasive surgery. Laparoscopic cholecystectomy is one of the most frequently performed endoscopic surgeries and the most fundamental modular surgery. However, many iatrogenic complications still occur each year, mainly due to the anatomical recognition errors of surgeons. Therefore, the development of artificial intelligence (AI)-assisted recognition is of great significance.</p><p><strong>Methods: </strong>This study proposes a method based on the lightweight YOLOv11n model. By introducing the efficient multi-scale feature extraction module, DWR, the real-time performance of the model is enhanced. Additionally, the bidirectional feature pyramid network (BiFPN) is incorporated to strengthen the capability of multi-scale feature fusion. Finally, we developed the LC-YOLOmatch semi-supervised learning framework, which effectively addresses the issue of scarce labeled data in the medical field.</p><p><strong>Results: </strong>Experimental results on the publicly available Cholec80 dataset show that this method achieves 70% mAP50 and 40.8% mAP50-95, reaching a new technical level and reducing the reliance on manual annotations.</p><p><strong>Discussion: </strong>These improvements not only highlight its potential in automated surgeries but also significantly enhance assistance in laparoscopic procedures while effectively reducing the incidence of complications.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1706021"},"PeriodicalIF":4.7,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12719465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145821316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crowdsourcing lexical diversity. 众包词汇多样性。
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-05 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1648073
Hadi Khalilia, Jahna Otterbacher, Gábor Bella, Shandy Darma, Fausto Giunchiglia

Lexical-semantic resources (LSRs), such as online lexicons and wordnets, are fundamental to natural language processing applications as well as to fields such as linguistic anthropology and language preservation. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also the rarely addressed issue of bias toward the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual lexical gaps, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food. Our LingoGap crowdsourcing platform facilitates comparisons through microtasks identifying equivalent terms, language-specific terms, and lexical gaps across languages. We validated our method by applying it to two case studies focused on food-related terminology: (1) English and Arabic, and (2) Standard Indonesian and Banjarese. These experiments identified 2,140 lexical gaps in the first case study and 951 in the second. The success of these experiments confirmed the usability of our method and tool for future large-scale lexicon enrichment tasks.

词汇语义资源(lsr),如在线词汇和词网,是自然语言处理应用以及语言人类学和语言保存等领域的基础。然而,在许多语言中,这些资源存在质量问题:条目不正确、不完整,而且很少涉及对英语语言和盎格鲁-撒克逊文化的偏见问题。这种偏见表现在缺乏语言或文化特有的概念,存在外国(盎格鲁-撒克逊)概念,以及缺乏明确的不可译性指示,也称为跨语言词汇差距,当一个术语在另一种语言中没有对应词时。本文提出了一种新颖的众包方法来减少lsr中的偏见。群体工作者比较两种语言的词汇,重点关注词汇多样性丰富的领域,如亲属关系或食物。我们的LingoGap众包平台通过微任务识别语言之间的等价术语、特定语言术语和词汇差距,促进了比较。我们通过将其应用于两个专注于食品相关术语的案例研究来验证我们的方法:(1)英语和阿拉伯语,以及(2)标准印尼语和孟加拉语。这些实验在第一个案例研究中发现了2140个词汇缺口,在第二个案例研究中发现了951个。这些实验的成功证实了我们的方法和工具在未来大规模词汇丰富任务中的可用性。
{"title":"Crowdsourcing lexical diversity.","authors":"Hadi Khalilia, Jahna Otterbacher, Gábor Bella, Shandy Darma, Fausto Giunchiglia","doi":"10.3389/frai.2025.1648073","DOIUrl":"10.3389/frai.2025.1648073","url":null,"abstract":"<p><p>Lexical-semantic resources (LSRs), such as online lexicons and wordnets, are fundamental to natural language processing applications as well as to fields such as linguistic anthropology and language preservation. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also the rarely addressed issue of bias toward the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual <i>lexical gaps</i>, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food. Our LingoGap crowdsourcing platform facilitates comparisons through microtasks identifying equivalent terms, language-specific terms, and lexical gaps across languages. We validated our method by applying it to two case studies focused on food-related terminology: (1) English and Arabic, and (2) Standard Indonesian and Banjarese. These experiments identified 2,140 lexical gaps in the first case study and 951 in the second. The success of these experiments confirmed the usability of our method and tool for future large-scale lexicon enrichment tasks.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1648073"},"PeriodicalIF":4.7,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring trust factors in AI-healthcare integration: a rapid review. 探索人工智能-医疗一体化中的信任因素:快速回顾。
IF 4.7 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-04 eCollection Date: 2025-01-01 DOI: 10.3389/frai.2025.1658510
Megan Mertz, Kelvi Toskovich, Gavin Shields, Ghislaine Attema, Jennifer Dumond, Erin Cameron

This rapid review explores how artificial intelligence (AI) is integrated into healthcare and examines the factors influencing trust between users and AI systems. By systematically identifying trust-related determinants, this review provides actionable insights to support effective AI adoption in clinical settings. A comprehensive search of MEDLINE (Ovid), Embase (Ovid), and CINAHL (Ebsco) using keywords related to AI, healthcare, and trust yielded 872 unique citations, of which 40 studies met the inclusion criteria after screening. Three core themes were identified. AI literacy highlights the importance of user understanding of AI inputs, processes, and outputs in fostering trust among patients and clinicians. AI psychology reflects demographic and experiential influences on trust, such as age, gender, and prior AI exposure. AI utility emphasizes perceived usefulness, system efficiency, and integration within clinical workflows. Additional considerations include anthropomorphism, privacy and security concerns, and trust-repair mechanisms following system errors, particularly in high-risk clinical contexts. Overall, this review advances the understanding of trustworthy AI in healthcare and offers guidance for future implementation strategies and policy development.

本快速回顾探讨了人工智能(AI)如何集成到医疗保健中,并检查了影响用户和AI系统之间信任的因素。通过系统地识别与信任相关的决定因素,本综述提供了可操作的见解,以支持在临床环境中有效采用人工智能。使用与人工智能、医疗保健和信任相关的关键词对MEDLINE (Ovid)、Embase (Ovid)和CINAHL (Ebsco)进行综合搜索,得到872条唯一引用,其中40项研究经过筛选符合纳入标准。确定了三个核心主题。人工智能素养强调了用户对人工智能输入、过程和输出的理解对于促进患者和临床医生之间的信任的重要性。人工智能心理学反映了人口统计学和经验对信任的影响,如年龄、性别和之前接触过人工智能。人工智能实用强调临床工作流程中的感知有用性、系统效率和集成。其他考虑因素包括拟人化,隐私和安全问题,以及系统错误后的信任修复机制,特别是在高风险临床环境中。总的来说,这篇综述促进了对医疗保健中可信赖的人工智能的理解,并为未来的实施策略和政策制定提供了指导。
{"title":"Exploring trust factors in AI-healthcare integration: a rapid review.","authors":"Megan Mertz, Kelvi Toskovich, Gavin Shields, Ghislaine Attema, Jennifer Dumond, Erin Cameron","doi":"10.3389/frai.2025.1658510","DOIUrl":"10.3389/frai.2025.1658510","url":null,"abstract":"<p><p>This rapid review explores how artificial intelligence (AI) is integrated into healthcare and examines the factors influencing trust between users and AI systems. By systematically identifying trust-related determinants, this review provides actionable insights to support effective AI adoption in clinical settings. A comprehensive search of MEDLINE (Ovid), Embase (Ovid), and CINAHL (Ebsco) using keywords related to AI, healthcare, and trust yielded 872 unique citations, of which 40 studies met the inclusion criteria after screening. Three core themes were identified. AI literacy highlights the importance of user understanding of AI inputs, processes, and outputs in fostering trust among patients and clinicians. AI psychology reflects demographic and experiential influences on trust, such as age, gender, and prior AI exposure. AI utility emphasizes perceived usefulness, system efficiency, and integration within clinical workflows. Additional considerations include anthropomorphism, privacy and security concerns, and trust-repair mechanisms following system errors, particularly in high-risk clinical contexts. Overall, this review advances the understanding of trustworthy AI in healthcare and offers guidance for future implementation strategies and policy development.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1658510"},"PeriodicalIF":4.7,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12712919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145805551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in Artificial Intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1