首页 > 最新文献

Intelligent Systems with Applications最新文献

英文 中文
Interpretable event diagnosis in water distribution networks 配水网络中的可解释事件诊断
IF 4.3 Pub Date : 2025-12-18 DOI: 10.1016/j.iswa.2025.200621
André Artelt , Stelios G. Vrachimis , Demetrios G. Eliades , Ulrike Kuhl , Barbara Hammer , Marios M. Polycarpou
The increasing penetration of information and communication technologies in the design, monitoring, and control of water systems enables the use of algorithms for detecting and identifying unanticipated events (such as leakages or water contamination) using sensor measurements. However, data-driven methodologies do not always give accurate results and are often not trusted by operators, who may prefer to use their engineering judgment and experience to deal with such events.
In this work, we propose a framework for interpretable event diagnosis — an approach that assists the operators in associating the results of algorithmic event diagnosis methodologies with their own intuition and experience. This is achieved by providing contrasting (i.e., counterfactual) explanations of the results provided by fault diagnosis algorithms; their aim is to improve the understanding of the algorithm’s inner workings by the operators, thus enabling them to take a more informed decision by combining the results with their personal experiences. Specifically, we propose counterfactual event fingerprints, a representation of the difference between the current event diagnosis and the closest alternative explanation, which can be presented in a graphical way. The proposed methodology is applied and evaluated on a realistic use case using the L-Town benchmark.
信息和通信技术在水系统的设计、监测和控制方面的日益普及,使利用传感器测量来检测和识别意外事件(如泄漏或水污染)的算法成为可能。然而,数据驱动的方法并不总是能给出准确的结果,而且通常不被作业者所信任,作业者可能更愿意使用他们的工程判断和经验来处理此类事件。在这项工作中,我们提出了一个可解释事件诊断的框架-一种帮助操作员将算法事件诊断方法的结果与他们自己的直觉和经验联系起来的方法。这是通过对故障诊断算法提供的结果提供对比(即反事实)解释来实现的;他们的目标是提高操作员对算法内部工作原理的理解,从而使他们能够通过将结果与个人经验相结合来做出更明智的决策。具体来说,我们提出了反事实事件指纹,这是当前事件诊断与最接近的替代解释之间差异的表征,可以以图形方式呈现。建议的方法使用L-Town基准在实际用例中应用和评估。
{"title":"Interpretable event diagnosis in water distribution networks","authors":"André Artelt ,&nbsp;Stelios G. Vrachimis ,&nbsp;Demetrios G. Eliades ,&nbsp;Ulrike Kuhl ,&nbsp;Barbara Hammer ,&nbsp;Marios M. Polycarpou","doi":"10.1016/j.iswa.2025.200621","DOIUrl":"10.1016/j.iswa.2025.200621","url":null,"abstract":"<div><div>The increasing penetration of information and communication technologies in the design, monitoring, and control of water systems enables the use of algorithms for detecting and identifying unanticipated events (such as leakages or water contamination) using sensor measurements. However, data-driven methodologies do not always give accurate results and are often not trusted by operators, who may prefer to use their engineering judgment and experience to deal with such events.</div><div>In this work, we propose a framework for interpretable event diagnosis — an approach that assists the operators in associating the results of algorithmic event diagnosis methodologies with their own intuition and experience. This is achieved by providing contrasting (i.e., counterfactual) explanations of the results provided by fault diagnosis algorithms; their aim is to improve the understanding of the algorithm’s inner workings by the operators, thus enabling them to take a more informed decision by combining the results with their personal experiences. Specifically, we propose <em>counterfactual event fingerprints</em>, a representation of the difference between the current event diagnosis and the closest alternative explanation, which can be presented in a graphical way. The proposed methodology is applied and evaluated on a realistic use case using the L-Town benchmark.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200621"},"PeriodicalIF":4.3,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FireBoost: A new bio-inspired approach for feature selection based on firefly algorithm and optimized XGBoost FireBoost:一种基于萤火虫算法和优化的XGBoost的生物特征选择新方法
IF 4.3 Pub Date : 2025-12-17 DOI: 10.1016/j.iswa.2025.200613
Nafaa Jabeur
High-dimensional data often reduce model efficiency and interpretability by introducing redundant or irrelevant features. This challenge is especially critical in domains like healthcare and cybersecurity, where both accuracy and explainability are essential. To address this, we introduce FireBoost, a novel hybrid framework that enhances classification performance through effective feature selection and optimized model training. FireBoost integrates the Firefly Algorithm (FFA) for selecting the most informative features with a customized version of XGBoost. The customized learner includes dynamic learning-rate decay, feature-specific binning, and mini-batch gradient updates. Unlike existing hybrid models, FireBoost tightly couples the selection and learning phases, enabling informed, performance-driven feature prioritization. Experiments on the METABRIC and KDD datasets demonstrate that FireBoost consistently reduces feature dimensionality while maintaining or improving classification accuracy and training speed. It outperforms standard ensemble models and shows robustness across different parameter settings. FireBoost thus provides a scalable and interpretable solution for real-world binary classification tasks involving high-dimensional data.
高维数据通常通过引入冗余或不相关的特征来降低模型的效率和可解释性。这一挑战在医疗保健和网络安全等领域尤为关键,因为这些领域的准确性和可解释性都至关重要。为了解决这个问题,我们引入了FireBoost,这是一个新的混合框架,通过有效的特征选择和优化的模型训练来提高分类性能。FireBoost集成了萤火虫算法(FFA),用于选择最具信息量的功能与定制版本的XGBoost。定制的学习器包括动态学习率衰减、特定特征分类和小批量梯度更新。与现有的混合模型不同,FireBoost将选择和学习阶段紧密结合在一起,从而实现明智的、性能驱动的功能优先级。在METABRIC和KDD数据集上的实验表明,FireBoost在保持或提高分类精度和训练速度的同时,持续地降低了特征维数。它优于标准集成模型,并在不同参数设置中显示出鲁棒性。因此,FireBoost为涉及高维数据的现实世界的二进制分类任务提供了可扩展和可解释的解决方案。
{"title":"FireBoost: A new bio-inspired approach for feature selection based on firefly algorithm and optimized XGBoost","authors":"Nafaa Jabeur","doi":"10.1016/j.iswa.2025.200613","DOIUrl":"10.1016/j.iswa.2025.200613","url":null,"abstract":"<div><div>High-dimensional data often reduce model efficiency and interpretability by introducing redundant or irrelevant features. This challenge is especially critical in domains like healthcare and cybersecurity, where both accuracy and explainability are essential. To address this, we introduce FireBoost, a novel hybrid framework that enhances classification performance through effective feature selection and optimized model training. FireBoost integrates the Firefly Algorithm (FFA) for selecting the most informative features with a customized version of XGBoost. The customized learner includes dynamic learning-rate decay, feature-specific binning, and mini-batch gradient updates. Unlike existing hybrid models, FireBoost tightly couples the selection and learning phases, enabling informed, performance-driven feature prioritization. Experiments on the METABRIC and KDD datasets demonstrate that FireBoost consistently reduces feature dimensionality while maintaining or improving classification accuracy and training speed. It outperforms standard ensemble models and shows robustness across different parameter settings. FireBoost thus provides a scalable and interpretable solution for real-world binary classification tasks involving high-dimensional data.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200613"},"PeriodicalIF":4.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UAV exploration for indoor navigation based on deep reinforcement learning and intrinsic curiosity 基于深度强化学习和内在好奇心的无人机室内导航探索
IF 4.3 Pub Date : 2025-12-16 DOI: 10.1016/j.iswa.2025.200618
Huei-Yung Lin , Xi-Sheng Zhang , Syahrul Munir
The operational versatility of Unmanned Aerial Vehicles (UAVs) continues to drive rapid development in the field of UAV. However, a critical challenge for diverse applications — such as search and rescue or warehouse inspection — is exploring the environment autonomously. Traditional exploration approaches are often hindered in practical deployments because they require precise navigation path planning and pre-defined obstacle avoidance rules for each of the testing environments. This paper presents a UAV indoor exploration technique based on deep reinforcement learning (DRL) and intrinsic curiosity. By integrating the reward function based on the extrinsic DRL reward and the intrinsic reward, the UAV is able to autonomously establish exploration strategies and actively encourage the exploration of unknown areas. In addition, NoisyNet is introduced to assess the value of different actions during the early stages of exploration. This proposed method will significantly improve the coverage of the exploration while relying solely on visual input. The effectiveness of our proposed technique is validated through experimental comparisons with several state-of-the-art algorithms. It achieves around at least 15% more exploration coverage at the same flight time compared to others, while achieving at least 20% less exploration distance at the same exploration coverage.
无人机操作的多功能性不断推动着无人机领域的快速发展。然而,对于各种应用(如搜索和救援或仓库检查)来说,一个关键的挑战是自主探索环境。传统的探测方法在实际部署中经常受到阻碍,因为它们需要精确的导航路径规划和针对每个测试环境预先定义的避障规则。提出了一种基于深度强化学习(DRL)和内在好奇心的无人机室内探测技术。通过整合基于外在DRL奖励和内在奖励的奖励函数,无人机能够自主制定探索策略,积极鼓励对未知区域的探索。此外,引入NoisyNet来评估在探索的早期阶段不同行动的价值。该方法在完全依赖视觉输入的情况下,显著提高了探测的覆盖范围。通过与几种最先进算法的实验比较,验证了我们提出的技术的有效性。在相同的飞行时间内,与其他飞机相比,它的勘探范围至少增加了15%,而在相同的勘探范围内,它的勘探距离至少减少了20%。
{"title":"UAV exploration for indoor navigation based on deep reinforcement learning and intrinsic curiosity","authors":"Huei-Yung Lin ,&nbsp;Xi-Sheng Zhang ,&nbsp;Syahrul Munir","doi":"10.1016/j.iswa.2025.200618","DOIUrl":"10.1016/j.iswa.2025.200618","url":null,"abstract":"<div><div>The operational versatility of Unmanned Aerial Vehicles (UAVs) continues to drive rapid development in the field of UAV. However, a critical challenge for diverse applications — such as search and rescue or warehouse inspection — is exploring the environment autonomously. Traditional exploration approaches are often hindered in practical deployments because they require precise navigation path planning and pre-defined obstacle avoidance rules for each of the testing environments. This paper presents a UAV indoor exploration technique based on deep reinforcement learning (DRL) and intrinsic curiosity. By integrating the reward function based on the extrinsic DRL reward and the intrinsic reward, the UAV is able to autonomously establish exploration strategies and actively encourage the exploration of unknown areas. In addition, NoisyNet is introduced to assess the value of different actions during the early stages of exploration. This proposed method will significantly improve the coverage of the exploration while relying solely on visual input. The effectiveness of our proposed technique is validated through experimental comparisons with several state-of-the-art algorithms. It achieves around at least 15% more exploration coverage at the same flight time compared to others, while achieving at least 20% less exploration distance at the same exploration coverage.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200618"},"PeriodicalIF":4.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision transformers in precision agriculture: A comprehensive survey 精准农业中的视觉变压器:综合调查
IF 4.3 Pub Date : 2025-12-13 DOI: 10.1016/j.iswa.2025.200617
Saber Mehdipour , Seyed Abolghasem Mirroshandel , Seyed Amirhossein Tabatabaei
Detecting plant diseases is a crucial aspect of modern agriculture, playing a key role in maintaining crop health and ensuring sustainable yields. Traditional approaches, though still valuable, often rely on manual inspection or conventional machine learning (ML) techniques, both of which face limitations in scalability and accuracy. The emergence of Vision Transformers (ViTs) marks a significant shift in this landscape by enabling superior modeling of long-range dependencies and offering improved scalability for complex visual tasks. This survey provides a rigorous and structured analysis of impactful studies that employ ViT-based models, along with a comprehensive categorization of existing research. It also offers a quantitative synthesis of reported performance — with accuracies ranging from 75.00% to 100.00% — highlighting clear trends in model effectiveness and identifying consistently high-performing architectures. In addition, this study examines the inductive biases of CNNs and ViTs, which is the first analysis of these architectural priors within an agricultural context. Further contributions include a comparative taxonomy of prior studies, an evaluation of dataset limitations and metric inconsistencies, and a statistical assessment of model efficiency across diverse crop-image sources. Collectively, these efforts clarify the current state of the field, identify critical research gaps, and outline key challenges — such as data diversity, interpretability, computational cost, and field adaptability — that must be addressed to advance the practical deployment of ViT technologies in precision agriculture.
植物病害检测是现代农业的一个重要方面,在保持作物健康和确保可持续产量方面发挥着关键作用。传统方法虽然仍然有价值,但通常依赖于人工检查或传统的机器学习(ML)技术,这两种方法在可扩展性和准确性方面都存在局限性。视觉转换器(vit)的出现标志着这一领域的重大转变,它支持远程依赖关系的高级建模,并为复杂的视觉任务提供改进的可伸缩性。本调查提供了一个严谨的和结构化的分析,有影响力的研究,采用基于虚拟现实的模型,以及现有研究的全面分类。它还提供了报告性能的定量综合——准确度范围从75.00%到100.00%——突出了模型有效性的清晰趋势,并确定了始终如一的高性能架构。此外,本研究考察了cnn和vit的归纳偏差,这是在农业背景下对这些建筑先验的首次分析。进一步的贡献包括对先前研究的比较分类,对数据集局限性和度量不一致性的评估,以及对不同作物图像来源的模型效率的统计评估。总的来说,这些努力澄清了该领域的现状,确定了关键的研究差距,并概述了关键挑战——例如数据多样性、可解释性、计算成本和现场适应性——必须解决这些问题,以推进ViT技术在精准农业中的实际部署。
{"title":"Vision transformers in precision agriculture: A comprehensive survey","authors":"Saber Mehdipour ,&nbsp;Seyed Abolghasem Mirroshandel ,&nbsp;Seyed Amirhossein Tabatabaei","doi":"10.1016/j.iswa.2025.200617","DOIUrl":"10.1016/j.iswa.2025.200617","url":null,"abstract":"<div><div>Detecting plant diseases is a crucial aspect of modern agriculture, playing a key role in maintaining crop health and ensuring sustainable yields. Traditional approaches, though still valuable, often rely on manual inspection or conventional machine learning (ML) techniques, both of which face limitations in scalability and accuracy. The emergence of Vision Transformers (ViTs) marks a significant shift in this landscape by enabling superior modeling of long-range dependencies and offering improved scalability for complex visual tasks. This survey provides a rigorous and structured analysis of impactful studies that employ ViT-based models, along with a comprehensive categorization of existing research. It also offers a quantitative synthesis of reported performance — with accuracies ranging from 75.00% to 100.00% — highlighting clear trends in model effectiveness and identifying consistently high-performing architectures. In addition, this study examines the inductive biases of CNNs and ViTs, which is the first analysis of these architectural priors within an agricultural context. Further contributions include a comparative taxonomy of prior studies, an evaluation of dataset limitations and metric inconsistencies, and a statistical assessment of model efficiency across diverse crop-image sources. Collectively, these efforts clarify the current state of the field, identify critical research gaps, and outline key challenges — such as data diversity, interpretability, computational cost, and field adaptability — that must be addressed to advance the practical deployment of ViT technologies in precision agriculture.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200617"},"PeriodicalIF":4.3,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing token boundary detection in disfluent speech 非流利语音中token边界检测的增强
IF 4.3 Pub Date : 2025-12-06 DOI: 10.1016/j.iswa.2025.200614
Manu Srivastava , Marcello Ferro , Vito Pirrelli , Gianpaolo Coro
This paper presents an open-source Automatic Speech Recognition (ASR) pipeline optimised for disfluent Italian read speech, designed to enhance both transcription accuracy and token boundary precision in low-resource settings. The study aims to address the difficulty that conventional ASR systems face in capturing the temporal irregularities of disfluent reading, which are crucial for psycholinguistic and clinical analyses of fluency. Building upon the WhisperX framework, the proposed system replaces the neural Voice Activity Detection module with an energy-based segmentation algorithm designed to preserve prosodic cues such as pauses and hesitations. A dual-alignment strategy integrates two complementary phoneme-level ASR models to correct onset–offset asymmetries, while a bias-compensation post-processing step mitigates systematic timing errors. Evaluation on the READLET (child read speech) and CLIPS (adult read speech) corpora shows consistent improvements over baseline systems, confirming enhanced robustness in boundary detection and transcription under disfluent conditions. The results demonstrate that the proposed architecture provides a general, language-independent framework for accurate alignment and disfluency-aware ASR. The approach can support downstream analyses of reading fluency and speech planning, contributing to both computational linguistics and clinical speech research.
本文提出了一种开源的自动语音识别(ASR)管道,该管道针对不流畅的意大利语读语音进行了优化,旨在提高低资源设置下的转录精度和令牌边界精度。本研究旨在解决传统的ASR系统在捕捉非流利阅读的时间不规则性方面所面临的困难,这对于流利性的心理语言学和临床分析至关重要。在WhisperX框架的基础上,该系统用基于能量的分割算法取代了神经语音活动检测模块,该算法旨在保留停顿和犹豫等韵律线索。双对齐策略集成了两个互补的音素级ASR模型来纠正初始偏移不对称,而偏置补偿后处理步骤则减轻了系统时序误差。对READLET(儿童读语)和CLIPS(成人读语)语料库的评估显示,与基线系统相比,该语料库具有一致性的改进,证实了在非流畅条件下边界检测和转录的鲁棒性增强。结果表明,所提出的体系结构为精确对齐和不流畅感知ASR提供了一个通用的、与语言无关的框架。该方法可以支持阅读流畅性和言语规划的下游分析,为计算语言学和临床言语研究做出贡献。
{"title":"Enhancing token boundary detection in disfluent speech","authors":"Manu Srivastava ,&nbsp;Marcello Ferro ,&nbsp;Vito Pirrelli ,&nbsp;Gianpaolo Coro","doi":"10.1016/j.iswa.2025.200614","DOIUrl":"10.1016/j.iswa.2025.200614","url":null,"abstract":"<div><div>This paper presents an open-source Automatic Speech Recognition (ASR) pipeline optimised for disfluent Italian read speech, designed to enhance both transcription accuracy and token boundary precision in low-resource settings. The study aims to address the difficulty that conventional ASR systems face in capturing the temporal irregularities of disfluent reading, which are crucial for psycholinguistic and clinical analyses of fluency. Building upon the WhisperX framework, the proposed system replaces the neural Voice Activity Detection module with an energy-based segmentation algorithm designed to preserve prosodic cues such as pauses and hesitations. A dual-alignment strategy integrates two complementary phoneme-level ASR models to correct onset–offset asymmetries, while a bias-compensation post-processing step mitigates systematic timing errors. Evaluation on the READLET (child read speech) and CLIPS (adult read speech) corpora shows consistent improvements over baseline systems, confirming enhanced robustness in boundary detection and transcription under disfluent conditions. The results demonstrate that the proposed architecture provides a general, language-independent framework for accurate alignment and disfluency-aware ASR. The approach can support downstream analyses of reading fluency and speech planning, contributing to both computational linguistics and clinical speech research.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200614"},"PeriodicalIF":4.3,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic review of vision transformer and explainable AI advances in multimodal facial expression recognition 系统回顾了视觉转换器和可解释人工智能在多模态面部表情识别中的进展
IF 4.3 Pub Date : 2025-12-06 DOI: 10.1016/j.iswa.2025.200615
Ilya Kus , Cemal Kocak , Ayse Keles
Facial expression is one of the most important indicators used to convey human emotions. Facial expression recognition is the process of automatically detecting and classifying these expressions by computer systems. Multimodal facial expression recognition aims to perform a more accurate and comprehensive emotion analysis by combining facial expressions with different modalities such as image, speech, Electroencephalogram (EEG), or text. This study systematically reviews research conducted between 2021 and 2025 on the Vision Transformer (ViT) based approaches and Explainable Artificial Intelligence (XAI) techniques in multimodal facial expression recognition, as well as the datasets employed in these studies. The findings indicate that ViT-based models outperform conventional Convolutional Neural Networks (CNNs) by effectively capturing long-range dependencies between spatially distant facial regions, thereby enhancing emotion classification accuracy. However, significant challenges remain, including data privacy risks arising from the collection of multimodal biometric information, data imbalance and inter-modality incompatibility, high computational costs hindering real-time applications, and limited progress in model explainability. Overall, this study highlights that integrating advanced ViT architectures with robust XAI and privacy-preserving techniques can enhance the reliability, transparency, and ethical deployment of multimodal facial expression recognition systems.
面部表情是用来传达人类情感的最重要的指标之一。面部表情识别是计算机系统对面部表情进行自动检测和分类的过程。多模态面部表情识别旨在通过将面部表情与图像、语音、脑电图(EEG)或文本等不同模态相结合,进行更准确、更全面的情绪分析。本研究系统回顾了2021年至2025年间在多模态面部表情识别中基于视觉变形(ViT)的方法和可解释人工智能(XAI)技术的研究,以及这些研究中使用的数据集。研究结果表明,基于vit的模型可以有效地捕获空间距离较远的面部区域之间的远程依赖关系,从而提高情绪分类的准确性,从而优于传统的卷积神经网络(cnn)。然而,重大挑战仍然存在,包括多模态生物特征信息收集带来的数据隐私风险、数据不平衡和模态间不兼容、阻碍实时应用的高计算成本,以及模型可解释性方面的有限进展。总之,本研究强调,将先进的ViT架构与强大的XAI和隐私保护技术相结合,可以提高多模态面部表情识别系统的可靠性、透明度和道德部署。
{"title":"A systematic review of vision transformer and explainable AI advances in multimodal facial expression recognition","authors":"Ilya Kus ,&nbsp;Cemal Kocak ,&nbsp;Ayse Keles","doi":"10.1016/j.iswa.2025.200615","DOIUrl":"10.1016/j.iswa.2025.200615","url":null,"abstract":"<div><div>Facial expression is one of the most important indicators used to convey human emotions. Facial expression recognition is the process of automatically detecting and classifying these expressions by computer systems. Multimodal facial expression recognition aims to perform a more accurate and comprehensive emotion analysis by combining facial expressions with different modalities such as image, speech, Electroencephalogram (EEG), or text. This study systematically reviews research conducted between 2021 and 2025 on the Vision Transformer (ViT) based approaches and Explainable Artificial Intelligence (XAI) techniques in multimodal facial expression recognition, as well as the datasets employed in these studies. The findings indicate that ViT-based models outperform conventional Convolutional Neural Networks (CNNs) by effectively capturing long-range dependencies between spatially distant facial regions, thereby enhancing emotion classification accuracy. However, significant challenges remain, including data privacy risks arising from the collection of multimodal biometric information, data imbalance and inter-modality incompatibility, high computational costs hindering real-time applications, and limited progress in model explainability. Overall, this study highlights that integrating advanced ViT architectures with robust XAI and privacy-preserving techniques can enhance the reliability, transparency, and ethical deployment of multimodal facial expression recognition systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200615"},"PeriodicalIF":4.3,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers AL-ViT:标签高效罗布斯塔咖啡豆缺陷检测在泰国使用主动学习视觉变压器
IF 4.3 Pub Date : 2025-11-28 DOI: 10.1016/j.iswa.2025.200612
Sirawich Vachmanus , Wimolsiri Pridasawas , Worapan Kusakunniran , Kitti Thamrongaphichartkul , Noppanan Phinklao
In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand’s rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired t-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.
在主要的培训和出口市场,咖啡豆分级过程仍然严重依赖人工从大量收获的咖啡豆中对单个咖啡豆进行分类。这项劳动密集型任务耗时、成本高昂,而且容易出现人为错误,尤其是在泰国迅速扩张的罗布斯塔咖啡行业。本研究介绍了AL-ViT,这是一个端到端的主动学习视觉转换器框架,可在单个面向生产的管道中实现主动学习和基于转换器的特征提取。该框架将ViT-Base/16主干与7种主动学习(AL)查询策略集成在一起:随机抽样、基于熵的选择、基于分歧的贝叶斯主动学习(BALD)、基于不同梯度嵌入的批量主动学习(BADGE)、核心集多样性抽样、集成分歧,以及一种新的混合不确定性多样性策略,旨在平衡样本获取过程中的信息性和代表性。在与分级机设置一致的受控照明条件下,收集了2098个罗布斯塔咖啡豆图像的高分辨率数据集,最初只有5%被标记,其余的形成人工智能池。在五个随机种子中,没有MixUp增强的混合策略仅使用850个标签(占数据集的41%)就实现了97.1%的准确率和0.956的F1bad,在0.3个百分点的完全监督范围内。操作可靠性,定义为95%的准确性,与先前的检查基准一致,仅达到407个标签,反映了75%的注释减少。熵采样显示了最快的早期收益,而BADGE滞后了1 pp;Core-Set和Ensemble提供了中等但稳定的结果。增强和校准分析表明,显式方法(MixUp, CutMix, RandAugment)没有进一步的好处,混合管道已经获得了很好的校准概率。通过配对t检验、效应大小和自举ci进行的统计验证证实,不确定性驱动策略比随机抽样有一致性的改进。总体而言,拟议的AL-ViT框架为农业质量控制建立了一种高效的标签和实际可部署的方法,以一小部分标签成本实现了近乎监督的准确性。
{"title":"AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers","authors":"Sirawich Vachmanus ,&nbsp;Wimolsiri Pridasawas ,&nbsp;Worapan Kusakunniran ,&nbsp;Kitti Thamrongaphichartkul ,&nbsp;Noppanan Phinklao","doi":"10.1016/j.iswa.2025.200612","DOIUrl":"10.1016/j.iswa.2025.200612","url":null,"abstract":"<div><div>In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand’s rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by &gt;1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired <em>t</em>-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200612"},"PeriodicalIF":4.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal document classification in AEC asset management AEC资产管理中的多模态文档分类
IF 4.3 Pub Date : 2025-11-19 DOI: 10.1016/j.iswa.2025.200609
Floor Rademaker , Faizan Ahmed , Marcos R. Machado
The digitalization of asset management within the architecture, engineering and construction (AEC) sector is in need of effective methods for the automatic classification of documents. This study focuses on the development and evaluation of multimodal document classification models, utilizing visual, textual, and layout-related document information. We examine various state-of-the-art machine learning models and combine them through an iterative development process. The performance of these models is evaluated on two different AEC-document datasets. The results demonstrate that each of the modalities is useful in classifying the documents, as well as the integration of the different information types. This study contributes by applying AI techniques, specifically document classification in the AEC sector, setting the initial step to automating information extraction and processing for Intelligent Asset Management, and lastly, by combining and comparing multimodal state-of-the-art classification models on real-life datasets.
建筑、工程和建设(AEC)部门资产管理的数字化需要有效的文件自动分类方法。本研究的重点是开发和评估多模态文档分类模型,利用视觉、文本和布局相关的文档信息。我们研究了各种最先进的机器学习模型,并通过迭代开发过程将它们结合起来。在两个不同的aec文档数据集上评估了这些模型的性能。结果表明,每种模式对文档分类以及不同信息类型的集成都是有用的。本研究通过应用人工智能技术,特别是AEC领域的文档分类,为智能资产管理的自动化信息提取和处理设置了第一步,最后,通过结合和比较现实数据集上的多模态最新分类模型,做出了贡献。
{"title":"Multi-modal document classification in AEC asset management","authors":"Floor Rademaker ,&nbsp;Faizan Ahmed ,&nbsp;Marcos R. Machado","doi":"10.1016/j.iswa.2025.200609","DOIUrl":"10.1016/j.iswa.2025.200609","url":null,"abstract":"<div><div>The digitalization of asset management within the architecture, engineering and construction (AEC) sector is in need of effective methods for the automatic classification of documents. This study focuses on the development and evaluation of multimodal document classification models, utilizing visual, textual, and layout-related document information. We examine various state-of-the-art machine learning models and combine them through an iterative development process. The performance of these models is evaluated on two different AEC-document datasets. The results demonstrate that each of the modalities is useful in classifying the documents, as well as the integration of the different information types. This study contributes by applying AI techniques, specifically document classification in the AEC sector, setting the initial step to automating information extraction and processing for Intelligent Asset Management, and lastly, by combining and comparing multimodal state-of-the-art classification models on real-life datasets.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200609"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145555492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AIOps for log anomaly detection in the era of LLMs: A systematic literature review llm时代日志异常检测的AIOps:系统的文献综述
IF 4.3 Pub Date : 2025-11-19 DOI: 10.1016/j.iswa.2025.200608
Miguel De la Cruz Cabello , Tiago Prince Sales , Marcos R. Machado
Modern IT systems generate large volumes of log data that challenge timely and effective anomaly detection. Traditional methods often require intensive feature engineering and struggle to adapt to dynamic operational environments. This Systematic Literature Review (SLR) analyzes how Artificial Intelligence for IT Operations (AIOps) benefits from advanced language models, emphasizing Large Language Models (LLMs) for more effective log anomaly detection. By comparing state-of-art frameworks with LLM-driven methods, this study reveals that prompt engineering – the practice of designing and refining inputs to AI models to produce accurate and useful outputs – and Retrieval Augmented Generation (RAG) boost accuracy and interpretability without extensive fine-tuning. Experimental findings demonstrate that LLM-based approaches significantly outperform traditional methods across evaluation metrics that include F1-score, precision, and recall. Furthermore, the integration of LLMs with RAG techniques has shown a strong adaptability to changing environments. The applicability of these methods also extends to the military industry. Consequently, the development of specialized LLM systems with RAG tailored for the military industry represents a promising research direction to improve operational effectiveness and responsiveness of defense systems.
现代IT系统产生大量的日志数据,这对及时有效的异常检测提出了挑战。传统的方法通常需要密集的特征工程,并且难以适应动态的操作环境。这篇系统性文献综述(SLR)分析了IT运营人工智能(AIOps)如何从高级语言模型中受益,强调了大型语言模型(llm)可以更有效地检测日志异常。通过比较最先进的框架与法学硕士驱动的方法,本研究表明,快速工程(设计和改进人工智能模型输入的实践,以产生准确和有用的输出)和检索增强生成(RAG)提高了准确性和可解释性,而无需进行大量微调。实验结果表明,基于法学硕士的方法在评估指标(包括f1分数、精度和召回率)上明显优于传统方法。此外,法学硕士与RAG技术的集成显示出对不断变化的环境的强大适应性。这些方法的适用性也延伸到军事工业。因此,为军事工业量身定制具有RAG的专用LLM系统的开发代表了一个有前途的研究方向,可以提高国防系统的作战效率和响应能力。
{"title":"AIOps for log anomaly detection in the era of LLMs: A systematic literature review","authors":"Miguel De la Cruz Cabello ,&nbsp;Tiago Prince Sales ,&nbsp;Marcos R. Machado","doi":"10.1016/j.iswa.2025.200608","DOIUrl":"10.1016/j.iswa.2025.200608","url":null,"abstract":"<div><div>Modern IT systems generate large volumes of log data that challenge timely and effective anomaly detection. Traditional methods often require intensive feature engineering and struggle to adapt to dynamic operational environments. This Systematic Literature Review (SLR) analyzes how Artificial Intelligence for IT Operations (AIOps) benefits from advanced language models, emphasizing Large Language Models (LLMs) for more effective log anomaly detection. By comparing state-of-art frameworks with LLM-driven methods, this study reveals that prompt engineering – the practice of designing and refining inputs to AI models to produce accurate and useful outputs – and Retrieval Augmented Generation (RAG) boost accuracy and interpretability without extensive fine-tuning. Experimental findings demonstrate that LLM-based approaches significantly outperform traditional methods across evaluation metrics that include F1-score, precision, and recall. Furthermore, the integration of LLMs with RAG techniques has shown a strong adaptability to changing environments. The applicability of these methods also extends to the military industry. Consequently, the development of specialized LLM systems with RAG tailored for the military industry represents a promising research direction to improve operational effectiveness and responsiveness of defense systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200608"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Musipainter: A music-conditioned generative architecture for artistic image synthesis Musipainter:一种以音乐为条件的艺术图像合成生成建筑
IF 4.3 Pub Date : 2025-11-19 DOI: 10.1016/j.iswa.2025.200611
Alfredo Baione , Giuseppe Rizzo , Luca Barco , Angelica Urbanelli , Luigi Di Biasi , Genoveffa Tortora
Generative art is a challenging area of research in deep generative modeling. Exploring AI’s role in human–machine co-creative processes requires understanding machine learning’s potential in the arts. Building on this premise, this paper presents Musipainter, a cross-modal generative framework adapted to create artistic images that are historically and stylistically aligned with 30-second musical inputs, with a focus on creative and semantic coherence. To support this goal, we introduce Museart, a dataset designed explicitly for this research, and GIILS, a creativity-oriented metric that enables us to assess both artistic-semantic consistency and diversity in the generated outputs. The results indicate that Musipainter, supported by the Museart dataset and the exploratory GIILS metric, can offer a foundation for further research on AI’s role in artistic generation, while also highlighting the need for systematic validation and future refinements.
生成艺术是深度生成建模中一个具有挑战性的研究领域。探索人工智能在人机共同创造过程中的作用需要理解机器学习在艺术中的潜力。在此前提下,本文介绍了Musipainter,这是一个跨模态生成框架,用于创建与30秒音乐输入在历史和风格上一致的艺术图像,重点是创造性和语义一致性。为了实现这一目标,我们引入了专门为本研究设计的数据集Museart和GIILS,这是一个以创造力为导向的度量标准,使我们能够评估生成输出中的艺术语义一致性和多样性。结果表明,在Museart数据集和探索性GIILS指标的支持下,Musipainter可以为进一步研究人工智能在艺术生成中的作用提供基础,同时也强调了系统验证和未来改进的必要性。
{"title":"Musipainter: A music-conditioned generative architecture for artistic image synthesis","authors":"Alfredo Baione ,&nbsp;Giuseppe Rizzo ,&nbsp;Luca Barco ,&nbsp;Angelica Urbanelli ,&nbsp;Luigi Di Biasi ,&nbsp;Genoveffa Tortora","doi":"10.1016/j.iswa.2025.200611","DOIUrl":"10.1016/j.iswa.2025.200611","url":null,"abstract":"<div><div>Generative art is a challenging area of research in deep generative modeling. Exploring AI’s role in human–machine co-creative processes requires understanding machine learning’s potential in the arts. Building on this premise, this paper presents Musipainter, a cross-modal generative framework adapted to create artistic images that are historically and stylistically aligned with 30-second musical inputs, with a focus on creative and semantic coherence. To support this goal, we introduce Museart, a dataset designed explicitly for this research, and GIILS, a creativity-oriented metric that enables us to assess both artistic-semantic consistency and diversity in the generated outputs. The results indicate that Musipainter, supported by the Museart dataset and the exploratory GIILS metric, can offer a foundation for further research on AI’s role in artistic generation, while also highlighting the need for systematic validation and future refinements.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200611"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Intelligent Systems with Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1