Pub Date : 2025-12-18DOI: 10.1016/j.iswa.2025.200621
André Artelt , Stelios G. Vrachimis , Demetrios G. Eliades , Ulrike Kuhl , Barbara Hammer , Marios M. Polycarpou
The increasing penetration of information and communication technologies in the design, monitoring, and control of water systems enables the use of algorithms for detecting and identifying unanticipated events (such as leakages or water contamination) using sensor measurements. However, data-driven methodologies do not always give accurate results and are often not trusted by operators, who may prefer to use their engineering judgment and experience to deal with such events.
In this work, we propose a framework for interpretable event diagnosis — an approach that assists the operators in associating the results of algorithmic event diagnosis methodologies with their own intuition and experience. This is achieved by providing contrasting (i.e., counterfactual) explanations of the results provided by fault diagnosis algorithms; their aim is to improve the understanding of the algorithm’s inner workings by the operators, thus enabling them to take a more informed decision by combining the results with their personal experiences. Specifically, we propose counterfactual event fingerprints, a representation of the difference between the current event diagnosis and the closest alternative explanation, which can be presented in a graphical way. The proposed methodology is applied and evaluated on a realistic use case using the L-Town benchmark.
{"title":"Interpretable event diagnosis in water distribution networks","authors":"André Artelt , Stelios G. Vrachimis , Demetrios G. Eliades , Ulrike Kuhl , Barbara Hammer , Marios M. Polycarpou","doi":"10.1016/j.iswa.2025.200621","DOIUrl":"10.1016/j.iswa.2025.200621","url":null,"abstract":"<div><div>The increasing penetration of information and communication technologies in the design, monitoring, and control of water systems enables the use of algorithms for detecting and identifying unanticipated events (such as leakages or water contamination) using sensor measurements. However, data-driven methodologies do not always give accurate results and are often not trusted by operators, who may prefer to use their engineering judgment and experience to deal with such events.</div><div>In this work, we propose a framework for interpretable event diagnosis — an approach that assists the operators in associating the results of algorithmic event diagnosis methodologies with their own intuition and experience. This is achieved by providing contrasting (i.e., counterfactual) explanations of the results provided by fault diagnosis algorithms; their aim is to improve the understanding of the algorithm’s inner workings by the operators, thus enabling them to take a more informed decision by combining the results with their personal experiences. Specifically, we propose <em>counterfactual event fingerprints</em>, a representation of the difference between the current event diagnosis and the closest alternative explanation, which can be presented in a graphical way. The proposed methodology is applied and evaluated on a realistic use case using the L-Town benchmark.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200621"},"PeriodicalIF":4.3,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.iswa.2025.200613
Nafaa Jabeur
High-dimensional data often reduce model efficiency and interpretability by introducing redundant or irrelevant features. This challenge is especially critical in domains like healthcare and cybersecurity, where both accuracy and explainability are essential. To address this, we introduce FireBoost, a novel hybrid framework that enhances classification performance through effective feature selection and optimized model training. FireBoost integrates the Firefly Algorithm (FFA) for selecting the most informative features with a customized version of XGBoost. The customized learner includes dynamic learning-rate decay, feature-specific binning, and mini-batch gradient updates. Unlike existing hybrid models, FireBoost tightly couples the selection and learning phases, enabling informed, performance-driven feature prioritization. Experiments on the METABRIC and KDD datasets demonstrate that FireBoost consistently reduces feature dimensionality while maintaining or improving classification accuracy and training speed. It outperforms standard ensemble models and shows robustness across different parameter settings. FireBoost thus provides a scalable and interpretable solution for real-world binary classification tasks involving high-dimensional data.
{"title":"FireBoost: A new bio-inspired approach for feature selection based on firefly algorithm and optimized XGBoost","authors":"Nafaa Jabeur","doi":"10.1016/j.iswa.2025.200613","DOIUrl":"10.1016/j.iswa.2025.200613","url":null,"abstract":"<div><div>High-dimensional data often reduce model efficiency and interpretability by introducing redundant or irrelevant features. This challenge is especially critical in domains like healthcare and cybersecurity, where both accuracy and explainability are essential. To address this, we introduce FireBoost, a novel hybrid framework that enhances classification performance through effective feature selection and optimized model training. FireBoost integrates the Firefly Algorithm (FFA) for selecting the most informative features with a customized version of XGBoost. The customized learner includes dynamic learning-rate decay, feature-specific binning, and mini-batch gradient updates. Unlike existing hybrid models, FireBoost tightly couples the selection and learning phases, enabling informed, performance-driven feature prioritization. Experiments on the METABRIC and KDD datasets demonstrate that FireBoost consistently reduces feature dimensionality while maintaining or improving classification accuracy and training speed. It outperforms standard ensemble models and shows robustness across different parameter settings. FireBoost thus provides a scalable and interpretable solution for real-world binary classification tasks involving high-dimensional data.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200613"},"PeriodicalIF":4.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.iswa.2025.200618
Huei-Yung Lin , Xi-Sheng Zhang , Syahrul Munir
The operational versatility of Unmanned Aerial Vehicles (UAVs) continues to drive rapid development in the field of UAV. However, a critical challenge for diverse applications — such as search and rescue or warehouse inspection — is exploring the environment autonomously. Traditional exploration approaches are often hindered in practical deployments because they require precise navigation path planning and pre-defined obstacle avoidance rules for each of the testing environments. This paper presents a UAV indoor exploration technique based on deep reinforcement learning (DRL) and intrinsic curiosity. By integrating the reward function based on the extrinsic DRL reward and the intrinsic reward, the UAV is able to autonomously establish exploration strategies and actively encourage the exploration of unknown areas. In addition, NoisyNet is introduced to assess the value of different actions during the early stages of exploration. This proposed method will significantly improve the coverage of the exploration while relying solely on visual input. The effectiveness of our proposed technique is validated through experimental comparisons with several state-of-the-art algorithms. It achieves around at least 15% more exploration coverage at the same flight time compared to others, while achieving at least 20% less exploration distance at the same exploration coverage.
{"title":"UAV exploration for indoor navigation based on deep reinforcement learning and intrinsic curiosity","authors":"Huei-Yung Lin , Xi-Sheng Zhang , Syahrul Munir","doi":"10.1016/j.iswa.2025.200618","DOIUrl":"10.1016/j.iswa.2025.200618","url":null,"abstract":"<div><div>The operational versatility of Unmanned Aerial Vehicles (UAVs) continues to drive rapid development in the field of UAV. However, a critical challenge for diverse applications — such as search and rescue or warehouse inspection — is exploring the environment autonomously. Traditional exploration approaches are often hindered in practical deployments because they require precise navigation path planning and pre-defined obstacle avoidance rules for each of the testing environments. This paper presents a UAV indoor exploration technique based on deep reinforcement learning (DRL) and intrinsic curiosity. By integrating the reward function based on the extrinsic DRL reward and the intrinsic reward, the UAV is able to autonomously establish exploration strategies and actively encourage the exploration of unknown areas. In addition, NoisyNet is introduced to assess the value of different actions during the early stages of exploration. This proposed method will significantly improve the coverage of the exploration while relying solely on visual input. The effectiveness of our proposed technique is validated through experimental comparisons with several state-of-the-art algorithms. It achieves around at least 15% more exploration coverage at the same flight time compared to others, while achieving at least 20% less exploration distance at the same exploration coverage.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200618"},"PeriodicalIF":4.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Detecting plant diseases is a crucial aspect of modern agriculture, playing a key role in maintaining crop health and ensuring sustainable yields. Traditional approaches, though still valuable, often rely on manual inspection or conventional machine learning (ML) techniques, both of which face limitations in scalability and accuracy. The emergence of Vision Transformers (ViTs) marks a significant shift in this landscape by enabling superior modeling of long-range dependencies and offering improved scalability for complex visual tasks. This survey provides a rigorous and structured analysis of impactful studies that employ ViT-based models, along with a comprehensive categorization of existing research. It also offers a quantitative synthesis of reported performance — with accuracies ranging from 75.00% to 100.00% — highlighting clear trends in model effectiveness and identifying consistently high-performing architectures. In addition, this study examines the inductive biases of CNNs and ViTs, which is the first analysis of these architectural priors within an agricultural context. Further contributions include a comparative taxonomy of prior studies, an evaluation of dataset limitations and metric inconsistencies, and a statistical assessment of model efficiency across diverse crop-image sources. Collectively, these efforts clarify the current state of the field, identify critical research gaps, and outline key challenges — such as data diversity, interpretability, computational cost, and field adaptability — that must be addressed to advance the practical deployment of ViT technologies in precision agriculture.
{"title":"Vision transformers in precision agriculture: A comprehensive survey","authors":"Saber Mehdipour , Seyed Abolghasem Mirroshandel , Seyed Amirhossein Tabatabaei","doi":"10.1016/j.iswa.2025.200617","DOIUrl":"10.1016/j.iswa.2025.200617","url":null,"abstract":"<div><div>Detecting plant diseases is a crucial aspect of modern agriculture, playing a key role in maintaining crop health and ensuring sustainable yields. Traditional approaches, though still valuable, often rely on manual inspection or conventional machine learning (ML) techniques, both of which face limitations in scalability and accuracy. The emergence of Vision Transformers (ViTs) marks a significant shift in this landscape by enabling superior modeling of long-range dependencies and offering improved scalability for complex visual tasks. This survey provides a rigorous and structured analysis of impactful studies that employ ViT-based models, along with a comprehensive categorization of existing research. It also offers a quantitative synthesis of reported performance — with accuracies ranging from 75.00% to 100.00% — highlighting clear trends in model effectiveness and identifying consistently high-performing architectures. In addition, this study examines the inductive biases of CNNs and ViTs, which is the first analysis of these architectural priors within an agricultural context. Further contributions include a comparative taxonomy of prior studies, an evaluation of dataset limitations and metric inconsistencies, and a statistical assessment of model efficiency across diverse crop-image sources. Collectively, these efforts clarify the current state of the field, identify critical research gaps, and outline key challenges — such as data diversity, interpretability, computational cost, and field adaptability — that must be addressed to advance the practical deployment of ViT technologies in precision agriculture.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200617"},"PeriodicalIF":4.3,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an open-source Automatic Speech Recognition (ASR) pipeline optimised for disfluent Italian read speech, designed to enhance both transcription accuracy and token boundary precision in low-resource settings. The study aims to address the difficulty that conventional ASR systems face in capturing the temporal irregularities of disfluent reading, which are crucial for psycholinguistic and clinical analyses of fluency. Building upon the WhisperX framework, the proposed system replaces the neural Voice Activity Detection module with an energy-based segmentation algorithm designed to preserve prosodic cues such as pauses and hesitations. A dual-alignment strategy integrates two complementary phoneme-level ASR models to correct onset–offset asymmetries, while a bias-compensation post-processing step mitigates systematic timing errors. Evaluation on the READLET (child read speech) and CLIPS (adult read speech) corpora shows consistent improvements over baseline systems, confirming enhanced robustness in boundary detection and transcription under disfluent conditions. The results demonstrate that the proposed architecture provides a general, language-independent framework for accurate alignment and disfluency-aware ASR. The approach can support downstream analyses of reading fluency and speech planning, contributing to both computational linguistics and clinical speech research.
{"title":"Enhancing token boundary detection in disfluent speech","authors":"Manu Srivastava , Marcello Ferro , Vito Pirrelli , Gianpaolo Coro","doi":"10.1016/j.iswa.2025.200614","DOIUrl":"10.1016/j.iswa.2025.200614","url":null,"abstract":"<div><div>This paper presents an open-source Automatic Speech Recognition (ASR) pipeline optimised for disfluent Italian read speech, designed to enhance both transcription accuracy and token boundary precision in low-resource settings. The study aims to address the difficulty that conventional ASR systems face in capturing the temporal irregularities of disfluent reading, which are crucial for psycholinguistic and clinical analyses of fluency. Building upon the WhisperX framework, the proposed system replaces the neural Voice Activity Detection module with an energy-based segmentation algorithm designed to preserve prosodic cues such as pauses and hesitations. A dual-alignment strategy integrates two complementary phoneme-level ASR models to correct onset–offset asymmetries, while a bias-compensation post-processing step mitigates systematic timing errors. Evaluation on the READLET (child read speech) and CLIPS (adult read speech) corpora shows consistent improvements over baseline systems, confirming enhanced robustness in boundary detection and transcription under disfluent conditions. The results demonstrate that the proposed architecture provides a general, language-independent framework for accurate alignment and disfluency-aware ASR. The approach can support downstream analyses of reading fluency and speech planning, contributing to both computational linguistics and clinical speech research.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200614"},"PeriodicalIF":4.3,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-06DOI: 10.1016/j.iswa.2025.200615
Ilya Kus , Cemal Kocak , Ayse Keles
Facial expression is one of the most important indicators used to convey human emotions. Facial expression recognition is the process of automatically detecting and classifying these expressions by computer systems. Multimodal facial expression recognition aims to perform a more accurate and comprehensive emotion analysis by combining facial expressions with different modalities such as image, speech, Electroencephalogram (EEG), or text. This study systematically reviews research conducted between 2021 and 2025 on the Vision Transformer (ViT) based approaches and Explainable Artificial Intelligence (XAI) techniques in multimodal facial expression recognition, as well as the datasets employed in these studies. The findings indicate that ViT-based models outperform conventional Convolutional Neural Networks (CNNs) by effectively capturing long-range dependencies between spatially distant facial regions, thereby enhancing emotion classification accuracy. However, significant challenges remain, including data privacy risks arising from the collection of multimodal biometric information, data imbalance and inter-modality incompatibility, high computational costs hindering real-time applications, and limited progress in model explainability. Overall, this study highlights that integrating advanced ViT architectures with robust XAI and privacy-preserving techniques can enhance the reliability, transparency, and ethical deployment of multimodal facial expression recognition systems.
{"title":"A systematic review of vision transformer and explainable AI advances in multimodal facial expression recognition","authors":"Ilya Kus , Cemal Kocak , Ayse Keles","doi":"10.1016/j.iswa.2025.200615","DOIUrl":"10.1016/j.iswa.2025.200615","url":null,"abstract":"<div><div>Facial expression is one of the most important indicators used to convey human emotions. Facial expression recognition is the process of automatically detecting and classifying these expressions by computer systems. Multimodal facial expression recognition aims to perform a more accurate and comprehensive emotion analysis by combining facial expressions with different modalities such as image, speech, Electroencephalogram (EEG), or text. This study systematically reviews research conducted between 2021 and 2025 on the Vision Transformer (ViT) based approaches and Explainable Artificial Intelligence (XAI) techniques in multimodal facial expression recognition, as well as the datasets employed in these studies. The findings indicate that ViT-based models outperform conventional Convolutional Neural Networks (CNNs) by effectively capturing long-range dependencies between spatially distant facial regions, thereby enhancing emotion classification accuracy. However, significant challenges remain, including data privacy risks arising from the collection of multimodal biometric information, data imbalance and inter-modality incompatibility, high computational costs hindering real-time applications, and limited progress in model explainability. Overall, this study highlights that integrating advanced ViT architectures with robust XAI and privacy-preserving techniques can enhance the reliability, transparency, and ethical deployment of multimodal facial expression recognition systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200615"},"PeriodicalIF":4.3,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand’s rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired t-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.
{"title":"AL-ViT: Label-efficient Robusta coffee-bean defect detection in Thailand using active learning vision transformers","authors":"Sirawich Vachmanus , Wimolsiri Pridasawas , Worapan Kusakunniran , Kitti Thamrongaphichartkul , Noppanan Phinklao","doi":"10.1016/j.iswa.2025.200612","DOIUrl":"10.1016/j.iswa.2025.200612","url":null,"abstract":"<div><div>In major training and export markets, the coffee bean grading process still relies heavily on manual labor to sort individual beans from large harvest volumes. This labor-intensive task is time-consuming, costly, and prone to human error, especially within Thailand’s rapidly expanding Robusta coffee sector. This study introduces AL–ViT, an end-to-end Active-Learning Vision Transformer framework that operationalizes active learning and transformer-based feature extraction within a single, production-oriented pipeline. The framework integrates a ViT-Base/16 backbone with seven active learning (AL) query strategies, random sampling, entropy-based selection, Bayesian Active Learning by Disagreement (BALD), Batch Active Learning by Diverse Gradient Embeddings (BADGE), Core-Set diversity sampling, ensemble disagreement, and a novel hybrid uncertainty–diversity strategy designed to balance informativeness and representativeness during sample acquisition. A high-resolution dataset of 2098 Robusta coffee bean images was collected under controlled-lighting conditions aligned with grading-machine setups, with only 5 % initially labeled and the remainder forming the AL pool. Across five random seeds, the hybrid strategy without MixUp augmentation achieved 97.1 % accuracy and an F1bad of 0.956 using just 850 labels (41 % of the dataset), within 0.3 percentage points of full supervision. Operational reliability, defined as 95 % accuracy, consistent with prior inspection benchmarks, was reached with only 407 labels, reflecting a 75 % reduction in annotation. Entropy sampling showed the fastest early-stage gains, whereas BADGE lagged by >1 pp; Core-Set and Ensemble provided moderate but stable results. Augmentation and calibration analyses indicated that explicit methods (MixUp, CutMix, RandAugment) offered no further benefit, with the hybrid pipeline already achieving well-calibrated probabilities. Statistical validation via paired <em>t</em>-tests, effect sizes, and bootstrap CIs confirmed consistent improvements of uncertainty-driven strategies over random sampling. Overall, the proposed AL–ViT framework establishes a label-efficient and practically deployable approach for agricultural quality control, achieving near-supervised accuracy at a fraction of the labeling cost.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200612"},"PeriodicalIF":4.3,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145684888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.iswa.2025.200609
Floor Rademaker , Faizan Ahmed , Marcos R. Machado
The digitalization of asset management within the architecture, engineering and construction (AEC) sector is in need of effective methods for the automatic classification of documents. This study focuses on the development and evaluation of multimodal document classification models, utilizing visual, textual, and layout-related document information. We examine various state-of-the-art machine learning models and combine them through an iterative development process. The performance of these models is evaluated on two different AEC-document datasets. The results demonstrate that each of the modalities is useful in classifying the documents, as well as the integration of the different information types. This study contributes by applying AI techniques, specifically document classification in the AEC sector, setting the initial step to automating information extraction and processing for Intelligent Asset Management, and lastly, by combining and comparing multimodal state-of-the-art classification models on real-life datasets.
{"title":"Multi-modal document classification in AEC asset management","authors":"Floor Rademaker , Faizan Ahmed , Marcos R. Machado","doi":"10.1016/j.iswa.2025.200609","DOIUrl":"10.1016/j.iswa.2025.200609","url":null,"abstract":"<div><div>The digitalization of asset management within the architecture, engineering and construction (AEC) sector is in need of effective methods for the automatic classification of documents. This study focuses on the development and evaluation of multimodal document classification models, utilizing visual, textual, and layout-related document information. We examine various state-of-the-art machine learning models and combine them through an iterative development process. The performance of these models is evaluated on two different AEC-document datasets. The results demonstrate that each of the modalities is useful in classifying the documents, as well as the integration of the different information types. This study contributes by applying AI techniques, specifically document classification in the AEC sector, setting the initial step to automating information extraction and processing for Intelligent Asset Management, and lastly, by combining and comparing multimodal state-of-the-art classification models on real-life datasets.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200609"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145555492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.iswa.2025.200608
Miguel De la Cruz Cabello , Tiago Prince Sales , Marcos R. Machado
Modern IT systems generate large volumes of log data that challenge timely and effective anomaly detection. Traditional methods often require intensive feature engineering and struggle to adapt to dynamic operational environments. This Systematic Literature Review (SLR) analyzes how Artificial Intelligence for IT Operations (AIOps) benefits from advanced language models, emphasizing Large Language Models (LLMs) for more effective log anomaly detection. By comparing state-of-art frameworks with LLM-driven methods, this study reveals that prompt engineering – the practice of designing and refining inputs to AI models to produce accurate and useful outputs – and Retrieval Augmented Generation (RAG) boost accuracy and interpretability without extensive fine-tuning. Experimental findings demonstrate that LLM-based approaches significantly outperform traditional methods across evaluation metrics that include F1-score, precision, and recall. Furthermore, the integration of LLMs with RAG techniques has shown a strong adaptability to changing environments. The applicability of these methods also extends to the military industry. Consequently, the development of specialized LLM systems with RAG tailored for the military industry represents a promising research direction to improve operational effectiveness and responsiveness of defense systems.
{"title":"AIOps for log anomaly detection in the era of LLMs: A systematic literature review","authors":"Miguel De la Cruz Cabello , Tiago Prince Sales , Marcos R. Machado","doi":"10.1016/j.iswa.2025.200608","DOIUrl":"10.1016/j.iswa.2025.200608","url":null,"abstract":"<div><div>Modern IT systems generate large volumes of log data that challenge timely and effective anomaly detection. Traditional methods often require intensive feature engineering and struggle to adapt to dynamic operational environments. This Systematic Literature Review (SLR) analyzes how Artificial Intelligence for IT Operations (AIOps) benefits from advanced language models, emphasizing Large Language Models (LLMs) for more effective log anomaly detection. By comparing state-of-art frameworks with LLM-driven methods, this study reveals that prompt engineering – the practice of designing and refining inputs to AI models to produce accurate and useful outputs – and Retrieval Augmented Generation (RAG) boost accuracy and interpretability without extensive fine-tuning. Experimental findings demonstrate that LLM-based approaches significantly outperform traditional methods across evaluation metrics that include F1-score, precision, and recall. Furthermore, the integration of LLMs with RAG techniques has shown a strong adaptability to changing environments. The applicability of these methods also extends to the military industry. Consequently, the development of specialized LLM systems with RAG tailored for the military industry represents a promising research direction to improve operational effectiveness and responsiveness of defense systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"28 ","pages":"Article 200608"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-19DOI: 10.1016/j.iswa.2025.200611
Alfredo Baione , Giuseppe Rizzo , Luca Barco , Angelica Urbanelli , Luigi Di Biasi , Genoveffa Tortora
Generative art is a challenging area of research in deep generative modeling. Exploring AI’s role in human–machine co-creative processes requires understanding machine learning’s potential in the arts. Building on this premise, this paper presents Musipainter, a cross-modal generative framework adapted to create artistic images that are historically and stylistically aligned with 30-second musical inputs, with a focus on creative and semantic coherence. To support this goal, we introduce Museart, a dataset designed explicitly for this research, and GIILS, a creativity-oriented metric that enables us to assess both artistic-semantic consistency and diversity in the generated outputs. The results indicate that Musipainter, supported by the Museart dataset and the exploratory GIILS metric, can offer a foundation for further research on AI’s role in artistic generation, while also highlighting the need for systematic validation and future refinements.
{"title":"Musipainter: A music-conditioned generative architecture for artistic image synthesis","authors":"Alfredo Baione , Giuseppe Rizzo , Luca Barco , Angelica Urbanelli , Luigi Di Biasi , Genoveffa Tortora","doi":"10.1016/j.iswa.2025.200611","DOIUrl":"10.1016/j.iswa.2025.200611","url":null,"abstract":"<div><div>Generative art is a challenging area of research in deep generative modeling. Exploring AI’s role in human–machine co-creative processes requires understanding machine learning’s potential in the arts. Building on this premise, this paper presents Musipainter, a cross-modal generative framework adapted to create artistic images that are historically and stylistically aligned with 30-second musical inputs, with a focus on creative and semantic coherence. To support this goal, we introduce Museart, a dataset designed explicitly for this research, and GIILS, a creativity-oriented metric that enables us to assess both artistic-semantic consistency and diversity in the generated outputs. The results indicate that Musipainter, supported by the Museart dataset and the exploratory GIILS metric, can offer a foundation for further research on AI’s role in artistic generation, while also highlighting the need for systematic validation and future refinements.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200611"},"PeriodicalIF":4.3,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}