Recent advancements in Large Language Models (LLMs) and Generative Artificial Intelligence (GenAI) have revolutionised software engineering (SE), augmenting practitioners across the SE lifecycle. In this paper, we focus on the application of GenAI within data analytics—considered a subdomain of SE—to address the growing need for reliable, user-friendly tools that bridge the gap between human expertise and automated analytical processes. In our work, we transform a conventional API-based analytics platform into a set of tools that can be used by AI agents and formulate a process to facilitate the communication between the data analyst, the agents and the platform. The result is a chat-based interface that allows analysts to query and execute analytical workflows using natural language, thereby reducing cognitive overhead and technical barriers. To validate our approach, we instantiated the proposed framework with open-source models and achieved a mean overall score increase of 7.2 % compared to other baselines. Complementary user-study data demonstrate that the chat-based analytics interface yielded superior task efficiency and higher user preference scores compared to the traditional form-based baseline.
{"title":"Generative AI for autonomous data analytics","authors":"Mattheos Fikardos , Katerina Lepenioti , Alexandros Bousdekis , Dimitris Apostolou , Gregoris Mentzas","doi":"10.1016/j.iswa.2026.200626","DOIUrl":"10.1016/j.iswa.2026.200626","url":null,"abstract":"<div><div>Recent advancements in Large Language Models (LLMs) and Generative Artificial Intelligence (GenAI) have revolutionised software engineering (SE), augmenting practitioners across the SE lifecycle. In this paper, we focus on the application of GenAI within data analytics—considered a subdomain of SE—to address the growing need for reliable, user-friendly tools that bridge the gap between human expertise and automated analytical processes. In our work, we transform a conventional API-based analytics platform into a set of tools that can be used by AI agents and formulate a process to facilitate the communication between the data analyst, the agents and the platform. The result is a chat-based interface that allows analysts to query and execute analytical workflows using natural language, thereby reducing cognitive overhead and technical barriers. To validate our approach, we instantiated the proposed framework with open-source models and achieved a mean overall score increase of 7.2 % compared to other baselines. Complementary user-study data demonstrate that the chat-based analytics interface yielded superior task efficiency and higher user preference scores compared to the traditional form-based baseline.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200626"},"PeriodicalIF":4.3,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Skeleton-based gait recognition has significantly improved due to the advent of graph convolutional networks (GCNs). Nevertheless, the classical ST-GCN has a key drawback: limited receptive fields fail to learn the global correlations of joints, restricting its ability to extract global dependencies effectively. To address this, we present the GSCTN method, a GCN and self-attention contemporary network with temporal convolution. This method combines GCN with a self-attention mechanism using a learnable weighted fusion. By combining local joint details from GCN with the larger context from self-attention, GSCTN creates a strong representation of skeleton movements. Our approach uses decoupled self-attention (DSA) techniques that fragment the tightly coupled (TiC) SA module into two learnable components, unary and pairwise SA, to model joint relationships separately. The unary SA shows an extensive relationship between the single key joint and all additional query joints. The paired SA captures the local gait features from each pair of body joints. We also present a Depthwise Multi-scale Temporal Convolutional Network (DMS-TCN) that smoothly captures the temporal nature of joint movements. DMS-TCN efficiently handles both short-term and long-term motion patterns. To boost the model’s ability to converge spatial and temporal joints dynamically, we applied Global Aware Attention (GAA) to the GSCTN module. We tested our method on the OUMVLP-Pose, CASIA-B, and GREW datasets. The suggested method exhibits remarkable accuracy on widely used CASIA-B datasets, with 97.9% for normal walking, 94.8% for carrying a bag, and 91.91% for clothing conditions. Meanwhile, the OUMVLP-Pose and GREW datasets exhibit a rank-1 accuracy of 93.5% and 75.7%, respectively. Our experimental results demonstrate that the proposed model is a holistic approach for gait recognition by utilizing GCN, DSA, and GAA with DMS-TCN to capture both inter-domain and spatial aspects of human locomotion.
{"title":"A GCN and Graph Self-Attention Contemporary Network with Temporal Depthwise Convolutions for Gait Recognition","authors":"Md. Khaliluzzaman , Kaushik Deb , Pranab Kumar Dhar , Tetsuya Shimamura","doi":"10.1016/j.iswa.2025.200625","DOIUrl":"10.1016/j.iswa.2025.200625","url":null,"abstract":"<div><div>Skeleton-based gait recognition has significantly improved due to the advent of graph convolutional networks (GCNs). Nevertheless, the classical ST-GCN has a key drawback: limited receptive fields fail to learn the global correlations of joints, restricting its ability to extract global dependencies effectively. To address this, we present the GSCTN method, a GCN and self-attention contemporary network with temporal convolution. This method combines GCN with a self-attention mechanism using a learnable weighted fusion. By combining local joint details from GCN with the larger context from self-attention, GSCTN creates a strong representation of skeleton movements. Our approach uses decoupled self-attention (DSA) techniques that fragment the tightly coupled (TiC) SA module into two learnable components, unary and pairwise SA, to model joint relationships separately. The unary SA shows an extensive relationship between the single key joint and all additional query joints. The paired SA captures the local gait features from each pair of body joints. We also present a Depthwise Multi-scale Temporal Convolutional Network (DMS-TCN) that smoothly captures the temporal nature of joint movements. DMS-TCN efficiently handles both short-term and long-term motion patterns. To boost the model’s ability to converge spatial and temporal joints dynamically, we applied Global Aware Attention (GAA) to the GSCTN module. We tested our method on the OUMVLP-Pose, CASIA-B, and GREW datasets. The suggested method exhibits remarkable accuracy on widely used CASIA-B datasets, with 97.9% for normal walking, 94.8% for carrying a bag, and 91.91% for clothing conditions. Meanwhile, the OUMVLP-Pose and GREW datasets exhibit a rank-1 accuracy of 93.5% and 75.7%, respectively. Our experimental results demonstrate that the proposed model is a holistic approach for gait recognition by utilizing GCN, DSA, and GAA with DMS-TCN to capture both inter-domain and spatial aspects of human locomotion.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200625"},"PeriodicalIF":4.3,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Internet of Things is an enormous network of interrelated devices that makes intelligent interaction and high-level control possible in various environments, such as smart homes, smart cities, and industry, by collecting, processing, and transferring data. The majority of the low-power devices within the network utilize limited sources of energy, such as batteries, and hence energy management is a critical factor in the design and operation of the systems. Current methods, such as reinforcement and evolutionary approaches, have at times been found to provide some enhancements but lacked extensive implementation over broad systems due to computational complexity as well as their inability to adapt to changing environmental settings. The growing number of IoT devices presents challenges in energy management, making it crucial to develop accurate prediction models. This research aims to address this challenge by proposing a novel solution using Long Short-Term Memory (LSTM) networks for energy consumption forecasting. This work suggests an optimal energy usage management model based on Long Short-Term Memory networks. The model collects historical energy usage, activity scheduling, and environmental factors such as temperature and humidity. Following the preprocessing, which includes noise removal and normalisation, it predicts future energy consumption. Scheduling data and the analysis and processing of environmental conditions are done using the short-term memory, while the long-term memory helps the model identify more complex patterns in the energy consumption over time to make more accurate predictions. Based on this prediction, smart policies are made for going to sleep and waking up the devices, so that unnecessary devices are put into sleep mode and only woken up when needed. Adaptive learning algorithms also assist in adjusting to environmental conditions. Results of experiments show that the proposed method can save energy up to 58% and increase device lifetime by 30%, while the prediction of energy consumption has an accuracy of 95%.
{"title":"Optimisation of energy management in IoT devices using LSTM models: Energy consumption prediction with sleep-wake scheduling control","authors":"Nahideh DerakhshanFard, Asra Rajabi Bavil Olyaei, Fahimeh RashidJafari","doi":"10.1016/j.iswa.2025.200624","DOIUrl":"10.1016/j.iswa.2025.200624","url":null,"abstract":"<div><div>The Internet of Things is an enormous network of interrelated devices that makes intelligent interaction and high-level control possible in various environments, such as smart homes, smart cities, and industry, by collecting, processing, and transferring data. The majority of the low-power devices within the network utilize limited sources of energy, such as batteries, and hence energy management is a critical factor in the design and operation of the systems. Current methods, such as reinforcement and evolutionary approaches, have at times been found to provide some enhancements but lacked extensive implementation over broad systems due to computational complexity as well as their inability to adapt to changing environmental settings. The growing number of IoT devices presents challenges in energy management, making it crucial to develop accurate prediction models. This research aims to address this challenge by proposing a novel solution using Long Short-Term Memory (LSTM) networks for energy consumption forecasting. This work suggests an optimal energy usage management model based on Long Short-Term Memory networks. The model collects historical energy usage, activity scheduling, and environmental factors such as temperature and humidity. Following the preprocessing, which includes noise removal and normalisation, it predicts future energy consumption. Scheduling data and the analysis and processing of environmental conditions are done using the short-term memory, while the long-term memory helps the model identify more complex patterns in the energy consumption over time to make more accurate predictions. Based on this prediction, smart policies are made for going to sleep and waking up the devices, so that unnecessary devices are put into sleep mode and only woken up when needed. Adaptive learning algorithms also assist in adjusting to environmental conditions. Results of experiments show that the proposed method can save energy up to 58% and increase device lifetime by 30%, while the prediction of energy consumption has an accuracy of 95%.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200624"},"PeriodicalIF":4.3,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1016/j.iswa.2025.200623
Deepa D. Shankar , Adresya Suresh Azhakath
Information technology and digital media have significantly improved in recent years, facilitating the internet as an effective channel for communication and data transmission. Nevertheless, the rapid advancement of technology has rendered data a source of mismanagement and vulnerable to exploitation. Consequently, technologies such as data concealment were devised to mitigate exploitation. Steganalysis is a technique for data concealment. Various processes, including breaches of information security, can be mitigated by steganalysis. This work aims to encapsulate the notion of blind statistical steganalysis within image processing methodologies and ascertain the accuracy percentage of secure transmission. This work discusses the extraction of features that indicate a change during embedding. A specific percentage of text is integrated into a JPEG image of a predetermined size. The text embedding utilizes various steganographic techniques in both the spatial and transform domains. The steganographic techniques include LSB Matching, LSB Replacement, Pixel Value Differencing, and F5. Due to the blind nature of steganalysis, there are no cover images available for comparative analysis. An estimation of the cover image is obtained by a calibration concept. After embedding, the images are partitioned into 8 × 8 blocks, from which certain features are extraction for classification. This paper utilizes interblock dependent features and intrablock dependent features. Both dependencies are regarded as means to mitigate the shortcomings of each individually. The approach of machine learning is employed using a classifier to distinguish between the stego image and the cover image. This research does a comparative investigation of the classifiers SVM and SVM-PSO. Comparative research is frequently performed both with and without use cross-validation methodology. The study incorporates the concept of cross-validation for comparative analysis. There are six unique kernel functions and four sample methods for grouping. The embedding ratio employed in this investigation is 50%.
{"title":"Blind steganalysis-driven secure transmission validation using feature-based classification in JPEG images","authors":"Deepa D. Shankar , Adresya Suresh Azhakath","doi":"10.1016/j.iswa.2025.200623","DOIUrl":"10.1016/j.iswa.2025.200623","url":null,"abstract":"<div><div>Information technology and digital media have significantly improved in recent years, facilitating the internet as an effective channel for communication and data transmission. Nevertheless, the rapid advancement of technology has rendered data a source of mismanagement and vulnerable to exploitation. Consequently, technologies such as data concealment were devised to mitigate exploitation. Steganalysis is a technique for data concealment. Various processes, including breaches of information security, can be mitigated by steganalysis. This work aims to encapsulate the notion of blind statistical steganalysis within image processing methodologies and ascertain the accuracy percentage of secure transmission. This work discusses the extraction of features that indicate a change during embedding. A specific percentage of text is integrated into a JPEG image of a predetermined size. The text embedding utilizes various steganographic techniques in both the spatial and transform domains. The steganographic techniques include LSB Matching, LSB Replacement, Pixel Value Differencing, and F5. Due to the blind nature of steganalysis, there are no cover images available for comparative analysis. An estimation of the cover image is obtained by a calibration concept. After embedding, the images are partitioned into 8 × 8 blocks, from which certain features are extraction for classification. This paper utilizes interblock dependent features and intrablock dependent features. Both dependencies are regarded as means to mitigate the shortcomings of each individually. The approach of machine learning is employed using a classifier to distinguish between the stego image and the cover image. This research does a comparative investigation of the classifiers SVM and SVM-PSO. Comparative research is frequently performed both with and without use cross-validation methodology. The study incorporates the concept of cross-validation for comparative analysis. There are six unique kernel functions and four sample methods for grouping. The embedding ratio employed in this investigation is 50%.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200623"},"PeriodicalIF":4.3,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.iswa.2025.200620
Elaheh Golzardi , Alireza Abdollahpouri
As social networks are constantly changing, decision-making in large groups becomes much more challenging. People form new connections, lose old ones, shift their preferences, and change how much they trust others (Qin, Li, Liang & Pedrycz, 2026). Methods that work well in stable settings often fail to keep pace here, especially when both quick adaptation and the ability to handle scale are essential (Ding et al., 2025). Our approach, called GCD-GNN (Group Consensus Decision using Graph Neural Networks), builds on graph neural networks to track these ongoing changes in structure and preferences. It processes live updates on trust levels, social ties, and preference similarities, then adjusts influence weights in real time to keep the consensus process stable. In experiments using both synthetic and real-world datasets, GCD-GNN delivered higher agreement levels, improved accuracy and precision, and faster execution compared with leading alternatives. These results point to a framework that is not only scalable, but also able to adapt effectiveness in complex, large-scale decision-making environments.
随着社交网络的不断变化,大群体的决策变得更具挑战性。人们建立新的联系,失去旧的联系,改变他们的偏好,并改变他们对他人的信任程度(秦,李,梁,Pedrycz, 2026)。在稳定环境中工作良好的方法往往无法跟上这里的步伐,特别是当快速适应和处理规模的能力都是必不可少的时候(Ding et al., 2025)。我们的方法,称为GCD-GNN(使用图神经网络的群体共识决策),建立在图神经网络的基础上,跟踪这些结构和偏好的持续变化。它处理信任水平、社会关系和偏好相似性的实时更新,然后实时调整影响权重,以保持共识过程的稳定。在使用合成数据集和真实数据集的实验中,与领先的替代方案相比,GCD-GNN提供了更高的一致性水平,提高了准确性和精度,并且执行速度更快。这些结果表明,该框架不仅具有可扩展性,而且能够适应复杂的大规模决策环境的有效性。
{"title":"Scalable and Adaptive Large-Scale Group Decision Making in Dynamic Social Networks via Graph Convolutional Neural Networks#","authors":"Elaheh Golzardi , Alireza Abdollahpouri","doi":"10.1016/j.iswa.2025.200620","DOIUrl":"10.1016/j.iswa.2025.200620","url":null,"abstract":"<div><div>As social networks are constantly changing, decision-making in large groups becomes much more challenging. People form new connections, lose old ones, shift their preferences, and change how much they trust others (<span><span>Qin, Li, Liang & Pedrycz, 2026</span></span>). Methods that work well in stable settings often fail to keep pace here, especially when both quick adaptation and the ability to handle scale are essential (<span><span>Ding et al., 2025</span></span>). Our approach, called GCD-GNN (Group Consensus Decision using Graph Neural Networks), builds on graph neural networks to track these ongoing changes in structure and preferences. It processes live updates on trust levels, social ties, and preference similarities, then adjusts influence weights in real time to keep the consensus process stable. In experiments using both synthetic and real-world datasets, GCD-GNN delivered higher agreement levels, improved accuracy and precision, and faster execution compared with leading alternatives. These results point to a framework that is not only scalable, but also able to adapt effectiveness in complex, large-scale decision-making environments.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200620"},"PeriodicalIF":4.3,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1016/j.iswa.2025.200621
André Artelt , Stelios G. Vrachimis , Demetrios G. Eliades , Ulrike Kuhl , Barbara Hammer , Marios M. Polycarpou
The increasing penetration of information and communication technologies in the design, monitoring, and control of water systems enables the use of algorithms for detecting and identifying unanticipated events (such as leakages or water contamination) using sensor measurements. However, data-driven methodologies do not always give accurate results and are often not trusted by operators, who may prefer to use their engineering judgment and experience to deal with such events.
In this work, we propose a framework for interpretable event diagnosis — an approach that assists the operators in associating the results of algorithmic event diagnosis methodologies with their own intuition and experience. This is achieved by providing contrasting (i.e., counterfactual) explanations of the results provided by fault diagnosis algorithms; their aim is to improve the understanding of the algorithm’s inner workings by the operators, thus enabling them to take a more informed decision by combining the results with their personal experiences. Specifically, we propose counterfactual event fingerprints, a representation of the difference between the current event diagnosis and the closest alternative explanation, which can be presented in a graphical way. The proposed methodology is applied and evaluated on a realistic use case using the L-Town benchmark.
{"title":"Interpretable event diagnosis in water distribution networks","authors":"André Artelt , Stelios G. Vrachimis , Demetrios G. Eliades , Ulrike Kuhl , Barbara Hammer , Marios M. Polycarpou","doi":"10.1016/j.iswa.2025.200621","DOIUrl":"10.1016/j.iswa.2025.200621","url":null,"abstract":"<div><div>The increasing penetration of information and communication technologies in the design, monitoring, and control of water systems enables the use of algorithms for detecting and identifying unanticipated events (such as leakages or water contamination) using sensor measurements. However, data-driven methodologies do not always give accurate results and are often not trusted by operators, who may prefer to use their engineering judgment and experience to deal with such events.</div><div>In this work, we propose a framework for interpretable event diagnosis — an approach that assists the operators in associating the results of algorithmic event diagnosis methodologies with their own intuition and experience. This is achieved by providing contrasting (i.e., counterfactual) explanations of the results provided by fault diagnosis algorithms; their aim is to improve the understanding of the algorithm’s inner workings by the operators, thus enabling them to take a more informed decision by combining the results with their personal experiences. Specifically, we propose <em>counterfactual event fingerprints</em>, a representation of the difference between the current event diagnosis and the closest alternative explanation, which can be presented in a graphical way. The proposed methodology is applied and evaluated on a realistic use case using the L-Town benchmark.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200621"},"PeriodicalIF":4.3,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.iswa.2025.200613
Nafaa Jabeur
High-dimensional data often reduce model efficiency and interpretability by introducing redundant or irrelevant features. This challenge is especially critical in domains like healthcare and cybersecurity, where both accuracy and explainability are essential. To address this, we introduce FireBoost, a novel hybrid framework that enhances classification performance through effective feature selection and optimized model training. FireBoost integrates the Firefly Algorithm (FFA) for selecting the most informative features with a customized version of XGBoost. The customized learner includes dynamic learning-rate decay, feature-specific binning, and mini-batch gradient updates. Unlike existing hybrid models, FireBoost tightly couples the selection and learning phases, enabling informed, performance-driven feature prioritization. Experiments on the METABRIC and KDD datasets demonstrate that FireBoost consistently reduces feature dimensionality while maintaining or improving classification accuracy and training speed. It outperforms standard ensemble models and shows robustness across different parameter settings. FireBoost thus provides a scalable and interpretable solution for real-world binary classification tasks involving high-dimensional data.
{"title":"FireBoost: A new bio-inspired approach for feature selection based on firefly algorithm and optimized XGBoost","authors":"Nafaa Jabeur","doi":"10.1016/j.iswa.2025.200613","DOIUrl":"10.1016/j.iswa.2025.200613","url":null,"abstract":"<div><div>High-dimensional data often reduce model efficiency and interpretability by introducing redundant or irrelevant features. This challenge is especially critical in domains like healthcare and cybersecurity, where both accuracy and explainability are essential. To address this, we introduce FireBoost, a novel hybrid framework that enhances classification performance through effective feature selection and optimized model training. FireBoost integrates the Firefly Algorithm (FFA) for selecting the most informative features with a customized version of XGBoost. The customized learner includes dynamic learning-rate decay, feature-specific binning, and mini-batch gradient updates. Unlike existing hybrid models, FireBoost tightly couples the selection and learning phases, enabling informed, performance-driven feature prioritization. Experiments on the METABRIC and KDD datasets demonstrate that FireBoost consistently reduces feature dimensionality while maintaining or improving classification accuracy and training speed. It outperforms standard ensemble models and shows robustness across different parameter settings. FireBoost thus provides a scalable and interpretable solution for real-world binary classification tasks involving high-dimensional data.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200613"},"PeriodicalIF":4.3,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.iswa.2025.200618
Huei-Yung Lin , Xi-Sheng Zhang , Syahrul Munir
The operational versatility of Unmanned Aerial Vehicles (UAVs) continues to drive rapid development in the field of UAV. However, a critical challenge for diverse applications — such as search and rescue or warehouse inspection — is exploring the environment autonomously. Traditional exploration approaches are often hindered in practical deployments because they require precise navigation path planning and pre-defined obstacle avoidance rules for each of the testing environments. This paper presents a UAV indoor exploration technique based on deep reinforcement learning (DRL) and intrinsic curiosity. By integrating the reward function based on the extrinsic DRL reward and the intrinsic reward, the UAV is able to autonomously establish exploration strategies and actively encourage the exploration of unknown areas. In addition, NoisyNet is introduced to assess the value of different actions during the early stages of exploration. This proposed method will significantly improve the coverage of the exploration while relying solely on visual input. The effectiveness of our proposed technique is validated through experimental comparisons with several state-of-the-art algorithms. It achieves around at least 15% more exploration coverage at the same flight time compared to others, while achieving at least 20% less exploration distance at the same exploration coverage.
{"title":"UAV exploration for indoor navigation based on deep reinforcement learning and intrinsic curiosity","authors":"Huei-Yung Lin , Xi-Sheng Zhang , Syahrul Munir","doi":"10.1016/j.iswa.2025.200618","DOIUrl":"10.1016/j.iswa.2025.200618","url":null,"abstract":"<div><div>The operational versatility of Unmanned Aerial Vehicles (UAVs) continues to drive rapid development in the field of UAV. However, a critical challenge for diverse applications — such as search and rescue or warehouse inspection — is exploring the environment autonomously. Traditional exploration approaches are often hindered in practical deployments because they require precise navigation path planning and pre-defined obstacle avoidance rules for each of the testing environments. This paper presents a UAV indoor exploration technique based on deep reinforcement learning (DRL) and intrinsic curiosity. By integrating the reward function based on the extrinsic DRL reward and the intrinsic reward, the UAV is able to autonomously establish exploration strategies and actively encourage the exploration of unknown areas. In addition, NoisyNet is introduced to assess the value of different actions during the early stages of exploration. This proposed method will significantly improve the coverage of the exploration while relying solely on visual input. The effectiveness of our proposed technique is validated through experimental comparisons with several state-of-the-art algorithms. It achieves around at least 15% more exploration coverage at the same flight time compared to others, while achieving at least 20% less exploration distance at the same exploration coverage.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200618"},"PeriodicalIF":4.3,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Detecting plant diseases is a crucial aspect of modern agriculture, playing a key role in maintaining crop health and ensuring sustainable yields. Traditional approaches, though still valuable, often rely on manual inspection or conventional machine learning (ML) techniques, both of which face limitations in scalability and accuracy. The emergence of Vision Transformers (ViTs) marks a significant shift in this landscape by enabling superior modeling of long-range dependencies and offering improved scalability for complex visual tasks. This survey provides a rigorous and structured analysis of impactful studies that employ ViT-based models, along with a comprehensive categorization of existing research. It also offers a quantitative synthesis of reported performance — with accuracies ranging from 75.00% to 100.00% — highlighting clear trends in model effectiveness and identifying consistently high-performing architectures. In addition, this study examines the inductive biases of CNNs and ViTs, which is the first analysis of these architectural priors within an agricultural context. Further contributions include a comparative taxonomy of prior studies, an evaluation of dataset limitations and metric inconsistencies, and a statistical assessment of model efficiency across diverse crop-image sources. Collectively, these efforts clarify the current state of the field, identify critical research gaps, and outline key challenges — such as data diversity, interpretability, computational cost, and field adaptability — that must be addressed to advance the practical deployment of ViT technologies in precision agriculture.
{"title":"Vision transformers in precision agriculture: A comprehensive survey","authors":"Saber Mehdipour , Seyed Abolghasem Mirroshandel , Seyed Amirhossein Tabatabaei","doi":"10.1016/j.iswa.2025.200617","DOIUrl":"10.1016/j.iswa.2025.200617","url":null,"abstract":"<div><div>Detecting plant diseases is a crucial aspect of modern agriculture, playing a key role in maintaining crop health and ensuring sustainable yields. Traditional approaches, though still valuable, often rely on manual inspection or conventional machine learning (ML) techniques, both of which face limitations in scalability and accuracy. The emergence of Vision Transformers (ViTs) marks a significant shift in this landscape by enabling superior modeling of long-range dependencies and offering improved scalability for complex visual tasks. This survey provides a rigorous and structured analysis of impactful studies that employ ViT-based models, along with a comprehensive categorization of existing research. It also offers a quantitative synthesis of reported performance — with accuracies ranging from 75.00% to 100.00% — highlighting clear trends in model effectiveness and identifying consistently high-performing architectures. In addition, this study examines the inductive biases of CNNs and ViTs, which is the first analysis of these architectural priors within an agricultural context. Further contributions include a comparative taxonomy of prior studies, an evaluation of dataset limitations and metric inconsistencies, and a statistical assessment of model efficiency across diverse crop-image sources. Collectively, these efforts clarify the current state of the field, identify critical research gaps, and outline key challenges — such as data diversity, interpretability, computational cost, and field adaptability — that must be addressed to advance the practical deployment of ViT technologies in precision agriculture.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200617"},"PeriodicalIF":4.3,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an open-source Automatic Speech Recognition (ASR) pipeline optimised for disfluent Italian read speech, designed to enhance both transcription accuracy and token boundary precision in low-resource settings. The study aims to address the difficulty that conventional ASR systems face in capturing the temporal irregularities of disfluent reading, which are crucial for psycholinguistic and clinical analyses of fluency. Building upon the WhisperX framework, the proposed system replaces the neural Voice Activity Detection module with an energy-based segmentation algorithm designed to preserve prosodic cues such as pauses and hesitations. A dual-alignment strategy integrates two complementary phoneme-level ASR models to correct onset–offset asymmetries, while a bias-compensation post-processing step mitigates systematic timing errors. Evaluation on the READLET (child read speech) and CLIPS (adult read speech) corpora shows consistent improvements over baseline systems, confirming enhanced robustness in boundary detection and transcription under disfluent conditions. The results demonstrate that the proposed architecture provides a general, language-independent framework for accurate alignment and disfluency-aware ASR. The approach can support downstream analyses of reading fluency and speech planning, contributing to both computational linguistics and clinical speech research.
{"title":"Enhancing token boundary detection in disfluent speech","authors":"Manu Srivastava , Marcello Ferro , Vito Pirrelli , Gianpaolo Coro","doi":"10.1016/j.iswa.2025.200614","DOIUrl":"10.1016/j.iswa.2025.200614","url":null,"abstract":"<div><div>This paper presents an open-source Automatic Speech Recognition (ASR) pipeline optimised for disfluent Italian read speech, designed to enhance both transcription accuracy and token boundary precision in low-resource settings. The study aims to address the difficulty that conventional ASR systems face in capturing the temporal irregularities of disfluent reading, which are crucial for psycholinguistic and clinical analyses of fluency. Building upon the WhisperX framework, the proposed system replaces the neural Voice Activity Detection module with an energy-based segmentation algorithm designed to preserve prosodic cues such as pauses and hesitations. A dual-alignment strategy integrates two complementary phoneme-level ASR models to correct onset–offset asymmetries, while a bias-compensation post-processing step mitigates systematic timing errors. Evaluation on the READLET (child read speech) and CLIPS (adult read speech) corpora shows consistent improvements over baseline systems, confirming enhanced robustness in boundary detection and transcription under disfluent conditions. The results demonstrate that the proposed architecture provides a general, language-independent framework for accurate alignment and disfluency-aware ASR. The approach can support downstream analyses of reading fluency and speech planning, contributing to both computational linguistics and clinical speech research.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200614"},"PeriodicalIF":4.3,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}