Pub Date : 2026-02-09DOI: 10.1016/j.iswa.2026.200636
Laslo Dinges , Marc-André Fiedler , Ayoub Al-Hamadi , Dmitri Bershadskyy , Joachim Weimann
Automated deception detection using only video and audio modalities remains a challenging problem in computer vision. Deceptive behavior is highly context dependent, shaped by factors such as the specific scenario, cultural background, and the associated stakes. To explore the potential of future deception detection tools in online interactions, such as virtual sales meetings, we present a new high-quality low-stakes dataset specifically tailored for this task. It currently includes 500 annotated video samples, with a planned extension to around 1000 before public release. Participants engage in incentivized online interactions, in which sellers attempt to persuade buyers to choose a specific card, creating naturally motivated deceptive and truthful behavior. We evaluate a variety of visual and audio based feature sets, such as gaze, head pose, facial Action-Units (AUs), and prosodic features, on our dataset as well as on a high-stakes in-the-wild deception dataset. Our results show that OpenFace based AU features perform best on our clean and controlled recordings, while CNN based AU predictors outperform others in the more challenging dataset with lower video quality and unstable head pose. Multimodal approaches slightly outperform the best unimodal features in both cases. We will make the dataset freely available to support future research in automated deception detection.
{"title":"Buyer–Seller-Deception-Game Dataset: A new comprehensive dataset for facial expression based deception detection in economic contexts","authors":"Laslo Dinges , Marc-André Fiedler , Ayoub Al-Hamadi , Dmitri Bershadskyy , Joachim Weimann","doi":"10.1016/j.iswa.2026.200636","DOIUrl":"10.1016/j.iswa.2026.200636","url":null,"abstract":"<div><div>Automated deception detection using only video and audio modalities remains a challenging problem in computer vision. Deceptive behavior is highly context dependent, shaped by factors such as the specific scenario, cultural background, and the associated stakes. To explore the potential of future deception detection tools in online interactions, such as virtual sales meetings, we present a new high-quality low-stakes dataset specifically tailored for this task. It currently includes 500 annotated video samples, with a planned extension to around 1000 before public release. Participants engage in incentivized online interactions, in which sellers attempt to persuade buyers to choose a specific card, creating naturally motivated deceptive and truthful behavior. We evaluate a variety of visual and audio based feature sets, such as gaze, head pose, facial Action-Units (AUs), and prosodic features, on our dataset as well as on a high-stakes in-the-wild deception dataset. Our results show that OpenFace based AU features perform best on our clean and controlled recordings, while CNN based AU predictors outperform others in the more challenging dataset with lower video quality and unstable head pose. Multimodal approaches slightly outperform the best unimodal features in both cases. We will make the dataset freely available to support future research in automated deception detection.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200636"},"PeriodicalIF":4.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-05DOI: 10.1016/j.iswa.2026.200635
Hoangcong Le, Cheng-Kai Lu, Chen-Chien Hsu
With the vast amount of video data generated daily, researchers have become increasingly interested in extracting meaningful information, particularly for analyzing abnormal events. This growing interest has accelerated progress in video anomaly detection (VAD) as a specialized subfield of computer vision, attracting considerable attention due to its potential applications in real-time scenarios such as elderly care, smart homes, and intelligent surveillance. To provide a comprehensive understanding of this rapidly evolving field, several systematic reviews have been conducted to help new researchers enter the field and assist experienced groups in keeping pace with recent advancements. However, existing surveys lack a focused analysis of how different input data modalities impact the performance of VAD systems, particularly from a privacy-preserving perspective. Understanding the effectiveness of various data modalities and data collection strategies is essential for protecting personal information in computer vision applications. Furthermore, the feasibility of deploying VAD models in real-time Internet of Things (IoT) environments remains underexplored, where low latency, limited resources, and strict privacy requirements are critical considerations. Although edge computing has been increasingly adopted to address these challenges, most studies overlook the deployment of VAD frameworks on resource-constrained devices. Integrating edge-based VAD systems with federated learning algorithms represents a promising direction for enabling privacy-aware and scalable real-world systems. Rather than providing a method-centric summary, this survey reorganizes the VAD literature from a deployment-oriented viewpoint, highlighting how input modality choices fundamentally affect privacy preservation and real-time feasibility on edge-based IoT systems. This work specifically reviews studies published between 2020 and 2025.
{"title":"Video anomaly detection for edge-based IoT systems: A survey of input modalities and real-time applications","authors":"Hoangcong Le, Cheng-Kai Lu, Chen-Chien Hsu","doi":"10.1016/j.iswa.2026.200635","DOIUrl":"10.1016/j.iswa.2026.200635","url":null,"abstract":"<div><div>With the vast amount of video data generated daily, researchers have become increasingly interested in extracting meaningful information, particularly for analyzing abnormal events. This growing interest has accelerated progress in video anomaly detection (VAD) as a specialized subfield of computer vision, attracting considerable attention due to its potential applications in real-time scenarios such as elderly care, smart homes, and intelligent surveillance. To provide a comprehensive understanding of this rapidly evolving field, several systematic reviews have been conducted to help new researchers enter the field and assist experienced groups in keeping pace with recent advancements. However, existing surveys lack a focused analysis of how different input data modalities impact the performance of VAD systems, particularly from a privacy-preserving perspective. Understanding the effectiveness of various data modalities and data collection strategies is essential for protecting personal information in computer vision applications. Furthermore, the feasibility of deploying VAD models in real-time Internet of Things (IoT) environments remains underexplored, where low latency, limited resources, and strict privacy requirements are critical considerations. Although edge computing has been increasingly adopted to address these challenges, most studies overlook the deployment of VAD frameworks on resource-constrained devices. Integrating edge-based VAD systems with federated learning algorithms represents a promising direction for enabling privacy-aware and scalable real-world systems. Rather than providing a method-centric summary, this survey reorganizes the VAD literature from a deployment-oriented viewpoint, highlighting how input modality choices fundamentally affect privacy preservation and real-time feasibility on edge-based IoT systems. This work specifically reviews studies published between 2020 and 2025.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200635"},"PeriodicalIF":4.3,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-04DOI: 10.1016/j.iswa.2026.200632
Simone Bianco, Marco Buzzelli, Gianluigi Ciocca, Flavio Piccoli, Raimondo Schettini
Self-supervised learning has recently gained increasing attention in computer vision, enabling the extraction of rich and general-purpose feature representations without requiring large annotated datasets. In this paper we aim to build a unified approach capable of deploying robust and effective analysis systems, replacing the need for multiple task-specific models trained end-to-end. Rather than introducing new architectures or training strategies, our goal is to systematically assess whether a single frozen self-supervised representation can support heterogeneous food-related tasks under realistic operating conditions. To this end, we performed an extensive analysis of DINOv2 features across multiple benchmark datasets and tasks, including food classification, segmentation, aesthetic assessment, and robustness to image distortions. In addition, we explore its capacity for continual learning by applying it to incremental food classification scenarios. Our findings reveal that DINOv2 features excel in many food-related applications. Their shared representations across tasks reduce the need for training separate models, while their strong generalization, high accuracy, and ability to handle complex multi-task scenarios make them a strong candidate for a unified food recognition approach. Specifically, DINOv2 features match or surpass state-of-the-art supervised methods in several food recognition tasks, while offering a simpler and more unified deployment strategy. Furthermore, they outperform end-to-end models in cross-dataset scenarios by up to +19.4% Top-1 accuracy and exhibits strong resilience to common image distortions by up to +48.0% robustness in Top-1 accuracy percentual difference, ensuring reliable performance in real-world applications. On average across all considered tasks, the DINOv2-based unified evaluation outperforms the state of the art by approximately 2.8% and 5.4%, depending on the chosen model size, while using only 6.2% and 23.9% of the total number of model parameters, respectively.
{"title":"A study on the generalization of DINOv2 features for food recognition tasks: A unified evaluation framework","authors":"Simone Bianco, Marco Buzzelli, Gianluigi Ciocca, Flavio Piccoli, Raimondo Schettini","doi":"10.1016/j.iswa.2026.200632","DOIUrl":"10.1016/j.iswa.2026.200632","url":null,"abstract":"<div><div>Self-supervised learning has recently gained increasing attention in computer vision, enabling the extraction of rich and general-purpose feature representations without requiring large annotated datasets. In this paper we aim to build a unified approach capable of deploying robust and effective analysis systems, replacing the need for multiple task-specific models trained end-to-end. Rather than introducing new architectures or training strategies, our goal is to systematically assess whether a single frozen self-supervised representation can support heterogeneous food-related tasks under realistic operating conditions. To this end, we performed an extensive analysis of DINOv2 features across multiple benchmark datasets and tasks, including food classification, segmentation, aesthetic assessment, and robustness to image distortions. In addition, we explore its capacity for continual learning by applying it to incremental food classification scenarios. Our findings reveal that DINOv2 features excel in many food-related applications. Their shared representations across tasks reduce the need for training separate models, while their strong generalization, high accuracy, and ability to handle complex multi-task scenarios make them a strong candidate for a unified food recognition approach. Specifically, DINOv2 features match or surpass state-of-the-art supervised methods in several food recognition tasks, while offering a simpler and more unified deployment strategy. Furthermore, they outperform end-to-end models in cross-dataset scenarios by up to +19.4% Top-1 accuracy and exhibits strong resilience to common image distortions by up to +48.0% robustness in Top-1 accuracy percentual difference, ensuring reliable performance in real-world applications. On average across all considered tasks, the DINOv2-based unified evaluation outperforms the state of the art by approximately 2.8% and 5.4%, depending on the chosen model size, while using only 6.2% and 23.9% of the total number of model parameters, respectively.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200632"},"PeriodicalIF":4.3,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-31DOI: 10.1016/j.iswa.2026.200631
Hussein A.A. Al-Khamees , Ahmad AL Smadi , Mutasem K. Alsmadi , Abdulrahman A. Alkannad , Ahed Abugabah , Latifa Abdullah Almusfar , Bashair Althani
The rapid evolution of intelligent systems, powered by artificial intelligence and machine learning, has created a fragmented research landscape. While numerous studies exist on specific applications, a holistic synthesis of their architectures, taxonomies, applications, and challenges is absent. This paper will bridge this gap by providing a comprehensive systematic review that integrates these disparate elements. This paper conducts a systematic review of over 100 peer-reviewed scientific publications, following a structured process to identify, analyze, and synthesize the current state of intelligent systems research. The review encompasses a wide range of domains, including healthcare, cybersecurity, data mining, and industrial automation. Our analysis yields a unified taxonomy and clarifies the core architectural components of intelligent systems. We identify and categorize key application domains and demonstrate their transformative impact. The review also synthesizes prevailing challenges, such as data quality, scalability, and ethical concerns, and pinpoints emerging trends, including the rise of multimodal AI and hybrid intelligent systems. To the best of our knowledge, this is the first review to offer a consolidated framework that integrates the architecture, taxonomy, applications, and cross-domain challenges of intelligent systems into a single reference. This work serves as a foundational guide for researchers and practitioners, facilitating future advancements in the development of efficient, scalable, and context-aware intelligent systems.
{"title":"Advancing decision-making: A comprehensive review of intelligent systems, applications, and challenges","authors":"Hussein A.A. Al-Khamees , Ahmad AL Smadi , Mutasem K. Alsmadi , Abdulrahman A. Alkannad , Ahed Abugabah , Latifa Abdullah Almusfar , Bashair Althani","doi":"10.1016/j.iswa.2026.200631","DOIUrl":"10.1016/j.iswa.2026.200631","url":null,"abstract":"<div><div>The rapid evolution of intelligent systems, powered by artificial intelligence and machine learning, has created a fragmented research landscape. While numerous studies exist on specific applications, a holistic synthesis of their architectures, taxonomies, applications, and challenges is absent. This paper will bridge this gap by providing a comprehensive systematic review that integrates these disparate elements. This paper conducts a systematic review of over 100 peer-reviewed scientific publications, following a structured process to identify, analyze, and synthesize the current state of intelligent systems research. The review encompasses a wide range of domains, including healthcare, cybersecurity, data mining, and industrial automation. Our analysis yields a unified taxonomy and clarifies the core architectural components of intelligent systems. We identify and categorize key application domains and demonstrate their transformative impact. The review also synthesizes prevailing challenges, such as data quality, scalability, and ethical concerns, and pinpoints emerging trends, including the rise of multimodal AI and hybrid intelligent systems. To the best of our knowledge, this is the first review to offer a consolidated framework that integrates the architecture, taxonomy, applications, and cross-domain challenges of intelligent systems into a single reference. This work serves as a foundational guide for researchers and practitioners, facilitating future advancements in the development of efficient, scalable, and context-aware intelligent systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200631"},"PeriodicalIF":4.3,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effective prediction of photovoltaic (PV) power generation is essential for enhancing energy management in solar-powered electric vehicles. This study introduces an innovative hybrid forecasting framework that combines Fuzzy C-Means (FCM) clustering, Convolutional Neural Networks (CNN), Wavelet Neural Networks (WNN), the Informer architecture, and the Whale Optimization Algorithm (WOA) to improve prediction accuracy. This approach introduces a condition-aware, end-to-end FCM-CNN-WNN-Informer pipeline tailored for PV dynamics, where: (i) similar-day fuzzy clustering normalizes weather heterogeneity before learning; (ii) wavelet-based multi-scale features are injected into a long-horizon Informer; (iii) a global, cross-module hyperparameter search via Whale Optimization Algorithm (WOA) jointly tunes all stages; (iv) a Generalization Index (GI) is proposed for robust model selection; and (v) Monte-Carlo dropout quantifies predictive uncertainty for practical deployment.
The proposed WOA-FCM-CNN-WNN-Informer model is evaluated on a comprehensive dataset of 70,080 hourly PV power recordings gathered over eight years in Tunisia. Results show superior performance compared to standard deep learning models like LSTM and BiLSTM. The framework reduces Mean Absolute Percentage Error (MAPE) by as much as 98.52% and Root Mean Squared Error (RMSE) by 93.84%, while maintaining a high coefficient of determination () across varying meteorological conditions. These outcomes underscore the model’s robustness and its promise for advancing energy utilization, refining charging strategies, and supporting intelligent route planning in solar-electric transportation systems.
{"title":"WOA-FCM-CNN-WNN-informer: An advanced hybrid deep learning model for ultra-accurate PV power forecasting in electric mobility","authors":"Lazhar Manai , Walid Mchara , Mohamed Abdellatif Khalfa , Monia Raissi , Wissem Dimassi","doi":"10.1016/j.iswa.2026.200630","DOIUrl":"10.1016/j.iswa.2026.200630","url":null,"abstract":"<div><div>Effective prediction of photovoltaic (PV) power generation is essential for enhancing energy management in solar-powered electric vehicles. This study introduces an innovative hybrid forecasting framework that combines Fuzzy C-Means (FCM) clustering, Convolutional Neural Networks (CNN), Wavelet Neural Networks (WNN), the Informer architecture, and the Whale Optimization Algorithm (WOA) to improve prediction accuracy. This approach introduces a condition-aware, end-to-end FCM-CNN-WNN-Informer pipeline tailored for PV dynamics, where: (i) <em>similar-day</em> fuzzy clustering normalizes weather heterogeneity <em>before</em> learning; (ii) wavelet-based multi-scale features are injected into a long-horizon Informer; (iii) a global, cross-module hyperparameter search via Whale Optimization Algorithm (WOA) jointly tunes all stages; (iv) a <em>Generalization Index (GI)</em> is proposed for robust model selection; and (v) Monte-Carlo dropout quantifies predictive uncertainty for practical deployment.</div><div>The proposed WOA-FCM-CNN-WNN-Informer model is evaluated on a comprehensive dataset of 70,080 hourly PV power recordings gathered over eight years in Tunisia. Results show superior performance compared to standard deep learning models like LSTM and BiLSTM. The framework reduces Mean Absolute Percentage Error (MAPE) by as much as 98.52% and Root Mean Squared Error (RMSE) by 93.84%, while maintaining a high coefficient of determination (<span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>98</mn></mrow></math></span>) across varying meteorological conditions. These outcomes underscore the model’s robustness and its promise for advancing energy utilization, refining charging strategies, and supporting intelligent route planning in solar-electric transportation systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200630"},"PeriodicalIF":4.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Autonomous driving in developing regions demands perception systems that can operate reliably in unstructured road environments marked by heterogeneous traffic, weak or missing lane geometry, frequent occlusions, and strong appearance variability. Existing semantic segmentation models, although successful in structured Western datasets, exhibit poor generalization to such chaotic conditions and are often too computationally heavy for real-time deployment on low-power edge hardware. To address these gaps, this paper focuses on the challenge of achieving fast, accurate, and resource-efficient segmentation tailored to complex Indian road scenes. We propose IndiSegNet, a lightweight architecture designed explicitly for this setting. The model introduces two novel components—Multi-Scale Contextual Features (MSCF) for capturing irregular object scales and Encoded Features Refining (EFR) for enhancing thin-structure and boundary detail, resulting in a more stable representation for unstructured environments. IndiSegNet achieves 67.2% mIoU on IDD, 78.9% on Cityscapes, and 74.6% on CamVid, while sustaining 112 FPS on Jetson Nano, outperforming standard baselines by 12%–18% IoU on safety-critical classes such as pedestrians, riders, and vehicles. Real-world evaluation across urban, monsoonal, rural, and mountainous regions shows less than 2.5% variance in mIoU with consistent inference speeds above 108 FPS. These results demonstrate that IndiSegNet offers a practical and hardware-efficient solution for high-speed autonomous navigation in the challenging traffic conditions of developing regions.
{"title":"IndiSegNet: Real-time semantic segmentation for unstructured road scenes in intelligent transportation systems","authors":"Pritam Chakraborty , Anjan Bandyopadhyay , Kushagra Agrawal , Jin Zhang , Man-Fai Leung","doi":"10.1016/j.iswa.2026.200629","DOIUrl":"10.1016/j.iswa.2026.200629","url":null,"abstract":"<div><div>Autonomous driving in developing regions demands perception systems that can operate reliably in unstructured road environments marked by heterogeneous traffic, weak or missing lane geometry, frequent occlusions, and strong appearance variability. Existing semantic segmentation models, although successful in structured Western datasets, exhibit poor generalization to such chaotic conditions and are often too computationally heavy for real-time deployment on low-power edge hardware. To address these gaps, this paper focuses on the challenge of achieving fast, accurate, and resource-efficient segmentation tailored to complex Indian road scenes. We propose IndiSegNet, a lightweight architecture designed explicitly for this setting. The model introduces two novel components—Multi-Scale Contextual Features (MSCF) for capturing irregular object scales and Encoded Features Refining (EFR) for enhancing thin-structure and boundary detail, resulting in a more stable representation for unstructured environments. IndiSegNet achieves 67.2% mIoU on IDD, 78.9% on Cityscapes, and 74.6% on CamVid, while sustaining 112 FPS on Jetson Nano, outperforming standard baselines by 12%–18% IoU on safety-critical classes such as pedestrians, riders, and vehicles. Real-world evaluation across urban, monsoonal, rural, and mountainous regions shows less than 2.5% variance in mIoU with consistent inference speeds above 108 FPS. These results demonstrate that IndiSegNet offers a practical and hardware-efficient solution for high-speed autonomous navigation in the challenging traffic conditions of developing regions.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200629"},"PeriodicalIF":4.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1016/j.iswa.2025.200622
Pei Xue, Yuanchun Ye
We propose a deep reinforcement learning framework for dynamic portfolio optimization that combines a Dirichlet policy with cross-sectional attention mechanisms. The Dirichlet distribution enforces feasibility by construction, accommodates tradability masks, and provides a coherent geometry for exploration. Our architecture integrates per-asset temporal encoders with a global attention layer, allowing the policy to adaptively weight sectoral co-movements, factor spillovers, and other cross-asset dependencies. We evaluate the framework on a comprehensive S&P 500 panel from 2000 to 2025 using purged walk-forward backtesting to prevent look-ahead bias. Empirical results show that attention-enhanced Dirichlet policies deliver higher terminal wealth, Sharpe and Sortino ratios than equal-weight and reinforcement learning baselines, while maintaining realistic turnover and drawdown profiles. Our findings highlight that principled action parameterization and attention-based representation learning materially improve both the stability and interpretability of reinforcement learning methods for portfolio allocation.
{"title":"Attention-enhanced reinforcement learning for dynamic portfolio optimization","authors":"Pei Xue, Yuanchun Ye","doi":"10.1016/j.iswa.2025.200622","DOIUrl":"10.1016/j.iswa.2025.200622","url":null,"abstract":"<div><div>We propose a deep reinforcement learning framework for dynamic portfolio optimization that combines a Dirichlet policy with cross-sectional attention mechanisms. The Dirichlet distribution enforces feasibility by construction, accommodates tradability masks, and provides a coherent geometry for exploration. Our architecture integrates per-asset temporal encoders with a global attention layer, allowing the policy to adaptively weight sectoral co-movements, factor spillovers, and other cross-asset dependencies. We evaluate the framework on a comprehensive S&P 500 panel from 2000 to 2025 using purged walk-forward backtesting to prevent look-ahead bias. Empirical results show that attention-enhanced Dirichlet policies deliver higher terminal wealth, Sharpe and Sortino ratios than equal-weight and reinforcement learning baselines, while maintaining realistic turnover and drawdown profiles. Our findings highlight that principled action parameterization and attention-based representation learning materially improve both the stability and interpretability of reinforcement learning methods for portfolio allocation.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200622"},"PeriodicalIF":4.3,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bound-constrained single-objective optimization and constrained engineering design often feature heterogeneous landscapes and barrier-like structures, motivating search procedures that are scale-aware, robust near constraints, and economical in tuning.
Contributions:
We introduce Quantum Tunneling and Fractional Calculus-Based Metaheuristic (QTFM), a physics-inspired metaheuristic that is parameter-lean and employs bounded, range-aware operators to reduce sensitivity to tuning and to prevent erratic steps close to constraints.
Methodology:
QTFM couples fractional-step dynamics for scale-aware exploitation with a quantum-tunneling jump for barrier crossing, and augments these with a wavefunction-collapse local search that averages a small neighborhood and applies minimal perturbations to accelerate refinement without sacrificing diversity.
Results:
On the IEEE Congress on Evolutionary Computation CEC 2022 single-objective bound-constrained suite, QTFM ranked first on ten of twelve functions; it reached the best optimum on F1 and achieved the best mean values on F2–F8 and F10–F11 with stable standard deviations. In three constrained engineering problems, QTFM produced the lowest mean and the best-found solution for the robotic gripper design, and the lowest mean for the planetary gear train and three-bar truss design.
Findings:
The proposed fractional–quantum approach delivers fast, accurate, and robust search across heterogeneous landscapes and real-world design problems.
{"title":"Novel quantum tunneling and fractional calculus-based metaheuristic for robust global data optimization and its applications in engineering design","authors":"Hussam Fakhouri , Riyad Alrousan , Niveen Halalsheh , Najem Sirhan , Jamal Zraqou , Khalil Omar","doi":"10.1016/j.iswa.2025.200616","DOIUrl":"10.1016/j.iswa.2025.200616","url":null,"abstract":"<div><h3>Background:</h3><div>Bound-constrained single-objective optimization and constrained engineering design often feature heterogeneous landscapes and barrier-like structures, motivating search procedures that are scale-aware, robust near constraints, and economical in tuning.</div></div><div><h3>Contributions:</h3><div>We introduce Quantum Tunneling and Fractional Calculus-Based Metaheuristic (QTFM), a physics-inspired metaheuristic that is parameter-lean and employs bounded, range-aware operators to reduce sensitivity to tuning and to prevent erratic steps close to constraints.</div></div><div><h3>Methodology:</h3><div>QTFM couples fractional-step dynamics for scale-aware exploitation with a quantum-tunneling jump for barrier crossing, and augments these with a wavefunction-collapse local search that averages a small neighborhood and applies minimal perturbations to accelerate refinement without sacrificing diversity.</div></div><div><h3>Results:</h3><div>On the IEEE Congress on Evolutionary Computation CEC 2022 single-objective bound-constrained suite, QTFM ranked first on ten of twelve functions; it reached the best optimum on F1 and achieved the best mean values on F2–F8 and F10–F11 with stable standard deviations. In three constrained engineering problems, QTFM produced the lowest mean and the best-found solution for the robotic gripper design, and the lowest mean for the planetary gear train and three-bar truss design.</div></div><div><h3>Findings:</h3><div>The proposed fractional–quantum approach delivers fast, accurate, and robust search across heterogeneous landscapes and real-world design problems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200616"},"PeriodicalIF":4.3,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08DOI: 10.1016/j.iswa.2026.200628
Thanh Tung Luu , Duy An Huynh
Rolling bearing degradation produces vibration signatures that vary across operating conditions, posing challenges for reliable fault diagnosis. This study proposes an adaptive and lightweight diagnostic framework combining a Depthwise Separable Multi-Scale CNN (DSMSCNN) with Convolutional Block Attention Module (CBAM) and Spatial Pyramid Pooling (SPP) to extract fault-frequency invariant features across different mechanical domains. Wavelet-based time–frequency maps are utilized to suppress noise and preserve multi-resolution spectral characteristics. The multi-scale separable convolutions adaptively capture discriminative frequency patterns, while CBAM highlights informative spectral regions and SPP enhances scale robustness without fixed input sizes. Experiments on the CWRU and HUST bearing datasets demonstrate over 99 % accuracy with significantly fewer parameters than conventional CNNs. The results confirm that the proposed DSMSCNN-CBAM-SPP framework effectively captures invariant fault-frequency features, offering a compact and adaptive solution for intelligent bearing fault diagnosis and real-time predictive maintenance in a noisy environment.
{"title":"An efficient lightweight multi-scale CNN framework with CBAM and SPP for bearing fault diagnosis","authors":"Thanh Tung Luu , Duy An Huynh","doi":"10.1016/j.iswa.2026.200628","DOIUrl":"10.1016/j.iswa.2026.200628","url":null,"abstract":"<div><div>Rolling bearing degradation produces vibration signatures that vary across operating conditions, posing challenges for reliable fault diagnosis. This study proposes an adaptive and lightweight diagnostic framework combining a Depthwise Separable Multi-Scale CNN (DSMSCNN) with Convolutional Block Attention Module (CBAM) and Spatial Pyramid Pooling (SPP) to extract fault-frequency invariant features across different mechanical domains. Wavelet-based time–frequency maps are utilized to suppress noise and preserve multi-resolution spectral characteristics. The multi-scale separable convolutions adaptively capture discriminative frequency patterns, while CBAM highlights informative spectral regions and SPP enhances scale robustness without fixed input sizes. Experiments on the CWRU and HUST bearing datasets demonstrate over 99 % accuracy with significantly fewer parameters than conventional CNNs. The results confirm that the proposed DSMSCNN-CBAM-SPP framework effectively captures invariant fault-frequency features, offering a compact and adaptive solution for intelligent bearing fault diagnosis and real-time predictive maintenance in a noisy environment.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200628"},"PeriodicalIF":4.3,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We evaluate a personalized, two-stage comparison-based FER framework on two datasets of low-to-mid-intensity, near-neutral expressions. The framework consistently outperforms FaceReader and Py-Feat. On the natural-transition younger-adult dataset (Dataset A, n = 9), mean accuracy is 90.22% ± 3.53%, with within-subject median gains of +16.46 percentage points (pp) over FaceReader (95% CI [+11.33, +33.90], p = 0.00195, r = 1.00) and +8.17 pp over Py-Feat (95% CI [+3.39, +21.58], p = 0.00195, r = 1.00). On the older adults dataset (Dataset B, n = 78), mean accuracy is 75.58% ± 9.04%, exceeding FaceReader by +15.47 pp (95% CI [+13.44, +17.21], p = 2.77 × 10–14, r = 0.980) and Py-Feat by +17.67 pp (95% CI [+15.13, +19.34], p = 3.02 × 10–14, r = 0.985). Component analyses are above chance on both datasets (B-stage medians 92.90% and 99.51%), and polarity-specific asymmetries emerge in the C-stage (A: positive > negative, Δ = +4.23 pp, two-sided p = 0.0273; B: negative > positive, Δ = -7.72 pp, p = 0.00442). On a subset of Dataset A emphasizing subtle transitions, the system maintains [78.61%, 85.38%] accuracy where human annotation accuracy ranges [50.00%, 71.47%]. Grad-CAM highlights eyebrows, forehead, and mouth regions consistent with expressive cues. Collectively, these findings demonstrate statistically significant and practically meaningful advantages for low-to-mid-intensity expression recognition and intensity ranking.
我们在两个低到中等强度、接近中性表达的数据集上评估了一个个性化的、基于两阶段比较的FER框架。该框架始终优于FaceReader和Py-Feat。在自然过渡的年轻人-成年人数据集(数据集A, n = 9)上,平均准确率为90.22%±3.53%,比FaceReader (95% CI [+11.33, +33.90], p = 0.00195, r = 1.00)和Py-Feat (95% CI [+3.39, +21.58], p = 0.00195, r = 1.00)的受试者内中位增益+16.46个百分点(pp)。在老年人数据集(数据集B, n = 78)上,平均准确率为75.58%±9.04%,超过FaceReader +15.47 pp (95% CI [+13.44, +17.21], p = 2.77 × 10-14, r = 0.980)和Py-Feat +17.67 pp (95% CI [+15.13, +19.34], p = 3.02 × 10-14, r = 0.985)。成分分析在两个数据集上都高于偶然(B期中位数为92.90%和99.51%),并且极性特异性不对称出现在c期(A:阳性>;阴性,Δ = +4.23 pp,双面p = 0.0273; B:阴性>;阳性,Δ = -7.72 pp, p = 0.00442)。在强调微妙过渡的Dataset a子集上,系统保持了[78.61%,85.38%]的准确率,而人类标注的准确率范围为[50.00%,71.47%]。Grad-CAM突出眉毛、前额和嘴部与表达线索一致。综上所述,这些发现显示了在低到中强度表达识别和强度排序方面具有统计学意义和实际意义的优势。
{"title":"Personalized two-stage comparison-based framework for low-to-mid-intensity facial expression recognition in real-world scenarios","authors":"Junyao Zhang , Kei Shimonishi , Kazuaki Kondo , Yuichi Nakamura","doi":"10.1016/j.iswa.2026.200627","DOIUrl":"10.1016/j.iswa.2026.200627","url":null,"abstract":"<div><div>We evaluate a personalized, two-stage comparison-based FER framework on two datasets of low-to-mid-intensity, near-neutral expressions. The framework consistently outperforms FaceReader and Py-Feat. On the natural-transition younger-adult dataset (Dataset A, <em>n</em> = 9), mean accuracy is 90.22% ± 3.53%, with within-subject median gains of +16.46 percentage points (pp) over FaceReader (95% CI [+11.33, +33.90], <em>p</em> = 0.00195, <em>r</em> = 1.00) and +8.17 pp over Py-Feat (95% CI [+3.39, +21.58], <em>p</em> = 0.00195, <em>r</em> = 1.00). On the older adults dataset (Dataset B, <em>n</em> = 78), mean accuracy is 75.58% ± 9.04%, exceeding FaceReader by +15.47 pp (95% CI [+13.44, +17.21], <em>p</em> = 2.77 × 10<sup>–14</sup>, <em>r</em> = 0.980) and Py-Feat by +17.67 pp (95% CI [+15.13, +19.34], <em>p</em> = 3.02 × 10<sup>–14</sup>, <em>r</em> = 0.985). Component analyses are above chance on both datasets (B-stage medians 92.90% and 99.51%), and polarity-specific asymmetries emerge in the C-stage (A: positive > negative, Δ = +4.23 pp, two-sided <em>p</em> = 0.0273; B: negative > positive, Δ = -7.72 pp, <em>p</em> = 0.00442). On a subset of Dataset A emphasizing subtle transitions, the system maintains [78.61%, 85.38%] accuracy where human annotation accuracy ranges [50.00%, 71.47%]. Grad-CAM highlights eyebrows, forehead, and mouth regions consistent with expressive cues. Collectively, these findings demonstrate statistically significant and practically meaningful advantages for low-to-mid-intensity expression recognition and intensity ranking.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200627"},"PeriodicalIF":4.3,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}