首页 > 最新文献

Intelligent Systems with Applications最新文献

英文 中文
Buyer–Seller-Deception-Game Dataset: A new comprehensive dataset for facial expression based deception detection in economic contexts 买方-卖方-欺骗-游戏数据集:一个新的基于面部表情的经济背景下欺骗检测的综合数据集
IF 4.3 Pub Date : 2026-02-09 DOI: 10.1016/j.iswa.2026.200636
Laslo Dinges , Marc-André Fiedler , Ayoub Al-Hamadi , Dmitri Bershadskyy , Joachim Weimann
Automated deception detection using only video and audio modalities remains a challenging problem in computer vision. Deceptive behavior is highly context dependent, shaped by factors such as the specific scenario, cultural background, and the associated stakes. To explore the potential of future deception detection tools in online interactions, such as virtual sales meetings, we present a new high-quality low-stakes dataset specifically tailored for this task. It currently includes 500 annotated video samples, with a planned extension to around 1000 before public release. Participants engage in incentivized online interactions, in which sellers attempt to persuade buyers to choose a specific card, creating naturally motivated deceptive and truthful behavior. We evaluate a variety of visual and audio based feature sets, such as gaze, head pose, facial Action-Units (AUs), and prosodic features, on our dataset as well as on a high-stakes in-the-wild deception dataset. Our results show that OpenFace based AU features perform best on our clean and controlled recordings, while CNN based AU predictors outperform others in the more challenging dataset with lower video quality and unstable head pose. Multimodal approaches slightly outperform the best unimodal features in both cases. We will make the dataset freely available to support future research in automated deception detection.
仅使用视频和音频模式的自动欺骗检测仍然是计算机视觉中的一个具有挑战性的问题。欺骗行为是高度依赖情境的,受特定情境、文化背景和相关风险等因素的影响。为了探索在线互动(如虚拟销售会议)中未来欺骗检测工具的潜力,我们提出了一个专门为此任务量身定制的新的高质量低风险数据集。它目前包括500个带注释的视频样本,在公开发布之前计划扩展到1000个左右。参与者参与有激励的在线互动,卖家试图说服买家选择一张特定的卡,产生自然动机的欺骗和诚实行为。我们在我们的数据集以及高风险的野外欺骗数据集上评估了各种基于视觉和音频的特征集,如凝视、头部姿势、面部动作单位(AUs)和韵律特征。我们的结果表明,基于OpenFace的AU特征在我们干净和受控的记录上表现最好,而基于CNN的AU预测器在视频质量较低和头部姿势不稳定的更具挑战性的数据集中表现优于其他预测器。在这两种情况下,多模态方法的性能略优于最佳单模态特征。我们将免费提供数据集,以支持未来自动欺骗检测的研究。
{"title":"Buyer–Seller-Deception-Game Dataset: A new comprehensive dataset for facial expression based deception detection in economic contexts","authors":"Laslo Dinges ,&nbsp;Marc-André Fiedler ,&nbsp;Ayoub Al-Hamadi ,&nbsp;Dmitri Bershadskyy ,&nbsp;Joachim Weimann","doi":"10.1016/j.iswa.2026.200636","DOIUrl":"10.1016/j.iswa.2026.200636","url":null,"abstract":"<div><div>Automated deception detection using only video and audio modalities remains a challenging problem in computer vision. Deceptive behavior is highly context dependent, shaped by factors such as the specific scenario, cultural background, and the associated stakes. To explore the potential of future deception detection tools in online interactions, such as virtual sales meetings, we present a new high-quality low-stakes dataset specifically tailored for this task. It currently includes 500 annotated video samples, with a planned extension to around 1000 before public release. Participants engage in incentivized online interactions, in which sellers attempt to persuade buyers to choose a specific card, creating naturally motivated deceptive and truthful behavior. We evaluate a variety of visual and audio based feature sets, such as gaze, head pose, facial Action-Units (AUs), and prosodic features, on our dataset as well as on a high-stakes in-the-wild deception dataset. Our results show that OpenFace based AU features perform best on our clean and controlled recordings, while CNN based AU predictors outperform others in the more challenging dataset with lower video quality and unstable head pose. Multimodal approaches slightly outperform the best unimodal features in both cases. We will make the dataset freely available to support future research in automated deception detection.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200636"},"PeriodicalIF":4.3,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video anomaly detection for edge-based IoT systems: A survey of input modalities and real-time applications 基于边缘的物联网系统的视频异常检测:输入方式和实时应用的调查
IF 4.3 Pub Date : 2026-02-05 DOI: 10.1016/j.iswa.2026.200635
Hoangcong Le, Cheng-Kai Lu, Chen-Chien Hsu
With the vast amount of video data generated daily, researchers have become increasingly interested in extracting meaningful information, particularly for analyzing abnormal events. This growing interest has accelerated progress in video anomaly detection (VAD) as a specialized subfield of computer vision, attracting considerable attention due to its potential applications in real-time scenarios such as elderly care, smart homes, and intelligent surveillance. To provide a comprehensive understanding of this rapidly evolving field, several systematic reviews have been conducted to help new researchers enter the field and assist experienced groups in keeping pace with recent advancements. However, existing surveys lack a focused analysis of how different input data modalities impact the performance of VAD systems, particularly from a privacy-preserving perspective. Understanding the effectiveness of various data modalities and data collection strategies is essential for protecting personal information in computer vision applications. Furthermore, the feasibility of deploying VAD models in real-time Internet of Things (IoT) environments remains underexplored, where low latency, limited resources, and strict privacy requirements are critical considerations. Although edge computing has been increasingly adopted to address these challenges, most studies overlook the deployment of VAD frameworks on resource-constrained devices. Integrating edge-based VAD systems with federated learning algorithms represents a promising direction for enabling privacy-aware and scalable real-world systems. Rather than providing a method-centric summary, this survey reorganizes the VAD literature from a deployment-oriented viewpoint, highlighting how input modality choices fundamentally affect privacy preservation and real-time feasibility on edge-based IoT systems. This work specifically reviews studies published between 2020 and 2025.
随着每天产生的大量视频数据,研究人员对提取有意义的信息越来越感兴趣,特别是对异常事件的分析。这种日益增长的兴趣加速了视频异常检测(VAD)作为计算机视觉的一个专门子领域的进展,由于其在老年人护理,智能家居和智能监控等实时场景中的潜在应用而引起了相当大的关注。为了全面了解这一快速发展的领域,已经进行了几次系统综述,以帮助新的研究人员进入该领域,并帮助有经验的小组跟上最新的进展。然而,现有的调查缺乏对不同输入数据模式如何影响VAD系统性能的重点分析,特别是从隐私保护的角度。了解各种数据模式和数据收集策略的有效性对于保护计算机视觉应用中的个人信息至关重要。此外,在实时物联网(IoT)环境中部署VAD模型的可行性仍未得到充分探索,在这些环境中,低延迟、有限资源和严格的隐私要求是关键考虑因素。尽管边缘计算已被越来越多地用于解决这些挑战,但大多数研究忽略了在资源受限设备上部署VAD框架。将基于边缘的VAD系统与联邦学习算法集成为实现隐私感知和可扩展的现实世界系统提供了一个有前途的方向。本调查不是提供以方法为中心的总结,而是从面向部署的角度重新组织了VAD文献,强调了输入模式选择如何从根本上影响基于边缘的物联网系统的隐私保护和实时可行性。这项工作特别回顾了2020年至2025年之间发表的研究。
{"title":"Video anomaly detection for edge-based IoT systems: A survey of input modalities and real-time applications","authors":"Hoangcong Le,&nbsp;Cheng-Kai Lu,&nbsp;Chen-Chien Hsu","doi":"10.1016/j.iswa.2026.200635","DOIUrl":"10.1016/j.iswa.2026.200635","url":null,"abstract":"<div><div>With the vast amount of video data generated daily, researchers have become increasingly interested in extracting meaningful information, particularly for analyzing abnormal events. This growing interest has accelerated progress in video anomaly detection (VAD) as a specialized subfield of computer vision, attracting considerable attention due to its potential applications in real-time scenarios such as elderly care, smart homes, and intelligent surveillance. To provide a comprehensive understanding of this rapidly evolving field, several systematic reviews have been conducted to help new researchers enter the field and assist experienced groups in keeping pace with recent advancements. However, existing surveys lack a focused analysis of how different input data modalities impact the performance of VAD systems, particularly from a privacy-preserving perspective. Understanding the effectiveness of various data modalities and data collection strategies is essential for protecting personal information in computer vision applications. Furthermore, the feasibility of deploying VAD models in real-time Internet of Things (IoT) environments remains underexplored, where low latency, limited resources, and strict privacy requirements are critical considerations. Although edge computing has been increasingly adopted to address these challenges, most studies overlook the deployment of VAD frameworks on resource-constrained devices. Integrating edge-based VAD systems with federated learning algorithms represents a promising direction for enabling privacy-aware and scalable real-world systems. Rather than providing a method-centric summary, this survey reorganizes the VAD literature from a deployment-oriented viewpoint, highlighting how input modality choices fundamentally affect privacy preservation and real-time feasibility on edge-based IoT systems. This work specifically reviews studies published between 2020 and 2025.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200635"},"PeriodicalIF":4.3,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study on the generalization of DINOv2 features for food recognition tasks: A unified evaluation framework 用于食物识别任务的DINOv2特征泛化研究:一个统一的评价框架
IF 4.3 Pub Date : 2026-02-04 DOI: 10.1016/j.iswa.2026.200632
Simone Bianco, Marco Buzzelli, Gianluigi Ciocca, Flavio Piccoli, Raimondo Schettini
Self-supervised learning has recently gained increasing attention in computer vision, enabling the extraction of rich and general-purpose feature representations without requiring large annotated datasets. In this paper we aim to build a unified approach capable of deploying robust and effective analysis systems, replacing the need for multiple task-specific models trained end-to-end. Rather than introducing new architectures or training strategies, our goal is to systematically assess whether a single frozen self-supervised representation can support heterogeneous food-related tasks under realistic operating conditions. To this end, we performed an extensive analysis of DINOv2 features across multiple benchmark datasets and tasks, including food classification, segmentation, aesthetic assessment, and robustness to image distortions. In addition, we explore its capacity for continual learning by applying it to incremental food classification scenarios. Our findings reveal that DINOv2 features excel in many food-related applications. Their shared representations across tasks reduce the need for training separate models, while their strong generalization, high accuracy, and ability to handle complex multi-task scenarios make them a strong candidate for a unified food recognition approach. Specifically, DINOv2 features match or surpass state-of-the-art supervised methods in several food recognition tasks, while offering a simpler and more unified deployment strategy. Furthermore, they outperform end-to-end models in cross-dataset scenarios by up to +19.4% Top-1 accuracy and exhibits strong resilience to common image distortions by up to +48.0% robustness in Top-1 accuracy percentual difference, ensuring reliable performance in real-world applications. On average across all considered tasks, the DINOv2-based unified evaluation outperforms the state of the art by approximately 2.8% and 5.4%, depending on the chosen model size, while using only 6.2% and 23.9% of the total number of model parameters, respectively.
自监督学习最近在计算机视觉领域得到了越来越多的关注,它可以在不需要大型注释数据集的情况下提取丰富和通用的特征表示。在本文中,我们的目标是构建一种统一的方法,能够部署健壮和有效的分析系统,取代对端到端训练的多个特定任务模型的需求。我们的目标不是引入新的架构或训练策略,而是系统地评估单个冷冻自监督表示是否可以在实际操作条件下支持与食物相关的异构任务。为此,我们在多个基准数据集和任务中对DINOv2特征进行了广泛的分析,包括食品分类、分割、美学评估和对图像失真的鲁棒性。此外,我们通过将其应用于增量食物分类场景来探索其持续学习的能力。我们的研究结果表明,DINOv2在许多与食品相关的应用中表现优异。它们跨任务的共享表示减少了训练单独模型的需要,而它们的强泛化、高精度和处理复杂多任务场景的能力使它们成为统一食品识别方法的有力候选。具体来说,DINOv2的特点是在几个食物识别任务中匹配或超过最先进的监督方法,同时提供更简单、更统一的部署策略。此外,它们在跨数据集场景中的表现优于端到端模型,最高可达19.4%的Top-1精度,并且在Top-1精度百分比差异中表现出对常见图像失真的强大弹性,最高可达48.0%的鲁棒性,确保了在实际应用中的可靠性能。在所有考虑的任务中,平均而言,基于dinov2的统一评估比目前的技术水平高出大约2.8%和5.4%,这取决于所选择的模型大小,而分别只使用了6.2%和23.9%的模型参数总数。
{"title":"A study on the generalization of DINOv2 features for food recognition tasks: A unified evaluation framework","authors":"Simone Bianco,&nbsp;Marco Buzzelli,&nbsp;Gianluigi Ciocca,&nbsp;Flavio Piccoli,&nbsp;Raimondo Schettini","doi":"10.1016/j.iswa.2026.200632","DOIUrl":"10.1016/j.iswa.2026.200632","url":null,"abstract":"<div><div>Self-supervised learning has recently gained increasing attention in computer vision, enabling the extraction of rich and general-purpose feature representations without requiring large annotated datasets. In this paper we aim to build a unified approach capable of deploying robust and effective analysis systems, replacing the need for multiple task-specific models trained end-to-end. Rather than introducing new architectures or training strategies, our goal is to systematically assess whether a single frozen self-supervised representation can support heterogeneous food-related tasks under realistic operating conditions. To this end, we performed an extensive analysis of DINOv2 features across multiple benchmark datasets and tasks, including food classification, segmentation, aesthetic assessment, and robustness to image distortions. In addition, we explore its capacity for continual learning by applying it to incremental food classification scenarios. Our findings reveal that DINOv2 features excel in many food-related applications. Their shared representations across tasks reduce the need for training separate models, while their strong generalization, high accuracy, and ability to handle complex multi-task scenarios make them a strong candidate for a unified food recognition approach. Specifically, DINOv2 features match or surpass state-of-the-art supervised methods in several food recognition tasks, while offering a simpler and more unified deployment strategy. Furthermore, they outperform end-to-end models in cross-dataset scenarios by up to +19.4% Top-1 accuracy and exhibits strong resilience to common image distortions by up to +48.0% robustness in Top-1 accuracy percentual difference, ensuring reliable performance in real-world applications. On average across all considered tasks, the DINOv2-based unified evaluation outperforms the state of the art by approximately 2.8% and 5.4%, depending on the chosen model size, while using only 6.2% and 23.9% of the total number of model parameters, respectively.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200632"},"PeriodicalIF":4.3,"publicationDate":"2026-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing decision-making: A comprehensive review of intelligent systems, applications, and challenges 推进决策:对智能系统、应用和挑战的全面回顾
IF 4.3 Pub Date : 2026-01-31 DOI: 10.1016/j.iswa.2026.200631
Hussein A.A. Al-Khamees , Ahmad AL Smadi , Mutasem K. Alsmadi , Abdulrahman A. Alkannad , Ahed Abugabah , Latifa Abdullah Almusfar , Bashair Althani
The rapid evolution of intelligent systems, powered by artificial intelligence and machine learning, has created a fragmented research landscape. While numerous studies exist on specific applications, a holistic synthesis of their architectures, taxonomies, applications, and challenges is absent. This paper will bridge this gap by providing a comprehensive systematic review that integrates these disparate elements. This paper conducts a systematic review of over 100 peer-reviewed scientific publications, following a structured process to identify, analyze, and synthesize the current state of intelligent systems research. The review encompasses a wide range of domains, including healthcare, cybersecurity, data mining, and industrial automation. Our analysis yields a unified taxonomy and clarifies the core architectural components of intelligent systems. We identify and categorize key application domains and demonstrate their transformative impact. The review also synthesizes prevailing challenges, such as data quality, scalability, and ethical concerns, and pinpoints emerging trends, including the rise of multimodal AI and hybrid intelligent systems. To the best of our knowledge, this is the first review to offer a consolidated framework that integrates the architecture, taxonomy, applications, and cross-domain challenges of intelligent systems into a single reference. This work serves as a foundational guide for researchers and practitioners, facilitating future advancements in the development of efficient, scalable, and context-aware intelligent systems.
在人工智能和机器学习的推动下,智能系统的快速发展创造了一个碎片化的研究格局。虽然有许多关于特定应用程序的研究,但缺乏对它们的体系结构、分类法、应用程序和挑战的全面综合。本文将通过提供集成这些不同元素的全面系统回顾来弥合这一差距。本文对100多篇同行评议的科学出版物进行了系统的综述,遵循结构化的过程来识别、分析和综合智能系统研究的现状。该审查涵盖了广泛的领域,包括医疗保健、网络安全、数据挖掘和工业自动化。我们的分析产生了一个统一的分类,并澄清了智能系统的核心架构组件。我们对关键应用领域进行识别和分类,并展示它们的变革性影响。该报告还综合了当前面临的挑战,如数据质量、可扩展性和伦理问题,并指出了新兴趋势,包括多模式人工智能和混合智能系统的兴起。据我们所知,这是第一个提供整合架构、分类、应用和智能系统跨领域挑战的综合框架的综述。这项工作为研究人员和实践者提供了基础指导,促进了未来高效、可扩展和上下文感知智能系统的发展。
{"title":"Advancing decision-making: A comprehensive review of intelligent systems, applications, and challenges","authors":"Hussein A.A. Al-Khamees ,&nbsp;Ahmad AL Smadi ,&nbsp;Mutasem K. Alsmadi ,&nbsp;Abdulrahman A. Alkannad ,&nbsp;Ahed Abugabah ,&nbsp;Latifa Abdullah Almusfar ,&nbsp;Bashair Althani","doi":"10.1016/j.iswa.2026.200631","DOIUrl":"10.1016/j.iswa.2026.200631","url":null,"abstract":"<div><div>The rapid evolution of intelligent systems, powered by artificial intelligence and machine learning, has created a fragmented research landscape. While numerous studies exist on specific applications, a holistic synthesis of their architectures, taxonomies, applications, and challenges is absent. This paper will bridge this gap by providing a comprehensive systematic review that integrates these disparate elements. This paper conducts a systematic review of over 100 peer-reviewed scientific publications, following a structured process to identify, analyze, and synthesize the current state of intelligent systems research. The review encompasses a wide range of domains, including healthcare, cybersecurity, data mining, and industrial automation. Our analysis yields a unified taxonomy and clarifies the core architectural components of intelligent systems. We identify and categorize key application domains and demonstrate their transformative impact. The review also synthesizes prevailing challenges, such as data quality, scalability, and ethical concerns, and pinpoints emerging trends, including the rise of multimodal AI and hybrid intelligent systems. To the best of our knowledge, this is the first review to offer a consolidated framework that integrates the architecture, taxonomy, applications, and cross-domain challenges of intelligent systems into a single reference. This work serves as a foundational guide for researchers and practitioners, facilitating future advancements in the development of efficient, scalable, and context-aware intelligent systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200631"},"PeriodicalIF":4.3,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WOA-FCM-CNN-WNN-informer: An advanced hybrid deep learning model for ultra-accurate PV power forecasting in electric mobility woa - fcm - cnn - cnn - inforformer:一种先进的混合深度学习模型,用于超精确的电动汽车光伏功率预测
IF 4.3 Pub Date : 2026-01-29 DOI: 10.1016/j.iswa.2026.200630
Lazhar Manai , Walid Mchara , Mohamed Abdellatif Khalfa , Monia Raissi , Wissem Dimassi
Effective prediction of photovoltaic (PV) power generation is essential for enhancing energy management in solar-powered electric vehicles. This study introduces an innovative hybrid forecasting framework that combines Fuzzy C-Means (FCM) clustering, Convolutional Neural Networks (CNN), Wavelet Neural Networks (WNN), the Informer architecture, and the Whale Optimization Algorithm (WOA) to improve prediction accuracy. This approach introduces a condition-aware, end-to-end FCM-CNN-WNN-Informer pipeline tailored for PV dynamics, where: (i) similar-day fuzzy clustering normalizes weather heterogeneity before learning; (ii) wavelet-based multi-scale features are injected into a long-horizon Informer; (iii) a global, cross-module hyperparameter search via Whale Optimization Algorithm (WOA) jointly tunes all stages; (iv) a Generalization Index (GI) is proposed for robust model selection; and (v) Monte-Carlo dropout quantifies predictive uncertainty for practical deployment.
The proposed WOA-FCM-CNN-WNN-Informer model is evaluated on a comprehensive dataset of 70,080 hourly PV power recordings gathered over eight years in Tunisia. Results show superior performance compared to standard deep learning models like LSTM and BiLSTM. The framework reduces Mean Absolute Percentage Error (MAPE) by as much as 98.52% and Root Mean Squared Error (RMSE) by 93.84%, while maintaining a high coefficient of determination (R2=0.98) across varying meteorological conditions. These outcomes underscore the model’s robustness and its promise for advancing energy utilization, refining charging strategies, and supporting intelligent route planning in solar-electric transportation systems.
有效的光伏发电预测是提高太阳能电动汽车能源管理水平的关键。本研究提出了一种创新的混合预测框架,该框架结合了模糊c均值(FCM)聚类、卷积神经网络(CNN)、小波神经网络(WNN)、Informer架构和鲸鱼优化算法(WOA)来提高预测精度。该方法引入了一种针对PV动力学的状态感知、端到端FCM-CNN-WNN-Informer管道,其中:(i)相似日模糊聚类在学习前对天气异质性进行归一化;(ii)将基于小波的多尺度特征注入到长视界信息中;(iii)通过鲸鱼优化算法(WOA)进行全局跨模块超参数搜索,共同调整所有阶段;(iv)提出了稳健模型选择的概化指数(GI);蒙特卡罗误差量化了实际部署的预测不确定性。提出的WOA-FCM-CNN-WNN-Informer模型是在突尼斯8年来收集的70,080小时光伏发电记录的综合数据集上进行评估的。结果显示,与LSTM和BiLSTM等标准深度学习模型相比,性能优越。该框架将平均绝对百分比误差(MAPE)降低了98.52%,均方根误差(RMSE)降低了93.84%,同时在不同的气象条件下保持了较高的决定系数(R2=0.98)。这些结果强调了该模型的稳健性及其在提高能源利用率、改进充电策略和支持太阳能电力运输系统智能路线规划方面的前景。
{"title":"WOA-FCM-CNN-WNN-informer: An advanced hybrid deep learning model for ultra-accurate PV power forecasting in electric mobility","authors":"Lazhar Manai ,&nbsp;Walid Mchara ,&nbsp;Mohamed Abdellatif Khalfa ,&nbsp;Monia Raissi ,&nbsp;Wissem Dimassi","doi":"10.1016/j.iswa.2026.200630","DOIUrl":"10.1016/j.iswa.2026.200630","url":null,"abstract":"<div><div>Effective prediction of photovoltaic (PV) power generation is essential for enhancing energy management in solar-powered electric vehicles. This study introduces an innovative hybrid forecasting framework that combines Fuzzy C-Means (FCM) clustering, Convolutional Neural Networks (CNN), Wavelet Neural Networks (WNN), the Informer architecture, and the Whale Optimization Algorithm (WOA) to improve prediction accuracy. This approach introduces a condition-aware, end-to-end FCM-CNN-WNN-Informer pipeline tailored for PV dynamics, where: (i) <em>similar-day</em> fuzzy clustering normalizes weather heterogeneity <em>before</em> learning; (ii) wavelet-based multi-scale features are injected into a long-horizon Informer; (iii) a global, cross-module hyperparameter search via Whale Optimization Algorithm (WOA) jointly tunes all stages; (iv) a <em>Generalization Index (GI)</em> is proposed for robust model selection; and (v) Monte-Carlo dropout quantifies predictive uncertainty for practical deployment.</div><div>The proposed WOA-FCM-CNN-WNN-Informer model is evaluated on a comprehensive dataset of 70,080 hourly PV power recordings gathered over eight years in Tunisia. Results show superior performance compared to standard deep learning models like LSTM and BiLSTM. The framework reduces Mean Absolute Percentage Error (MAPE) by as much as 98.52% and Root Mean Squared Error (RMSE) by 93.84%, while maintaining a high coefficient of determination (<span><math><mrow><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>=</mo><mn>0</mn><mo>.</mo><mn>98</mn></mrow></math></span>) across varying meteorological conditions. These outcomes underscore the model’s robustness and its promise for advancing energy utilization, refining charging strategies, and supporting intelligent route planning in solar-electric transportation systems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200630"},"PeriodicalIF":4.3,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IndiSegNet: Real-time semantic segmentation for unstructured road scenes in intelligent transportation systems IndiSegNet:智能交通系统中非结构化道路场景的实时语义分割
IF 4.3 Pub Date : 2026-01-17 DOI: 10.1016/j.iswa.2026.200629
Pritam Chakraborty , Anjan Bandyopadhyay , Kushagra Agrawal , Jin Zhang , Man-Fai Leung
Autonomous driving in developing regions demands perception systems that can operate reliably in unstructured road environments marked by heterogeneous traffic, weak or missing lane geometry, frequent occlusions, and strong appearance variability. Existing semantic segmentation models, although successful in structured Western datasets, exhibit poor generalization to such chaotic conditions and are often too computationally heavy for real-time deployment on low-power edge hardware. To address these gaps, this paper focuses on the challenge of achieving fast, accurate, and resource-efficient segmentation tailored to complex Indian road scenes. We propose IndiSegNet, a lightweight architecture designed explicitly for this setting. The model introduces two novel components—Multi-Scale Contextual Features (MSCF) for capturing irregular object scales and Encoded Features Refining (EFR) for enhancing thin-structure and boundary detail, resulting in a more stable representation for unstructured environments. IndiSegNet achieves 67.2% mIoU on IDD, 78.9% on Cityscapes, and 74.6% on CamVid, while sustaining 112 FPS on Jetson Nano, outperforming standard baselines by 12%–18% IoU on safety-critical classes such as pedestrians, riders, and vehicles. Real-world evaluation across urban, monsoonal, rural, and mountainous regions shows less than 2.5% variance in mIoU with consistent inference speeds above 108 FPS. These results demonstrate that IndiSegNet offers a practical and hardware-efficient solution for high-speed autonomous navigation in the challenging traffic conditions of developing regions.
在发展中地区,自动驾驶需要能够在非结构化道路环境中可靠运行的感知系统,这些环境的特点是交通不均匀、车道几何形状薄弱或缺失、频繁闭塞以及外观变异性强。现有的语义分割模型,虽然在结构化的西方数据集上取得了成功,但在这种混乱的条件下表现出较差的泛化能力,并且对于在低功耗边缘硬件上的实时部署来说,计算量往往太大。为了解决这些差距,本文将重点放在实现针对复杂印度道路场景的快速、准确和资源高效分割的挑战上。我们提出了IndiSegNet,这是一个专门为这种设置设计的轻量级架构。该模型引入了两个新组件——用于捕获不规则对象尺度的多尺度上下文特征(MSCF)和用于增强薄结构和边界细节的编码特征精炼(EFR),从而对非结构化环境进行更稳定的表示。IndiSegNet在IDD上的mIoU达到67.2%,在cityscape上达到78.9%,在CamVid上达到74.6%,而在Jetson Nano上保持112 FPS,在行人、骑手和车辆等安全关键类别上的IoU比标准基准高出12%-18%。在城市、季风、农村和山区的实际评估中,mIoU的差异小于2.5%,推理速度一致高于108 FPS。这些结果表明,IndiSegNet为发展中地区具有挑战性的交通条件下的高速自主导航提供了实用且硬件高效的解决方案。
{"title":"IndiSegNet: Real-time semantic segmentation for unstructured road scenes in intelligent transportation systems","authors":"Pritam Chakraborty ,&nbsp;Anjan Bandyopadhyay ,&nbsp;Kushagra Agrawal ,&nbsp;Jin Zhang ,&nbsp;Man-Fai Leung","doi":"10.1016/j.iswa.2026.200629","DOIUrl":"10.1016/j.iswa.2026.200629","url":null,"abstract":"<div><div>Autonomous driving in developing regions demands perception systems that can operate reliably in unstructured road environments marked by heterogeneous traffic, weak or missing lane geometry, frequent occlusions, and strong appearance variability. Existing semantic segmentation models, although successful in structured Western datasets, exhibit poor generalization to such chaotic conditions and are often too computationally heavy for real-time deployment on low-power edge hardware. To address these gaps, this paper focuses on the challenge of achieving fast, accurate, and resource-efficient segmentation tailored to complex Indian road scenes. We propose IndiSegNet, a lightweight architecture designed explicitly for this setting. The model introduces two novel components—Multi-Scale Contextual Features (MSCF) for capturing irregular object scales and Encoded Features Refining (EFR) for enhancing thin-structure and boundary detail, resulting in a more stable representation for unstructured environments. IndiSegNet achieves 67.2% mIoU on IDD, 78.9% on Cityscapes, and 74.6% on CamVid, while sustaining 112 FPS on Jetson Nano, outperforming standard baselines by 12%–18% IoU on safety-critical classes such as pedestrians, riders, and vehicles. Real-world evaluation across urban, monsoonal, rural, and mountainous regions shows less than 2.5% variance in mIoU with consistent inference speeds above 108 FPS. These results demonstrate that IndiSegNet offers a practical and hardware-efficient solution for high-speed autonomous navigation in the challenging traffic conditions of developing regions.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200629"},"PeriodicalIF":4.3,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-enhanced reinforcement learning for dynamic portfolio optimization 动态投资组合优化的注意力增强强化学习
IF 4.3 Pub Date : 2026-01-15 DOI: 10.1016/j.iswa.2025.200622
Pei Xue, Yuanchun Ye
We propose a deep reinforcement learning framework for dynamic portfolio optimization that combines a Dirichlet policy with cross-sectional attention mechanisms. The Dirichlet distribution enforces feasibility by construction, accommodates tradability masks, and provides a coherent geometry for exploration. Our architecture integrates per-asset temporal encoders with a global attention layer, allowing the policy to adaptively weight sectoral co-movements, factor spillovers, and other cross-asset dependencies. We evaluate the framework on a comprehensive S&P 500 panel from 2000 to 2025 using purged walk-forward backtesting to prevent look-ahead bias. Empirical results show that attention-enhanced Dirichlet policies deliver higher terminal wealth, Sharpe and Sortino ratios than equal-weight and reinforcement learning baselines, while maintaining realistic turnover and drawdown profiles. Our findings highlight that principled action parameterization and attention-based representation learning materially improve both the stability and interpretability of reinforcement learning methods for portfolio allocation.
我们提出了一个用于动态投资组合优化的深度强化学习框架,该框架将Dirichlet策略与横截面注意机制相结合。狄利克雷分布通过构造加强了可行性,容纳了可交易掩模,并为勘探提供了连贯的几何形状。我们的架构将每个资产的时间编码器与全局关注层集成在一起,允许策略自适应地权衡部门协同运动、因素溢出和其他跨资产依赖关系。我们在2000年至2025年的综合标准普尔500指数面板上评估了框架,使用清除的向前回溯测试来防止前瞻性偏见。实证结果表明,与等权重和强化学习基线相比,注意力增强的狄利克雷政策提供了更高的终端财富、夏普和索蒂诺比率,同时保持了现实的周转和收缩概况。我们的研究结果强调,有原则的动作参数化和基于注意的表示学习极大地提高了强化学习方法在投资组合分配中的稳定性和可解释性。
{"title":"Attention-enhanced reinforcement learning for dynamic portfolio optimization","authors":"Pei Xue,&nbsp;Yuanchun Ye","doi":"10.1016/j.iswa.2025.200622","DOIUrl":"10.1016/j.iswa.2025.200622","url":null,"abstract":"<div><div>We propose a deep reinforcement learning framework for dynamic portfolio optimization that combines a Dirichlet policy with cross-sectional attention mechanisms. The Dirichlet distribution enforces feasibility by construction, accommodates tradability masks, and provides a coherent geometry for exploration. Our architecture integrates per-asset temporal encoders with a global attention layer, allowing the policy to adaptively weight sectoral co-movements, factor spillovers, and other cross-asset dependencies. We evaluate the framework on a comprehensive S&amp;P 500 panel from 2000 to 2025 using purged walk-forward backtesting to prevent look-ahead bias. Empirical results show that attention-enhanced Dirichlet policies deliver higher terminal wealth, Sharpe and Sortino ratios than equal-weight and reinforcement learning baselines, while maintaining realistic turnover and drawdown profiles. Our findings highlight that principled action parameterization and attention-based representation learning materially improve both the stability and interpretability of reinforcement learning methods for portfolio allocation.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200622"},"PeriodicalIF":4.3,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel quantum tunneling and fractional calculus-based metaheuristic for robust global data optimization and its applications in engineering design 基于分数阶演算的新型量子隧道和元启发式鲁棒全局数据优化方法及其在工程设计中的应用
IF 4.3 Pub Date : 2026-01-09 DOI: 10.1016/j.iswa.2025.200616
Hussam Fakhouri , Riyad Alrousan , Niveen Halalsheh , Najem Sirhan , Jamal Zraqou , Khalil Omar

Background:

Bound-constrained single-objective optimization and constrained engineering design often feature heterogeneous landscapes and barrier-like structures, motivating search procedures that are scale-aware, robust near constraints, and economical in tuning.

Contributions:

We introduce Quantum Tunneling and Fractional Calculus-Based Metaheuristic (QTFM), a physics-inspired metaheuristic that is parameter-lean and employs bounded, range-aware operators to reduce sensitivity to tuning and to prevent erratic steps close to constraints.

Methodology:

QTFM couples fractional-step dynamics for scale-aware exploitation with a quantum-tunneling jump for barrier crossing, and augments these with a wavefunction-collapse local search that averages a small neighborhood and applies minimal perturbations to accelerate refinement without sacrificing diversity.

Results:

On the IEEE Congress on Evolutionary Computation CEC 2022 single-objective bound-constrained suite, QTFM ranked first on ten of twelve functions; it reached the best optimum on F1 and achieved the best mean values on F2–F8 and F10–F11 with stable standard deviations. In three constrained engineering problems, QTFM produced the lowest mean and the best-found solution for the robotic gripper design, and the lowest mean for the planetary gear train and three-bar truss design.

Findings:

The proposed fractional–quantum approach delivers fast, accurate, and robust search across heterogeneous landscapes and real-world design problems.
背景:受约束的单目标优化和受约束的工程设计通常具有异质景观和类似障碍物的结构,激励搜索过程具有规模意识、鲁棒性和经济性。贡献:我们引入了量子隧道和基于分数微积分的元启发式(QTFM),这是一种物理启发的元启发式,它是参数精益的,并采用有界的范围感知算子来降低对调谐的敏感性,并防止接近约束的不稳定步骤。方法:QTFM将分数阶动力学与量子隧道跃迁结合起来,用于规模感知开发,并通过波函数坍缩局部搜索来增强这些功能,该搜索可以平均小邻域,并应用最小的扰动来加速改进,而不会牺牲多样性。结果:在IEEE进化计算大会CEC 2022单目标约束集上,QTFM在12项功能中有10项排名第一;在F1上达到最佳,在F2-F8和F10-F11上达到最佳均值,标准差稳定。在三个约束工程问题中,QTFM给出了机器人夹持器设计的最小均值和最优解,以及行星齿轮传动和三杆桁架设计的最小均值。研究结果:提出的分数量子方法提供了跨异质景观和现实世界设计问题的快速、准确和强大的搜索。
{"title":"Novel quantum tunneling and fractional calculus-based metaheuristic for robust global data optimization and its applications in engineering design","authors":"Hussam Fakhouri ,&nbsp;Riyad Alrousan ,&nbsp;Niveen Halalsheh ,&nbsp;Najem Sirhan ,&nbsp;Jamal Zraqou ,&nbsp;Khalil Omar","doi":"10.1016/j.iswa.2025.200616","DOIUrl":"10.1016/j.iswa.2025.200616","url":null,"abstract":"<div><h3>Background:</h3><div>Bound-constrained single-objective optimization and constrained engineering design often feature heterogeneous landscapes and barrier-like structures, motivating search procedures that are scale-aware, robust near constraints, and economical in tuning.</div></div><div><h3>Contributions:</h3><div>We introduce Quantum Tunneling and Fractional Calculus-Based Metaheuristic (QTFM), a physics-inspired metaheuristic that is parameter-lean and employs bounded, range-aware operators to reduce sensitivity to tuning and to prevent erratic steps close to constraints.</div></div><div><h3>Methodology:</h3><div>QTFM couples fractional-step dynamics for scale-aware exploitation with a quantum-tunneling jump for barrier crossing, and augments these with a wavefunction-collapse local search that averages a small neighborhood and applies minimal perturbations to accelerate refinement without sacrificing diversity.</div></div><div><h3>Results:</h3><div>On the IEEE Congress on Evolutionary Computation CEC 2022 single-objective bound-constrained suite, QTFM ranked first on ten of twelve functions; it reached the best optimum on F1 and achieved the best mean values on F2–F8 and F10–F11 with stable standard deviations. In three constrained engineering problems, QTFM produced the lowest mean and the best-found solution for the robotic gripper design, and the lowest mean for the planetary gear train and three-bar truss design.</div></div><div><h3>Findings:</h3><div>The proposed fractional–quantum approach delivers fast, accurate, and robust search across heterogeneous landscapes and real-world design problems.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200616"},"PeriodicalIF":4.3,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient lightweight multi-scale CNN framework with CBAM and SPP for bearing fault diagnosis 基于CBAM和SPP的轴承故障诊断的高效轻量级多尺度CNN框架
IF 4.3 Pub Date : 2026-01-08 DOI: 10.1016/j.iswa.2026.200628
Thanh Tung Luu , Duy An Huynh
Rolling bearing degradation produces vibration signatures that vary across operating conditions, posing challenges for reliable fault diagnosis. This study proposes an adaptive and lightweight diagnostic framework combining a Depthwise Separable Multi-Scale CNN (DSMSCNN) with Convolutional Block Attention Module (CBAM) and Spatial Pyramid Pooling (SPP) to extract fault-frequency invariant features across different mechanical domains. Wavelet-based time–frequency maps are utilized to suppress noise and preserve multi-resolution spectral characteristics. The multi-scale separable convolutions adaptively capture discriminative frequency patterns, while CBAM highlights informative spectral regions and SPP enhances scale robustness without fixed input sizes. Experiments on the CWRU and HUST bearing datasets demonstrate over 99 % accuracy with significantly fewer parameters than conventional CNNs. The results confirm that the proposed DSMSCNN-CBAM-SPP framework effectively captures invariant fault-frequency features, offering a compact and adaptive solution for intelligent bearing fault diagnosis and real-time predictive maintenance in a noisy environment.
滚动轴承退化会产生不同运行条件下的振动特征,这对可靠的故障诊断提出了挑战。本文提出了一种基于深度可分离多尺度CNN (DSMSCNN)、卷积块注意模块(CBAM)和空间金字塔池(SPP)的自适应轻量级诊断框架,用于提取不同机械领域的故障频率不变特征。基于小波的时频图用于抑制噪声和保持多分辨率频谱特征。多尺度可分离卷积自适应捕获判别频率模式,而CBAM突出信息频谱区域,SPP增强了不固定输入大小的尺度鲁棒性。在CWRU和HUST轴承数据集上的实验表明,与传统cnn相比,该方法的准确率超过99%,参数显著减少。结果表明,所提出的DSMSCNN-CBAM-SPP框架能够有效捕获不变的故障频率特征,为噪声环境下的轴承智能故障诊断和实时预测性维护提供了一种紧凑、自适应的解决方案。
{"title":"An efficient lightweight multi-scale CNN framework with CBAM and SPP for bearing fault diagnosis","authors":"Thanh Tung Luu ,&nbsp;Duy An Huynh","doi":"10.1016/j.iswa.2026.200628","DOIUrl":"10.1016/j.iswa.2026.200628","url":null,"abstract":"<div><div>Rolling bearing degradation produces vibration signatures that vary across operating conditions, posing challenges for reliable fault diagnosis. This study proposes an adaptive and lightweight diagnostic framework combining a Depthwise Separable Multi-Scale CNN (DSMSCNN) with Convolutional Block Attention Module (CBAM) and Spatial Pyramid Pooling (SPP) to extract fault-frequency invariant features across different mechanical domains. Wavelet-based time–frequency maps are utilized to suppress noise and preserve multi-resolution spectral characteristics. The multi-scale separable convolutions adaptively capture discriminative frequency patterns, while CBAM highlights informative spectral regions and SPP enhances scale robustness without fixed input sizes. Experiments on the CWRU and HUST bearing datasets demonstrate over 99 % accuracy with significantly fewer parameters than conventional CNNs. The results confirm that the proposed DSMSCNN-CBAM-SPP framework effectively captures invariant fault-frequency features, offering a compact and adaptive solution for intelligent bearing fault diagnosis and real-time predictive maintenance in a noisy environment.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200628"},"PeriodicalIF":4.3,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized two-stage comparison-based framework for low-to-mid-intensity facial expression recognition in real-world scenarios 现实场景中低强度面部表情识别的个性化两阶段比较框架
IF 4.3 Pub Date : 2026-01-08 DOI: 10.1016/j.iswa.2026.200627
Junyao Zhang , Kei Shimonishi , Kazuaki Kondo , Yuichi Nakamura
We evaluate a personalized, two-stage comparison-based FER framework on two datasets of low-to-mid-intensity, near-neutral expressions. The framework consistently outperforms FaceReader and Py-Feat. On the natural-transition younger-adult dataset (Dataset A, n = 9), mean accuracy is 90.22% ± 3.53%, with within-subject median gains of +16.46 percentage points (pp) over FaceReader (95% CI [+11.33, +33.90], p = 0.00195, r = 1.00) and +8.17 pp over Py-Feat (95% CI [+3.39, +21.58], p = 0.00195, r = 1.00). On the older adults dataset (Dataset B, n = 78), mean accuracy is 75.58% ± 9.04%, exceeding FaceReader by +15.47 pp (95% CI [+13.44, +17.21], p = 2.77 × 10–14, r = 0.980) and Py-Feat by +17.67 pp (95% CI [+15.13, +19.34], p = 3.02 × 10–14, r = 0.985). Component analyses are above chance on both datasets (B-stage medians 92.90% and 99.51%), and polarity-specific asymmetries emerge in the C-stage (A: positive > negative, Δ = +4.23 pp, two-sided p = 0.0273; B: negative > positive, Δ = -7.72 pp, p = 0.00442). On a subset of Dataset A emphasizing subtle transitions, the system maintains [78.61%, 85.38%] accuracy where human annotation accuracy ranges [50.00%, 71.47%]. Grad-CAM highlights eyebrows, forehead, and mouth regions consistent with expressive cues. Collectively, these findings demonstrate statistically significant and practically meaningful advantages for low-to-mid-intensity expression recognition and intensity ranking.
我们在两个低到中等强度、接近中性表达的数据集上评估了一个个性化的、基于两阶段比较的FER框架。该框架始终优于FaceReader和Py-Feat。在自然过渡的年轻人-成年人数据集(数据集A, n = 9)上,平均准确率为90.22%±3.53%,比FaceReader (95% CI [+11.33, +33.90], p = 0.00195, r = 1.00)和Py-Feat (95% CI [+3.39, +21.58], p = 0.00195, r = 1.00)的受试者内中位增益+16.46个百分点(pp)。在老年人数据集(数据集B, n = 78)上,平均准确率为75.58%±9.04%,超过FaceReader +15.47 pp (95% CI [+13.44, +17.21], p = 2.77 × 10-14, r = 0.980)和Py-Feat +17.67 pp (95% CI [+15.13, +19.34], p = 3.02 × 10-14, r = 0.985)。成分分析在两个数据集上都高于偶然(B期中位数为92.90%和99.51%),并且极性特异性不对称出现在c期(A:阳性>;阴性,Δ = +4.23 pp,双面p = 0.0273; B:阴性>;阳性,Δ = -7.72 pp, p = 0.00442)。在强调微妙过渡的Dataset a子集上,系统保持了[78.61%,85.38%]的准确率,而人类标注的准确率范围为[50.00%,71.47%]。Grad-CAM突出眉毛、前额和嘴部与表达线索一致。综上所述,这些发现显示了在低到中强度表达识别和强度排序方面具有统计学意义和实际意义的优势。
{"title":"Personalized two-stage comparison-based framework for low-to-mid-intensity facial expression recognition in real-world scenarios","authors":"Junyao Zhang ,&nbsp;Kei Shimonishi ,&nbsp;Kazuaki Kondo ,&nbsp;Yuichi Nakamura","doi":"10.1016/j.iswa.2026.200627","DOIUrl":"10.1016/j.iswa.2026.200627","url":null,"abstract":"<div><div>We evaluate a personalized, two-stage comparison-based FER framework on two datasets of low-to-mid-intensity, near-neutral expressions. The framework consistently outperforms FaceReader and Py-Feat. On the natural-transition younger-adult dataset (Dataset A, <em>n</em> = 9), mean accuracy is 90.22% ± 3.53%, with within-subject median gains of +16.46 percentage points (pp) over FaceReader (95% CI [+11.33, +33.90], <em>p</em> = 0.00195, <em>r</em> = 1.00) and +8.17 pp over Py-Feat (95% CI [+3.39, +21.58], <em>p</em> = 0.00195, <em>r</em> = 1.00). On the older adults dataset (Dataset B, <em>n</em> = 78), mean accuracy is 75.58% ± 9.04%, exceeding FaceReader by +15.47 pp (95% CI [+13.44, +17.21], <em>p</em> = 2.77 × 10<sup>–14</sup>, <em>r</em> = 0.980) and Py-Feat by +17.67 pp (95% CI [+15.13, +19.34], <em>p</em> = 3.02 × 10<sup>–14</sup>, <em>r</em> = 0.985). Component analyses are above chance on both datasets (B-stage medians 92.90% and 99.51%), and polarity-specific asymmetries emerge in the C-stage (A: positive &gt; negative, Δ = +4.23 pp, two-sided <em>p</em> = 0.0273; B: negative &gt; positive, Δ = -7.72 pp, <em>p</em> = 0.00442). On a subset of Dataset A emphasizing subtle transitions, the system maintains [78.61%, 85.38%] accuracy where human annotation accuracy ranges [50.00%, 71.47%]. Grad-CAM highlights eyebrows, forehead, and mouth regions consistent with expressive cues. Collectively, these findings demonstrate statistically significant and practically meaningful advantages for low-to-mid-intensity expression recognition and intensity ranking.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"29 ","pages":"Article 200627"},"PeriodicalIF":4.3,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Intelligent Systems with Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1