首页 > 最新文献

Expert Systems with Applications最新文献

英文 中文
Progressive alternating attribute-Structure optimization for multiplex heterogeneous graphs 多路异构图的渐进式交替属性结构优化
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-25 Epub Date: 2026-02-03 DOI: 10.1016/j.eswa.2026.131495
Haochang Hao , Jun Huang , Shuzhen Rao
Multiplex heterogeneous graphs, characterized by various types of node and relation, often exhibit incomplete structures and missing attributes in real-world scenarios, posing significant challenges for effective representation learning. Although existing studies have explored either structure refinement or attribute completion independently, few have touched on their potential complementarity. In this work, we propose an alternating optimization framework for node representation learning in multiplex heterogeneous graphs. We propose an alternating optimization framework with three key innovations: (i) relation-aware dynamic structure learning guided by attribute similarity, (ii) multi-hop completion of missing attributes on the refined graphs, and (iii) a progressive alternating optimization strategy that couples the two modules so they bootstrap and denoise each other over rounds. Extensive experiments on multiple real-world heterogeneous graph datasets demonstrate that our framework achieves superior performance over state-of-the-art baselines, validating the effectiveness and robustness of progressive structure-attribute co-optimization in heterogeneous graph representation learning.
多元异构图以各种类型的节点和关系为特征,在现实场景中往往表现出结构不完整和属性缺失,这对有效的表示学习提出了重大挑战。虽然已有的研究分别对结构精细化和属性补全进行了探讨,但很少涉及到两者潜在的互补性。在这项工作中,我们提出了一个用于多路异构图中节点表示学习的交替优化框架。我们提出了一个交替优化框架,其中包含三个关键创新:(i)由属性相似度引导的关系感知动态结构学习,(ii)精细化图上缺失属性的多跳补全,以及(iii)将两个模块耦合在一起的渐进交替优化策略,以便它们在回合中相互引导和去噪。在多个真实世界的异构图数据集上进行的大量实验表明,我们的框架在最先进的基线上取得了卓越的性能,验证了异构图表示学习中渐进式结构-属性协同优化的有效性和鲁棒性。
{"title":"Progressive alternating attribute-Structure optimization for multiplex heterogeneous graphs","authors":"Haochang Hao ,&nbsp;Jun Huang ,&nbsp;Shuzhen Rao","doi":"10.1016/j.eswa.2026.131495","DOIUrl":"10.1016/j.eswa.2026.131495","url":null,"abstract":"<div><div>Multiplex heterogeneous graphs, characterized by various types of node and relation, often exhibit incomplete structures and missing attributes in real-world scenarios, posing significant challenges for effective representation learning. Although existing studies have explored either structure refinement or attribute completion independently, few have touched on their potential complementarity. In this work, we propose an alternating optimization framework for node representation learning in multiplex heterogeneous graphs. We propose an alternating optimization framework with three key innovations: (i) relation-aware dynamic structure learning guided by attribute similarity, (ii) multi-hop completion of missing attributes on the refined graphs, and (iii) a progressive alternating optimization strategy that couples the two modules so they bootstrap and denoise each other over rounds. Extensive experiments on multiple real-world heterogeneous graph datasets demonstrate that our framework achieves superior performance over state-of-the-art baselines, validating the effectiveness and robustness of progressive structure-attribute co-optimization in heterogeneous graph representation learning.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131495"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A design method for electric vehicle front face styling: based on engineering feasibility optimization of GenAI-generated images 基于genai生成图像工程可行性优化的电动汽车前脸造型设计方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-25 Epub Date: 2026-02-05 DOI: 10.1016/j.eswa.2026.131522
Huining Pei , Mingzhe Yang , Zhonghang Bai , Man Ding , Wen Li , Yuxin Cao , Yanjun Zhang
To address the low engineering feasibility of electric vehicle (EV) front face styling images generated by generative artificial intelligence (GenAI) tools such as Midjourney, this study proposes an innovative design method that integrates curve optimization with a collaborative evaluation system combining simulated and human experts. The method aims to enhance the manufacturability of AI-generated design schemes while efficiently transferring the styling genes of conventional fuel vehicles to EV front face styling design. First, the large language model ChatGPT-5.0 is employed to construct a styling semantic database based on six categories of conventional fuel vehicle front face datasets. Second, Midjourney is used to generate an initial EV front face styling dataset, and a production-ready styling dataset is subsequently constructed to provide engineering feasibility references for EV front face styling design. Third, “AI-generated curves” and “engineering reference curves” are fused at different ratios, and an EV front face styling scheme is generated using a curve blending algorithm optimized for the figure–ground relationship. Finally, an LLM-based collaborative evaluation system integrating simulated experts (via ChatGPT-5.0) and human experts is established to conduct quantitative evaluation and optimization of the schemes in terms of engineering feasibility and styling design metrics. A case study demonstrates that the optimized scheme’s engineering feasibility score is significantly improved from 2.3 to 7.1 (out of 10), while maintaining a high level of design creativity (7.5). The established LLM-based collaborative evaluation system achieved high inter-rater consistency in both engineering feasibility evaluation (ICC ≥ 0.9) and design creativity evaluation for EV front face styling schemes (ICC ≥ 0.85), effectively balancing engineering feasibility and design creativity in generative artificial intelligence-generated EV front face styling schemes. By constructing an AI-led, human-supervised hybrid design workflow, this method significantly enhances the engineering feasibility and design efficiency of generative AI in product styling design, providing a theoretical reference for achieving a balance between design innovation and engineering feasibility.
针对Midjourney等生成式人工智能(GenAI)工具生成的电动汽车(EV)正面造型图像工程可行性较低的问题,提出了一种将曲线优化与仿真专家与人类专家相结合的协同评估系统相结合的创新设计方法。该方法旨在提高人工智能生成设计方案的可制造性,同时有效地将传统燃油汽车的造型基因转移到电动汽车前脸造型设计中。首先,采用ChatGPT-5.0大型语言模型,基于六类常规燃油车正面数据集构建样式语义数据库;其次,利用Midjourney软件生成初始电动汽车前脸造型数据集,并构建可量产的造型数据集,为电动汽车前脸造型设计提供工程可行性参考;第三,将“人工智能生成曲线”与“工程参考曲线”按不同比例融合,采用针对图地关系优化的曲线混合算法生成电动汽车前脸造型方案。最后,建立了仿真专家(通过ChatGPT-5.0)与真人专家集成的基于llm的协同评估系统,从工程可行性和造型设计指标两方面对方案进行定量评估和优化。案例研究表明,优化方案的工程可行性得分从2.3分显著提高到7.1分(满分10分),同时保持了较高的设计创意水平(7.5分)。所建立的基于llm的协同评价体系在电动汽车前脸造型方案的工程可行性评价(ICC ≥ 0.9)和设计创意评价(ICC ≥ 0.85)上实现了较高的评价一致性,有效地平衡了生成式人工智能生成的电动汽车前脸造型方案的工程可行性和设计创意。该方法通过构建人工智能主导、人工监督的混合设计工作流,显著提高了生成式人工智能在产品造型设计中的工程可行性和设计效率,为实现设计创新与工程可行性之间的平衡提供了理论参考。
{"title":"A design method for electric vehicle front face styling: based on engineering feasibility optimization of GenAI-generated images","authors":"Huining Pei ,&nbsp;Mingzhe Yang ,&nbsp;Zhonghang Bai ,&nbsp;Man Ding ,&nbsp;Wen Li ,&nbsp;Yuxin Cao ,&nbsp;Yanjun Zhang","doi":"10.1016/j.eswa.2026.131522","DOIUrl":"10.1016/j.eswa.2026.131522","url":null,"abstract":"<div><div>To address the low engineering feasibility of electric vehicle (EV) front face styling images generated by generative artificial intelligence (GenAI) tools such as Midjourney, this study proposes an innovative design method that integrates curve optimization with a collaborative evaluation system combining simulated and human experts. The method aims to enhance the manufacturability of AI-generated design schemes while efficiently transferring the styling genes of conventional fuel vehicles to EV front face styling design. First, the large language model ChatGPT-5.0 is employed to construct a styling semantic database based on six categories of conventional fuel vehicle front face datasets. Second, Midjourney is used to generate an initial EV front face styling dataset, and a production-ready styling dataset is subsequently constructed to provide engineering feasibility references for EV front face styling design. Third, “AI-generated curves” and “engineering reference curves” are fused at different ratios, and an EV front face styling scheme is generated using a curve blending algorithm optimized for the figure–ground relationship. Finally, an LLM-based collaborative evaluation system integrating simulated experts (via ChatGPT-5.0) and human experts is established to conduct quantitative evaluation and optimization of the schemes in terms of engineering feasibility and styling design metrics. A case study demonstrates that the optimized scheme’s engineering feasibility score is significantly improved from 2.3 to 7.1 (out of 10), while maintaining a high level of design creativity (7.5). The established LLM-based collaborative evaluation system achieved high inter-rater consistency in both engineering feasibility evaluation (ICC ≥ 0.9) and design creativity evaluation for EV front face styling schemes (ICC ≥ 0.85), effectively balancing engineering feasibility and design creativity in generative artificial intelligence-generated EV front face styling schemes. By constructing an AI-led, human-supervised hybrid design workflow, this method significantly enhances the engineering feasibility and design efficiency of generative AI in product styling design, providing a theoretical reference for achieving a balance between design innovation and engineering feasibility.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131522"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CNN-DET: A hybrid deep learning architecture for emotion recognition CNN-DET:一种用于情感识别的混合深度学习架构
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-25 Epub Date: 2026-02-06 DOI: 10.1016/j.eswa.2026.131377
Berrouachedi Abdelkader, Jaziri Rakia, Bernard Gilles
Emotion recognition plays a crucial role in various biometric applications, including human-computer interaction, healthcare, and security. This paper presents CNN-DET, a novel hybrid approach that integrates Convolutional Neural Networks (CNNs) with Deep Extra-Trees (DETs) for robust facial emotion recognition. The proposed methodology leverages hierarchical feature extraction through pre-trained CNN models combined with ensemble-based classification using DETs to accurately detect and classify emotions from facial expressions. Comprehensive evaluation on benchmark datasets demonstrates the superior performance of our approach. On the FER-2013 dataset, CNN-DET achieves 98.16% accuracy in 10-fold cross-validation and 85.32% accuracy on the standard test set, with precision of 85.7%, recall of 85.3%, and F1-score of 85.4%. The model maintains strong performance across diverse conditions, achieving 91.2% accuracy on AffectNet and 89.7% accuracy on RAF-DB, confirming its generalization capability. Extensive experiments reveal that our method reduces misclassification between visually similar emotions by 23.4% compared to traditional CNN approaches and shows 15.8% improvement in robustness under varying lighting conditions. The proposed approach not only accurately recognizes emotions but also demonstrates consistent performance across different demographic groups, with less than 3.2% performance variance across age and ethnicity subgroups. These findings highlight the significant potential of deep learning techniques for emotion recognition in biometric applications, providing valuable insights for developing more intelligent and interactive systems. Future research will focus on multimodal data fusion and temporal modeling to further enhance recognition accuracy and real-time performance.
情感识别在各种生物识别应用中起着至关重要的作用,包括人机交互、医疗保健和安全。本文提出了CNN-DET,一种新颖的混合方法,将卷积神经网络(cnn)与深度额外树(det)相结合,用于鲁棒的面部情绪识别。该方法通过预训练的CNN模型进行分层特征提取,并结合基于集成的分类,使用det从面部表情中准确地检测和分类情绪。对基准数据集的综合评估证明了我们的方法的优越性能。在FER-2013数据集上,CNN-DET在10倍交叉验证中准确率达到98.16%,在标准测试集上准确率达到85.32%,准确率为85.7%,召回率为85.3%,f1得分为85.4%。该模型在不同条件下保持了较强的性能,在AffectNet上达到91.2%的准确率,在RAF-DB上达到89.7%的准确率,证实了其泛化能力。大量的实验表明,与传统的CNN方法相比,我们的方法减少了23.4%的视觉相似情绪之间的错误分类,并且在不同光照条件下的鲁棒性提高了15.8%。所提出的方法不仅能准确地识别情绪,而且在不同的人口群体中表现出一致的表现,在不同年龄和种族的子群体中表现差异小于3.2%。这些发现突出了深度学习技术在生物识别应用中情感识别的巨大潜力,为开发更智能和互动的系统提供了有价值的见解。未来的研究将集中在多模态数据融合和时间建模方面,以进一步提高识别精度和实时性。
{"title":"CNN-DET: A hybrid deep learning architecture for emotion recognition","authors":"Berrouachedi Abdelkader,&nbsp;Jaziri Rakia,&nbsp;Bernard Gilles","doi":"10.1016/j.eswa.2026.131377","DOIUrl":"10.1016/j.eswa.2026.131377","url":null,"abstract":"<div><div>Emotion recognition plays a crucial role in various biometric applications, including human-computer interaction, healthcare, and security. This paper presents CNN-DET, a novel hybrid approach that integrates Convolutional Neural Networks (CNNs) with Deep Extra-Trees (DETs) for robust facial emotion recognition. The proposed methodology leverages hierarchical feature extraction through pre-trained CNN models combined with ensemble-based classification using DETs to accurately detect and classify emotions from facial expressions. Comprehensive evaluation on benchmark datasets demonstrates the superior performance of our approach. On the FER-2013 dataset, CNN-DET achieves 98.16% accuracy in 10-fold cross-validation and 85.32% accuracy on the standard test set, with precision of 85.7%, recall of 85.3%, and F1-score of 85.4%. The model maintains strong performance across diverse conditions, achieving 91.2% accuracy on AffectNet and 89.7% accuracy on RAF-DB, confirming its generalization capability. Extensive experiments reveal that our method reduces misclassification between visually similar emotions by 23.4% compared to traditional CNN approaches and shows 15.8% improvement in robustness under varying lighting conditions. The proposed approach not only accurately recognizes emotions but also demonstrates consistent performance across different demographic groups, with less than 3.2% performance variance across age and ethnicity subgroups. These findings highlight the significant potential of deep learning techniques for emotion recognition in biometric applications, providing valuable insights for developing more intelligent and interactive systems. Future research will focus on multimodal data fusion and temporal modeling to further enhance recognition accuracy and real-time performance.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131377"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid intelligence–driven global path planning for ships in complex maritime environments 复杂海洋环境下船舶混合智能驱动的全局路径规划
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-25 Epub Date: 2026-02-05 DOI: 10.1016/j.eswa.2026.131473
Jiao Liu , Kaige Zhu , Yuanqiang Zhang , Miao Gao , Pengjun Zheng
Global ship path planning in complex maritime environments is challenged by dynamic disturbances, vessel-specific constraints, and long-range trajectory dependencies. This study develops an integrated hybrid planning framework that combines deep generative modeling with rule-based optimization. Automatic identification system trajectory time series are first transformed into Gramian Angular Field images to enhance spatio-temporal feature extraction. Vessel type and length are encoded as one-hot vectors and introduced as conditional variables, enabling personalized path generation. These inputs are processed by a Multi-Head Attention–based Conditional Wasserstein Generative Adversarial Network with Gradient Penalty (MHA-cWGAN-GP), in which multi-head attention is used to model long-range dependencies, and conditional Generative Adversarial Network (cGAN) training together with a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) objective is adopted to improve conditioning behavior and training robustness. The model generates initial navigation paths, which are further refined using an A* search procedure that incorporates wind and current disturbances, as well as constraints such as static obstacles, water depth, and Traffic Separation Scheme (TSS) regulations. The final path is smoothed to ensure feasibility and compliance. In case studies for the Ningbo–Zhoushan Port and Yangtze River Estuary, the hybrid planner reduces the number of search nodes from 45 to 57 to 29–35 while simultaneously enforcing TSS, water-depth, wind, and current constraints, with only about a 3–4% increase in path length relative to classical A* and Dijkstra algorithms. The results indicate that the proposed framework effectively integrates learning and optimization, offering a practical and intelligent solution for real-world maritime path planning.
在复杂的海洋环境中,全球船舶路径规划受到动态干扰、船舶特定约束和远程轨迹依赖的挑战。本研究开发了一个集成的混合规划框架,将深度生成建模与基于规则的优化相结合。首先将自动识别系统的轨迹时间序列转化为格莱曼角场图像,增强时空特征提取。船舶类型和长度被编码为单热向量,并作为条件变量引入,从而实现个性化路径生成。这些输入通过基于多头注意力的Wasserstein梯度惩罚条件生成对抗网络(mfa - cwgan - gp)进行处理,其中多头注意力用于建立远程依赖关系模型,并采用条件生成对抗网络(cGAN)训练与WGAN-GP目标相结合来提高条件反射行为和训练鲁棒性。该模型生成初始导航路径,并使用A*搜索程序进一步优化,该搜索程序包含风和电流干扰,以及静态障碍物、水深和交通分道制(TSS)法规等约束条件。最后的路径被平滑以确保可行性和合规性。以宁波-舟山港和长江口为例,混合规划器将搜索节点数从45 ~ 57个减少到29 ~ 35个,同时执行TSS、水深、风和电流约束,路径长度仅比经典a *和Dijkstra算法增加约3 ~ 4%。结果表明,该框架有效地将学习与优化相结合,为现实世界的海上路径规划提供了实用的智能解决方案。
{"title":"Hybrid intelligence–driven global path planning for ships in complex maritime environments","authors":"Jiao Liu ,&nbsp;Kaige Zhu ,&nbsp;Yuanqiang Zhang ,&nbsp;Miao Gao ,&nbsp;Pengjun Zheng","doi":"10.1016/j.eswa.2026.131473","DOIUrl":"10.1016/j.eswa.2026.131473","url":null,"abstract":"<div><div>Global ship path planning in complex maritime environments is challenged by dynamic disturbances, vessel-specific constraints, and long-range trajectory dependencies. This study develops an integrated hybrid planning framework that combines deep generative modeling with rule-based optimization. Automatic identification system trajectory time series are first transformed into Gramian Angular Field images to enhance spatio-temporal feature extraction. Vessel type and length are encoded as one-hot vectors and introduced as conditional variables, enabling personalized path generation. These inputs are processed by a Multi-Head Attention–based Conditional Wasserstein Generative Adversarial Network with Gradient Penalty (MHA-cWGAN-GP), in which multi-head attention is used to model long-range dependencies, and conditional Generative Adversarial Network (cGAN) training together with a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) objective is adopted to improve conditioning behavior and training robustness. The model generates initial navigation paths, which are further refined using an A* search procedure that incorporates wind and current disturbances, as well as constraints such as static obstacles, water depth, and Traffic Separation Scheme (TSS) regulations. The final path is smoothed to ensure feasibility and compliance. In case studies for the Ningbo–Zhoushan Port and Yangtze River Estuary, the hybrid planner reduces the number of search nodes from 45 to 57 to 29–35 while simultaneously enforcing TSS, water-depth, wind, and current constraints, with only about a 3–4% increase in path length relative to classical A* and Dijkstra algorithms. The results indicate that the proposed framework effectively integrates learning and optimization, offering a practical and intelligent solution for real-world maritime path planning.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131473"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressive sensing image restoration with deep prior guided group sparse representation 基于深度先验引导群稀疏表示的压缩感知图像恢复
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-25 Epub Date: 2026-02-04 DOI: 10.1016/j.eswa.2026.131465
Zhulin Ji , Shenghai Liao , Ruyi Han , Shujun Fu
Compressive sensing (CS) enables accurate reconstruction of images from significantly fewer measurements than required by the Nyquist-Shannon sampling theorem, relying critically on effective image priors to regularize the ill-posed inverse problem. Conventional patch-based sparse representation utilize fixed dictionaries that are learned off-the-shelf using the K-SVD algorithm. However, patch-based sparse representation ignores the relationship among patches, and the learned dictionaries can not capture the global image statistics, which will lead to suboptimal reconstruction performance. In this paper, we exploit group sparse representation (GSR) for image compressive sensing reconstruction. By clustering non-local image patches into group and regarding each group as a unit, group sparse representation simultaneously finding sparse codes for all patches within a group, leading to improved reconstruction fidelity and edge preservation. However, GSR relies solely on the undersampled image itself to construct dictionary that is not learnable, being increasingly unreliable at low compressive sensing rates where substantial loss of local image information occurs. To address this limitation, we propose a Deep Prior guided Group Sparse Representation (DPGSR) model for compressive image restoration, where a deep denoiser is responsible for capturing and learning both local and global image statistics by training on external data. The proposed DPGSR achieves improved global consistency, effectively reducing block artifacts while preserving sharper local details. Extensive experiments on image compressive sensing reconstruction and fast MRI demonstrate that the proposed method outperforms state-of-the-art approaches, particularly in preserving fine details and reducing over-smoothing artifacts.
压缩感知(CS)能够从比Nyquist-Shannon采样定理所需的更少的测量中精确地重建图像,主要依赖于有效的图像先验来正则化不适定逆问题。传统的基于补丁的稀疏表示使用K-SVD算法学习现成的固定字典。然而,基于patch的稀疏表示忽略了patch之间的关系,并且学习到的字典不能捕获全局的图像统计信息,这将导致重建性能不理想。本文利用群稀疏表示(GSR)进行图像压缩感知重构。将非局部图像斑块聚类成一组,并以每一组为单位,进行分组稀疏表示,同时为一组内的所有斑块寻找稀疏编码,从而提高了重建保真度和边缘保持能力。然而,GSR仅依赖于欠采样图像本身来构建不可学习的字典,在低压缩感知率下越来越不可靠,会发生大量的局部图像信息丢失。为了解决这一限制,我们提出了一种用于压缩图像恢复的深度先验引导群稀疏表示(DPGSR)模型,其中深度去噪器负责通过外部数据的训练来捕获和学习局部和全局图像统计。提出的DPGSR实现了改进的全局一致性,有效地减少了块伪影,同时保留了更清晰的局部细节。大量的图像压缩感知重建和快速MRI实验表明,该方法优于最先进的方法,特别是在保留细节和减少过度平滑伪影方面。
{"title":"Compressive sensing image restoration with deep prior guided group sparse representation","authors":"Zhulin Ji ,&nbsp;Shenghai Liao ,&nbsp;Ruyi Han ,&nbsp;Shujun Fu","doi":"10.1016/j.eswa.2026.131465","DOIUrl":"10.1016/j.eswa.2026.131465","url":null,"abstract":"<div><div>Compressive sensing (CS) enables accurate reconstruction of images from significantly fewer measurements than required by the Nyquist-Shannon sampling theorem, relying critically on effective image priors to regularize the ill-posed inverse problem. Conventional patch-based sparse representation utilize fixed dictionaries that are learned off-the-shelf using the K-SVD algorithm. However, patch-based sparse representation ignores the relationship among patches, and the learned dictionaries can not capture the global image statistics, which will lead to suboptimal reconstruction performance. In this paper, we exploit group sparse representation (GSR) for image compressive sensing reconstruction. By clustering non-local image patches into group and regarding each group as a unit, group sparse representation simultaneously finding sparse codes for all patches within a group, leading to improved reconstruction fidelity and edge preservation. However, GSR relies solely on the undersampled image itself to construct dictionary that is not learnable, being increasingly unreliable at low compressive sensing rates where substantial loss of local image information occurs. To address this limitation, we propose a Deep Prior guided Group Sparse Representation (DPGSR) model for compressive image restoration, where a deep denoiser is responsible for capturing and learning both local and global image statistics by training on external data. The proposed DPGSR achieves improved global consistency, effectively reducing block artifacts while preserving sharper local details. Extensive experiments on image compressive sensing reconstruction and fast MRI demonstrate that the proposed method outperforms state-of-the-art approaches, particularly in preserving fine details and reducing over-smoothing artifacts.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131465"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Surv-RWKV: Cross-modal receptance weighted key-value interaction with optimal transport feature alignment for survival analysis Surv-RWKV:跨模态接受加权键值相互作用与最佳运输特征对齐的生存分析
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-25 Epub Date: 2026-02-04 DOI: 10.1016/j.eswa.2026.131506
Xiyang Kuang , Bin Yang , Bingo Wing-Kuen Ling , Kok Lay Teo , Xiaozhi Zhang
Multimodal learning has played a pivotal role in survival prediction, particularly in integrating pathological images and genomic data for improving predictive performance. Pathological images provide macroscopic histological information about tumor morphology, while genomic data reveal molecular-level genetic characteristics. The integration of these two modalities enables a comprehensive characterization of tumor heterogeneity and disease progression mechanisms. Despite recent advances in multimodal integration that have significantly enhanced prognostic accuracy, challenges remain in effectively analyzing high-dimensional and heterogeneous whole-slide images and omics data. Current Transformer-based sequence modeling approaches suffer from limited computational efficiency when processing long feature sequences and capturing complex cross-modal interactions. To address these challenges, we propose an innovative cross-modal receptance weighted key-value (RWKV)-based framework, termed Surv-RWKV, for survival prediction. This framework integrates RWKV-based sequence modeling with advanced multimodal fusion strategies to enhance both predictive accuracy and model efficiency. Specifically, Surv-RWKV employs parallel RWKV-based encoders to model long-range dependencies in WSI tissue cluster patterns and genomic pathway activation profiles, achieving improved prognostic performance with optimized computational efficiency. Subsequently, a transport-based optimal cross-modal alignment module is introduced to establish semantic correspondences between histopathological and genomic feature spaces. Furthermore, a progressive feature fusion strategy is implemented to enable effective cross-modal interaction. An RWKV-based shallow fusion module is first developed to explore cross-modal dependencies through spatial-channel hybrid operations, thereby enhancing the representational quality of fused features. A cross-RWKV deep interaction module is then designed to further strengthen information synthesis via iterative cross-attention mechanisms, while simultaneously reinforcing intra-modal representation learning and cross-modal knowledge transfer. Surv-RWKV is expected to effectively capture such cross-modal correlations, thereby improving the accuracy and interpretability of survival predictions. Extensive validation across five TCGA cancer cohorts demonstrates that Surv-RWKV achieves state-of-the-art predictive performance with superior computational efficiency.
多模式学习在生存预测中发挥了关键作用,特别是在整合病理图像和基因组数据以提高预测性能方面。病理图像提供了肿瘤形态的宏观组织学信息,而基因组数据揭示了分子水平的遗传特征。这两种模式的整合可以全面表征肿瘤异质性和疾病进展机制。尽管最近在多模式整合方面取得了进展,显著提高了预后准确性,但在有效分析高维和异构的全切片图像和组学数据方面仍然存在挑战。当前基于变压器的序列建模方法在处理长特征序列和捕获复杂的跨模态相互作用时,计算效率有限。为了应对这些挑战,我们提出了一种创新的基于跨模态接受加权键值(RWKV)的框架,称为Surv-RWKV,用于生存预测。该框架将基于rwkv的序列建模与先进的多模态融合策略相结合,提高了预测精度和模型效率。具体来说,Surv-RWKV采用基于并行rwkv的编码器来模拟WSI组织簇模式和基因组通路激活谱的远程依赖关系,通过优化计算效率来提高预后性能。随后,引入基于转运的最优跨模态对齐模块来建立组织病理和基因组特征空间之间的语义对应关系。此外,实现了渐进式特征融合策略以实现有效的跨模态交互。首先开发了基于rwkv的浅融合模块,通过空间通道混合操作探索跨模态依赖关系,从而提高融合特征的表征质量。然后设计一个跨rwkv深度交互模块,通过迭代的交叉注意机制进一步加强信息合成,同时加强模态内表征学习和跨模态知识转移。Surv-RWKV有望有效捕获这种跨模态相关性,从而提高生存预测的准确性和可解释性。在五个TCGA癌症队列中进行的广泛验证表明,Surv-RWKV具有卓越的计算效率,实现了最先进的预测性能。
{"title":"Surv-RWKV: Cross-modal receptance weighted key-value interaction with optimal transport feature alignment for survival analysis","authors":"Xiyang Kuang ,&nbsp;Bin Yang ,&nbsp;Bingo Wing-Kuen Ling ,&nbsp;Kok Lay Teo ,&nbsp;Xiaozhi Zhang","doi":"10.1016/j.eswa.2026.131506","DOIUrl":"10.1016/j.eswa.2026.131506","url":null,"abstract":"<div><div>Multimodal learning has played a pivotal role in survival prediction, particularly in integrating pathological images and genomic data for improving predictive performance. Pathological images provide macroscopic histological information about tumor morphology, while genomic data reveal molecular-level genetic characteristics. The integration of these two modalities enables a comprehensive characterization of tumor heterogeneity and disease progression mechanisms. Despite recent advances in multimodal integration that have significantly enhanced prognostic accuracy, challenges remain in effectively analyzing high-dimensional and heterogeneous whole-slide images and omics data. Current Transformer-based sequence modeling approaches suffer from limited computational efficiency when processing long feature sequences and capturing complex cross-modal interactions. To address these challenges, we propose an innovative cross-modal receptance weighted key-value (RWKV)-based framework, termed Surv-RWKV, for survival prediction. This framework integrates RWKV-based sequence modeling with advanced multimodal fusion strategies to enhance both predictive accuracy and model efficiency. Specifically, Surv-RWKV employs parallel RWKV-based encoders to model long-range dependencies in WSI tissue cluster patterns and genomic pathway activation profiles, achieving improved prognostic performance with optimized computational efficiency. Subsequently, a transport-based optimal cross-modal alignment module is introduced to establish semantic correspondences between histopathological and genomic feature spaces. Furthermore, a progressive feature fusion strategy is implemented to enable effective cross-modal interaction. An RWKV-based shallow fusion module is first developed to explore cross-modal dependencies through spatial-channel hybrid operations, thereby enhancing the representational quality of fused features. A cross-RWKV deep interaction module is then designed to further strengthen information synthesis via iterative cross-attention mechanisms, while simultaneously reinforcing intra-modal representation learning and cross-modal knowledge transfer. Surv-RWKV is expected to effectively capture such cross-modal correlations, thereby improving the accuracy and interpretability of survival predictions. Extensive validation across five TCGA cancer cohorts demonstrates that Surv-RWKV achieves state-of-the-art predictive performance with superior computational efficiency.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131506"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HHGDroid: Hybrid heterogeneous graph-based android malware detection via multi-evidence similarity fusion HHGDroid:基于多证据相似性融合的基于混合异构图形的android恶意软件检测
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-25 Epub Date: 2026-02-04 DOI: 10.1016/j.eswa.2026.131528
Junwei Tang , Xiaomei Tian , Tao Peng , Jianfeng Lu , Haozhao Wang , Ruixuan Li
Currently, static analysis is insufficient to deal with Android malware that employs advanced evasion techniques such as code obfuscation and dynamic loading. Therefore, hybrid analysis that combines static structure and dynamic behavior has become the mainstream trend. However, existing hybrid analysis methods often adopt simple feature concatenation or shallow fusion mechanisms, which cannot effectively integrate heterogeneous static and dynamic features or capture the complex correlations between structure and behavior. To address this, we propose a hybrid heterogeneous graph-based Android malware detection method via multi-evidence similarity fusion, named HHGDroid. The function call graph generated by static analysis and the event graph obtained through dynamic analysis are connected through a comprehensive similarity of multiple evidences such as semantics, permissions, and time frequency, ultimately forming the hybrid heterogeneous graph with multiple heterogeneous nodes and edges. Our constructed hybrid heterogeneous graph is the first one that simultaneously possesses static and dynamic features. Finally, we improve Reliability-Calibrated Heterogeneous Graph Transformer (RCHGT) to learn the multiple relationships in the hybrid heterogeneous graph, which can automatically distinguish reliable and unreliable edges during the information propagation stage. We conduct experiments on real Android malware applications and achieved an F1-score of 97.87%, outperforming the state-of-the-art methods. Additionally, we verify our method on an unknown malware dataset and obtained an F1-score of 81.52%, which is superior to existing methods. HHGDroid is a novel and effective method for detecting Android malware.
目前,静态分析不足以处理采用高级逃避技术(如代码混淆和动态加载)的Android恶意软件。因此,结合静力结构和动力行为的混合分析已成为主流趋势。然而,现有的混合分析方法往往采用简单的特征拼接或浅融合机制,无法有效整合异构的静态和动态特征,也无法捕捉结构与行为之间的复杂关联。为了解决这一问题,我们提出了一种基于多证据相似性融合的基于混合异构图的Android恶意软件检测方法,命名为HHGDroid。静态分析生成的函数调用图与动态分析得到的事件图通过语义、权限、时间频率等多个证据的综合相似度连接起来,最终形成具有多个异构节点和异构边的混合异构图。本文构造的混合异构图是第一个同时具有静态和动态特征的混合异构图。最后,对RCHGT (reliability - calibration Heterogeneous Graph Transformer)进行改进,学习混合异构图中的多重关系,在信息传播阶段自动区分可靠边和不可靠边。我们在真实的Android恶意软件应用上进行了实验,获得了97.87%的f1得分,优于目前最先进的方法。此外,我们在一个未知的恶意软件数据集上验证了我们的方法,得到了81.52%的f1得分,优于现有的方法。HHGDroid是一种新颖有效的Android恶意软件检测方法。
{"title":"HHGDroid: Hybrid heterogeneous graph-based android malware detection via multi-evidence similarity fusion","authors":"Junwei Tang ,&nbsp;Xiaomei Tian ,&nbsp;Tao Peng ,&nbsp;Jianfeng Lu ,&nbsp;Haozhao Wang ,&nbsp;Ruixuan Li","doi":"10.1016/j.eswa.2026.131528","DOIUrl":"10.1016/j.eswa.2026.131528","url":null,"abstract":"<div><div>Currently, static analysis is insufficient to deal with Android malware that employs advanced evasion techniques such as code obfuscation and dynamic loading. Therefore, hybrid analysis that combines static structure and dynamic behavior has become the mainstream trend. However, existing hybrid analysis methods often adopt simple feature concatenation or shallow fusion mechanisms, which cannot effectively integrate heterogeneous static and dynamic features or capture the complex correlations between structure and behavior. To address this, we propose a hybrid heterogeneous graph-based Android malware detection method via multi-evidence similarity fusion, named HHGDroid. The function call graph generated by static analysis and the event graph obtained through dynamic analysis are connected through a comprehensive similarity of multiple evidences such as semantics, permissions, and time frequency, ultimately forming the hybrid heterogeneous graph with multiple heterogeneous nodes and edges. Our constructed hybrid heterogeneous graph is the first one that simultaneously possesses static and dynamic features. Finally, we improve Reliability-Calibrated Heterogeneous Graph Transformer (RCHGT) to learn the multiple relationships in the hybrid heterogeneous graph, which can automatically distinguish reliable and unreliable edges during the information propagation stage. We conduct experiments on real Android malware applications and achieved an F1-score of 97.87%, outperforming the state-of-the-art methods. Additionally, we verify our method on an unknown malware dataset and obtained an F1-score of 81.52%, which is superior to existing methods. HHGDroid is a novel and effective method for detecting Android malware.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131528"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated self-Expanding neural network learning framework for heterogeneous devices 异构设备的联邦自扩展神经网络学习框架
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-15 Epub Date: 2026-01-21 DOI: 10.1016/j.eswa.2026.131199
Rong Xie , Zhong Chen , Weiguo Cao , Haosen Wang
Federated learning enables collaborative training without sharing raw data, while addressing growing privacy concerns. Real deployments face wide device heterogeneity that undermines both efficiency and accuracy in multi sensor information fusion. We present FSENNL, a federated framework with a self expanding neural network that adapts model capacity to each device. It adjusts capacity dynamically while leaving communication unchanged. A natural extension score combines Fisher information with device profiles to decide when and where to expand. An adaptive regularization term stabilizes newly added units and prevents over extension. To align structurally diverse models during aggregation, an adaptive pruning compensation step uses Optimal Brain Surgeon with lightweight compensation data to recover accuracy after alignment. Knowledge distillation with an asynchronous fusion protocol mitigates straggler effects from uneven training speeds. Decoupling update frequency through teacher and student roles supports timely aggregation and cross device knowledge transfer while preserving convergence. Experiments across heterogeneous settings show consistent accuracy with improved resource use, and demonstrate that the method scales to large federations. FSENNL provides a practical solution for multi sensor information fusion in federated systems, delivering scalable and efficient models under diverse computational constraints.
联邦学习可以在不共享原始数据的情况下进行协作训练,同时解决日益增长的隐私问题。实际部署面临广泛的设备异构性,这破坏了多传感器信息融合的效率和准确性。我们提出了FSENNL,这是一个具有自扩展神经网络的联邦框架,它可以适应每个设备的模型容量。它在保持通信不变的情况下动态调整容量。自然扩展评分将Fisher信息与设备配置文件相结合,以决定何时何地扩展。自适应正则化项稳定了新增加的单元,防止了过度扩展。为了在聚合过程中对齐结构多样的模型,自适应修剪补偿步骤使用了具有轻量级补偿数据的Optimal Brain Surgeon来恢复对齐后的准确性。知识蒸馏与异步融合协议减轻了不均匀训练速度的掉队效应。通过教师和学生角色解耦更新频率支持及时聚合和跨设备知识转移,同时保持收敛性。跨异构设置的实验表明,改进的资源使用具有一致的准确性,并证明该方法适用于大型联合。FSENNL为联邦系统中的多传感器信息融合提供了一个实用的解决方案,在不同的计算约束下提供可扩展和高效的模型。
{"title":"Federated self-Expanding neural network learning framework for heterogeneous devices","authors":"Rong Xie ,&nbsp;Zhong Chen ,&nbsp;Weiguo Cao ,&nbsp;Haosen Wang","doi":"10.1016/j.eswa.2026.131199","DOIUrl":"10.1016/j.eswa.2026.131199","url":null,"abstract":"<div><div>Federated learning enables collaborative training without sharing raw data, while addressing growing privacy concerns. Real deployments face wide device heterogeneity that undermines both efficiency and accuracy in multi sensor information fusion. We present FSENNL, a federated framework with a self expanding neural network that adapts model capacity to each device. It adjusts capacity dynamically while leaving communication unchanged. A natural extension score combines Fisher information with device profiles to decide when and where to expand. An adaptive regularization term stabilizes newly added units and prevents over extension. To align structurally diverse models during aggregation, an adaptive pruning compensation step uses Optimal Brain Surgeon with lightweight compensation data to recover accuracy after alignment. Knowledge distillation with an asynchronous fusion protocol mitigates straggler effects from uneven training speeds. Decoupling update frequency through teacher and student roles supports timely aggregation and cross device knowledge transfer while preserving convergence. Experiments across heterogeneous settings show consistent accuracy with improved resource use, and demonstrate that the method scales to large federations. FSENNL provides a practical solution for multi sensor information fusion in federated systems, delivering scalable and efficient models under diverse computational constraints.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131199"},"PeriodicalIF":7.5,"publicationDate":"2026-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-space intervention for mitigating bias in robust visual question answering 双空间干预在稳健视觉问答中的缓解偏差
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-15 Epub Date: 2026-01-27 DOI: 10.1016/j.eswa.2026.131346
Runmin Wang , Xingdong Song , Zukun Wan , Han Xu , Congzhen Yu , Tianming Ma , Yajun Ding , Shengyou Qian
Visual Question Answering (VQA) evaluates the visual-textual reasoning capabilities of intelligent agents. However, existing methods are often susceptible to various biases. In particular, language bias leads models to rely on spurious question-answer correlations as shortcut solutions, while distribution bias caused by dataset imbalance encourages models to overfit head classes and overlook tail classes. To address these long-standing challenges, we propose a Dual-Space Intervention (DSI) approach that tackles these two biases from a unified yet complementary perspective. Two key innovations are included in our work: (1) In the input space, we adopt an adaptive question shuffling strategy to alleviate language bias by adjusting perturbation strength according to question bias, ensuring models develop a deeper understanding of the problem context, rather than relying on spurious word-answer correlations; (2) In the output space, we propose a novel label rebalancing mechanism that moderates head-class dominance based on long-tailed statistics, improving robustness to distribution bias. This approach reduces the disproportionately high variance in head logits relative to tail logits, improving tail class recognition accuracy. Extensive experiments on four benchmarks (VQA-CP v1, VQA-CP v2, VQA-CE, and SLAKE-CP) demonstrate our method’s superiority, with VQA-CP v1 and SLAKE-CP achieving state-of-the-art performance at 63.14% and 37.61% respectively. The code will be released at https://github.com/songxdr3/DSI.
视觉问答(VQA)评估智能代理的视觉文本推理能力。然而,现有的方法往往容易受到各种偏差的影响。特别是,语言偏差导致模型依赖虚假的问答相关性作为捷径解决方案,而由数据集不平衡引起的分布偏差则导致模型过度拟合头部类而忽略尾部类。为了解决这些长期存在的挑战,我们提出了一种双空间干预(DSI)方法,从统一但互补的角度解决这两种偏见。我们的工作包括两个关键创新:(1)在输入空间中,我们采用自适应问题洗牌策略,通过根据问题偏差调整扰动强度来减轻语言偏差,确保模型对问题上下文有更深入的理解,而不是依赖虚假的词-答案相关性;(2)在输出空间中,我们提出了一种新的标签再平衡机制,该机制调节了基于长尾统计的头类优势,提高了对分布偏差的鲁棒性。该方法减少了头部logits相对于尾部logits的不成比例的高方差,提高了尾部分类识别的准确性。在四个基准测试(VQA-CP v1、VQA-CP v2、VQA-CE和slack - cp)上进行的大量实验证明了我们的方法的优越性,VQA-CP v1和slack - cp的性能分别达到了63.14%和37.61%。代码将在https://github.com/songxdr3/DSI上发布。
{"title":"Dual-space intervention for mitigating bias in robust visual question answering","authors":"Runmin Wang ,&nbsp;Xingdong Song ,&nbsp;Zukun Wan ,&nbsp;Han Xu ,&nbsp;Congzhen Yu ,&nbsp;Tianming Ma ,&nbsp;Yajun Ding ,&nbsp;Shengyou Qian","doi":"10.1016/j.eswa.2026.131346","DOIUrl":"10.1016/j.eswa.2026.131346","url":null,"abstract":"<div><div>Visual Question Answering (VQA) evaluates the visual-textual reasoning capabilities of intelligent agents. However, existing methods are often susceptible to various biases. In particular, language bias leads models to rely on spurious question-answer correlations as shortcut solutions, while distribution bias caused by dataset imbalance encourages models to overfit head classes and overlook tail classes. To address these long-standing challenges, we propose a Dual-Space Intervention (DSI) approach that tackles these two biases from a unified yet complementary perspective. Two key innovations are included in our work: (1) In the input space, we adopt an adaptive question shuffling strategy to alleviate language bias by adjusting perturbation strength according to question bias, ensuring models develop a deeper understanding of the problem context, rather than relying on spurious word-answer correlations; (2) In the output space, we propose a novel label rebalancing mechanism that moderates head-class dominance based on long-tailed statistics, improving robustness to distribution bias. This approach reduces the disproportionately high variance in head logits relative to tail logits, improving tail class recognition accuracy. Extensive experiments on four benchmarks (VQA-CP v1, VQA-CP v2, VQA-CE, and SLAKE-CP) demonstrate our method’s superiority, with VQA-CP v1 and SLAKE-CP achieving state-of-the-art performance at 63.14% and 37.61% respectively. The code will be released at <span><span>https://github.com/songxdr3/DSI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131346"},"PeriodicalIF":7.5,"publicationDate":"2026-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCCL: Question-guided dual-channel contrastive learning framework for emotion-cause pair extraction 基于问题导向的双通道情感原因对提取对比学习框架
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-05-15 Epub Date: 2026-01-29 DOI: 10.1016/j.eswa.2026.131357
Hongyang Wang, Yajun Du, Jia Liu, Xianyong Li, Xiaoliang Chen, Yanli Lee, Qing Qi, Wanjie Zhang
The emotion-cause pair extraction (ECPE) task aims to identify emotion clauses and their corresponding cause clauses from document-level text. It has important applications in a wide range of scenarios, including public opinion monitoring and user feedback analysis. Although research has made initial progress on this task, existing methods still face challenges in identifying implicit emotions. Firstly, the lack of explicit semantic guidance leads to insufficient discriminative power, especially when dealing with ambiguous emotional expressions. Secondly, existing methods primarily focus on modeling intra-sentence relationships, which limits their ability to jointly capture cross-sentence temporal dependencies and global semantic information. To address the challenges of emotion-cause pair extraction, we propose a question-guided dual-channel contrastive learning framework, DCCL. Firstly, the DCCL employs a question formulation based on machine reading comprehension (MRC) to guide the model in capturing the emotion-cause relationship between clauses. Furthermore, task-specific queries are explicitly injected into the input, making the model more aware of the task objective. Secondly, in DCCL, we design a dual-channel network combining query-aware clause-level Transformer and BiLSTM to enhance the model’s ability to capture temporal and global contextual dependencies, which enables DCCL to capture the temporal and global contextual relationships between clauses more fully. Thirdly, the DCCL incorporates supervised contrastive learning. We leverage positive and negative samples to incorporate contrastive learning into each channel, which optimizes the representation space and enhances the model’s ability to recognize ambiguous emotions and boundary conditions. We conducted experiments on three mainstream tasks, namely emotion cause pair extraction, emotion extraction, and cause extraction, on the ECPE benchmark dataset. The results show that DCCL improves the F1 scores of the best baseline models such as CD-MRC, SEG, ect by 1.53%, 4.41%, respectively in the emotion-cause pair extraction task, 0.81%, 4.37%, respectively in the emotion extraction task, and 0.62%, 1.27%, respectively in the cause extraction task. Moreover, compared with the large language model baseline LLM-MTLN, DCCL further improves F1 by 2.48%, 4.50%, and 0.63% on these three tasks, respectively.
情感-原因对抽取(ECPE)任务旨在从文档级文本中识别情感子句及其对应的原因子句。它在广泛的场景中具有重要的应用,包括舆情监测和用户反馈分析。虽然研究在这项任务上取得了初步进展,但现有的方法在识别内隐情绪方面仍然面临挑战。首先,缺乏明确的语义引导导致辨别能力不足,尤其是在处理模棱两可的情绪表达时。其次,现有的方法主要关注句子内关系的建模,这限制了它们联合捕获跨句子时间依赖关系和全局语义信息的能力。为了解决情感-原因对提取的挑战,我们提出了一个问题引导的双通道对比学习框架,DCCL。首先,DCCL采用基于机器阅读理解(MRC)的问题提法来指导模型捕捉子句之间的情感-原因关系。此外,特定于任务的查询被显式地注入到输入中,使模型更加了解任务目标。其次,在DCCL中,我们设计了一个结合查询感知子句级Transformer和BiLSTM的双通道网络,以增强模型捕获时态和全局上下文依赖关系的能力,使DCCL能够更充分地捕获子句之间的时态和全局上下文关系。第三,DCCL结合了监督对比学习。我们利用正样本和负样本将对比学习纳入每个通道,这优化了表示空间,增强了模型识别模糊情绪和边界条件的能力。我们在ECPE基准数据集上对情感原因对提取、情感提取和原因提取三个主流任务进行了实验。结果表明,DCCL在情绪-原因对提取任务中分别提高了CD-MRC、SEG等最佳基线模型的F1得分1.53%、4.41%,在情绪提取任务中分别提高了0.81%、4.37%,在原因提取任务中分别提高了0.62%、1.27%。此外,与大型语言模型基线LLM-MTLN相比,DCCL在这三个任务上的F1分别提高了2.48%、4.50%和0.63%。
{"title":"DCCL: Question-guided dual-channel contrastive learning framework for emotion-cause pair extraction","authors":"Hongyang Wang,&nbsp;Yajun Du,&nbsp;Jia Liu,&nbsp;Xianyong Li,&nbsp;Xiaoliang Chen,&nbsp;Yanli Lee,&nbsp;Qing Qi,&nbsp;Wanjie Zhang","doi":"10.1016/j.eswa.2026.131357","DOIUrl":"10.1016/j.eswa.2026.131357","url":null,"abstract":"<div><div>The emotion-cause pair extraction (ECPE) task aims to identify emotion clauses and their corresponding cause clauses from document-level text. It has important applications in a wide range of scenarios, including public opinion monitoring and user feedback analysis. Although research has made initial progress on this task, existing methods still face challenges in identifying implicit emotions. Firstly, the lack of explicit semantic guidance leads to insufficient discriminative power, especially when dealing with ambiguous emotional expressions. Secondly, existing methods primarily focus on modeling intra-sentence relationships, which limits their ability to jointly capture cross-sentence temporal dependencies and global semantic information. To address the challenges of emotion-cause pair extraction, we propose a question-guided dual-channel contrastive learning framework, DCCL. Firstly, the DCCL employs a question formulation based on machine reading comprehension (MRC) to guide the model in capturing the emotion-cause relationship between clauses. Furthermore, task-specific queries are explicitly injected into the input, making the model more aware of the task objective. Secondly, in DCCL, we design a dual-channel network combining query-aware clause-level Transformer and BiLSTM to enhance the model’s ability to capture temporal and global contextual dependencies, which enables DCCL to capture the temporal and global contextual relationships between clauses more fully. Thirdly, the DCCL incorporates supervised contrastive learning. We leverage positive and negative samples to incorporate contrastive learning into each channel, which optimizes the representation space and enhances the model’s ability to recognize ambiguous emotions and boundary conditions. We conducted experiments on three mainstream tasks, namely emotion cause pair extraction, emotion extraction, and cause extraction, on the ECPE benchmark dataset. The results show that DCCL improves the F1 scores of the best baseline models such as CD-MRC, SEG, ect by 1.53%, 4.41%, respectively in the emotion-cause pair extraction task, 0.81%, 4.37%, respectively in the emotion extraction task, and 0.62%, 1.27%, respectively in the cause extraction task. Moreover, compared with the large language model baseline LLM-MTLN, DCCL further improves F1 by 2.48%, 4.50%, and 0.63% on these three tasks, respectively.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131357"},"PeriodicalIF":7.5,"publicationDate":"2026-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Expert Systems with Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1