Pub Date : 2026-05-25Epub Date: 2026-02-03DOI: 10.1016/j.eswa.2026.131495
Haochang Hao , Jun Huang , Shuzhen Rao
Multiplex heterogeneous graphs, characterized by various types of node and relation, often exhibit incomplete structures and missing attributes in real-world scenarios, posing significant challenges for effective representation learning. Although existing studies have explored either structure refinement or attribute completion independently, few have touched on their potential complementarity. In this work, we propose an alternating optimization framework for node representation learning in multiplex heterogeneous graphs. We propose an alternating optimization framework with three key innovations: (i) relation-aware dynamic structure learning guided by attribute similarity, (ii) multi-hop completion of missing attributes on the refined graphs, and (iii) a progressive alternating optimization strategy that couples the two modules so they bootstrap and denoise each other over rounds. Extensive experiments on multiple real-world heterogeneous graph datasets demonstrate that our framework achieves superior performance over state-of-the-art baselines, validating the effectiveness and robustness of progressive structure-attribute co-optimization in heterogeneous graph representation learning.
{"title":"Progressive alternating attribute-Structure optimization for multiplex heterogeneous graphs","authors":"Haochang Hao , Jun Huang , Shuzhen Rao","doi":"10.1016/j.eswa.2026.131495","DOIUrl":"10.1016/j.eswa.2026.131495","url":null,"abstract":"<div><div>Multiplex heterogeneous graphs, characterized by various types of node and relation, often exhibit incomplete structures and missing attributes in real-world scenarios, posing significant challenges for effective representation learning. Although existing studies have explored either structure refinement or attribute completion independently, few have touched on their potential complementarity. In this work, we propose an alternating optimization framework for node representation learning in multiplex heterogeneous graphs. We propose an alternating optimization framework with three key innovations: (i) relation-aware dynamic structure learning guided by attribute similarity, (ii) multi-hop completion of missing attributes on the refined graphs, and (iii) a progressive alternating optimization strategy that couples the two modules so they bootstrap and denoise each other over rounds. Extensive experiments on multiple real-world heterogeneous graph datasets demonstrate that our framework achieves superior performance over state-of-the-art baselines, validating the effectiveness and robustness of progressive structure-attribute co-optimization in heterogeneous graph representation learning.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131495"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-25Epub Date: 2026-02-05DOI: 10.1016/j.eswa.2026.131522
Huining Pei , Mingzhe Yang , Zhonghang Bai , Man Ding , Wen Li , Yuxin Cao , Yanjun Zhang
To address the low engineering feasibility of electric vehicle (EV) front face styling images generated by generative artificial intelligence (GenAI) tools such as Midjourney, this study proposes an innovative design method that integrates curve optimization with a collaborative evaluation system combining simulated and human experts. The method aims to enhance the manufacturability of AI-generated design schemes while efficiently transferring the styling genes of conventional fuel vehicles to EV front face styling design. First, the large language model ChatGPT-5.0 is employed to construct a styling semantic database based on six categories of conventional fuel vehicle front face datasets. Second, Midjourney is used to generate an initial EV front face styling dataset, and a production-ready styling dataset is subsequently constructed to provide engineering feasibility references for EV front face styling design. Third, “AI-generated curves” and “engineering reference curves” are fused at different ratios, and an EV front face styling scheme is generated using a curve blending algorithm optimized for the figure–ground relationship. Finally, an LLM-based collaborative evaluation system integrating simulated experts (via ChatGPT-5.0) and human experts is established to conduct quantitative evaluation and optimization of the schemes in terms of engineering feasibility and styling design metrics. A case study demonstrates that the optimized scheme’s engineering feasibility score is significantly improved from 2.3 to 7.1 (out of 10), while maintaining a high level of design creativity (7.5). The established LLM-based collaborative evaluation system achieved high inter-rater consistency in both engineering feasibility evaluation (ICC ≥ 0.9) and design creativity evaluation for EV front face styling schemes (ICC ≥ 0.85), effectively balancing engineering feasibility and design creativity in generative artificial intelligence-generated EV front face styling schemes. By constructing an AI-led, human-supervised hybrid design workflow, this method significantly enhances the engineering feasibility and design efficiency of generative AI in product styling design, providing a theoretical reference for achieving a balance between design innovation and engineering feasibility.
{"title":"A design method for electric vehicle front face styling: based on engineering feasibility optimization of GenAI-generated images","authors":"Huining Pei , Mingzhe Yang , Zhonghang Bai , Man Ding , Wen Li , Yuxin Cao , Yanjun Zhang","doi":"10.1016/j.eswa.2026.131522","DOIUrl":"10.1016/j.eswa.2026.131522","url":null,"abstract":"<div><div>To address the low engineering feasibility of electric vehicle (EV) front face styling images generated by generative artificial intelligence (GenAI) tools such as Midjourney, this study proposes an innovative design method that integrates curve optimization with a collaborative evaluation system combining simulated and human experts. The method aims to enhance the manufacturability of AI-generated design schemes while efficiently transferring the styling genes of conventional fuel vehicles to EV front face styling design. First, the large language model ChatGPT-5.0 is employed to construct a styling semantic database based on six categories of conventional fuel vehicle front face datasets. Second, Midjourney is used to generate an initial EV front face styling dataset, and a production-ready styling dataset is subsequently constructed to provide engineering feasibility references for EV front face styling design. Third, “AI-generated curves” and “engineering reference curves” are fused at different ratios, and an EV front face styling scheme is generated using a curve blending algorithm optimized for the figure–ground relationship. Finally, an LLM-based collaborative evaluation system integrating simulated experts (via ChatGPT-5.0) and human experts is established to conduct quantitative evaluation and optimization of the schemes in terms of engineering feasibility and styling design metrics. A case study demonstrates that the optimized scheme’s engineering feasibility score is significantly improved from 2.3 to 7.1 (out of 10), while maintaining a high level of design creativity (7.5). The established LLM-based collaborative evaluation system achieved high inter-rater consistency in both engineering feasibility evaluation (ICC ≥ 0.9) and design creativity evaluation for EV front face styling schemes (ICC ≥ 0.85), effectively balancing engineering feasibility and design creativity in generative artificial intelligence-generated EV front face styling schemes. By constructing an AI-led, human-supervised hybrid design workflow, this method significantly enhances the engineering feasibility and design efficiency of generative AI in product styling design, providing a theoretical reference for achieving a balance between design innovation and engineering feasibility.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131522"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-25Epub Date: 2026-02-06DOI: 10.1016/j.eswa.2026.131377
Berrouachedi Abdelkader, Jaziri Rakia, Bernard Gilles
Emotion recognition plays a crucial role in various biometric applications, including human-computer interaction, healthcare, and security. This paper presents CNN-DET, a novel hybrid approach that integrates Convolutional Neural Networks (CNNs) with Deep Extra-Trees (DETs) for robust facial emotion recognition. The proposed methodology leverages hierarchical feature extraction through pre-trained CNN models combined with ensemble-based classification using DETs to accurately detect and classify emotions from facial expressions. Comprehensive evaluation on benchmark datasets demonstrates the superior performance of our approach. On the FER-2013 dataset, CNN-DET achieves 98.16% accuracy in 10-fold cross-validation and 85.32% accuracy on the standard test set, with precision of 85.7%, recall of 85.3%, and F1-score of 85.4%. The model maintains strong performance across diverse conditions, achieving 91.2% accuracy on AffectNet and 89.7% accuracy on RAF-DB, confirming its generalization capability. Extensive experiments reveal that our method reduces misclassification between visually similar emotions by 23.4% compared to traditional CNN approaches and shows 15.8% improvement in robustness under varying lighting conditions. The proposed approach not only accurately recognizes emotions but also demonstrates consistent performance across different demographic groups, with less than 3.2% performance variance across age and ethnicity subgroups. These findings highlight the significant potential of deep learning techniques for emotion recognition in biometric applications, providing valuable insights for developing more intelligent and interactive systems. Future research will focus on multimodal data fusion and temporal modeling to further enhance recognition accuracy and real-time performance.
{"title":"CNN-DET: A hybrid deep learning architecture for emotion recognition","authors":"Berrouachedi Abdelkader, Jaziri Rakia, Bernard Gilles","doi":"10.1016/j.eswa.2026.131377","DOIUrl":"10.1016/j.eswa.2026.131377","url":null,"abstract":"<div><div>Emotion recognition plays a crucial role in various biometric applications, including human-computer interaction, healthcare, and security. This paper presents CNN-DET, a novel hybrid approach that integrates Convolutional Neural Networks (CNNs) with Deep Extra-Trees (DETs) for robust facial emotion recognition. The proposed methodology leverages hierarchical feature extraction through pre-trained CNN models combined with ensemble-based classification using DETs to accurately detect and classify emotions from facial expressions. Comprehensive evaluation on benchmark datasets demonstrates the superior performance of our approach. On the FER-2013 dataset, CNN-DET achieves 98.16% accuracy in 10-fold cross-validation and 85.32% accuracy on the standard test set, with precision of 85.7%, recall of 85.3%, and F1-score of 85.4%. The model maintains strong performance across diverse conditions, achieving 91.2% accuracy on AffectNet and 89.7% accuracy on RAF-DB, confirming its generalization capability. Extensive experiments reveal that our method reduces misclassification between visually similar emotions by 23.4% compared to traditional CNN approaches and shows 15.8% improvement in robustness under varying lighting conditions. The proposed approach not only accurately recognizes emotions but also demonstrates consistent performance across different demographic groups, with less than 3.2% performance variance across age and ethnicity subgroups. These findings highlight the significant potential of deep learning techniques for emotion recognition in biometric applications, providing valuable insights for developing more intelligent and interactive systems. Future research will focus on multimodal data fusion and temporal modeling to further enhance recognition accuracy and real-time performance.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131377"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Global ship path planning in complex maritime environments is challenged by dynamic disturbances, vessel-specific constraints, and long-range trajectory dependencies. This study develops an integrated hybrid planning framework that combines deep generative modeling with rule-based optimization. Automatic identification system trajectory time series are first transformed into Gramian Angular Field images to enhance spatio-temporal feature extraction. Vessel type and length are encoded as one-hot vectors and introduced as conditional variables, enabling personalized path generation. These inputs are processed by a Multi-Head Attention–based Conditional Wasserstein Generative Adversarial Network with Gradient Penalty (MHA-cWGAN-GP), in which multi-head attention is used to model long-range dependencies, and conditional Generative Adversarial Network (cGAN) training together with a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) objective is adopted to improve conditioning behavior and training robustness. The model generates initial navigation paths, which are further refined using an A* search procedure that incorporates wind and current disturbances, as well as constraints such as static obstacles, water depth, and Traffic Separation Scheme (TSS) regulations. The final path is smoothed to ensure feasibility and compliance. In case studies for the Ningbo–Zhoushan Port and Yangtze River Estuary, the hybrid planner reduces the number of search nodes from 45 to 57 to 29–35 while simultaneously enforcing TSS, water-depth, wind, and current constraints, with only about a 3–4% increase in path length relative to classical A* and Dijkstra algorithms. The results indicate that the proposed framework effectively integrates learning and optimization, offering a practical and intelligent solution for real-world maritime path planning.
{"title":"Hybrid intelligence–driven global path planning for ships in complex maritime environments","authors":"Jiao Liu , Kaige Zhu , Yuanqiang Zhang , Miao Gao , Pengjun Zheng","doi":"10.1016/j.eswa.2026.131473","DOIUrl":"10.1016/j.eswa.2026.131473","url":null,"abstract":"<div><div>Global ship path planning in complex maritime environments is challenged by dynamic disturbances, vessel-specific constraints, and long-range trajectory dependencies. This study develops an integrated hybrid planning framework that combines deep generative modeling with rule-based optimization. Automatic identification system trajectory time series are first transformed into Gramian Angular Field images to enhance spatio-temporal feature extraction. Vessel type and length are encoded as one-hot vectors and introduced as conditional variables, enabling personalized path generation. These inputs are processed by a Multi-Head Attention–based Conditional Wasserstein Generative Adversarial Network with Gradient Penalty (MHA-cWGAN-GP), in which multi-head attention is used to model long-range dependencies, and conditional Generative Adversarial Network (cGAN) training together with a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) objective is adopted to improve conditioning behavior and training robustness. The model generates initial navigation paths, which are further refined using an A* search procedure that incorporates wind and current disturbances, as well as constraints such as static obstacles, water depth, and Traffic Separation Scheme (TSS) regulations. The final path is smoothed to ensure feasibility and compliance. In case studies for the Ningbo–Zhoushan Port and Yangtze River Estuary, the hybrid planner reduces the number of search nodes from 45 to 57 to 29–35 while simultaneously enforcing TSS, water-depth, wind, and current constraints, with only about a 3–4% increase in path length relative to classical A* and Dijkstra algorithms. The results indicate that the proposed framework effectively integrates learning and optimization, offering a practical and intelligent solution for real-world maritime path planning.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131473"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-25Epub Date: 2026-02-04DOI: 10.1016/j.eswa.2026.131465
Zhulin Ji , Shenghai Liao , Ruyi Han , Shujun Fu
Compressive sensing (CS) enables accurate reconstruction of images from significantly fewer measurements than required by the Nyquist-Shannon sampling theorem, relying critically on effective image priors to regularize the ill-posed inverse problem. Conventional patch-based sparse representation utilize fixed dictionaries that are learned off-the-shelf using the K-SVD algorithm. However, patch-based sparse representation ignores the relationship among patches, and the learned dictionaries can not capture the global image statistics, which will lead to suboptimal reconstruction performance. In this paper, we exploit group sparse representation (GSR) for image compressive sensing reconstruction. By clustering non-local image patches into group and regarding each group as a unit, group sparse representation simultaneously finding sparse codes for all patches within a group, leading to improved reconstruction fidelity and edge preservation. However, GSR relies solely on the undersampled image itself to construct dictionary that is not learnable, being increasingly unreliable at low compressive sensing rates where substantial loss of local image information occurs. To address this limitation, we propose a Deep Prior guided Group Sparse Representation (DPGSR) model for compressive image restoration, where a deep denoiser is responsible for capturing and learning both local and global image statistics by training on external data. The proposed DPGSR achieves improved global consistency, effectively reducing block artifacts while preserving sharper local details. Extensive experiments on image compressive sensing reconstruction and fast MRI demonstrate that the proposed method outperforms state-of-the-art approaches, particularly in preserving fine details and reducing over-smoothing artifacts.
{"title":"Compressive sensing image restoration with deep prior guided group sparse representation","authors":"Zhulin Ji , Shenghai Liao , Ruyi Han , Shujun Fu","doi":"10.1016/j.eswa.2026.131465","DOIUrl":"10.1016/j.eswa.2026.131465","url":null,"abstract":"<div><div>Compressive sensing (CS) enables accurate reconstruction of images from significantly fewer measurements than required by the Nyquist-Shannon sampling theorem, relying critically on effective image priors to regularize the ill-posed inverse problem. Conventional patch-based sparse representation utilize fixed dictionaries that are learned off-the-shelf using the K-SVD algorithm. However, patch-based sparse representation ignores the relationship among patches, and the learned dictionaries can not capture the global image statistics, which will lead to suboptimal reconstruction performance. In this paper, we exploit group sparse representation (GSR) for image compressive sensing reconstruction. By clustering non-local image patches into group and regarding each group as a unit, group sparse representation simultaneously finding sparse codes for all patches within a group, leading to improved reconstruction fidelity and edge preservation. However, GSR relies solely on the undersampled image itself to construct dictionary that is not learnable, being increasingly unreliable at low compressive sensing rates where substantial loss of local image information occurs. To address this limitation, we propose a Deep Prior guided Group Sparse Representation (DPGSR) model for compressive image restoration, where a deep denoiser is responsible for capturing and learning both local and global image statistics by training on external data. The proposed DPGSR achieves improved global consistency, effectively reducing block artifacts while preserving sharper local details. Extensive experiments on image compressive sensing reconstruction and fast MRI demonstrate that the proposed method outperforms state-of-the-art approaches, particularly in preserving fine details and reducing over-smoothing artifacts.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131465"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-25Epub Date: 2026-02-04DOI: 10.1016/j.eswa.2026.131506
Xiyang Kuang , Bin Yang , Bingo Wing-Kuen Ling , Kok Lay Teo , Xiaozhi Zhang
Multimodal learning has played a pivotal role in survival prediction, particularly in integrating pathological images and genomic data for improving predictive performance. Pathological images provide macroscopic histological information about tumor morphology, while genomic data reveal molecular-level genetic characteristics. The integration of these two modalities enables a comprehensive characterization of tumor heterogeneity and disease progression mechanisms. Despite recent advances in multimodal integration that have significantly enhanced prognostic accuracy, challenges remain in effectively analyzing high-dimensional and heterogeneous whole-slide images and omics data. Current Transformer-based sequence modeling approaches suffer from limited computational efficiency when processing long feature sequences and capturing complex cross-modal interactions. To address these challenges, we propose an innovative cross-modal receptance weighted key-value (RWKV)-based framework, termed Surv-RWKV, for survival prediction. This framework integrates RWKV-based sequence modeling with advanced multimodal fusion strategies to enhance both predictive accuracy and model efficiency. Specifically, Surv-RWKV employs parallel RWKV-based encoders to model long-range dependencies in WSI tissue cluster patterns and genomic pathway activation profiles, achieving improved prognostic performance with optimized computational efficiency. Subsequently, a transport-based optimal cross-modal alignment module is introduced to establish semantic correspondences between histopathological and genomic feature spaces. Furthermore, a progressive feature fusion strategy is implemented to enable effective cross-modal interaction. An RWKV-based shallow fusion module is first developed to explore cross-modal dependencies through spatial-channel hybrid operations, thereby enhancing the representational quality of fused features. A cross-RWKV deep interaction module is then designed to further strengthen information synthesis via iterative cross-attention mechanisms, while simultaneously reinforcing intra-modal representation learning and cross-modal knowledge transfer. Surv-RWKV is expected to effectively capture such cross-modal correlations, thereby improving the accuracy and interpretability of survival predictions. Extensive validation across five TCGA cancer cohorts demonstrates that Surv-RWKV achieves state-of-the-art predictive performance with superior computational efficiency.
{"title":"Surv-RWKV: Cross-modal receptance weighted key-value interaction with optimal transport feature alignment for survival analysis","authors":"Xiyang Kuang , Bin Yang , Bingo Wing-Kuen Ling , Kok Lay Teo , Xiaozhi Zhang","doi":"10.1016/j.eswa.2026.131506","DOIUrl":"10.1016/j.eswa.2026.131506","url":null,"abstract":"<div><div>Multimodal learning has played a pivotal role in survival prediction, particularly in integrating pathological images and genomic data for improving predictive performance. Pathological images provide macroscopic histological information about tumor morphology, while genomic data reveal molecular-level genetic characteristics. The integration of these two modalities enables a comprehensive characterization of tumor heterogeneity and disease progression mechanisms. Despite recent advances in multimodal integration that have significantly enhanced prognostic accuracy, challenges remain in effectively analyzing high-dimensional and heterogeneous whole-slide images and omics data. Current Transformer-based sequence modeling approaches suffer from limited computational efficiency when processing long feature sequences and capturing complex cross-modal interactions. To address these challenges, we propose an innovative cross-modal receptance weighted key-value (RWKV)-based framework, termed Surv-RWKV, for survival prediction. This framework integrates RWKV-based sequence modeling with advanced multimodal fusion strategies to enhance both predictive accuracy and model efficiency. Specifically, Surv-RWKV employs parallel RWKV-based encoders to model long-range dependencies in WSI tissue cluster patterns and genomic pathway activation profiles, achieving improved prognostic performance with optimized computational efficiency. Subsequently, a transport-based optimal cross-modal alignment module is introduced to establish semantic correspondences between histopathological and genomic feature spaces. Furthermore, a progressive feature fusion strategy is implemented to enable effective cross-modal interaction. An RWKV-based shallow fusion module is first developed to explore cross-modal dependencies through spatial-channel hybrid operations, thereby enhancing the representational quality of fused features. A cross-RWKV deep interaction module is then designed to further strengthen information synthesis via iterative cross-attention mechanisms, while simultaneously reinforcing intra-modal representation learning and cross-modal knowledge transfer. Surv-RWKV is expected to effectively capture such cross-modal correlations, thereby improving the accuracy and interpretability of survival predictions. Extensive validation across five TCGA cancer cohorts demonstrates that Surv-RWKV achieves state-of-the-art predictive performance with superior computational efficiency.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131506"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-25Epub Date: 2026-02-04DOI: 10.1016/j.eswa.2026.131528
Junwei Tang , Xiaomei Tian , Tao Peng , Jianfeng Lu , Haozhao Wang , Ruixuan Li
Currently, static analysis is insufficient to deal with Android malware that employs advanced evasion techniques such as code obfuscation and dynamic loading. Therefore, hybrid analysis that combines static structure and dynamic behavior has become the mainstream trend. However, existing hybrid analysis methods often adopt simple feature concatenation or shallow fusion mechanisms, which cannot effectively integrate heterogeneous static and dynamic features or capture the complex correlations between structure and behavior. To address this, we propose a hybrid heterogeneous graph-based Android malware detection method via multi-evidence similarity fusion, named HHGDroid. The function call graph generated by static analysis and the event graph obtained through dynamic analysis are connected through a comprehensive similarity of multiple evidences such as semantics, permissions, and time frequency, ultimately forming the hybrid heterogeneous graph with multiple heterogeneous nodes and edges. Our constructed hybrid heterogeneous graph is the first one that simultaneously possesses static and dynamic features. Finally, we improve Reliability-Calibrated Heterogeneous Graph Transformer (RCHGT) to learn the multiple relationships in the hybrid heterogeneous graph, which can automatically distinguish reliable and unreliable edges during the information propagation stage. We conduct experiments on real Android malware applications and achieved an F1-score of 97.87%, outperforming the state-of-the-art methods. Additionally, we verify our method on an unknown malware dataset and obtained an F1-score of 81.52%, which is superior to existing methods. HHGDroid is a novel and effective method for detecting Android malware.
{"title":"HHGDroid: Hybrid heterogeneous graph-based android malware detection via multi-evidence similarity fusion","authors":"Junwei Tang , Xiaomei Tian , Tao Peng , Jianfeng Lu , Haozhao Wang , Ruixuan Li","doi":"10.1016/j.eswa.2026.131528","DOIUrl":"10.1016/j.eswa.2026.131528","url":null,"abstract":"<div><div>Currently, static analysis is insufficient to deal with Android malware that employs advanced evasion techniques such as code obfuscation and dynamic loading. Therefore, hybrid analysis that combines static structure and dynamic behavior has become the mainstream trend. However, existing hybrid analysis methods often adopt simple feature concatenation or shallow fusion mechanisms, which cannot effectively integrate heterogeneous static and dynamic features or capture the complex correlations between structure and behavior. To address this, we propose a hybrid heterogeneous graph-based Android malware detection method via multi-evidence similarity fusion, named HHGDroid. The function call graph generated by static analysis and the event graph obtained through dynamic analysis are connected through a comprehensive similarity of multiple evidences such as semantics, permissions, and time frequency, ultimately forming the hybrid heterogeneous graph with multiple heterogeneous nodes and edges. Our constructed hybrid heterogeneous graph is the first one that simultaneously possesses static and dynamic features. Finally, we improve Reliability-Calibrated Heterogeneous Graph Transformer (RCHGT) to learn the multiple relationships in the hybrid heterogeneous graph, which can automatically distinguish reliable and unreliable edges during the information propagation stage. We conduct experiments on real Android malware applications and achieved an F1-score of 97.87%, outperforming the state-of-the-art methods. Additionally, we verify our method on an unknown malware dataset and obtained an F1-score of 81.52%, which is superior to existing methods. HHGDroid is a novel and effective method for detecting Android malware.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131528"},"PeriodicalIF":7.5,"publicationDate":"2026-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146192853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-15Epub Date: 2026-01-21DOI: 10.1016/j.eswa.2026.131199
Rong Xie , Zhong Chen , Weiguo Cao , Haosen Wang
Federated learning enables collaborative training without sharing raw data, while addressing growing privacy concerns. Real deployments face wide device heterogeneity that undermines both efficiency and accuracy in multi sensor information fusion. We present FSENNL, a federated framework with a self expanding neural network that adapts model capacity to each device. It adjusts capacity dynamically while leaving communication unchanged. A natural extension score combines Fisher information with device profiles to decide when and where to expand. An adaptive regularization term stabilizes newly added units and prevents over extension. To align structurally diverse models during aggregation, an adaptive pruning compensation step uses Optimal Brain Surgeon with lightweight compensation data to recover accuracy after alignment. Knowledge distillation with an asynchronous fusion protocol mitigates straggler effects from uneven training speeds. Decoupling update frequency through teacher and student roles supports timely aggregation and cross device knowledge transfer while preserving convergence. Experiments across heterogeneous settings show consistent accuracy with improved resource use, and demonstrate that the method scales to large federations. FSENNL provides a practical solution for multi sensor information fusion in federated systems, delivering scalable and efficient models under diverse computational constraints.
{"title":"Federated self-Expanding neural network learning framework for heterogeneous devices","authors":"Rong Xie , Zhong Chen , Weiguo Cao , Haosen Wang","doi":"10.1016/j.eswa.2026.131199","DOIUrl":"10.1016/j.eswa.2026.131199","url":null,"abstract":"<div><div>Federated learning enables collaborative training without sharing raw data, while addressing growing privacy concerns. Real deployments face wide device heterogeneity that undermines both efficiency and accuracy in multi sensor information fusion. We present FSENNL, a federated framework with a self expanding neural network that adapts model capacity to each device. It adjusts capacity dynamically while leaving communication unchanged. A natural extension score combines Fisher information with device profiles to decide when and where to expand. An adaptive regularization term stabilizes newly added units and prevents over extension. To align structurally diverse models during aggregation, an adaptive pruning compensation step uses Optimal Brain Surgeon with lightweight compensation data to recover accuracy after alignment. Knowledge distillation with an asynchronous fusion protocol mitigates straggler effects from uneven training speeds. Decoupling update frequency through teacher and student roles supports timely aggregation and cross device knowledge transfer while preserving convergence. Experiments across heterogeneous settings show consistent accuracy with improved resource use, and demonstrate that the method scales to large federations. FSENNL provides a practical solution for multi sensor information fusion in federated systems, delivering scalable and efficient models under diverse computational constraints.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131199"},"PeriodicalIF":7.5,"publicationDate":"2026-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-05-15Epub Date: 2026-01-27DOI: 10.1016/j.eswa.2026.131346
Runmin Wang , Xingdong Song , Zukun Wan , Han Xu , Congzhen Yu , Tianming Ma , Yajun Ding , Shengyou Qian
Visual Question Answering (VQA) evaluates the visual-textual reasoning capabilities of intelligent agents. However, existing methods are often susceptible to various biases. In particular, language bias leads models to rely on spurious question-answer correlations as shortcut solutions, while distribution bias caused by dataset imbalance encourages models to overfit head classes and overlook tail classes. To address these long-standing challenges, we propose a Dual-Space Intervention (DSI) approach that tackles these two biases from a unified yet complementary perspective. Two key innovations are included in our work: (1) In the input space, we adopt an adaptive question shuffling strategy to alleviate language bias by adjusting perturbation strength according to question bias, ensuring models develop a deeper understanding of the problem context, rather than relying on spurious word-answer correlations; (2) In the output space, we propose a novel label rebalancing mechanism that moderates head-class dominance based on long-tailed statistics, improving robustness to distribution bias. This approach reduces the disproportionately high variance in head logits relative to tail logits, improving tail class recognition accuracy. Extensive experiments on four benchmarks (VQA-CP v1, VQA-CP v2, VQA-CE, and SLAKE-CP) demonstrate our method’s superiority, with VQA-CP v1 and SLAKE-CP achieving state-of-the-art performance at 63.14% and 37.61% respectively. The code will be released at https://github.com/songxdr3/DSI.
{"title":"Dual-space intervention for mitigating bias in robust visual question answering","authors":"Runmin Wang , Xingdong Song , Zukun Wan , Han Xu , Congzhen Yu , Tianming Ma , Yajun Ding , Shengyou Qian","doi":"10.1016/j.eswa.2026.131346","DOIUrl":"10.1016/j.eswa.2026.131346","url":null,"abstract":"<div><div>Visual Question Answering (VQA) evaluates the visual-textual reasoning capabilities of intelligent agents. However, existing methods are often susceptible to various biases. In particular, language bias leads models to rely on spurious question-answer correlations as shortcut solutions, while distribution bias caused by dataset imbalance encourages models to overfit head classes and overlook tail classes. To address these long-standing challenges, we propose a Dual-Space Intervention (DSI) approach that tackles these two biases from a unified yet complementary perspective. Two key innovations are included in our work: (1) In the input space, we adopt an adaptive question shuffling strategy to alleviate language bias by adjusting perturbation strength according to question bias, ensuring models develop a deeper understanding of the problem context, rather than relying on spurious word-answer correlations; (2) In the output space, we propose a novel label rebalancing mechanism that moderates head-class dominance based on long-tailed statistics, improving robustness to distribution bias. This approach reduces the disproportionately high variance in head logits relative to tail logits, improving tail class recognition accuracy. Extensive experiments on four benchmarks (VQA-CP v1, VQA-CP v2, VQA-CE, and SLAKE-CP) demonstrate our method’s superiority, with VQA-CP v1 and SLAKE-CP achieving state-of-the-art performance at 63.14% and 37.61% respectively. The code will be released at <span><span>https://github.com/songxdr3/DSI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131346"},"PeriodicalIF":7.5,"publicationDate":"2026-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The emotion-cause pair extraction (ECPE) task aims to identify emotion clauses and their corresponding cause clauses from document-level text. It has important applications in a wide range of scenarios, including public opinion monitoring and user feedback analysis. Although research has made initial progress on this task, existing methods still face challenges in identifying implicit emotions. Firstly, the lack of explicit semantic guidance leads to insufficient discriminative power, especially when dealing with ambiguous emotional expressions. Secondly, existing methods primarily focus on modeling intra-sentence relationships, which limits their ability to jointly capture cross-sentence temporal dependencies and global semantic information. To address the challenges of emotion-cause pair extraction, we propose a question-guided dual-channel contrastive learning framework, DCCL. Firstly, the DCCL employs a question formulation based on machine reading comprehension (MRC) to guide the model in capturing the emotion-cause relationship between clauses. Furthermore, task-specific queries are explicitly injected into the input, making the model more aware of the task objective. Secondly, in DCCL, we design a dual-channel network combining query-aware clause-level Transformer and BiLSTM to enhance the model’s ability to capture temporal and global contextual dependencies, which enables DCCL to capture the temporal and global contextual relationships between clauses more fully. Thirdly, the DCCL incorporates supervised contrastive learning. We leverage positive and negative samples to incorporate contrastive learning into each channel, which optimizes the representation space and enhances the model’s ability to recognize ambiguous emotions and boundary conditions. We conducted experiments on three mainstream tasks, namely emotion cause pair extraction, emotion extraction, and cause extraction, on the ECPE benchmark dataset. The results show that DCCL improves the F1 scores of the best baseline models such as CD-MRC, SEG, ect by 1.53%, 4.41%, respectively in the emotion-cause pair extraction task, 0.81%, 4.37%, respectively in the emotion extraction task, and 0.62%, 1.27%, respectively in the cause extraction task. Moreover, compared with the large language model baseline LLM-MTLN, DCCL further improves F1 by 2.48%, 4.50%, and 0.63% on these three tasks, respectively.
{"title":"DCCL: Question-guided dual-channel contrastive learning framework for emotion-cause pair extraction","authors":"Hongyang Wang, Yajun Du, Jia Liu, Xianyong Li, Xiaoliang Chen, Yanli Lee, Qing Qi, Wanjie Zhang","doi":"10.1016/j.eswa.2026.131357","DOIUrl":"10.1016/j.eswa.2026.131357","url":null,"abstract":"<div><div>The emotion-cause pair extraction (ECPE) task aims to identify emotion clauses and their corresponding cause clauses from document-level text. It has important applications in a wide range of scenarios, including public opinion monitoring and user feedback analysis. Although research has made initial progress on this task, existing methods still face challenges in identifying implicit emotions. Firstly, the lack of explicit semantic guidance leads to insufficient discriminative power, especially when dealing with ambiguous emotional expressions. Secondly, existing methods primarily focus on modeling intra-sentence relationships, which limits their ability to jointly capture cross-sentence temporal dependencies and global semantic information. To address the challenges of emotion-cause pair extraction, we propose a question-guided dual-channel contrastive learning framework, DCCL. Firstly, the DCCL employs a question formulation based on machine reading comprehension (MRC) to guide the model in capturing the emotion-cause relationship between clauses. Furthermore, task-specific queries are explicitly injected into the input, making the model more aware of the task objective. Secondly, in DCCL, we design a dual-channel network combining query-aware clause-level Transformer and BiLSTM to enhance the model’s ability to capture temporal and global contextual dependencies, which enables DCCL to capture the temporal and global contextual relationships between clauses more fully. Thirdly, the DCCL incorporates supervised contrastive learning. We leverage positive and negative samples to incorporate contrastive learning into each channel, which optimizes the representation space and enhances the model’s ability to recognize ambiguous emotions and boundary conditions. We conducted experiments on three mainstream tasks, namely emotion cause pair extraction, emotion extraction, and cause extraction, on the ECPE benchmark dataset. The results show that DCCL improves the F1 scores of the best baseline models such as CD-MRC, SEG, ect by 1.53%, 4.41%, respectively in the emotion-cause pair extraction task, 0.81%, 4.37%, respectively in the emotion extraction task, and 0.62%, 1.27%, respectively in the cause extraction task. Moreover, compared with the large language model baseline LLM-MTLN, DCCL further improves F1 by 2.48%, 4.50%, and 0.63% on these three tasks, respectively.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"311 ","pages":"Article 131357"},"PeriodicalIF":7.5,"publicationDate":"2026-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}