Pub Date : 2026-01-22DOI: 10.1016/j.eswa.2026.131319
Zhaoli Zhang , Jiahao Li , Hai Liu , Erqi Zhang , Tingting Liu , Minhong Wang
Knowledge tracing (KT), is a crucial task in educational data mining that aims to model the state of learners’ knowledge by analyzing their behavioral data in real time. However, unlike the dynamic evolution process of learners’ knowledge internalization in KT modeling, the intrinsic features associated with exercises and knowledge components (KCs) remain static. Many existing models overlook this distinction and fail to implement differentiated feature processing. Additionally, the rapidly expanding volume of data in online learning platforms poses new challenges to model performance and efficiency. To address these issues, we propose DRKT, a new model that employs intrinsic information mining (IIM) module to extract inherent feature information from exercises and KCs. We also utilize the Mamba network to capture learner-exercise interaction patterns and achieve a balance between performance and efficiency. Furthermore, we introduce a double matrix dynamic update (DMDU) strategy to differentially model the complex dynamics of knowledge internalization and the inherent invariability of exercises and KCs. Experimental results on four real-world educational datasets demonstrate that DRKT outperforms existing methods in predictive accuracy, resource consumption, and time complexity, providing effective technical support for pedagogical interventions and personalized learning recommendations.
{"title":"DRKT: Learning differential relationships for efficient knowledge tracing with learner’s knowledge internalization representation","authors":"Zhaoli Zhang , Jiahao Li , Hai Liu , Erqi Zhang , Tingting Liu , Minhong Wang","doi":"10.1016/j.eswa.2026.131319","DOIUrl":"10.1016/j.eswa.2026.131319","url":null,"abstract":"<div><div>Knowledge tracing (KT), is a crucial task in educational data mining that aims to model the state of learners’ knowledge by analyzing their behavioral data in real time. However, unlike the dynamic evolution process of learners’ knowledge internalization in KT modeling, the intrinsic features associated with exercises and knowledge components (KCs) remain static. Many existing models overlook this distinction and fail to implement differentiated feature processing. Additionally, the rapidly expanding volume of data in online learning platforms poses new challenges to model performance and efficiency. To address these issues, we propose DRKT, a new model that employs intrinsic information mining (IIM) module to extract inherent feature information from exercises and KCs. We also utilize the Mamba network to capture learner-exercise interaction patterns and achieve a balance between performance and efficiency. Furthermore, we introduce a double matrix dynamic update (DMDU) strategy to differentially model the complex dynamics of knowledge internalization and the inherent invariability of exercises and KCs. Experimental results on four real-world educational datasets demonstrate that DRKT outperforms existing methods in predictive accuracy, resource consumption, and time complexity, providing effective technical support for pedagogical interventions and personalized learning recommendations.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131319"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Oblique predictive clustering trees (SPYCTs) are semi-supervised multi-target prediction models mainly used for structured output prediction (SOP) problems. They are computationally efficient and when combined in ensembles they achieve state-of-the-art results. However, one major issue is that it is challenging to interpret an ensemble of SPYCTs without the use of a model-agnostic method. We propose variational oblique predictive clustering trees, which address this challenge. The parameters of each split node are treated as random variables, described with a probability distribution, and they are learned through the Variational Bayes method. We evaluate the model on several benchmark datasets of different sizes. The experimental analyses show that a single variational oblique predictive clustering tree (VSPYCT) achieves competitive, and sometimes better predictive performance than the ensemble of standard SPYCTs. We also present a method for extracting feature importance scores from the model. Finally, we present a method to visually interpret the model’s decision making process through analysis of the relative feature importance in each split node.
{"title":"Variational oblique predictive clustering trees","authors":"Viktor Andonovikj , Sašo Džeroski , Biljana Mileva Boshkoska , Pavle Boškoski","doi":"10.1016/j.eswa.2026.131255","DOIUrl":"10.1016/j.eswa.2026.131255","url":null,"abstract":"<div><div>Oblique predictive clustering trees (SPYCTs) are semi-supervised multi-target prediction models mainly used for structured output prediction (SOP) problems. They are computationally efficient and when combined in ensembles they achieve state-of-the-art results. However, one major issue is that it is challenging to interpret an ensemble of SPYCTs without the use of a model-agnostic method. We propose variational oblique predictive clustering trees, which address this challenge. The parameters of each split node are treated as random variables, described with a probability distribution, and they are learned through the Variational Bayes method. We evaluate the model on several benchmark datasets of different sizes. The experimental analyses show that a single variational oblique predictive clustering tree (VSPYCT) achieves competitive, and sometimes better predictive performance than the ensemble of standard SPYCTs. We also present a method for extracting feature importance scores from the model. Finally, we present a method to visually interpret the model’s decision making process through analysis of the relative feature importance in each split node.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131255"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.eswa.2026.131279
Saira Mudassar , Aneela Zameer , Muhammad Asif Zahoor Raja
Accurate wind power forecasting is crucial for effectively integrating renewable energy into the electric grid, enabling the optimal utilization of generated clean energy. The intermittent nature of wind and the operational complexity of sustainable energy systems make prediction a highly challenging task. This study combines the strengths of a metaheuristic algorithm, grey wolf optimization (GWO), for feature selection with a time-series multivariate forecasting model, gated recurrent unit (GRU), along with a double attention mechanism (DAGRU) for effective, precise, and efficient predictions. The proposed model, GWO-DAGRU, is a short-term wind power forecasting model that integrates grey wolf optimization for feature selection with a double attention gated recurrent unit for time-series prediction. GWO, combined with an XGBoost regressor, is first used to identify key input features and refined by a double attention mechanism in DAGRU to capture temporal dependencies more effectively. The proposed approach is validated on data from seven European wind farms and further tested on the ELIA dataset to assess generalization capability. Performance is benchmarked using error metrics and statistical validation through the Wilcoxon signed-rank test at a 95% confidence level. The findings demonstrate that GWO-DAGRU achieves superior accuracy and robustness, outperforming several existing forecasting methods for efficient management and planning of a sustainable energy support system.
{"title":"GWO-DAGRU: A hybrid deep learning framework with metaheuristic feature selection and self-weighted context GRU for short-term wind power forecast","authors":"Saira Mudassar , Aneela Zameer , Muhammad Asif Zahoor Raja","doi":"10.1016/j.eswa.2026.131279","DOIUrl":"10.1016/j.eswa.2026.131279","url":null,"abstract":"<div><div>Accurate wind power forecasting is crucial for effectively integrating renewable energy into the electric grid, enabling the optimal utilization of generated clean energy. The intermittent nature of wind and the operational complexity of sustainable energy systems make prediction a highly challenging task. This study combines the strengths of a metaheuristic algorithm, grey wolf optimization (GWO), for feature selection with a time-series multivariate forecasting model, gated recurrent unit (GRU), along with a double attention mechanism (DAGRU) for effective, precise, and efficient predictions. The proposed model, GWO-DAGRU, is a short-term wind power forecasting model that integrates grey wolf optimization for feature selection with a double attention gated recurrent unit for time-series prediction. GWO, combined with an XGBoost regressor, is first used to identify key input features and refined by a double attention mechanism in DAGRU to capture temporal dependencies more effectively. The proposed approach is validated on data from seven European wind farms and further tested on the ELIA dataset to assess generalization capability. Performance is benchmarked using error metrics and statistical validation through the Wilcoxon signed-rank test at a 95% confidence level. The findings demonstrate that GWO-DAGRU achieves superior accuracy and robustness, outperforming several existing forecasting methods for efficient management and planning of a sustainable energy support system.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131279"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146025743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.eswa.2026.131302
Héctor Penadés, Félix Escalona, Miguel Cazorla
The task of face re-identification seeks to match identities across images captured under varying conditions. In conventional single-registration scenarios, only one real image per subject is available during inference, limiting the discriminative capability of the embedding. Advances in synthetic data present new opportunities for improving recognition systems, particularly as privacy concerns restrict data availability. We propose a novel method that leverages identity-guided synthetic augmentation to enrich facial representations at inference time. Unlike traditional data augmentation, it enhances embeddings through sample aggregation, introducing an inference-time paradigm for representation enrichment without expanding the training set or retraining existing models. Using Arc2Face, we generate diverse, identity-consistent synthetic images from each real sample, synthesizing multiple facial variations to approximate the distributional space around each identity. A non-parametric analysis of ten embedding fusion strategies showed consistent improvements over the baselines, with the Mean, Median, and hybrid Mean-Median (Meta-MM) achieving the best performance and Meta-MM showing the lowest variability across models. Experiments demonstrated consistent improvements across re-identification and verification settings. On Labeled Faces in the Wild (LFW) dataset, Rank-1 accuracy improved by an average of 6.97 points and mean Average Precision (mAP) by 5.82 and 8.10 points. On the Surveillance Cameras Face (SCFace) dataset, a low-quality, cross-distance dataset, Rank-1 gains ranged from 10.98 to 31.33 points. On the Cross-Pose LFW (CPLFW) verification benchmark, accuracy generally matched or exceeded AdaFace baselines, with gains of up to 5.57 points. Incorporating latent consistency models with low-rank adaptation (LCM-LoRA) accelerated sample generation tenfold, making the framework suitable for large-scale applications.
人脸再识别的任务是在不同条件下捕获的图像中匹配身份。在传统的单配准场景中,在推理过程中每个受试者只有一张真实图像,限制了嵌入的判别能力。合成数据的进步为改进识别系统提供了新的机会,特别是在隐私问题限制数据可用性的情况下。我们提出了一种新的方法,利用身份引导合成增强来丰富推理时的面部表征。与传统的数据增强不同,它通过样本聚合来增强嵌入,在不扩展训练集或重新训练现有模型的情况下,引入了一个用于表示丰富的推理时间范式。使用Arc2Face,我们从每个真实样本中生成多样化,身份一致的合成图像,合成多种面部变化以近似每个身份周围的分布空间。对10种嵌入融合策略的非参数分析显示,在基线上有一致的改进,Mean、Median和hybrid Mean-Median (Meta-MM)获得了最佳性能,Meta-MM显示出不同模型之间最低的可变性。实验证明了在重新识别和验证设置中一致的改进。在Labeled Faces in the Wild (LFW)数据集上,Rank-1的准确率平均提高了6.97点,平均平均精度(mAP)提高了5.82点和8.10点。在监控摄像头面部(SCFace)数据集(一个低质量的跨距离数据集)上,排名1的增益范围从10.98到31.33分不等。在交叉位姿LFW (Cross-Pose LFW, CPLFW)验证基准上,准确率基本达到或超过AdaFace基线,最高可达5.57分。将潜在一致性模型与低秩自适应(LCM-LoRA)相结合,使样本生成速度加快了10倍,使框架适合大规模应用。
{"title":"Improving face re-identification via identity-conditioned synthetic augmentation and inference-time embedding fusion","authors":"Héctor Penadés, Félix Escalona, Miguel Cazorla","doi":"10.1016/j.eswa.2026.131302","DOIUrl":"10.1016/j.eswa.2026.131302","url":null,"abstract":"<div><div>The task of face re-identification seeks to match identities across images captured under varying conditions. In conventional single-registration scenarios, only one real image per subject is available during inference, limiting the discriminative capability of the embedding. Advances in synthetic data present new opportunities for improving recognition systems, particularly as privacy concerns restrict data availability. We propose a novel method that leverages identity-guided synthetic augmentation to enrich facial representations at inference time. Unlike traditional data augmentation, it enhances embeddings through sample aggregation, introducing an inference-time paradigm for representation enrichment without expanding the training set or retraining existing models. Using Arc2Face, we generate diverse, identity-consistent synthetic images from each real sample, synthesizing multiple facial variations to approximate the distributional space around each identity. A non-parametric analysis of ten embedding fusion strategies showed consistent improvements over the baselines, with the Mean, Median, and hybrid Mean-Median (Meta-MM) achieving the best performance and Meta-MM showing the lowest variability across models. Experiments demonstrated consistent improvements across re-identification and verification settings. On Labeled Faces in the Wild (LFW) dataset, Rank-1 accuracy improved by an average of 6.97 points and mean Average Precision (mAP) by 5.82 and 8.10 points. On the Surveillance Cameras Face (SCFace) dataset, a low-quality, cross-distance dataset, Rank-1 gains ranged from 10.98 to 31.33 points. On the Cross-Pose LFW (CPLFW) verification benchmark, accuracy generally matched or exceeded AdaFace baselines, with gains of up to 5.57 points. Incorporating latent consistency models with low-rank adaptation (LCM-LoRA) accelerated sample generation tenfold, making the framework suitable for large-scale applications.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131302"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.eswa.2026.131220
Li Yan , Yinjin Wu , Boyang Qu , Chao Li , Jing Liang , Kunjie Yu , Caitong Yue , Baihao Qiao , Yuqi Lei
Dynamic constrained multiobjective optimization problems (DCMOPs) are characterized by objective functions and constraints that change complexly over time. This time-varying characteristic proposes significant challenges for existing optimization algorithms, particularly in rapidly tracking the dynamic feasible regions and accurately converging to the changing Dynamic Constrained Pareto Optimal Front (DCPOF). To address the above challenges, an adaptive decomposition-based transfer learning method is proposed in this article, termed ADTL. The method introduces an adaptive objective space decomposition strategy to locate the dynamic feasible regions accurately. Upon the detection of a new environment, the objective space is decomposed by the historical optimal solutions. To efficiently track the DCPOF, an individual-based transfer learning strategy is proposed, which associates each solution in the current environment with its nearest reference vector. Then, a single-layer autoencoder is employed to learn the features of historical optimal solutions and transfer historical knowledge to the current population. Furthermore, to improve search efficiency, a diversity and feasibility enhancement strategyis proposed. This strategy evaluates the diversity and feasibility of the predicted population, introduces random solutions according to the diversity level, and relocates infeasible solutions to the boundary of the feasible regions. Comprehensive experiments on widely used benchmark problems demonstrate that the proposed algorithm is highly competitive in dealing with DCMOPs when compared with seven state-of-the-art algorithms.
{"title":"Adaptive decomposition-based transfer learning for dynamic constrained multi-objective optimization","authors":"Li Yan , Yinjin Wu , Boyang Qu , Chao Li , Jing Liang , Kunjie Yu , Caitong Yue , Baihao Qiao , Yuqi Lei","doi":"10.1016/j.eswa.2026.131220","DOIUrl":"10.1016/j.eswa.2026.131220","url":null,"abstract":"<div><div>Dynamic constrained multiobjective optimization problems (DCMOPs) are characterized by objective functions and constraints that change complexly over time. This time-varying characteristic proposes significant challenges for existing optimization algorithms, particularly in rapidly tracking the dynamic feasible regions and accurately converging to the changing Dynamic Constrained Pareto Optimal Front (DCPOF). To address the above challenges, an adaptive decomposition-based transfer learning method is proposed in this article, termed ADTL. The method introduces an adaptive objective space decomposition strategy to locate the dynamic feasible regions accurately. Upon the detection of a new environment, the objective space is decomposed by the historical optimal solutions. To efficiently track the DCPOF, an individual-based transfer learning strategy is proposed, which associates each solution in the current environment with its nearest reference vector. Then, a single-layer autoencoder is employed to learn the features of historical optimal solutions and transfer historical knowledge to the current population. Furthermore, to improve search efficiency, a diversity and feasibility enhancement strategyis proposed. This strategy evaluates the diversity and feasibility of the predicted population, introduces random solutions according to the diversity level, and relocates infeasible solutions to the boundary of the feasible regions. Comprehensive experiments on widely used benchmark problems demonstrate that the proposed algorithm is highly competitive in dealing with DCMOPs when compared with seven state-of-the-art algorithms.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"309 ","pages":"Article 131220"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146025013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.eswa.2026.131304
Cheng Ju , Yuansha Xie , Zhongrong Wang , Yu Zhao , Wenyao Yan , Rongjun Chai , Juan Duan , Yali Cao , Yuxin Chang
Achieving high-precision and multimodal trajectory prediction for multiple agents in mixed traffic environments, where autonomous and human-driven vehicles coexist, constitutes a fundamental scientific challenge for ensuring traffic safety and efficiency. To address the limitations of existing approaches in modeling heterogeneous behaviors, long-term dependencies, and high-level semantics in complex dynamic scenarios, a Pyramidal Spatio-Temporal Graph Transformer (STGFormer) based on cross-disciplinary feature fusion is proposed in this study. This method, grounded in hierarchical feature integration, systematically incorporates multi-source information from physical, psychological, environmental, and social domains, thereby significantly enhancing the model’s capacity to represent diverse behaviors. In the spatial modeling stage, an Adaptive Neighborhood Selection Graph Convolutional Network (ANS-GCN) is introduced, which dynamically selects key interactive agents through a multi-factor learnable weighting mechanism, enabling efficient spatial relationship modeling. For temporal modeling, a Pyramid Sparse Semantic Attention Transformer Encoder (PSSAT) is designed to progressively capture short-term dynamics and long-term trends, integrating spatial, temporal, and behavioral semantic features. Ultimately, a t-distribution-based Mixture Density Network (TDMDN) is employed for multimodal probabilistic modeling, better fitting the multi-modal and heavy-tailed distributions of future trajectories and enhancing adaptability and robustness in complex traffic contexts. Experimental results demonstrate that the proposed STGFormer achieves synergistic improvements in accuracy, diversity, and physical plausibility across multiple mainstream evaluation metrics, exhibiting superior predictive consistency and robustness, particularly in complex interactions and adverse driving scenarios. These findings not only validate the effectiveness of cross-disciplinary feature fusion and hierarchical structural design in multi-agent trajectory modeling but also provide a theoretical foundation and methodological reference for multimodal behavior understanding and safe decision-making in intelligent transportation systems.
{"title":"STGFormer: A pyramidal spatio-temporal graph transformer with cross-disciplinary feature fusion for semantic-rich trajectory prediction in heterogeneous autonomy traffic","authors":"Cheng Ju , Yuansha Xie , Zhongrong Wang , Yu Zhao , Wenyao Yan , Rongjun Chai , Juan Duan , Yali Cao , Yuxin Chang","doi":"10.1016/j.eswa.2026.131304","DOIUrl":"10.1016/j.eswa.2026.131304","url":null,"abstract":"<div><div>Achieving high-precision and multimodal trajectory prediction for multiple agents in mixed traffic environments, where autonomous and human-driven vehicles coexist, constitutes a fundamental scientific challenge for ensuring traffic safety and efficiency. To address the limitations of existing approaches in modeling heterogeneous behaviors, long-term dependencies, and high-level semantics in complex dynamic scenarios, a Pyramidal Spatio-Temporal Graph Transformer (STGFormer) based on cross-disciplinary feature fusion is proposed in this study. This method, grounded in hierarchical feature integration, systematically incorporates multi-source information from physical, psychological, environmental, and social domains, thereby significantly enhancing the model’s capacity to represent diverse behaviors. In the spatial modeling stage, an Adaptive Neighborhood Selection Graph Convolutional Network (ANS-GCN) is introduced, which dynamically selects key interactive agents through a multi-factor learnable weighting mechanism, enabling efficient spatial relationship modeling. For temporal modeling, a Pyramid Sparse Semantic Attention Transformer Encoder (PSSAT) is designed to progressively capture short-term dynamics and long-term trends, integrating spatial, temporal, and behavioral semantic features. Ultimately, a t-distribution-based Mixture Density Network (TDMDN) is employed for multimodal probabilistic modeling, better fitting the multi-modal and heavy-tailed distributions of future trajectories and enhancing adaptability and robustness in complex traffic contexts. Experimental results demonstrate that the proposed STGFormer achieves synergistic improvements in accuracy, diversity, and physical plausibility across multiple mainstream evaluation metrics, exhibiting superior predictive consistency and robustness, particularly in complex interactions and adverse driving scenarios. These findings not only validate the effectiveness of cross-disciplinary feature fusion and hierarchical structural design in multi-agent trajectory modeling but also provide a theoretical foundation and methodological reference for multimodal behavior understanding and safe decision-making in intelligent transportation systems.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131304"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.eswa.2026.131201
Zeren Ai , Hui Cao , Henglong Shen , Longde Wang
Marine diesel engine fault diagnosis has long been hindered by two major challenges: sample scarcity and class imbalance, both of which significantly limit the effectiveness of traditional data-driven models. Existing approaches struggle to simultaneously capture the complex inter-sample relationships and mitigate the prediction bias caused by imbalanced class distributions. To address these issues, this study proposes an innovative graph-based fault diagnosis model, the Hierarchical Multi-stage Attentional Fusion Graph Attention Network (HMAF-GAT). To alleviate the problem of sample scarcity, we construct a dual-graph topology based on Euclidean distance and cosine similarity, enabling the extraction of multi-dimensional relational information from limited samples. To handle class imbalance, we design a Hierarchical Multi-stage Attentional Fusion (HMAF) framework composed of a Global-Local Attention Fusion Module (GL-AFM) and a hierarchical fusion strategy. The GL-AFM preserves minority-class neighbors through local attention and adaptively adjusts weight assignment through global attention. Furthermore, the hierarchical fusion strategy facilitates the interaction of local and global features, effectively suppressing the dominance of majority-class samples.We employ Graph Attention Networks (GAT) as the classifier and use the multi-head attention mechanism to compute node aggregation weights. By parallelizing multiple attention heads, the model enhances representational capacity and improves training stability, enabling the extraction of more robust features from scarce samples.Experiments on a marine diesel engine dataset validate the effectiveness and reliability of the proposed HMAF-GAT model. A series of ablation studies and comparative evaluations further demonstrate its performance. The results show that when the ratio of normal samples to each fault category is 9:1, the proposed method achieves an accuracy of 98.89%, outperforming traditional data augmentation methods such as Synthetic Minority Over-sampling Technique (SMOTE) and Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) by 9.1%, significantly enhancing the recognition capability of minority-class faults. This study reveals the graph-structured characteristics of fault propagation in complex systems and provides a novel graph learning-based solution for diesel engine fault diagnosis under imbalanced data conditions.
{"title":"Hierarchical attentional fusion graph attention network for marine diesel engines based on imbalanced datasets","authors":"Zeren Ai , Hui Cao , Henglong Shen , Longde Wang","doi":"10.1016/j.eswa.2026.131201","DOIUrl":"10.1016/j.eswa.2026.131201","url":null,"abstract":"<div><div>Marine diesel engine fault diagnosis has long been hindered by two major challenges: sample scarcity and class imbalance, both of which significantly limit the effectiveness of traditional data-driven models. Existing approaches struggle to simultaneously capture the complex inter-sample relationships and mitigate the prediction bias caused by imbalanced class distributions. To address these issues, this study proposes an innovative graph-based fault diagnosis model, the Hierarchical Multi-stage Attentional Fusion Graph Attention Network (HMAF-GAT). To alleviate the problem of sample scarcity, we construct a dual-graph topology based on Euclidean distance and cosine similarity, enabling the extraction of multi-dimensional relational information from limited samples. To handle class imbalance, we design a Hierarchical Multi-stage Attentional Fusion (HMAF) framework composed of a Global-Local Attention Fusion Module (GL-AFM) and a hierarchical fusion strategy. The GL-AFM preserves minority-class neighbors through local attention and adaptively adjusts weight assignment through global attention. Furthermore, the hierarchical fusion strategy facilitates the interaction of local and global features, effectively suppressing the dominance of majority-class samples.We employ Graph Attention Networks (GAT) as the classifier and use the multi-head attention mechanism to compute node aggregation weights. By parallelizing multiple attention heads, the model enhances representational capacity and improves training stability, enabling the extraction of more robust features from scarce samples.Experiments on a marine diesel engine dataset validate the effectiveness and reliability of the proposed HMAF-GAT model. A series of ablation studies and comparative evaluations further demonstrate its performance. The results show that when the ratio of normal samples to each fault category is 9:1, the proposed method achieves an accuracy of 98.89%, outperforming traditional data augmentation methods such as Synthetic Minority Over-sampling Technique (SMOTE) and Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) by 9.1%, significantly enhancing the recognition capability of minority-class faults. This study reveals the graph-structured characteristics of fault propagation in complex systems and provides a novel graph learning-based solution for diesel engine fault diagnosis under imbalanced data conditions.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131201"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.eswa.2026.131228
Rudan Deng , Hongmei Chen , Chenglong Zhu , Shi-Jinn Horng , Tianrui Li
Partial multi-label learning (PML) aims to identify true positive labels from candidate sets heavily contaminated with false positives. However, most existing PML methods typically overlook both the redundant structure in feature relationships and the higher-order semantic correlations among labels. They also fail to adequately utilize the valuable supervisory information provided by negative labels outside the candidate set. To address these limitations, this paper proposes a novel negative-label-guided method, PML-GTHG, that constructs a feature tree and a high-order label hypergraph. Specifically, we first design a negative-label guidance term that uses labels outside the candidate sets as highly reliable negative references to help identify true positives. Then introduce a minimum spanning tree to model feature dependencies, capturing essential feature structures without cycles while eliminating redundancy. Additionally, we employ a hypergraph to explore complex high-order label correlations that go beyond traditional pairwise relationships. The feature relation tree, high-order label hypergraph, and negative-label guidance term are integrated into a unified optimization framework that jointly improves learning performance. Extensive experiments across multiple benchmark datasets show that our method achieves superior performance compared to leading methods across a range of evaluation metrics.
{"title":"Partial multi-label learning with guided feature tree and high-order label graph","authors":"Rudan Deng , Hongmei Chen , Chenglong Zhu , Shi-Jinn Horng , Tianrui Li","doi":"10.1016/j.eswa.2026.131228","DOIUrl":"10.1016/j.eswa.2026.131228","url":null,"abstract":"<div><div>Partial multi-label learning (PML) aims to identify true positive labels from candidate sets heavily contaminated with false positives. However, most existing PML methods typically overlook both the redundant structure in feature relationships and the higher-order semantic correlations among labels. They also fail to adequately utilize the valuable supervisory information provided by negative labels outside the candidate set. To address these limitations, this paper proposes a novel negative-label-guided method, PML-GTHG, that constructs a feature tree and a high-order label hypergraph. Specifically, we first design a negative-label guidance term that uses labels outside the candidate sets as highly reliable negative references to help identify true positives. Then introduce a minimum spanning tree to model feature dependencies, capturing essential feature structures without cycles while eliminating redundancy. Additionally, we employ a hypergraph to explore complex high-order label correlations that go beyond traditional pairwise relationships. The feature relation tree, high-order label hypergraph, and negative-label guidance term are integrated into a unified optimization framework that jointly improves learning performance. Extensive experiments across multiple benchmark datasets show that our method achieves superior performance compared to leading methods across a range of evaluation metrics.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131228"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1016/j.eswa.2026.131303
Dianlong You , Yulong Wang , Cunguo Tao , Zhen Chen , Shunfu Jin
Cross-modal image fusion aims to integrate complementary information from different imaging sources to generate high-quality images with comprehensive information and fine details. Although convolutional neural network(CNN)-based methods have made competitive progress, their inherent local receptive fields limit effective global information modeling, while Transformer-based approaches excel at capturing long-range dependencies but are constrained by quadratic computational complexity. We propose DAMFusion, a dual-branch architecture that decouples shallow texture and global semantic features through attention mechanisms and state space models. Specifically, 1) design a Shallow Feature Fusion Module(SFFM) based on channel-spatial attention to replace traditional convolution operations, enabling precise local feature extraction; 2) construct an efficient improved model combining visual Mamba with dynamic convolution, enhancing global feature representation capabilities; 3) make an adaptive semantic feature fusion strategy based on spatial normalization to establish dynamic interaction mechanisms between shallow and global features. Extensive experiments demonstrate that DAMFusion achieves competitive performance in infrared-visible fusion and medical image fusion tasks, demonstrating consistent improvements over existing methods in objective metrics and subjective visual quality, thus providing a new technical paradigm for cross-modal image fusion. The code is released at https://github.com/youdianlong/DAMFusion.git.
{"title":"Cross-modal image fusion via dual attention and Mamba","authors":"Dianlong You , Yulong Wang , Cunguo Tao , Zhen Chen , Shunfu Jin","doi":"10.1016/j.eswa.2026.131303","DOIUrl":"10.1016/j.eswa.2026.131303","url":null,"abstract":"<div><div>Cross-modal image fusion aims to integrate complementary information from different imaging sources to generate high-quality images with comprehensive information and fine details. Although convolutional neural network(CNN)-based methods have made competitive progress, their inherent local receptive fields limit effective global information modeling, while Transformer-based approaches excel at capturing long-range dependencies but are constrained by quadratic computational complexity. We propose DAMFusion, a dual-branch architecture that decouples shallow texture and global semantic features through attention mechanisms and state space models. Specifically, 1) design a Shallow Feature Fusion Module(SFFM) based on channel-spatial attention to replace traditional convolution operations, enabling precise local feature extraction; 2) construct an efficient improved model combining visual Mamba with dynamic convolution, enhancing global feature representation capabilities; 3) make an adaptive semantic feature fusion strategy based on spatial normalization to establish dynamic interaction mechanisms between shallow and global features. Extensive experiments demonstrate that DAMFusion achieves competitive performance in infrared-visible fusion and medical image fusion tasks, demonstrating consistent improvements over existing methods in objective metrics and subjective visual quality, thus providing a new technical paradigm for cross-modal image fusion. The code is released at <span><span>https://github.com/youdianlong/DAMFusion.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131303"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multi-modal image fusion (MMIF) integrates complementary information from multi-source images to enhance visual quality for downstream vision tasks. Existing methods have proposed numerous promising solutions, yet they still exhibit deficiencies in multi-modal feature interaction. In this paper, we find that Transformer-based architectures outperform Mamba-based counterparts in fundamental feature extraction, while Mamba’s unique scanning mechanism holds significant potential for deep multi-modal feature interaction. To this end, we propose HTM, a novel hybrid Transformer-Mamba architecture. Specifically, HTM leverages the respective strengths of both architectures: Transformer blocks enable effective feature extraction and Mamba blocks achieve efficient feature interaction. Building upon vanilla Mamba, we design a cross-modal local feature scanning mechanism (CMLFSM) that performs channel-wise joint scanning to align and fuse analogous features across modalities. Furthermore, we incorporate a Cross-Modal Gated Feedforward Network (CMFFN) that leverages inter-modal information flows to execute dynamic gating, effectively minimizing the flow of non-essential information. Finally, a CLIP-based loss is proposed to provide high-quality semantic guidance for unsupervised MMIF tasks. Extensive experiments demonstrate that our method achieves superior results across multiple image fusion benchmarks. The project code and pre-trained models are available upon acceptance.
{"title":"Fusion requires interaction: a hybrid Mamba-transformer architecture for deep interactive fusion of multi-modal images","authors":"Wenxiao Xu , Chen Wu , Qiyuan Yin , Ling Wang , Zhuoran Zheng , Daqing Huang","doi":"10.1016/j.eswa.2026.131309","DOIUrl":"10.1016/j.eswa.2026.131309","url":null,"abstract":"<div><div>Multi-modal image fusion (MMIF) integrates complementary information from multi-source images to enhance visual quality for downstream vision tasks. Existing methods have proposed numerous promising solutions, yet they still exhibit deficiencies in multi-modal feature interaction. In this paper, we find that Transformer-based architectures outperform Mamba-based counterparts in fundamental feature extraction, while Mamba’s unique scanning mechanism holds significant potential for deep multi-modal feature interaction. To this end, we propose HTM, a novel hybrid Transformer-Mamba architecture. Specifically, HTM leverages the respective strengths of both architectures: Transformer blocks enable effective feature extraction and Mamba blocks achieve efficient feature interaction. Building upon vanilla Mamba, we design a cross-modal local feature scanning mechanism (CMLFSM) that performs channel-wise joint scanning to align and fuse analogous features across modalities. Furthermore, we incorporate a Cross-Modal Gated Feedforward Network (CMFFN) that leverages inter-modal information flows to execute dynamic gating, effectively minimizing the flow of non-essential information. Finally, a CLIP-based loss is proposed to provide high-quality semantic guidance for unsupervised MMIF tasks. Extensive experiments demonstrate that our method achieves superior results across multiple image fusion benchmarks. The project code and pre-trained models are available upon acceptance.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131309"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}