首页 > 最新文献

Expert Systems with Applications最新文献

英文 中文
DRKT: Learning differential relationships for efficient knowledge tracing with learner’s knowledge internalization representation 基于学习者知识内化表征的有效知识跟踪学习微分关系
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131319
Zhaoli Zhang , Jiahao Li , Hai Liu , Erqi Zhang , Tingting Liu , Minhong Wang
Knowledge tracing (KT), is a crucial task in educational data mining that aims to model the state of learners’ knowledge by analyzing their behavioral data in real time. However, unlike the dynamic evolution process of learners’ knowledge internalization in KT modeling, the intrinsic features associated with exercises and knowledge components (KCs) remain static. Many existing models overlook this distinction and fail to implement differentiated feature processing. Additionally, the rapidly expanding volume of data in online learning platforms poses new challenges to model performance and efficiency. To address these issues, we propose DRKT, a new model that employs intrinsic information mining (IIM) module to extract inherent feature information from exercises and KCs. We also utilize the Mamba network to capture learner-exercise interaction patterns and achieve a balance between performance and efficiency. Furthermore, we introduce a double matrix dynamic update (DMDU) strategy to differentially model the complex dynamics of knowledge internalization and the inherent invariability of exercises and KCs. Experimental results on four real-world educational datasets demonstrate that DRKT outperforms existing methods in predictive accuracy, resource consumption, and time complexity, providing effective technical support for pedagogical interventions and personalized learning recommendations.
知识追踪(Knowledge tracing, KT)是教育数据挖掘中的一项重要任务,旨在通过实时分析学习者的行为数据,对学习者的知识状态进行建模。然而,与KT建模中学习者知识内化的动态演变过程不同,与练习和知识组件(KCs)相关的内在特征是静态的。许多现有模型忽略了这一区别,未能实现差异化的特征处理。此外,在线学习平台中快速增长的数据量对模型的性能和效率提出了新的挑战。为了解决这些问题,我们提出了一种新的DRKT模型,该模型采用内在信息挖掘(IIM)模块从练习和KCs中提取固有特征信息。我们还利用曼巴网络来捕捉学习者-锻炼的互动模式,并实现性能和效率之间的平衡。此外,我们还引入了双矩阵动态更新(DMDU)策略,对知识内化的复杂动态以及练习和KCs的固有不变性进行差异性建模。在四个真实教育数据集上的实验结果表明,DRKT在预测精度、资源消耗和时间复杂度方面优于现有方法,为教学干预和个性化学习建议提供了有效的技术支持。
{"title":"DRKT: Learning differential relationships for efficient knowledge tracing with learner’s knowledge internalization representation","authors":"Zhaoli Zhang ,&nbsp;Jiahao Li ,&nbsp;Hai Liu ,&nbsp;Erqi Zhang ,&nbsp;Tingting Liu ,&nbsp;Minhong Wang","doi":"10.1016/j.eswa.2026.131319","DOIUrl":"10.1016/j.eswa.2026.131319","url":null,"abstract":"<div><div>Knowledge tracing (KT), is a crucial task in educational data mining that aims to model the state of learners’ knowledge by analyzing their behavioral data in real time. However, unlike the dynamic evolution process of learners’ knowledge internalization in KT modeling, the intrinsic features associated with exercises and knowledge components (KCs) remain static. Many existing models overlook this distinction and fail to implement differentiated feature processing. Additionally, the rapidly expanding volume of data in online learning platforms poses new challenges to model performance and efficiency. To address these issues, we propose DRKT, a new model that employs intrinsic information mining (IIM) module to extract inherent feature information from exercises and KCs. We also utilize the Mamba network to capture learner-exercise interaction patterns and achieve a balance between performance and efficiency. Furthermore, we introduce a double matrix dynamic update (DMDU) strategy to differentially model the complex dynamics of knowledge internalization and the inherent invariability of exercises and KCs. Experimental results on four real-world educational datasets demonstrate that DRKT outperforms existing methods in predictive accuracy, resource consumption, and time complexity, providing effective technical support for pedagogical interventions and personalized learning recommendations.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131319"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variational oblique predictive clustering trees 变分倾斜预测聚类树
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131255
Viktor Andonovikj , Sašo Džeroski , Biljana Mileva Boshkoska , Pavle Boškoski
Oblique predictive clustering trees (SPYCTs) are semi-supervised multi-target prediction models mainly used for structured output prediction (SOP) problems. They are computationally efficient and when combined in ensembles they achieve state-of-the-art results. However, one major issue is that it is challenging to interpret an ensemble of SPYCTs without the use of a model-agnostic method. We propose variational oblique predictive clustering trees, which address this challenge. The parameters of each split node are treated as random variables, described with a probability distribution, and they are learned through the Variational Bayes method. We evaluate the model on several benchmark datasets of different sizes. The experimental analyses show that a single variational oblique predictive clustering tree (VSPYCT) achieves competitive, and sometimes better predictive performance than the ensemble of standard SPYCTs. We also present a method for extracting feature importance scores from the model. Finally, we present a method to visually interpret the model’s decision making process through analysis of the relative feature importance in each split node.
斜预测聚类树(spyct)是一种半监督多目标预测模型,主要用于结构化输出预测问题。它们在计算上是高效的,当组合在一起时,它们达到了最先进的结果。然而,一个主要问题是,在不使用模型不可知方法的情况下解释spyct集合是具有挑战性的。我们提出了变分倾斜预测聚类树,解决了这一挑战。将每个分裂节点的参数作为随机变量,用概率分布来描述,并通过变分贝叶斯方法进行学习。我们在几个不同规模的基准数据集上对模型进行了评估。实验分析表明,单一变分倾斜预测聚类树(VSPYCT)的预测性能优于标准的倾斜预测聚类树。我们还提出了一种从模型中提取特征重要性分数的方法。最后,我们提出了一种通过分析每个分裂节点的相对特征重要性来直观解释模型决策过程的方法。
{"title":"Variational oblique predictive clustering trees","authors":"Viktor Andonovikj ,&nbsp;Sašo Džeroski ,&nbsp;Biljana Mileva Boshkoska ,&nbsp;Pavle Boškoski","doi":"10.1016/j.eswa.2026.131255","DOIUrl":"10.1016/j.eswa.2026.131255","url":null,"abstract":"<div><div>Oblique predictive clustering trees (SPYCTs) are semi-supervised multi-target prediction models mainly used for structured output prediction (SOP) problems. They are computationally efficient and when combined in ensembles they achieve state-of-the-art results. However, one major issue is that it is challenging to interpret an ensemble of SPYCTs without the use of a model-agnostic method. We propose variational oblique predictive clustering trees, which address this challenge. The parameters of each split node are treated as random variables, described with a probability distribution, and they are learned through the Variational Bayes method. We evaluate the model on several benchmark datasets of different sizes. The experimental analyses show that a single variational oblique predictive clustering tree (VSPYCT) achieves competitive, and sometimes better predictive performance than the ensemble of standard SPYCTs. We also present a method for extracting feature importance scores from the model. Finally, we present a method to visually interpret the model’s decision making process through analysis of the relative feature importance in each split node.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131255"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GWO-DAGRU: A hybrid deep learning framework with metaheuristic feature selection and self-weighted context GRU for short-term wind power forecast 基于元启发式特征选择和自加权上下文GRU的风电短期预测混合深度学习框架
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131279
Saira Mudassar , Aneela Zameer , Muhammad Asif Zahoor Raja
Accurate wind power forecasting is crucial for effectively integrating renewable energy into the electric grid, enabling the optimal utilization of generated clean energy. The intermittent nature of wind and the operational complexity of sustainable energy systems make prediction a highly challenging task. This study combines the strengths of a metaheuristic algorithm, grey wolf optimization (GWO), for feature selection with a time-series multivariate forecasting model, gated recurrent unit (GRU), along with a double attention mechanism (DAGRU) for effective, precise, and efficient predictions. The proposed model, GWO-DAGRU, is a short-term wind power forecasting model that integrates grey wolf optimization for feature selection with a double attention gated recurrent unit for time-series prediction. GWO, combined with an XGBoost regressor, is first used to identify key input features and refined by a double attention mechanism in DAGRU to capture temporal dependencies more effectively. The proposed approach is validated on data from seven European wind farms and further tested on the ELIA dataset to assess generalization capability. Performance is benchmarked using error metrics and statistical validation through the Wilcoxon signed-rank test at a 95% confidence level. The findings demonstrate that GWO-DAGRU achieves superior accuracy and robustness, outperforming several existing forecasting methods for efficient management and planning of a sustainable energy support system.
准确的风电预测对于有效地将可再生能源纳入电网,实现清洁能源的最佳利用至关重要。风能的间歇性和可持续能源系统运行的复杂性使得预测成为一项极具挑战性的任务。本研究结合了用于特征选择的元启发式算法灰狼优化(GWO)与时间序列多元预测模型门控循环单元(GRU)以及双注意机制(DAGRU)的优势,以实现有效、精确和高效的预测。提出的GWO-DAGRU模型是一种短期风电预测模型,它将灰狼优化的特征选择与双关注门控循环单元的时间序列预测相结合。GWO与XGBoost回归器相结合,首先用于识别关键输入特征,并通过DAGRU中的双注意机制进行细化,以更有效地捕获时间依赖性。该方法在七个欧洲风电场的数据上进行了验证,并在ELIA数据集上进行了进一步测试,以评估泛化能力。通过95%置信水平的Wilcoxon sign -rank检验,使用误差度量和统计验证对性能进行基准测试。研究结果表明,GWO-DAGRU在可持续能源支持系统的有效管理和规划方面具有优越的准确性和鲁棒性,优于现有的几种预测方法。
{"title":"GWO-DAGRU: A hybrid deep learning framework with metaheuristic feature selection and self-weighted context GRU for short-term wind power forecast","authors":"Saira Mudassar ,&nbsp;Aneela Zameer ,&nbsp;Muhammad Asif Zahoor Raja","doi":"10.1016/j.eswa.2026.131279","DOIUrl":"10.1016/j.eswa.2026.131279","url":null,"abstract":"<div><div>Accurate wind power forecasting is crucial for effectively integrating renewable energy into the electric grid, enabling the optimal utilization of generated clean energy. The intermittent nature of wind and the operational complexity of sustainable energy systems make prediction a highly challenging task. This study combines the strengths of a metaheuristic algorithm, grey wolf optimization (GWO), for feature selection with a time-series multivariate forecasting model, gated recurrent unit (GRU), along with a double attention mechanism (DAGRU) for effective, precise, and efficient predictions. The proposed model, GWO-DAGRU, is a short-term wind power forecasting model that integrates grey wolf optimization for feature selection with a double attention gated recurrent unit for time-series prediction. GWO, combined with an XGBoost regressor, is first used to identify key input features and refined by a double attention mechanism in DAGRU to capture temporal dependencies more effectively. The proposed approach is validated on data from seven European wind farms and further tested on the ELIA dataset to assess generalization capability. Performance is benchmarked using error metrics and statistical validation through the Wilcoxon signed-rank test at a 95% confidence level. The findings demonstrate that GWO-DAGRU achieves superior accuracy and robustness, outperforming several existing forecasting methods for efficient management and planning of a sustainable energy support system.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131279"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146025743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving face re-identification via identity-conditioned synthetic augmentation and inference-time embedding fusion 利用身份条件合成增强和推理时间嵌入融合改进人脸再识别
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131302
Héctor Penadés, Félix Escalona, Miguel Cazorla
The task of face re-identification seeks to match identities across images captured under varying conditions. In conventional single-registration scenarios, only one real image per subject is available during inference, limiting the discriminative capability of the embedding. Advances in synthetic data present new opportunities for improving recognition systems, particularly as privacy concerns restrict data availability. We propose a novel method that leverages identity-guided synthetic augmentation to enrich facial representations at inference time. Unlike traditional data augmentation, it enhances embeddings through sample aggregation, introducing an inference-time paradigm for representation enrichment without expanding the training set or retraining existing models. Using Arc2Face, we generate diverse, identity-consistent synthetic images from each real sample, synthesizing multiple facial variations to approximate the distributional space around each identity. A non-parametric analysis of ten embedding fusion strategies showed consistent improvements over the baselines, with the Mean, Median, and hybrid Mean-Median (Meta-MM) achieving the best performance and Meta-MM showing the lowest variability across models. Experiments demonstrated consistent improvements across re-identification and verification settings. On Labeled Faces in the Wild (LFW) dataset, Rank-1 accuracy improved by an average of 6.97 points and mean Average Precision (mAP) by 5.82 and 8.10 points. On the Surveillance Cameras Face (SCFace) dataset, a low-quality, cross-distance dataset, Rank-1 gains ranged from 10.98 to 31.33 points. On the Cross-Pose LFW (CPLFW) verification benchmark, accuracy generally matched or exceeded AdaFace baselines, with gains of up to 5.57 points. Incorporating latent consistency models with low-rank adaptation (LCM-LoRA) accelerated sample generation tenfold, making the framework suitable for large-scale applications.
人脸再识别的任务是在不同条件下捕获的图像中匹配身份。在传统的单配准场景中,在推理过程中每个受试者只有一张真实图像,限制了嵌入的判别能力。合成数据的进步为改进识别系统提供了新的机会,特别是在隐私问题限制数据可用性的情况下。我们提出了一种新的方法,利用身份引导合成增强来丰富推理时的面部表征。与传统的数据增强不同,它通过样本聚合来增强嵌入,在不扩展训练集或重新训练现有模型的情况下,引入了一个用于表示丰富的推理时间范式。使用Arc2Face,我们从每个真实样本中生成多样化,身份一致的合成图像,合成多种面部变化以近似每个身份周围的分布空间。对10种嵌入融合策略的非参数分析显示,在基线上有一致的改进,Mean、Median和hybrid Mean-Median (Meta-MM)获得了最佳性能,Meta-MM显示出不同模型之间最低的可变性。实验证明了在重新识别和验证设置中一致的改进。在Labeled Faces in the Wild (LFW)数据集上,Rank-1的准确率平均提高了6.97点,平均平均精度(mAP)提高了5.82点和8.10点。在监控摄像头面部(SCFace)数据集(一个低质量的跨距离数据集)上,排名1的增益范围从10.98到31.33分不等。在交叉位姿LFW (Cross-Pose LFW, CPLFW)验证基准上,准确率基本达到或超过AdaFace基线,最高可达5.57分。将潜在一致性模型与低秩自适应(LCM-LoRA)相结合,使样本生成速度加快了10倍,使框架适合大规模应用。
{"title":"Improving face re-identification via identity-conditioned synthetic augmentation and inference-time embedding fusion","authors":"Héctor Penadés,&nbsp;Félix Escalona,&nbsp;Miguel Cazorla","doi":"10.1016/j.eswa.2026.131302","DOIUrl":"10.1016/j.eswa.2026.131302","url":null,"abstract":"<div><div>The task of face re-identification seeks to match identities across images captured under varying conditions. In conventional single-registration scenarios, only one real image per subject is available during inference, limiting the discriminative capability of the embedding. Advances in synthetic data present new opportunities for improving recognition systems, particularly as privacy concerns restrict data availability. We propose a novel method that leverages identity-guided synthetic augmentation to enrich facial representations at inference time. Unlike traditional data augmentation, it enhances embeddings through sample aggregation, introducing an inference-time paradigm for representation enrichment without expanding the training set or retraining existing models. Using Arc2Face, we generate diverse, identity-consistent synthetic images from each real sample, synthesizing multiple facial variations to approximate the distributional space around each identity. A non-parametric analysis of ten embedding fusion strategies showed consistent improvements over the baselines, with the Mean, Median, and hybrid Mean-Median (Meta-MM) achieving the best performance and Meta-MM showing the lowest variability across models. Experiments demonstrated consistent improvements across re-identification and verification settings. On Labeled Faces in the Wild (LFW) dataset, Rank-1 accuracy improved by an average of 6.97 points and mean Average Precision (mAP) by 5.82 and 8.10 points. On the Surveillance Cameras Face (SCFace) dataset, a low-quality, cross-distance dataset, Rank-1 gains ranged from 10.98 to 31.33 points. On the Cross-Pose LFW (CPLFW) verification benchmark, accuracy generally matched or exceeded AdaFace baselines, with gains of up to 5.57 points. Incorporating latent consistency models with low-rank adaptation (LCM-LoRA) accelerated sample generation tenfold, making the framework suitable for large-scale applications.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131302"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive decomposition-based transfer learning for dynamic constrained multi-objective optimization 基于自适应分解的动态约束多目标优化迁移学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131220
Li Yan , Yinjin Wu , Boyang Qu , Chao Li , Jing Liang , Kunjie Yu , Caitong Yue , Baihao Qiao , Yuqi Lei
Dynamic constrained multiobjective optimization problems (DCMOPs) are characterized by objective functions and constraints that change complexly over time. This time-varying characteristic proposes significant challenges for existing optimization algorithms, particularly in rapidly tracking the dynamic feasible regions and accurately converging to the changing Dynamic Constrained Pareto Optimal Front (DCPOF). To address the above challenges, an adaptive decomposition-based transfer learning method is proposed in this article, termed ADTL. The method introduces an adaptive objective space decomposition strategy to locate the dynamic feasible regions accurately. Upon the detection of a new environment, the objective space is decomposed by the historical optimal solutions. To efficiently track the DCPOF, an individual-based transfer learning strategy is proposed, which associates each solution in the current environment with its nearest reference vector. Then, a single-layer autoencoder is employed to learn the features of historical optimal solutions and transfer historical knowledge to the current population. Furthermore, to improve search efficiency, a diversity and feasibility enhancement strategyis proposed. This strategy evaluates the diversity and feasibility of the predicted population, introduces random solutions according to the diversity level, and relocates infeasible solutions to the boundary of the feasible regions. Comprehensive experiments on widely used benchmark problems demonstrate that the proposed algorithm is highly competitive in dealing with DCMOPs when compared with seven state-of-the-art algorithms.
动态约束多目标优化问题具有目标函数和约束随时间复杂变化的特点。这种时变特性对现有优化算法提出了重大挑战,特别是在快速跟踪动态可行区域和准确收敛到变化的动态约束帕累托最优前沿(DCPOF)方面。为了解决上述挑战,本文提出了一种基于自适应分解的迁移学习方法,称为ADTL。该方法引入自适应目标空间分解策略,精确定位动态可行区域。在检测到新环境后,用历史最优解对目标空间进行分解。为了有效地跟踪DCPOF,提出了一种基于个体的迁移学习策略,该策略将当前环境中的每个解与其最近的参考向量相关联。然后,采用单层自编码器学习历史最优解的特征,并将历史知识传递给当前种群;为了提高搜索效率,提出了一种多样性和可行性增强策略。该策略评估预测种群的多样性和可行性,根据多样性水平引入随机解,并将不可行解重新定位到可行区域的边界。在广泛应用的基准问题上进行的综合实验表明,与现有的7种算法相比,该算法在处理DCMOPs方面具有很强的竞争力。
{"title":"Adaptive decomposition-based transfer learning for dynamic constrained multi-objective optimization","authors":"Li Yan ,&nbsp;Yinjin Wu ,&nbsp;Boyang Qu ,&nbsp;Chao Li ,&nbsp;Jing Liang ,&nbsp;Kunjie Yu ,&nbsp;Caitong Yue ,&nbsp;Baihao Qiao ,&nbsp;Yuqi Lei","doi":"10.1016/j.eswa.2026.131220","DOIUrl":"10.1016/j.eswa.2026.131220","url":null,"abstract":"<div><div>Dynamic constrained multiobjective optimization problems (DCMOPs) are characterized by objective functions and constraints that change complexly over time. This time-varying characteristic proposes significant challenges for existing optimization algorithms, particularly in rapidly tracking the dynamic feasible regions and accurately converging to the changing Dynamic Constrained Pareto Optimal Front (DCPOF). To address the above challenges, an adaptive decomposition-based transfer learning method is proposed in this article, termed ADTL. The method introduces an adaptive objective space decomposition strategy to locate the dynamic feasible regions accurately. Upon the detection of a new environment, the objective space is decomposed by the historical optimal solutions. To efficiently track the DCPOF, an individual-based transfer learning strategy is proposed, which associates each solution in the current environment with its nearest reference vector. Then, a single-layer autoencoder is employed to learn the features of historical optimal solutions and transfer historical knowledge to the current population. Furthermore, to improve search efficiency, a diversity and feasibility enhancement strategyis proposed. This strategy evaluates the diversity and feasibility of the predicted population, introduces random solutions according to the diversity level, and relocates infeasible solutions to the boundary of the feasible regions. Comprehensive experiments on widely used benchmark problems demonstrate that the proposed algorithm is highly competitive in dealing with DCMOPs when compared with seven state-of-the-art algorithms.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"309 ","pages":"Article 131220"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146025013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STGFormer: A pyramidal spatio-temporal graph transformer with cross-disciplinary feature fusion for semantic-rich trajectory prediction in heterogeneous autonomy traffic STGFormer:一种基于多学科特征融合的金字塔形时空图转换器,用于异构自治交通中富含语义的轨迹预测
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131304
Cheng Ju , Yuansha Xie , Zhongrong Wang , Yu Zhao , Wenyao Yan , Rongjun Chai , Juan Duan , Yali Cao , Yuxin Chang
Achieving high-precision and multimodal trajectory prediction for multiple agents in mixed traffic environments, where autonomous and human-driven vehicles coexist, constitutes a fundamental scientific challenge for ensuring traffic safety and efficiency. To address the limitations of existing approaches in modeling heterogeneous behaviors, long-term dependencies, and high-level semantics in complex dynamic scenarios, a Pyramidal Spatio-Temporal Graph Transformer (STGFormer) based on cross-disciplinary feature fusion is proposed in this study. This method, grounded in hierarchical feature integration, systematically incorporates multi-source information from physical, psychological, environmental, and social domains, thereby significantly enhancing the model’s capacity to represent diverse behaviors. In the spatial modeling stage, an Adaptive Neighborhood Selection Graph Convolutional Network (ANS-GCN) is introduced, which dynamically selects key interactive agents through a multi-factor learnable weighting mechanism, enabling efficient spatial relationship modeling. For temporal modeling, a Pyramid Sparse Semantic Attention Transformer Encoder (PSSAT) is designed to progressively capture short-term dynamics and long-term trends, integrating spatial, temporal, and behavioral semantic features. Ultimately, a t-distribution-based Mixture Density Network (TDMDN) is employed for multimodal probabilistic modeling, better fitting the multi-modal and heavy-tailed distributions of future trajectories and enhancing adaptability and robustness in complex traffic contexts. Experimental results demonstrate that the proposed STGFormer achieves synergistic improvements in accuracy, diversity, and physical plausibility across multiple mainstream evaluation metrics, exhibiting superior predictive consistency and robustness, particularly in complex interactions and adverse driving scenarios. These findings not only validate the effectiveness of cross-disciplinary feature fusion and hierarchical structural design in multi-agent trajectory modeling but also provide a theoretical foundation and methodological reference for multimodal behavior understanding and safe decision-making in intelligent transportation systems.
在自动驾驶和人类驾驶车辆并存的混合交通环境中,实现多智能体的高精度多模式轨迹预测,是确保交通安全和效率的根本科学挑战。为了解决现有方法在复杂动态场景中异构行为、长期依赖关系和高级语义建模方面的局限性,本研究提出了一种基于跨学科特征融合的金字塔形时空图转换器(STGFormer)。该方法以分层特征集成为基础,系统地融合了来自物理、心理、环境和社会领域的多源信息,从而显著增强了模型表征多种行为的能力。在空间建模阶段,引入自适应邻域选择图卷积网络(ANS-GCN),通过多因素可学习的加权机制动态选择关键交互主体,实现高效的空间关系建模。对于时间建模,设计了一个金字塔稀疏语义注意转换编码器(PSSAT),以逐步捕获短期动态和长期趋势,整合空间,时间和行为语义特征。最后,采用基于t分布的混合密度网络(TDMDN)进行多模态概率建模,更好地拟合未来轨迹的多模态和重尾分布,增强复杂交通环境下的适应性和鲁棒性。实验结果表明,提出的STGFormer在多个主流评估指标之间实现了准确性、多样性和物理合理性的协同改进,表现出卓越的预测一致性和鲁棒性,特别是在复杂的相互作用和不利的驾驶场景中。这些发现不仅验证了跨学科特征融合和分层结构设计在多智能体轨迹建模中的有效性,也为智能交通系统中多式联运行为理解和安全决策提供了理论基础和方法参考。
{"title":"STGFormer: A pyramidal spatio-temporal graph transformer with cross-disciplinary feature fusion for semantic-rich trajectory prediction in heterogeneous autonomy traffic","authors":"Cheng Ju ,&nbsp;Yuansha Xie ,&nbsp;Zhongrong Wang ,&nbsp;Yu Zhao ,&nbsp;Wenyao Yan ,&nbsp;Rongjun Chai ,&nbsp;Juan Duan ,&nbsp;Yali Cao ,&nbsp;Yuxin Chang","doi":"10.1016/j.eswa.2026.131304","DOIUrl":"10.1016/j.eswa.2026.131304","url":null,"abstract":"<div><div>Achieving high-precision and multimodal trajectory prediction for multiple agents in mixed traffic environments, where autonomous and human-driven vehicles coexist, constitutes a fundamental scientific challenge for ensuring traffic safety and efficiency. To address the limitations of existing approaches in modeling heterogeneous behaviors, long-term dependencies, and high-level semantics in complex dynamic scenarios, a Pyramidal Spatio-Temporal Graph Transformer (STGFormer) based on cross-disciplinary feature fusion is proposed in this study. This method, grounded in hierarchical feature integration, systematically incorporates multi-source information from physical, psychological, environmental, and social domains, thereby significantly enhancing the model’s capacity to represent diverse behaviors. In the spatial modeling stage, an Adaptive Neighborhood Selection Graph Convolutional Network (ANS-GCN) is introduced, which dynamically selects key interactive agents through a multi-factor learnable weighting mechanism, enabling efficient spatial relationship modeling. For temporal modeling, a Pyramid Sparse Semantic Attention Transformer Encoder (PSSAT) is designed to progressively capture short-term dynamics and long-term trends, integrating spatial, temporal, and behavioral semantic features. Ultimately, a t-distribution-based Mixture Density Network (TDMDN) is employed for multimodal probabilistic modeling, better fitting the multi-modal and heavy-tailed distributions of future trajectories and enhancing adaptability and robustness in complex traffic contexts. Experimental results demonstrate that the proposed STGFormer achieves synergistic improvements in accuracy, diversity, and physical plausibility across multiple mainstream evaluation metrics, exhibiting superior predictive consistency and robustness, particularly in complex interactions and adverse driving scenarios. These findings not only validate the effectiveness of cross-disciplinary feature fusion and hierarchical structural design in multi-agent trajectory modeling but also provide a theoretical foundation and methodological reference for multimodal behavior understanding and safe decision-making in intelligent transportation systems.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131304"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical attentional fusion graph attention network for marine diesel engines based on imbalanced datasets 基于不平衡数据集的船舶柴油机分层注意力融合图注意力网络
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131201
Zeren Ai , Hui Cao , Henglong Shen , Longde Wang
Marine diesel engine fault diagnosis has long been hindered by two major challenges: sample scarcity and class imbalance, both of which significantly limit the effectiveness of traditional data-driven models. Existing approaches struggle to simultaneously capture the complex inter-sample relationships and mitigate the prediction bias caused by imbalanced class distributions. To address these issues, this study proposes an innovative graph-based fault diagnosis model, the Hierarchical Multi-stage Attentional Fusion Graph Attention Network (HMAF-GAT). To alleviate the problem of sample scarcity, we construct a dual-graph topology based on Euclidean distance and cosine similarity, enabling the extraction of multi-dimensional relational information from limited samples. To handle class imbalance, we design a Hierarchical Multi-stage Attentional Fusion (HMAF) framework composed of a Global-Local Attention Fusion Module (GL-AFM) and a hierarchical fusion strategy. The GL-AFM preserves minority-class neighbors through local attention and adaptively adjusts weight assignment through global attention. Furthermore, the hierarchical fusion strategy facilitates the interaction of local and global features, effectively suppressing the dominance of majority-class samples.We employ Graph Attention Networks (GAT) as the classifier and use the multi-head attention mechanism to compute node aggregation weights. By parallelizing multiple attention heads, the model enhances representational capacity and improves training stability, enabling the extraction of more robust features from scarce samples.Experiments on a marine diesel engine dataset validate the effectiveness and reliability of the proposed HMAF-GAT model. A series of ablation studies and comparative evaluations further demonstrate its performance. The results show that when the ratio of normal samples to each fault category is 9:1, the proposed method achieves an accuracy of 98.89%, outperforming traditional data augmentation methods such as Synthetic Minority Over-sampling Technique (SMOTE) and Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) by 9.1%, significantly enhancing the recognition capability of minority-class faults. This study reveals the graph-structured characteristics of fault propagation in complex systems and provides a novel graph learning-based solution for diesel engine fault diagnosis under imbalanced data conditions.
船舶柴油机故障诊断一直受到样本稀缺性和类别不平衡两大挑战的阻碍,这两大挑战极大地限制了传统数据驱动模型的有效性。现有的方法很难同时捕捉复杂的样本间关系,并减轻由不平衡的类别分布引起的预测偏差。为了解决这些问题,本研究提出了一种创新的基于图的故障诊断模型——分层多阶段注意融合图注意网络(HMAF-GAT)。为了缓解样本稀缺的问题,我们基于欧几里得距离和余弦相似度构造了一个双图拓扑,实现了从有限样本中提取多维关系信息。为了解决类不平衡问题,我们设计了由全局-局部注意融合模块(GL-AFM)和分层融合策略组成的分层多阶段注意融合(HMAF)框架。GL-AFM通过局部关注保留少数类邻居,并通过全局关注自适应调整权重分配。此外,层次融合策略促进了局部和全局特征的相互作用,有效地抑制了多数类样本的优势性。我们采用图注意网络(GAT)作为分类器,并使用多头注意机制计算节点聚合权值。通过并行处理多个注意头,该模型增强了表征能力,提高了训练稳定性,能够从稀缺样本中提取出更鲁棒的特征。在船用柴油机数据集上的实验验证了该模型的有效性和可靠性。一系列的烧蚀研究和对比评价进一步证明了它的性能。结果表明,当正常样本与各故障类别之比为9:1时,所提方法的准确率达到98.89%,比传统的数据增强方法(如合成少数派过采样技术(SMOTE)和Wasserstein梯度惩罚生成对抗网络(WGAN-GP))提高9.1%,显著提高了对少数派故障的识别能力。该研究揭示了复杂系统故障传播的图结构特征,为数据不平衡条件下柴油机故障诊断提供了一种新的基于图学习的解决方案。
{"title":"Hierarchical attentional fusion graph attention network for marine diesel engines based on imbalanced datasets","authors":"Zeren Ai ,&nbsp;Hui Cao ,&nbsp;Henglong Shen ,&nbsp;Longde Wang","doi":"10.1016/j.eswa.2026.131201","DOIUrl":"10.1016/j.eswa.2026.131201","url":null,"abstract":"<div><div>Marine diesel engine fault diagnosis has long been hindered by two major challenges: sample scarcity and class imbalance, both of which significantly limit the effectiveness of traditional data-driven models. Existing approaches struggle to simultaneously capture the complex inter-sample relationships and mitigate the prediction bias caused by imbalanced class distributions. To address these issues, this study proposes an innovative graph-based fault diagnosis model, the Hierarchical Multi-stage Attentional Fusion Graph Attention Network (HMAF-GAT). To alleviate the problem of sample scarcity, we construct a dual-graph topology based on Euclidean distance and cosine similarity, enabling the extraction of multi-dimensional relational information from limited samples. To handle class imbalance, we design a Hierarchical Multi-stage Attentional Fusion (HMAF) framework composed of a Global-Local Attention Fusion Module (GL-AFM) and a hierarchical fusion strategy. The GL-AFM preserves minority-class neighbors through local attention and adaptively adjusts weight assignment through global attention. Furthermore, the hierarchical fusion strategy facilitates the interaction of local and global features, effectively suppressing the dominance of majority-class samples.We employ Graph Attention Networks (GAT) as the classifier and use the multi-head attention mechanism to compute node aggregation weights. By parallelizing multiple attention heads, the model enhances representational capacity and improves training stability, enabling the extraction of more robust features from scarce samples.Experiments on a marine diesel engine dataset validate the effectiveness and reliability of the proposed HMAF-GAT model. A series of ablation studies and comparative evaluations further demonstrate its performance. The results show that when the ratio of normal samples to each fault category is 9:1, the proposed method achieves an accuracy of 98.89%, outperforming traditional data augmentation methods such as Synthetic Minority Over-sampling Technique (SMOTE) and Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) by 9.1%, significantly enhancing the recognition capability of minority-class faults. This study reveals the graph-structured characteristics of fault propagation in complex systems and provides a novel graph learning-based solution for diesel engine fault diagnosis under imbalanced data conditions.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131201"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partial multi-label learning with guided feature tree and high-order label graph 基于引导特征树和高阶标签图的部分多标签学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131228
Rudan Deng , Hongmei Chen , Chenglong Zhu , Shi-Jinn Horng , Tianrui Li
Partial multi-label learning (PML) aims to identify true positive labels from candidate sets heavily contaminated with false positives. However, most existing PML methods typically overlook both the redundant structure in feature relationships and the higher-order semantic correlations among labels. They also fail to adequately utilize the valuable supervisory information provided by negative labels outside the candidate set. To address these limitations, this paper proposes a novel negative-label-guided method, PML-GTHG, that constructs a feature tree and a high-order label hypergraph. Specifically, we first design a negative-label guidance term that uses labels outside the candidate sets as highly reliable negative references to help identify true positives. Then introduce a minimum spanning tree to model feature dependencies, capturing essential feature structures without cycles while eliminating redundancy. Additionally, we employ a hypergraph to explore complex high-order label correlations that go beyond traditional pairwise relationships. The feature relation tree, high-order label hypergraph, and negative-label guidance term are integrated into a unified optimization framework that jointly improves learning performance. Extensive experiments across multiple benchmark datasets show that our method achieves superior performance compared to leading methods across a range of evaluation metrics.
部分多标签学习(PML)旨在从被假阳性严重污染的候选集中识别真阳性标签。然而,大多数现有的PML方法都忽略了特征关系中的冗余结构和标签之间的高阶语义关联。他们也不能充分利用候选组之外的负面标签提供的有价值的监督信息。为了解决这些限制,本文提出了一种新的负标签引导方法PML-GTHG,该方法构造了一个特征树和一个高阶标签超图。具体来说,我们首先设计了一个负标签指导术语,它使用候选集之外的标签作为高可靠的负参考来帮助识别真正的阳性。然后引入最小生成树对特征依赖关系进行建模,在消除冗余的同时捕获基本的特征结构而不需要循环。此外,我们使用超图来探索超越传统两两关系的复杂高阶标签相关性。将特征关系树、高阶标签超图和负标签引导项集成到一个统一的优化框架中,共同提高学习性能。跨多个基准数据集的广泛实验表明,与一系列评估指标的领先方法相比,我们的方法实现了卓越的性能。
{"title":"Partial multi-label learning with guided feature tree and high-order label graph","authors":"Rudan Deng ,&nbsp;Hongmei Chen ,&nbsp;Chenglong Zhu ,&nbsp;Shi-Jinn Horng ,&nbsp;Tianrui Li","doi":"10.1016/j.eswa.2026.131228","DOIUrl":"10.1016/j.eswa.2026.131228","url":null,"abstract":"<div><div>Partial multi-label learning (PML) aims to identify true positive labels from candidate sets heavily contaminated with false positives. However, most existing PML methods typically overlook both the redundant structure in feature relationships and the higher-order semantic correlations among labels. They also fail to adequately utilize the valuable supervisory information provided by negative labels outside the candidate set. To address these limitations, this paper proposes a novel negative-label-guided method, PML-GTHG, that constructs a feature tree and a high-order label hypergraph. Specifically, we first design a negative-label guidance term that uses labels outside the candidate sets as highly reliable negative references to help identify true positives. Then introduce a minimum spanning tree to model feature dependencies, capturing essential feature structures without cycles while eliminating redundancy. Additionally, we employ a hypergraph to explore complex high-order label correlations that go beyond traditional pairwise relationships. The feature relation tree, high-order label hypergraph, and negative-label guidance term are integrated into a unified optimization framework that jointly improves learning performance. Extensive experiments across multiple benchmark datasets show that our method achieves superior performance compared to leading methods across a range of evaluation metrics.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131228"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-modal image fusion via dual attention and Mamba 跨模态图像融合的双重注意和曼巴
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131303
Dianlong You , Yulong Wang , Cunguo Tao , Zhen Chen , Shunfu Jin
Cross-modal image fusion aims to integrate complementary information from different imaging sources to generate high-quality images with comprehensive information and fine details. Although convolutional neural network(CNN)-based methods have made competitive progress, their inherent local receptive fields limit effective global information modeling, while Transformer-based approaches excel at capturing long-range dependencies but are constrained by quadratic computational complexity. We propose DAMFusion, a dual-branch architecture that decouples shallow texture and global semantic features through attention mechanisms and state space models. Specifically, 1) design a Shallow Feature Fusion Module(SFFM) based on channel-spatial attention to replace traditional convolution operations, enabling precise local feature extraction; 2) construct an efficient improved model combining visual Mamba with dynamic convolution, enhancing global feature representation capabilities; 3) make an adaptive semantic feature fusion strategy based on spatial normalization to establish dynamic interaction mechanisms between shallow and global features. Extensive experiments demonstrate that DAMFusion achieves competitive performance in infrared-visible fusion and medical image fusion tasks, demonstrating consistent improvements over existing methods in objective metrics and subjective visual quality, thus providing a new technical paradigm for cross-modal image fusion. The code is released at https://github.com/youdianlong/DAMFusion.git.
跨模态图像融合旨在整合不同成像源的互补信息,生成信息全面、细节精细的高质量图像。尽管基于卷积神经网络(CNN)的方法取得了竞争性进展,但其固有的局部接受域限制了有效的全局信息建模,而基于transformer的方法在捕获远程依赖关系方面表现出色,但受到二次计算复杂性的限制。我们提出了DAMFusion,这是一种双分支架构,通过注意机制和状态空间模型将浅纹理和全局语义特征解耦。具体而言,1)设计一种基于通道空间注意力的浅特征融合模块(SFFM),取代传统的卷积运算,实现精确的局部特征提取;2)构建视觉曼巴与动态卷积相结合的高效改进模型,增强全局特征表示能力;3)构建基于空间归一化的自适应语义特征融合策略,建立浅特征与全局特征的动态交互机制。大量实验表明,DAMFusion在红外-可见光融合和医学图像融合任务中取得了具有竞争力的性能,在客观指标和主观视觉质量方面比现有方法有了一致的改进,从而为跨模态图像融合提供了新的技术范式。该代码发布在https://github.com/youdianlong/DAMFusion.git。
{"title":"Cross-modal image fusion via dual attention and Mamba","authors":"Dianlong You ,&nbsp;Yulong Wang ,&nbsp;Cunguo Tao ,&nbsp;Zhen Chen ,&nbsp;Shunfu Jin","doi":"10.1016/j.eswa.2026.131303","DOIUrl":"10.1016/j.eswa.2026.131303","url":null,"abstract":"<div><div>Cross-modal image fusion aims to integrate complementary information from different imaging sources to generate high-quality images with comprehensive information and fine details. Although convolutional neural network(CNN)-based methods have made competitive progress, their inherent local receptive fields limit effective global information modeling, while Transformer-based approaches excel at capturing long-range dependencies but are constrained by quadratic computational complexity. We propose DAMFusion, a dual-branch architecture that decouples shallow texture and global semantic features through attention mechanisms and state space models. Specifically, 1) design a Shallow Feature Fusion Module(SFFM) based on channel-spatial attention to replace traditional convolution operations, enabling precise local feature extraction; 2) construct an efficient improved model combining visual Mamba with dynamic convolution, enhancing global feature representation capabilities; 3) make an adaptive semantic feature fusion strategy based on spatial normalization to establish dynamic interaction mechanisms between shallow and global features. Extensive experiments demonstrate that DAMFusion achieves competitive performance in infrared-visible fusion and medical image fusion tasks, demonstrating consistent improvements over existing methods in objective metrics and subjective visual quality, thus providing a new technical paradigm for cross-modal image fusion. The code is released at <span><span>https://github.com/youdianlong/DAMFusion.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"310 ","pages":"Article 131303"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusion requires interaction: a hybrid Mamba-transformer architecture for deep interactive fusion of multi-modal images 融合需要交互:用于多模态图像深度交互融合的混合Mamba-transformer架构
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.eswa.2026.131309
Wenxiao Xu , Chen Wu , Qiyuan Yin , Ling Wang , Zhuoran Zheng , Daqing Huang
Multi-modal image fusion (MMIF) integrates complementary information from multi-source images to enhance visual quality for downstream vision tasks. Existing methods have proposed numerous promising solutions, yet they still exhibit deficiencies in multi-modal feature interaction. In this paper, we find that Transformer-based architectures outperform Mamba-based counterparts in fundamental feature extraction, while Mamba’s unique scanning mechanism holds significant potential for deep multi-modal feature interaction. To this end, we propose HTM, a novel hybrid Transformer-Mamba architecture. Specifically, HTM leverages the respective strengths of both architectures: Transformer blocks enable effective feature extraction and Mamba blocks achieve efficient feature interaction. Building upon vanilla Mamba, we design a cross-modal local feature scanning mechanism (CMLFSM) that performs channel-wise joint scanning to align and fuse analogous features across modalities. Furthermore, we incorporate a Cross-Modal Gated Feedforward Network (CMFFN) that leverages inter-modal information flows to execute dynamic gating, effectively minimizing the flow of non-essential information. Finally, a CLIP-based loss is proposed to provide high-quality semantic guidance for unsupervised MMIF tasks. Extensive experiments demonstrate that our method achieves superior results across multiple image fusion benchmarks. The project code and pre-trained models are available upon acceptance.
多模态图像融合(MMIF)将多源图像的互补信息集成在一起,以提高下游视觉任务的视觉质量。现有的方法提出了许多有前途的解决方案,但它们在多模态特征交互方面仍然存在不足。在本文中,我们发现基于transformer的体系结构在基本特征提取方面优于基于Mamba的体系结构,而Mamba独特的扫描机制在深度多模态特征交互方面具有巨大的潜力。为此,我们提出了HTM,一种新颖的混合Transformer-Mamba架构。具体来说,HTM利用了这两种体系结构各自的优势:Transformer块支持有效的特征提取,而Mamba块实现有效的特征交互。在香草Mamba的基础上,我们设计了一个跨模态本地特征扫描机制(CMLFSM),该机制执行通道联合扫描,以对齐和融合跨模态的类似特征。此外,我们结合了一个跨模态门控前馈网络(CMFFN),利用多模态信息流执行动态门控,有效地减少了非必要信息流。最后,提出了一种基于clip的损失,为无监督MMIF任务提供高质量的语义指导。大量的实验表明,我们的方法在多个图像融合基准测试中取得了优异的效果。项目代码和预训练模型在验收时可用。
{"title":"Fusion requires interaction: a hybrid Mamba-transformer architecture for deep interactive fusion of multi-modal images","authors":"Wenxiao Xu ,&nbsp;Chen Wu ,&nbsp;Qiyuan Yin ,&nbsp;Ling Wang ,&nbsp;Zhuoran Zheng ,&nbsp;Daqing Huang","doi":"10.1016/j.eswa.2026.131309","DOIUrl":"10.1016/j.eswa.2026.131309","url":null,"abstract":"<div><div>Multi-modal image fusion (MMIF) integrates complementary information from multi-source images to enhance visual quality for downstream vision tasks. Existing methods have proposed numerous promising solutions, yet they still exhibit deficiencies in multi-modal feature interaction. In this paper, we find that Transformer-based architectures outperform Mamba-based counterparts in fundamental feature extraction, while Mamba’s unique scanning mechanism holds significant potential for deep multi-modal feature interaction. To this end, we propose HTM, a novel hybrid Transformer-Mamba architecture. Specifically, HTM leverages the respective strengths of both architectures: Transformer blocks enable effective feature extraction and Mamba blocks achieve efficient feature interaction. Building upon vanilla Mamba, we design a cross-modal local feature scanning mechanism (CMLFSM) that performs channel-wise joint scanning to align and fuse analogous features across modalities. Furthermore, we incorporate a Cross-Modal Gated Feedforward Network (CMFFN) that leverages inter-modal information flows to execute dynamic gating, effectively minimizing the flow of non-essential information. Finally, a CLIP-based loss is proposed to provide high-quality semantic guidance for unsupervised MMIF tasks. Extensive experiments demonstrate that our method achieves superior results across multiple image fusion benchmarks. The project code and pre-trained models are available upon acceptance.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"312 ","pages":"Article 131309"},"PeriodicalIF":7.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Expert Systems with Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1