首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
Innovative optimization-driven machine learning models for hourly streamflow forecasting 创新优化驱动的机器学习模型,用于每小时流量预测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-02 DOI: 10.1016/j.knosys.2026.115487
Peiman Parisouj , Changhyun Jun , Sayed M. Bateni , Shunlin Liang
This study introduces a novel framework for short-term streamflow forecasting by integrating multilayer perceptron (MLP) and gradient boosting (GB) models with artificial rabbit optimization (ARO) and the honey badger algorithm (HBA). The proposed framework addresses a critical need for accurate flood forecasting by providing a robust alternative to complex physical models. The methodology is applied to the flood-prone Chehalis Basin in the U.S. using 2011–2023 hydrometeorological data, including precipitation, temperature, humidity, wind speed, and streamflow. The study systematically evaluates the impact of input data quality and quantity by testing two model configurations: base models (M1 and M2) with simpler inputs, and upgraded models (M3, M4, and M5) with more complex features. The optimized HBA-MLP hybrid model achieves 1–6 h streamflow forecasts with root mean square error (RMSE) values of 1.87–7.58 (m3/s) and R2 of 0.99–1.0 during testing on 2019–2023 data, which was excluded from training. On average, the MLP models using M5 inputs demonstrate a 58 % lower RMSE and 22.6 % lower mean absolute error (MAE) compared to GB models. The HBA-MLP M5 model excels in predicting extreme flow events, addressing a key challenge in hydrological forecasting. Furthermore, the proposed framework outperformed the National Water Model (NWM), especially during high-flow periods, making it more suitable for real-time flood forecasting. Overall, this study demonstrates how machine learning models, when combined with optimization techniques, can enhance the accuracy and reliability of flood forecasting systems, facilitating more effective flood mitigation strategies in similar basins.
本研究通过将多层感知器(MLP)和梯度增强(GB)模型与人工兔子优化(ARO)和蜜獾算法(HBA)相结合,提出了一种新的短期流量预测框架。提出的框架通过提供复杂物理模型的可靠替代方案,解决了精确洪水预报的关键需求。该方法使用2011-2023年水文气象数据,包括降水、温度、湿度、风速和流量,应用于美国易发生洪水的Chehalis盆地。本研究通过测试两种模型配置,即输入更简单的基础模型(M1和M2)和特征更复杂的升级模型(M3、M4和M5),系统地评估了输入数据质量和数量的影响。优化后的HBA-MLP混合模型在2019-2023年排除训练数据的测试中,实现了1-6 h的流量预测,均方根误差(RMSE)为1.87-7.58 (m3/s), R2为0.99-1.0。平均而言,与GB模型相比,使用M5输入的MLP模型显示RMSE降低58%,平均绝对误差(MAE)降低22.6%。HBA-MLP M5模型在预测极端流量事件方面表现出色,解决了水文预测中的一个关键挑战。此外,所提出的框架优于国家水模型(NWM),特别是在高流量时期,使其更适合实时洪水预报。总体而言,本研究表明,机器学习模型与优化技术相结合,可以提高洪水预报系统的准确性和可靠性,从而在类似流域促进更有效的洪水缓解策略。
{"title":"Innovative optimization-driven machine learning models for hourly streamflow forecasting","authors":"Peiman Parisouj ,&nbsp;Changhyun Jun ,&nbsp;Sayed M. Bateni ,&nbsp;Shunlin Liang","doi":"10.1016/j.knosys.2026.115487","DOIUrl":"10.1016/j.knosys.2026.115487","url":null,"abstract":"<div><div>This study introduces a novel framework for short-term streamflow forecasting by integrating multilayer perceptron (MLP) and gradient boosting (GB) models with artificial rabbit optimization (ARO) and the honey badger algorithm (HBA). The proposed framework addresses a critical need for accurate flood forecasting by providing a robust alternative to complex physical models. The methodology is applied to the flood-prone Chehalis Basin in the U.S. using 2011–2023 hydrometeorological data, including precipitation, temperature, humidity, wind speed, and streamflow. The study systematically evaluates the impact of input data quality and quantity by testing two model configurations: base models (M1 and M2) with simpler inputs, and upgraded models (M3, M4, and M5) with more complex features. The optimized HBA-MLP hybrid model achieves 1–6 h streamflow forecasts with root mean square error (RMSE) values of 1.87–7.58 <span><math><mrow><mo>(</mo><msup><mrow><mi>m</mi></mrow><mn>3</mn></msup><mo>/</mo><mi>s</mi><mo>)</mo></mrow></math></span> and <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span> of 0.99–1.0 during testing on 2019–2023 data, which was excluded from training. On average, the MLP models using M5 inputs demonstrate a 58 % lower RMSE and 22.6 % lower mean absolute error (MAE) compared to GB models. The HBA-MLP M5 model excels in predicting extreme flow events, addressing a key challenge in hydrological forecasting. Furthermore, the proposed framework outperformed the National Water Model (NWM), especially during high-flow periods, making it more suitable for real-time flood forecasting. Overall, this study demonstrates how machine learning models, when combined with optimization techniques, can enhance the accuracy and reliability of flood forecasting systems, facilitating more effective flood mitigation strategies in similar basins.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115487"},"PeriodicalIF":7.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIARS: Mutual information-guided feature selection with angle reconstruction and semantic alignment for multi-label learning MIARS:基于角度重构和语义对齐的多标签学习的互信息引导特征选择
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-02 DOI: 10.1016/j.knosys.2026.115424
Ruijia Li , Hong Chen , Yingcang Ma , Feiping Nie , Yixiao Huang
To address the key challenges in multi-label feature selection, including the non-smooth optimization problem caused by discrete label representation, the insufficient generalization performance due to ignored label correlations, and the difficulty in balancing feature discriminability and redundancy, we propose a Mutual Information-guided Angle Reconstruction and Semantic Alignment (MIARS) feature selection method. This method achieves breakthrough progress through three core technological innovations: First, it innovatively maps discrete labels to a unit hypersphere space and achieves continuous label representation by minimizing the Angle Reconstruction Error (ARE), effectively preserving the global similarity structure among labels. Second, an orthogonal rotation matrix optimization mechanism is introduced to achieve precise semantic alignment by maximizing the cosine similarity between pseudo-labels and true labels. Finally, a strategy combining mutual information matrices with ℓ2,0-norm constraints is adopted to directly select the optimal feature subset with low redundancy and high discriminability. Experimental results on nine benchmark datasets demonstrate the significant effectiveness of MIARS.
针对多标签特征选择中由于离散标签表示导致的非光滑优化问题、忽略标签相关性导致的泛化性能不足以及难以平衡特征可判别性和冗余性等问题,提出了一种互信息引导的角度重构和语义对齐(MIARS)特征选择方法。该方法通过三个核心技术创新取得突破性进展:一是创新地将离散标签映射到单位超球空间,通过最小化角度重构误差(Angle Reconstruction Error, ARE)实现连续标签表示,有效地保持了标签之间的全局相似结构;其次,引入正交旋转矩阵优化机制,通过最大化伪标签与真标签之间的余弦相似度来实现精确的语义对齐;最后,采用互信息矩阵与0范数约束相结合的策略,直接选择低冗余、高判别性的最优特征子集。在9个基准数据集上的实验结果证明了MIARS的显著有效性。
{"title":"MIARS: Mutual information-guided feature selection with angle reconstruction and semantic alignment for multi-label learning","authors":"Ruijia Li ,&nbsp;Hong Chen ,&nbsp;Yingcang Ma ,&nbsp;Feiping Nie ,&nbsp;Yixiao Huang","doi":"10.1016/j.knosys.2026.115424","DOIUrl":"10.1016/j.knosys.2026.115424","url":null,"abstract":"<div><div>To address the key challenges in multi-label feature selection, including the non-smooth optimization problem caused by discrete label representation, the insufficient generalization performance due to ignored label correlations, and the difficulty in balancing feature discriminability and redundancy, we propose a Mutual Information-guided Angle Reconstruction and Semantic Alignment (MIARS) feature selection method. This method achieves breakthrough progress through three core technological innovations: First, it innovatively maps discrete labels to a unit hypersphere space and achieves continuous label representation by minimizing the Angle Reconstruction Error (ARE), effectively preserving the global similarity structure among labels. Second, an orthogonal rotation matrix optimization mechanism is introduced to achieve precise semantic alignment by maximizing the cosine similarity between pseudo-labels and true labels. Finally, a strategy combining mutual information matrices with ℓ<sub>2,0</sub>-norm constraints is adopted to directly select the optimal feature subset with low redundancy and high discriminability. Experimental results on nine benchmark datasets demonstrate the significant effectiveness of MIARS.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115424"},"PeriodicalIF":7.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of fine-tuning on entity resolution: An experimental evaluation 微调对实体分辨率的影响:一个实验评估
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-02 DOI: 10.1016/j.knosys.2026.115427
Dimitrios Karapiperis, Leonidas Akritidis, Panayiotis Bozanis
Fine-tuning pre-trained language models has become the state-of-the-art approach for Entity Resolution (ER), but this has created a divide between two dominant architectures: fast-but-less-accurate bi-encoders and accurate-but-slow cross-encoders. However, a concrete gap in prior ER benchmarking remains unresolved: existing studies often evaluate architectures in isolation or on limited datasets. It remains unclear which base models and architectures are best suited for the diverse range of real-world ER datasets, each with unique characteristics and performance bottlenecks. This paper bridges this gap through an extensive empirical evaluation. We systematically compare three popular pre-trained models (MiniLM, MPNet, and BGE) across three distinct architectural paradigms: a pre-trained bi-encoder, a fine-tuned bi-encoder, and a fine-tuned cross-encoder. We tested these combinations on eight diverse real-world and semi-synthetic datasets, analyzing their performance, training costs, and final resolution times. Our results reveal a clear accuracy-vs-efficiency trade-off, identifying the fine-tuned bi-encoder as the optimal balance between performance and practical resolution speed. More importantly, we demonstrate that fine-tuning is not a universal solution. Its effectiveness is highly contingent on the dataset: it provides substantial gains on specialized domains by fixing pre-existing performance gaps but is detrimental to performance on datasets where pre-trained models are already well-aligned. These findings provide a practical guide for practitioners on selecting the optimal model and architecture based on their specific data and application requirements.
微调预训练的语言模型已经成为实体解析(ER)的最先进的方法,但这在两种主流架构之间产生了分歧:快速但不太准确的双编码器和准确但缓慢的交叉编码器。然而,在先前的ER基准测试中,一个具体的差距仍然没有得到解决:现有的研究通常是在孤立的或有限的数据集上评估架构。目前还不清楚哪些基本模型和体系结构最适合各种实际ER数据集,每个数据集都有独特的特征和性能瓶颈。本文通过广泛的实证评估弥合了这一差距。我们系统地比较了三种流行的预训练模型(MiniLM、MPNet和BGE),它们跨越三种不同的架构范式:预训练的双编码器、微调的双编码器和微调的交叉编码器。我们在八个不同的真实世界和半合成数据集上测试了这些组合,分析了它们的性能、训练成本和最终分辨率时间。我们的研究结果揭示了一个清晰的精度与效率的权衡,确定了微调的双编码器是性能和实际分辨率速度之间的最佳平衡。更重要的是,我们证明微调不是万能的解决方案。它的有效性在很大程度上取决于数据集:它通过修复预先存在的性能差距,在特定领域提供了实质性的收益,但在预训练模型已经很好地对齐的数据集上,它对性能是有害的。这些发现为从业者根据他们的特定数据和应用程序需求选择最佳模型和体系结构提供了实用指南。
{"title":"The impact of fine-tuning on entity resolution: An experimental evaluation","authors":"Dimitrios Karapiperis,&nbsp;Leonidas Akritidis,&nbsp;Panayiotis Bozanis","doi":"10.1016/j.knosys.2026.115427","DOIUrl":"10.1016/j.knosys.2026.115427","url":null,"abstract":"<div><div>Fine-tuning pre-trained language models has become the state-of-the-art approach for Entity Resolution (ER), but this has created a divide between two dominant architectures: fast-but-less-accurate bi-encoders and accurate-but-slow cross-encoders. However, a concrete gap in prior ER benchmarking remains unresolved: existing studies often evaluate architectures in isolation or on limited datasets. It remains unclear which base models and architectures are best suited for the diverse range of real-world ER datasets, each with unique characteristics and performance bottlenecks. This paper bridges this gap through an extensive empirical evaluation. We systematically compare three popular pre-trained models (MiniLM, MPNet, and BGE) across three distinct architectural paradigms: a pre-trained bi-encoder, a fine-tuned bi-encoder, and a fine-tuned cross-encoder. We tested these combinations on eight diverse real-world and semi-synthetic datasets, analyzing their performance, training costs, and final resolution times. Our results reveal a clear accuracy-vs-efficiency trade-off, identifying the fine-tuned bi-encoder as the optimal balance between performance and practical resolution speed. More importantly, we demonstrate that fine-tuning is not a universal solution. Its effectiveness is highly contingent on the dataset: it provides substantial gains on specialized domains by fixing pre-existing performance gaps but is detrimental to performance on datasets where pre-trained models are already well-aligned. These findings provide a practical guide for practitioners on selecting the optimal model and architecture based on their specific data and application requirements.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115427"},"PeriodicalIF":7.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HyTexNet: Percentile-guided local encoding and deep feature fusion for enhanced texture classification HyTexNet:用于增强纹理分类的百分位引导局部编码和深度特征融合
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-02 DOI: 10.1016/j.knosys.2026.115482
Vandana Gupta , Ashish Mishra , Nishant Shrivastava
Texture classification remains a challenging problem in computer vision, particularly under variations in illumination, pose, and scale. While deep networks provide powerful semantic representations, they often overlook fine-grained local structures, whereas handcrafted descriptors, though interpretable, struggle with adaptability. To address these limitations, this paper introduces HyTexNet, a hybrid framework that fuses percentile-guided local encoding with deep embeddings from DenseNet-121. The proposed encoding scheme employs an adaptive threshold based on the 75th percentile of neighborhood intensity differences, enabling the descriptor to capture significant local contrasts while suppressing redundant variations. This local representation is combined with global semantic features obtained through global average pooling, and a lightweight fusion head optimizes the joint feature space for classification. Extensive experiments on four benchmark datasets (UIUC, Kylberg, Brodatz, and KTH-TIPS2b) demonstrate that HyTexNet achieves classification accuracies of 95.65%, 100%, 99.22%, and 99.79%, respectively, indicating consistently strong performance across diverse texture categories and imaging conditions. Additional evaluation on a challenging real-world texture dataset (DTD) further demonstrates the robustness and generalization capability of the proposed framework beyond controlled benchmark settings. In addition to accuracy, the framework is compact and computationally efficient, making it practical for scenarios with limited data and resources. These results position HyTexNet as a balanced alternative to recent texture analysis methods, offering a combination of robustness, interpretability, and scalability that bridges the gap between handcrafted and deep learning-based approaches.
纹理分类在计算机视觉中仍然是一个具有挑战性的问题,特别是在光照、姿态和比例变化的情况下。虽然深度网络提供了强大的语义表示,但它们往往忽略了细粒度的局部结构,而手工制作的描述符虽然可解释,但在适应性方面存在困难。为了解决这些限制,本文介绍了HyTexNet,这是一个混合框架,融合了来自DenseNet-121的百分位数引导的局部编码和深度嵌入。所提出的编码方案采用基于邻域强度差异的第75个百分位数的自适应阈值,使描述符能够捕获重要的局部对比,同时抑制冗余变化。该局部表示与通过全局平均池化获得的全局语义特征相结合,轻量级融合头优化联合特征空间进行分类。在UIUC、Kylberg、Brodatz和KTH-TIPS2b四个基准数据集上进行的大量实验表明,HyTexNet的分类准确率分别达到95.65%、100%、99.22%和99.79%,表明在不同纹理类别和成像条件下,HyTexNet的分类准确率始终保持在较高水平。对具有挑战性的真实世界纹理数据集(DTD)的额外评估进一步证明了所提出框架在受控基准设置之外的鲁棒性和泛化能力。除了准确性之外,该框架紧凑且计算效率高,使其适用于数据和资源有限的场景。这些结果将HyTexNet定位为当前纹理分析方法的平衡替代方案,提供鲁棒性、可解释性和可扩展性的组合,弥补了手工方法和基于深度学习的方法之间的差距。
{"title":"HyTexNet: Percentile-guided local encoding and deep feature fusion for enhanced texture classification","authors":"Vandana Gupta ,&nbsp;Ashish Mishra ,&nbsp;Nishant Shrivastava","doi":"10.1016/j.knosys.2026.115482","DOIUrl":"10.1016/j.knosys.2026.115482","url":null,"abstract":"<div><div>Texture classification remains a challenging problem in computer vision, particularly under variations in illumination, pose, and scale. While deep networks provide powerful semantic representations, they often overlook fine-grained local structures, whereas handcrafted descriptors, though interpretable, struggle with adaptability. To address these limitations, this paper introduces HyTexNet, a hybrid framework that fuses percentile-guided local encoding with deep embeddings from DenseNet-121. The proposed encoding scheme employs an adaptive threshold based on the 75th percentile of neighborhood intensity differences, enabling the descriptor to capture significant local contrasts while suppressing redundant variations. This local representation is combined with global semantic features obtained through global average pooling, and a lightweight fusion head optimizes the joint feature space for classification. Extensive experiments on four benchmark datasets (UIUC, Kylberg, Brodatz, and KTH-TIPS2b) demonstrate that HyTexNet achieves classification accuracies of 95.65%, 100%, 99.22%, and 99.79%, respectively, indicating consistently strong performance across diverse texture categories and imaging conditions. Additional evaluation on a challenging real-world texture dataset (DTD) further demonstrates the robustness and generalization capability of the proposed framework beyond controlled benchmark settings. In addition to accuracy, the framework is compact and computationally efficient, making it practical for scenarios with limited data and resources. These results position HyTexNet as a balanced alternative to recent texture analysis methods, offering a combination of robustness, interpretability, and scalability that bridges the gap between handcrafted and deep learning-based approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115482"},"PeriodicalIF":7.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure adversarial augmented graph anomaly detection via multi-view contrastive learning 基于多视图对比学习的结构对抗增强图异常检测
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 DOI: 10.1016/j.knosys.2026.115455
Qian Chen , Huiying Xu , Ruidong Wang , Yue Liu , Xinzhong Zhu
Graph anomaly detection is essential for many security-related fields but faces significant challenges in handling complex real-world graph data. Due to the complex and imbalanced graph structure, it is difficult to find abnormal points among many nodes. Current contrastive learning methods often overlook structural imperfections in real-world graphs, such as redundant edges and low-degree sparse nodes. Redundant connections may introduce noise during message passing, while sparse nodes receive insufficient structural information to accurately learn representation, which can degrade detection performance. To overcome above challenges, we propose SAA-GCL, an innovative framework that integrates adaptive structure adversarial augmentation with multi-view contrastive learning. Specifically, by edge weight learning and LMSE loss calculation, our approach adaptively optimizes the structure of the augmented graph, discards redundant edges as much as possible, and retains more discriminating features. For low-degree sparse nodes, we mix their self-networks with the self-networks of auxiliary nodes to improve the representation quality. In order to fully mine abnormal information, we use the multi-view contrastive loss function to distinguish positive and negative sample pairs within the view and maintain cross-view consistency. The framework adaptively refines the graph topology to suppress noisy edges and enhance representations for structurally weak nodes, so it can improve anomaly detection performance in the imbalanced structure attribute graph. Comprehensive experiments on six real-world graph datasets show that SAA-GCL is superior to existing methods in detection accuracy. Our code is open source at https://github.com/HZAI-ZJNU/SAAGCL.
图异常检测在许多与安全相关的领域是必不可少的,但在处理复杂的现实世界图数据方面面临着重大挑战。由于图结构复杂且不平衡,很难在众多节点中发现异常点。目前的对比学习方法往往忽略了现实世界图的结构缺陷,如冗余边和低度稀疏节点。冗余连接可能会在消息传递过程中引入噪声,而稀疏节点接收到的结构信息不足,无法准确学习表征,从而降低检测性能。为了克服上述挑战,我们提出了一种集成了自适应结构对抗增强和多视角对比学习的创新框架SAA-GCL。具体来说,我们的方法通过边权学习和LMSE损失计算,自适应优化增广图的结构,尽可能地丢弃冗余边,并保留更多的判别特征。对于低度稀疏节点,我们将其自网络与辅助节点的自网络混合,以提高表示质量。为了充分挖掘异常信息,我们使用多视图对比损失函数来区分视图内的正、负样本对,并保持跨视图一致性。该框架自适应细化图拓扑,抑制边缘噪声,增强结构弱节点的表示,从而提高结构不平衡属性图的异常检测性能。在6个真实图数据集上的综合实验表明,SAA-GCL在检测精度上优于现有方法。我们的代码在https://github.com/HZAI-ZJNU/SAAGCL上是开源的。
{"title":"Structure adversarial augmented graph anomaly detection via multi-view contrastive learning","authors":"Qian Chen ,&nbsp;Huiying Xu ,&nbsp;Ruidong Wang ,&nbsp;Yue Liu ,&nbsp;Xinzhong Zhu","doi":"10.1016/j.knosys.2026.115455","DOIUrl":"10.1016/j.knosys.2026.115455","url":null,"abstract":"<div><div>Graph anomaly detection is essential for many security-related fields but faces significant challenges in handling complex real-world graph data. Due to the complex and imbalanced graph structure, it is difficult to find abnormal points among many nodes. Current contrastive learning methods often overlook structural imperfections in real-world graphs, such as redundant edges and low-degree sparse nodes. Redundant connections may introduce noise during message passing, while sparse nodes receive insufficient structural information to accurately learn representation, which can degrade detection performance. To overcome above challenges, we propose SAA-GCL, an innovative framework that integrates adaptive structure adversarial augmentation with multi-view contrastive learning. Specifically, by edge weight learning and LMSE loss calculation, our approach adaptively optimizes the structure of the augmented graph, discards redundant edges as much as possible, and retains more discriminating features. For low-degree sparse nodes, we mix their self-networks with the self-networks of auxiliary nodes to improve the representation quality. In order to fully mine abnormal information, we use the multi-view contrastive loss function to distinguish positive and negative sample pairs within the view and maintain cross-view consistency. The framework adaptively refines the graph topology to suppress noisy edges and enhance representations for structurally weak nodes, so it can improve anomaly detection performance in the imbalanced structure attribute graph. Comprehensive experiments on six real-world graph datasets show that SAA-GCL is superior to existing methods in detection accuracy. Our code is open source at <span><span>https://github.com/HZAI-ZJNU/SAAGCL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115455"},"PeriodicalIF":7.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking static weights: Language-guided adaptive weight adjustment for 3D visual grounding 重新思考静态权重:三维视觉基础的语言引导自适应权重调整
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 DOI: 10.1016/j.knosys.2026.115467
Zongshun Wang , Ce Li , Zhiqiang Feng , Limei Xiao , Pengcheng Wang , Mengmeng Ping
3D Visual Grounding (3DVG) aims to accurately localize target objects in complex 3D point cloud scenes using natural language descriptions. However, current methods typically utilize static visual encoders with fixed parameters to handle the infinite variety of linguistic queries. This static approach inevitably leads to low signal-to-noise ratios in the feature inputs during the subsequent visual-language fusion stage. To overcome this limitation, we propose a Language-guided Adaptive Weight Adjustment (LAWA) framework that equips the visual backbone with query-aware dynamic adaptability during the early visual encoding stage via a lightweight language-guided strategy. Specifically, we first construct visual features that integrate class prior information using Object Semantic Augmented Encoding. Then, by leveraging weight coefficients derived from multimodal embeddings, we employ a Low-Rank Adaptation-based Dynamic Weight Adjustment (DWA) module to update the linear projection layers and weight matrices within the visual encoder’s attention mechanism. This approach enables the model to focus more effectively on visual regions that are semantically aligned with the textual descriptions. Extensive experiments demonstrate that LAWA achieves an [email protected] of 86.2% on the ScanRefer dataset, and overall accuracies of 69.5% and 58.4% on the Sr3D and Nr3D datasets, respectively, all while maintaining superior parameter efficiency.
3D视觉定位(3D Visual Grounding, 3DVG)旨在利用自然语言描述在复杂的三维点云场景中精确定位目标物体。然而,目前的方法通常使用具有固定参数的静态视觉编码器来处理无穷多种语言查询。这种静态方法不可避免地导致在随后的视觉语言融合阶段特征输入的低信噪比。为了克服这一限制,我们提出了一种语言引导的自适应权重调整(LAWA)框架,该框架通过轻量级的语言引导策略,在早期视觉编码阶段为视觉主干提供查询感知的动态适应性。具体而言,我们首先使用对象语义增强编码构建集成类先验信息的视觉特征。然后,通过利用从多模态嵌入中获得的权重系数,我们采用基于低秩自适应的动态权重调整(DWA)模块来更新视觉编码器注意机制中的线性投影层和权重矩阵。这种方法使模型能够更有效地关注语义上与文本描述一致的视觉区域。大量实验表明,LAWA在scanreference数据集上的[email protected]准确率为86.2%,在Sr3D和Nr3D数据集上的总体准确率分别为69.5%和58.4%,同时保持了优越的参数效率。
{"title":"Rethinking static weights: Language-guided adaptive weight adjustment for 3D visual grounding","authors":"Zongshun Wang ,&nbsp;Ce Li ,&nbsp;Zhiqiang Feng ,&nbsp;Limei Xiao ,&nbsp;Pengcheng Wang ,&nbsp;Mengmeng Ping","doi":"10.1016/j.knosys.2026.115467","DOIUrl":"10.1016/j.knosys.2026.115467","url":null,"abstract":"<div><div>3D Visual Grounding (3DVG) aims to accurately localize target objects in complex 3D point cloud scenes using natural language descriptions. However, current methods typically utilize static visual encoders with fixed parameters to handle the infinite variety of linguistic queries. This static approach inevitably leads to low signal-to-noise ratios in the feature inputs during the subsequent visual-language fusion stage. To overcome this limitation, we propose a Language-guided Adaptive Weight Adjustment (LAWA) framework that equips the visual backbone with query-aware dynamic adaptability during the early visual encoding stage via a lightweight language-guided strategy. Specifically, we first construct visual features that integrate class prior information using Object Semantic Augmented Encoding. Then, by leveraging weight coefficients derived from multimodal embeddings, we employ a Low-Rank Adaptation-based Dynamic Weight Adjustment (DWA) module to update the linear projection layers and weight matrices within the visual encoder’s attention mechanism. This approach enables the model to focus more effectively on visual regions that are semantically aligned with the textual descriptions. Extensive experiments demonstrate that LAWA achieves an [email protected] of 86.2% on the ScanRefer dataset, and overall accuracies of 69.5% and 58.4% on the Sr3D and Nr3D datasets, respectively, all while maintaining superior parameter efficiency.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115467"},"PeriodicalIF":7.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transferable multi-level spatial-temporal graph neural network for adaptive multi-agent trajectory prediction 自适应多智能体轨迹预测的可转移多层次时空图神经网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.knosys.2026.115451
Yu Sun , Dengyu Xiao , Mengdie Huang , Jiali Wang , Chuan Tong , Jun Luo , Huayan Pu
Accurately predicting future multi-agent trajectories at intersections is crucial yet challenging due to complex and dynamic traffic environments. Existing methods struggle with cross-domain trajectory prediction owing to: 1) there are significant differences in spatiotemporal features between domains, which leads to insufficient modeling of trajectory temporal sequence dynamics during cross-domain spatiotemporal alignment; and 2) the strong heterogeneity of behavioral patterns within different datasets causes significant domain shifts, resulting in a notable performance decline when the model is transferred across datasets. To address the aforementioned challenges, this paper proposes a transferable multi-level spatial-temporal graph neural network (T-MLSTG). Based on maximum mean discrepancy theory, we design a windowed mean gradient discrepancy (WMGD) metric that incorporates mean and gradient information of temporal features to better capture cross-domain distribution differences. Furthermore, a multi-level spatial-temporal graph network (MLSTG) is designed with a two-level architecture. The first level encodes historical spatiotemporal features independently, while the second level integrates spatiotemporal features and employs a channel attention mechanism to enhance feature discrimination. The performance of T-MLSTG was evaluated on the inD and INTERACTION datasets. Compared to the baseline model, the cross-domain trajectory prediction results demonstrate a reduction in root mean square error (RMSE) of 0.812. In cross-dataset trajectory prediction evaluation, the mean error was reduced by 27.8%, demonstrating the method’s effectiveness and generalization capability.
由于复杂和动态的交通环境,准确预测未来交叉口的多智能体轨迹至关重要,但也具有挑战性。现有方法在跨域轨迹预测方面存在困难,主要原因是:1)域间时空特征差异较大,导致对跨域轨迹时间序列动态建模不足;2)不同数据集中行为模式的强异质性导致了显著的域迁移,导致模型跨数据集迁移时性能显著下降。为了解决上述问题,本文提出了一种可转移的多层次时空图神经网络(T-MLSTG)。基于最大均值差异理论,设计了一种结合时间特征均值和梯度信息的带窗均值梯度差异度量,以更好地捕捉跨域分布差异。在此基础上,设计了一个具有两层结构的多层次时空图网络。第1层对历史时空特征进行独立编码,第2层对时空特征进行整合,并采用通道注意机制增强特征识别。在inD和INTERACTION数据集上对T-MLSTG的性能进行了评估。与基线模型相比,跨域轨迹预测结果的均方根误差(RMSE)降低了0.812。在跨数据集轨迹预测评价中,平均误差降低了27.8%,证明了该方法的有效性和泛化能力。
{"title":"Transferable multi-level spatial-temporal graph neural network for adaptive multi-agent trajectory prediction","authors":"Yu Sun ,&nbsp;Dengyu Xiao ,&nbsp;Mengdie Huang ,&nbsp;Jiali Wang ,&nbsp;Chuan Tong ,&nbsp;Jun Luo ,&nbsp;Huayan Pu","doi":"10.1016/j.knosys.2026.115451","DOIUrl":"10.1016/j.knosys.2026.115451","url":null,"abstract":"<div><div>Accurately predicting future multi-agent trajectories at intersections is crucial yet challenging due to complex and dynamic traffic environments. Existing methods struggle with cross-domain trajectory prediction owing to: 1) there are significant differences in spatiotemporal features between domains, which leads to insufficient modeling of trajectory temporal sequence dynamics during cross-domain spatiotemporal alignment; and 2) the strong heterogeneity of behavioral patterns within different datasets causes significant domain shifts, resulting in a notable performance decline when the model is transferred across datasets. To address the aforementioned challenges, this paper proposes a transferable multi-level spatial-temporal graph neural network (T-MLSTG). Based on maximum mean discrepancy theory, we design a windowed mean gradient discrepancy (<em>WMGD</em>) metric that incorporates mean and gradient information of temporal features to better capture cross-domain distribution differences. Furthermore, a multi-level spatial-temporal graph network (MLSTG) is designed with a two-level architecture. The first level encodes historical spatiotemporal features independently, while the second level integrates spatiotemporal features and employs a channel attention mechanism to enhance feature discrimination. The performance of T-MLSTG was evaluated on the inD and INTERACTION datasets. Compared to the baseline model, the cross-domain trajectory prediction results demonstrate a reduction in root mean square error (RMSE) of 0.812. In cross-dataset trajectory prediction evaluation, the mean error was reduced by 27.8%, demonstrating the method’s effectiveness and generalization capability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115451"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Face and gait based authentication using similarity-optimized bidirectional recurrent neural transformer model 基于人脸和步态的身份验证,基于相似性优化的双向递归神经变压器模型
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.knosys.2026.115445
Sugantha Priyadharshini P , Grace Selvarani A
Biometric recognition is a necessary task in security control systems, yet unimodal techniques often suffer from missing modalities, noise and limited robustness. Thus, the proposed research introduced a multi-modal biometric identification system by integrating both face and gait based images. The key frames are selected by using an enhanced agglomerative nesting clustering algorithm (EAg-NCA) to preserve various information with minimal redundancy in the input video. Noise in the selected key frames is removed by using the Trimmed Pixel density based median filter (TPDMF). Then, from the pre-processed images, faces are detected using a Dung beetle optimization tuned YOLO-V9, while gait silhouettes are extracted by Mask Region based Convolutional Neural Network (Mask-RCNN). The features are extracted from the face, and the gait image is accomplished through a pooled convolutional dense net model (PoC-Den). The extracted features are fused, and a decision has been made regarding the authentication of a person by matching the current features with the features in the database by using a novel similarity based optimized hybrid bidirectional recurrent neural pooling transformer encoder block (Sim-OpPTr). Finally, get the classified result as the person is authorized or unauthorized. The results are evaluated by using various performance metrics, proposed methodology obtained an accuracy of 99.08%. The proposed hybrid strategy improves multi-modal fusion, robustness to noise, and authentication accuracy, making it suitable for real-world surveillance applications.
生物特征识别在安全控制系统中是一项必要的任务,但单峰识别技术往往存在模态缺失、噪声和鲁棒性不足的问题。因此,本研究引入了一种基于人脸和步态图像的多模态生物识别系统。关键帧的选择采用一种增强的聚类嵌套聚类算法(EAg-NCA),在最小化冗余的情况下保留输入视频中的各种信息。使用基于修剪像素密度的中值滤波器(TPDMF)去除所选关键帧中的噪声。然后,在预处理图像中,使用优化的屎壳虫优化算法YOLO-V9检测人脸,同时使用基于掩模区域的卷积神经网络(Mask- rcnn)提取步态轮廓。从人脸中提取特征,并通过池化卷积密集网络模型(PoC-Den)完成步态图像。将提取的特征进行融合,采用一种新颖的基于相似度的优化混合双向循环神经池转换编码器块(Sim-OpPTr),将当前特征与数据库中的特征进行匹配,从而决定是否对人进行身份认证。最后,根据人员的授权或未授权获得分类结果。采用各种性能指标对结果进行了评价,所提出的方法获得了99.08%的准确率。所提出的混合策略提高了多模态融合、抗噪声鲁棒性和认证精度,使其适用于现实世界的监控应用。
{"title":"Face and gait based authentication using similarity-optimized bidirectional recurrent neural transformer model","authors":"Sugantha Priyadharshini P ,&nbsp;Grace Selvarani A","doi":"10.1016/j.knosys.2026.115445","DOIUrl":"10.1016/j.knosys.2026.115445","url":null,"abstract":"<div><div>Biometric recognition is a necessary task in security control systems, yet unimodal techniques often suffer from missing modalities, noise and limited robustness. Thus, the proposed research introduced a multi-modal biometric identification system by integrating both face and gait based images. The key frames are selected by using an enhanced agglomerative nesting clustering algorithm (EAg-NCA) to preserve various information with minimal redundancy in the input video. Noise in the selected key frames is removed by using the Trimmed Pixel density based median filter (TPDMF). Then, from the pre-processed images, faces are detected using a Dung beetle optimization tuned YOLO-V9, while gait silhouettes are extracted by Mask Region based Convolutional Neural Network (Mask-RCNN). The features are extracted from the face, and the gait image is accomplished through a pooled convolutional dense net model (PoC-Den). The extracted features are fused, and a decision has been made regarding the authentication of a person by matching the current features with the features in the database by using a novel similarity based optimized hybrid bidirectional recurrent neural pooling transformer encoder block (Sim-OpPTr). Finally, get the classified result as the person is authorized or unauthorized. The results are evaluated by using various performance metrics, proposed methodology obtained an accuracy of 99.08%. The proposed hybrid strategy improves multi-modal fusion, robustness to noise, and authentication accuracy, making it suitable for real-world surveillance applications.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115445"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detective Behavior Algorithm (DBA): A New Metaheuristic for Design and Engineering Optimization 检测行为算法(DBA):一种新的设计与工程优化元启发式算法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.knosys.2026.115434
Jun Cheng , Wim De Waele
Hunting-inspired algorithms have gained widespread attention in the field of optimization because of their simplicity, flexibility, and natural metaphors. However, many suffer from limitations such as slow convergence rates, sensitivity to parameter settings, and a tendency to become trapped in local optima. To address these challenges, this paper proposes the Detective Behavior Algorithm (DBA), a novel meta-heuristic approach that integrates three core search mechanisms: large-area directional exploration, localized exploitation, and direct target-oriented attacks. DBA is designed to balance exploration and exploitation effectively, enabling faster convergence and improved global search capabilities. The performance is validated through comprehensive application on a suite of benchmark functions and real-world engineering problems. A comparative analysis is conducted against eight state-of-the-art optimization algorithms, including recently developed hunting-inspired methods such as the Walrus Optimizer and Sea-Horse Optimizer. Results consistently demonstrate that DBA outperforms these approaches in terms of convergence speed, solution accuracy, and robustness, particularly in complex optimization scenarios. Furthermore, DBA is applied to predict and optimize surface waviness in Wire Arc Additive Manufacturing components. Two predictive models are developed: one employing an Artificial Neural Network (ANN) optimized by DBA, and another using Particle Swarm Optimization (PSO). The DBA-optimized ANN model exhibits superior predictive accuracy and reliability compared to both standard ANN and PSO-optimized ANN models. Leveraging this enhanced prediction capability, DBA is further used to minimize surface waviness, consistently outperforming competing algorithms. These findings underscore the robustness, adaptability, and real-world applicability of DBA in both theoretical and practical contexts. The source codes of DBA are publicly available at (https://www.mathworks.com/matlabcentral/fileexchange/183178-detective-behavior-algorithm-dba).
受狩猎启发的算法因其简单、灵活和自然隐喻而在优化领域获得了广泛关注。然而,许多算法都存在一些局限性,比如收敛速度慢、对参数设置敏感、容易陷入局部最优。为了应对这些挑战,本文提出了一种新的元启发式算法(DBA),该算法集成了三种核心搜索机制:大面积定向探索、局部利用和直接面向目标的攻击。DBA旨在有效地平衡探索和利用,从而实现更快的收敛和改进的全局搜索功能。通过一套基准函数和实际工程问题的综合应用,验证了该性能。对八种最先进的优化算法进行了比较分析,包括最近开发的狩猎启发方法,如海象优化器和海马优化器。结果一致表明,DBA在收敛速度、解决方案准确性和健壮性方面优于这些方法,特别是在复杂的优化场景中。此外,将DBA应用于电弧增材制造零件表面波纹度的预测和优化。建立了两种预测模型:一种采用DBA优化的人工神经网络(ANN),另一种采用粒子群优化(PSO)。与标准人工神经网络和pso优化人工神经网络模型相比,dba优化的人工神经网络模型具有更高的预测精度和可靠性。利用这种增强的预测能力,DBA进一步用于最小化表面波动,始终优于竞争算法。这些发现强调了DBA在理论和实践上下文中的健壮性、适应性和实际应用。DBA的源代码可以在(https://www.mathworks.com/matlabcentral/fileexchange/183178-detective-behavior-algorithm-dba)上公开获得。
{"title":"Detective Behavior Algorithm (DBA): A New Metaheuristic for Design and Engineering Optimization","authors":"Jun Cheng ,&nbsp;Wim De Waele","doi":"10.1016/j.knosys.2026.115434","DOIUrl":"10.1016/j.knosys.2026.115434","url":null,"abstract":"<div><div>Hunting-inspired algorithms have gained widespread attention in the field of optimization because of their simplicity, flexibility, and natural metaphors. However, many suffer from limitations such as slow convergence rates, sensitivity to parameter settings, and a tendency to become trapped in local optima. To address these challenges, this paper proposes the Detective Behavior Algorithm (DBA), a novel meta-heuristic approach that integrates three core search mechanisms: large-area directional exploration, localized exploitation, and direct target-oriented attacks. DBA is designed to balance exploration and exploitation effectively, enabling faster convergence and improved global search capabilities. The performance is validated through comprehensive application on a suite of benchmark functions and real-world engineering problems. A comparative analysis is conducted against eight state-of-the-art optimization algorithms, including recently developed hunting-inspired methods such as the Walrus Optimizer and Sea-Horse Optimizer. Results consistently demonstrate that DBA outperforms these approaches in terms of convergence speed, solution accuracy, and robustness, particularly in complex optimization scenarios. Furthermore, DBA is applied to predict and optimize surface waviness in Wire Arc Additive Manufacturing components. Two predictive models are developed: one employing an Artificial Neural Network (ANN) optimized by DBA, and another using Particle Swarm Optimization (PSO). The DBA-optimized ANN model exhibits superior predictive accuracy and reliability compared to both standard ANN and PSO-optimized ANN models. Leveraging this enhanced prediction capability, DBA is further used to minimize surface waviness, consistently outperforming competing algorithms. These findings underscore the robustness, adaptability, and real-world applicability of DBA in both theoretical and practical contexts. The source codes of DBA are publicly available at (<span><span>https://www.mathworks.com/matlabcentral/fileexchange/183178-detective-behavior-algorithm-dba</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115434"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Infrared and visible image fusion based on multi-modal and multi-scale cross-compensation 基于多模态多尺度交叉补偿的红外与可见光图像融合
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-31 DOI: 10.1016/j.knosys.2026.115441
Meitian Li, Jing Sun, Heng Ma, Fasheng Wang, Fuming Sun
<div><div>In the task of infrared and visible image fusion, fully preserving the complementary information from different modalities while avoiding detail loss and redundant information superposition has been a core challenge in recent research. Most existing methods primarily focus on feature processing at a single level or for a single modality, leading to insufficient cross-level information interaction and inadequate cross-modal feature fusion. This deficiency typically results in two types of issues: firstly, the lack of effective compensation between adjacent-level features prevents the synergistic utilization of low-level details and high-level semantics; secondly, the differences between features from different modalities are not explicitly modeled, where direct concatenation or weighted summation often introduces redundancy or even artifacts, thereby compromising the overall quality of the fused image. To address these challenges, this paper proposes a novel infrared and visible image fusion network based on a Multi-modal and Multi-scale Cross-compensation referred to as MMCFusion. The proposed network incorporates an Upper-Lower-level Cross-Compensation (ULCC) module that integrates features from adjacent levels to enhance the richness and diversity of feature representations. Additionally, we introduce a Feature-Difference Cross-Compensation (FDCC) module to facilitate cross-compensation of upper-lower-level information through a differential approach. This design enhances the complementarity between features and effectively mitigates the problem of detail information loss prevalent in conventional methods. To further augment the model’s ability to detect and represent objects across various scales, we also devise the Multi-Scale Fusion Module (MSFM) that effectively integrates feature information from multiple scales, thereby improving the model’s adaptability to diverse objects. Furthermore, we design a Texture Enhancement Module (TEM) to capture and retain local structures and texture information in the image, thereby providing richer detail representation after processing. Finally, to comprehensively capture multi-modal information and perform remote modeling, we employ Pyramid Vision Transformer (PVTv2) to construct a dual-stream Transformer encoder, which can capture valuable information at multiple scales and provide robust global modeling capabilities, thereby improving the fusion results. The efficacy of the proposed method is rigorously evaluated on several datasets, including infrared and visible datasets such as MSRS, TNO, and RoadScene, as well as medical imaging datasets, such as PET-MRI. Experimental results demonstrate that MMCFusion significantly outperforms current state-of-the-art methods in terms of both visual quality and quantitative metrics, while also exhibiting strong generalization capability across different datasets, thereby validating its effectiveness and robustness in practical applications. The source co
在红外与可见光图像融合任务中,如何在充分保留不同模态的互补信息的同时避免细节丢失和冗余信息叠加是当前研究的核心问题。现有的特征处理方法大多侧重于单层次或单模态的特征处理,导致跨层次信息交互不足,跨模态特征融合不足。这种缺陷通常会导致两种类型的问题:首先,邻接级特征之间缺乏有效的补偿,阻碍了低级细节和高级语义的协同利用;其次,来自不同模态的特征之间的差异没有明确建模,其中直接连接或加权求和通常会引入冗余甚至伪影,从而损害融合图像的整体质量。为了解决这些问题,本文提出了一种新的基于多模态和多尺度交叉补偿的红外和可见光图像融合网络(MMCFusion)。所提出的网络包含一个上下级交叉补偿(ULCC)模块,该模块集成了相邻级别的特征,以增强特征表示的丰富性和多样性。此外,我们引入了一个特征差分交叉补偿(FDCC)模块,通过差分方法促进上下级信息的交叉补偿。该设计增强了特征之间的互补性,有效地缓解了传统方法中存在的细节信息丢失问题。为了进一步增强模型对不同尺度目标的检测和表示能力,我们还设计了多尺度融合模块(MSFM),该模块有效地集成了多尺度特征信息,从而提高了模型对不同目标的适应性。此外,我们设计了纹理增强模块(TEM)来捕获和保留图像中的局部结构和纹理信息,从而在处理后提供更丰富的细节表示。最后,为了全面捕获多模态信息并进行远程建模,我们采用金字塔视觉变压器(PVTv2)构建了双流变压器编码器,该编码器可以在多尺度上捕获有价值的信息,并提供鲁棒的全局建模能力,从而改善融合结果。该方法的有效性在多个数据集上进行了严格的评估,包括红外和可见光数据集,如MSRS、TNO和RoadScene,以及医学成像数据集,如PET-MRI。实验结果表明,MMCFusion在视觉质量和定量指标方面都明显优于当前最先进的方法,同时在不同数据集上也表现出强大的泛化能力,从而验证了其在实际应用中的有效性和鲁棒性。源代码可从https://github.com/leemt0127/MMCFusion获得。
{"title":"Infrared and visible image fusion based on multi-modal and multi-scale cross-compensation","authors":"Meitian Li,&nbsp;Jing Sun,&nbsp;Heng Ma,&nbsp;Fasheng Wang,&nbsp;Fuming Sun","doi":"10.1016/j.knosys.2026.115441","DOIUrl":"10.1016/j.knosys.2026.115441","url":null,"abstract":"&lt;div&gt;&lt;div&gt;In the task of infrared and visible image fusion, fully preserving the complementary information from different modalities while avoiding detail loss and redundant information superposition has been a core challenge in recent research. Most existing methods primarily focus on feature processing at a single level or for a single modality, leading to insufficient cross-level information interaction and inadequate cross-modal feature fusion. This deficiency typically results in two types of issues: firstly, the lack of effective compensation between adjacent-level features prevents the synergistic utilization of low-level details and high-level semantics; secondly, the differences between features from different modalities are not explicitly modeled, where direct concatenation or weighted summation often introduces redundancy or even artifacts, thereby compromising the overall quality of the fused image. To address these challenges, this paper proposes a novel infrared and visible image fusion network based on a Multi-modal and Multi-scale Cross-compensation referred to as MMCFusion. The proposed network incorporates an Upper-Lower-level Cross-Compensation (ULCC) module that integrates features from adjacent levels to enhance the richness and diversity of feature representations. Additionally, we introduce a Feature-Difference Cross-Compensation (FDCC) module to facilitate cross-compensation of upper-lower-level information through a differential approach. This design enhances the complementarity between features and effectively mitigates the problem of detail information loss prevalent in conventional methods. To further augment the model’s ability to detect and represent objects across various scales, we also devise the Multi-Scale Fusion Module (MSFM) that effectively integrates feature information from multiple scales, thereby improving the model’s adaptability to diverse objects. Furthermore, we design a Texture Enhancement Module (TEM) to capture and retain local structures and texture information in the image, thereby providing richer detail representation after processing. Finally, to comprehensively capture multi-modal information and perform remote modeling, we employ Pyramid Vision Transformer (PVTv2) to construct a dual-stream Transformer encoder, which can capture valuable information at multiple scales and provide robust global modeling capabilities, thereby improving the fusion results. The efficacy of the proposed method is rigorously evaluated on several datasets, including infrared and visible datasets such as MSRS, TNO, and RoadScene, as well as medical imaging datasets, such as PET-MRI. Experimental results demonstrate that MMCFusion significantly outperforms current state-of-the-art methods in terms of both visual quality and quantitative metrics, while also exhibiting strong generalization capability across different datasets, thereby validating its effectiveness and robustness in practical applications. The source co","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115441"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1