Pub Date : 2026-02-02DOI: 10.1016/j.knosys.2026.115487
Peiman Parisouj , Changhyun Jun , Sayed M. Bateni , Shunlin Liang
This study introduces a novel framework for short-term streamflow forecasting by integrating multilayer perceptron (MLP) and gradient boosting (GB) models with artificial rabbit optimization (ARO) and the honey badger algorithm (HBA). The proposed framework addresses a critical need for accurate flood forecasting by providing a robust alternative to complex physical models. The methodology is applied to the flood-prone Chehalis Basin in the U.S. using 2011–2023 hydrometeorological data, including precipitation, temperature, humidity, wind speed, and streamflow. The study systematically evaluates the impact of input data quality and quantity by testing two model configurations: base models (M1 and M2) with simpler inputs, and upgraded models (M3, M4, and M5) with more complex features. The optimized HBA-MLP hybrid model achieves 1–6 h streamflow forecasts with root mean square error (RMSE) values of 1.87–7.58 and of 0.99–1.0 during testing on 2019–2023 data, which was excluded from training. On average, the MLP models using M5 inputs demonstrate a 58 % lower RMSE and 22.6 % lower mean absolute error (MAE) compared to GB models. The HBA-MLP M5 model excels in predicting extreme flow events, addressing a key challenge in hydrological forecasting. Furthermore, the proposed framework outperformed the National Water Model (NWM), especially during high-flow periods, making it more suitable for real-time flood forecasting. Overall, this study demonstrates how machine learning models, when combined with optimization techniques, can enhance the accuracy and reliability of flood forecasting systems, facilitating more effective flood mitigation strategies in similar basins.
{"title":"Innovative optimization-driven machine learning models for hourly streamflow forecasting","authors":"Peiman Parisouj , Changhyun Jun , Sayed M. Bateni , Shunlin Liang","doi":"10.1016/j.knosys.2026.115487","DOIUrl":"10.1016/j.knosys.2026.115487","url":null,"abstract":"<div><div>This study introduces a novel framework for short-term streamflow forecasting by integrating multilayer perceptron (MLP) and gradient boosting (GB) models with artificial rabbit optimization (ARO) and the honey badger algorithm (HBA). The proposed framework addresses a critical need for accurate flood forecasting by providing a robust alternative to complex physical models. The methodology is applied to the flood-prone Chehalis Basin in the U.S. using 2011–2023 hydrometeorological data, including precipitation, temperature, humidity, wind speed, and streamflow. The study systematically evaluates the impact of input data quality and quantity by testing two model configurations: base models (M1 and M2) with simpler inputs, and upgraded models (M3, M4, and M5) with more complex features. The optimized HBA-MLP hybrid model achieves 1–6 h streamflow forecasts with root mean square error (RMSE) values of 1.87–7.58 <span><math><mrow><mo>(</mo><msup><mrow><mi>m</mi></mrow><mn>3</mn></msup><mo>/</mo><mi>s</mi><mo>)</mo></mrow></math></span> and <span><math><msup><mrow><mi>R</mi></mrow><mn>2</mn></msup></math></span> of 0.99–1.0 during testing on 2019–2023 data, which was excluded from training. On average, the MLP models using M5 inputs demonstrate a 58 % lower RMSE and 22.6 % lower mean absolute error (MAE) compared to GB models. The HBA-MLP M5 model excels in predicting extreme flow events, addressing a key challenge in hydrological forecasting. Furthermore, the proposed framework outperformed the National Water Model (NWM), especially during high-flow periods, making it more suitable for real-time flood forecasting. Overall, this study demonstrates how machine learning models, when combined with optimization techniques, can enhance the accuracy and reliability of flood forecasting systems, facilitating more effective flood mitigation strategies in similar basins.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115487"},"PeriodicalIF":7.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-02DOI: 10.1016/j.knosys.2026.115424
Ruijia Li , Hong Chen , Yingcang Ma , Feiping Nie , Yixiao Huang
To address the key challenges in multi-label feature selection, including the non-smooth optimization problem caused by discrete label representation, the insufficient generalization performance due to ignored label correlations, and the difficulty in balancing feature discriminability and redundancy, we propose a Mutual Information-guided Angle Reconstruction and Semantic Alignment (MIARS) feature selection method. This method achieves breakthrough progress through three core technological innovations: First, it innovatively maps discrete labels to a unit hypersphere space and achieves continuous label representation by minimizing the Angle Reconstruction Error (ARE), effectively preserving the global similarity structure among labels. Second, an orthogonal rotation matrix optimization mechanism is introduced to achieve precise semantic alignment by maximizing the cosine similarity between pseudo-labels and true labels. Finally, a strategy combining mutual information matrices with ℓ2,0-norm constraints is adopted to directly select the optimal feature subset with low redundancy and high discriminability. Experimental results on nine benchmark datasets demonstrate the significant effectiveness of MIARS.
{"title":"MIARS: Mutual information-guided feature selection with angle reconstruction and semantic alignment for multi-label learning","authors":"Ruijia Li , Hong Chen , Yingcang Ma , Feiping Nie , Yixiao Huang","doi":"10.1016/j.knosys.2026.115424","DOIUrl":"10.1016/j.knosys.2026.115424","url":null,"abstract":"<div><div>To address the key challenges in multi-label feature selection, including the non-smooth optimization problem caused by discrete label representation, the insufficient generalization performance due to ignored label correlations, and the difficulty in balancing feature discriminability and redundancy, we propose a Mutual Information-guided Angle Reconstruction and Semantic Alignment (MIARS) feature selection method. This method achieves breakthrough progress through three core technological innovations: First, it innovatively maps discrete labels to a unit hypersphere space and achieves continuous label representation by minimizing the Angle Reconstruction Error (ARE), effectively preserving the global similarity structure among labels. Second, an orthogonal rotation matrix optimization mechanism is introduced to achieve precise semantic alignment by maximizing the cosine similarity between pseudo-labels and true labels. Finally, a strategy combining mutual information matrices with ℓ<sub>2,0</sub>-norm constraints is adopted to directly select the optimal feature subset with low redundancy and high discriminability. Experimental results on nine benchmark datasets demonstrate the significant effectiveness of MIARS.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115424"},"PeriodicalIF":7.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fine-tuning pre-trained language models has become the state-of-the-art approach for Entity Resolution (ER), but this has created a divide between two dominant architectures: fast-but-less-accurate bi-encoders and accurate-but-slow cross-encoders. However, a concrete gap in prior ER benchmarking remains unresolved: existing studies often evaluate architectures in isolation or on limited datasets. It remains unclear which base models and architectures are best suited for the diverse range of real-world ER datasets, each with unique characteristics and performance bottlenecks. This paper bridges this gap through an extensive empirical evaluation. We systematically compare three popular pre-trained models (MiniLM, MPNet, and BGE) across three distinct architectural paradigms: a pre-trained bi-encoder, a fine-tuned bi-encoder, and a fine-tuned cross-encoder. We tested these combinations on eight diverse real-world and semi-synthetic datasets, analyzing their performance, training costs, and final resolution times. Our results reveal a clear accuracy-vs-efficiency trade-off, identifying the fine-tuned bi-encoder as the optimal balance between performance and practical resolution speed. More importantly, we demonstrate that fine-tuning is not a universal solution. Its effectiveness is highly contingent on the dataset: it provides substantial gains on specialized domains by fixing pre-existing performance gaps but is detrimental to performance on datasets where pre-trained models are already well-aligned. These findings provide a practical guide for practitioners on selecting the optimal model and architecture based on their specific data and application requirements.
{"title":"The impact of fine-tuning on entity resolution: An experimental evaluation","authors":"Dimitrios Karapiperis, Leonidas Akritidis, Panayiotis Bozanis","doi":"10.1016/j.knosys.2026.115427","DOIUrl":"10.1016/j.knosys.2026.115427","url":null,"abstract":"<div><div>Fine-tuning pre-trained language models has become the state-of-the-art approach for Entity Resolution (ER), but this has created a divide between two dominant architectures: fast-but-less-accurate bi-encoders and accurate-but-slow cross-encoders. However, a concrete gap in prior ER benchmarking remains unresolved: existing studies often evaluate architectures in isolation or on limited datasets. It remains unclear which base models and architectures are best suited for the diverse range of real-world ER datasets, each with unique characteristics and performance bottlenecks. This paper bridges this gap through an extensive empirical evaluation. We systematically compare three popular pre-trained models (MiniLM, MPNet, and BGE) across three distinct architectural paradigms: a pre-trained bi-encoder, a fine-tuned bi-encoder, and a fine-tuned cross-encoder. We tested these combinations on eight diverse real-world and semi-synthetic datasets, analyzing their performance, training costs, and final resolution times. Our results reveal a clear accuracy-vs-efficiency trade-off, identifying the fine-tuned bi-encoder as the optimal balance between performance and practical resolution speed. More importantly, we demonstrate that fine-tuning is not a universal solution. Its effectiveness is highly contingent on the dataset: it provides substantial gains on specialized domains by fixing pre-existing performance gaps but is detrimental to performance on datasets where pre-trained models are already well-aligned. These findings provide a practical guide for practitioners on selecting the optimal model and architecture based on their specific data and application requirements.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115427"},"PeriodicalIF":7.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Texture classification remains a challenging problem in computer vision, particularly under variations in illumination, pose, and scale. While deep networks provide powerful semantic representations, they often overlook fine-grained local structures, whereas handcrafted descriptors, though interpretable, struggle with adaptability. To address these limitations, this paper introduces HyTexNet, a hybrid framework that fuses percentile-guided local encoding with deep embeddings from DenseNet-121. The proposed encoding scheme employs an adaptive threshold based on the 75th percentile of neighborhood intensity differences, enabling the descriptor to capture significant local contrasts while suppressing redundant variations. This local representation is combined with global semantic features obtained through global average pooling, and a lightweight fusion head optimizes the joint feature space for classification. Extensive experiments on four benchmark datasets (UIUC, Kylberg, Brodatz, and KTH-TIPS2b) demonstrate that HyTexNet achieves classification accuracies of 95.65%, 100%, 99.22%, and 99.79%, respectively, indicating consistently strong performance across diverse texture categories and imaging conditions. Additional evaluation on a challenging real-world texture dataset (DTD) further demonstrates the robustness and generalization capability of the proposed framework beyond controlled benchmark settings. In addition to accuracy, the framework is compact and computationally efficient, making it practical for scenarios with limited data and resources. These results position HyTexNet as a balanced alternative to recent texture analysis methods, offering a combination of robustness, interpretability, and scalability that bridges the gap between handcrafted and deep learning-based approaches.
{"title":"HyTexNet: Percentile-guided local encoding and deep feature fusion for enhanced texture classification","authors":"Vandana Gupta , Ashish Mishra , Nishant Shrivastava","doi":"10.1016/j.knosys.2026.115482","DOIUrl":"10.1016/j.knosys.2026.115482","url":null,"abstract":"<div><div>Texture classification remains a challenging problem in computer vision, particularly under variations in illumination, pose, and scale. While deep networks provide powerful semantic representations, they often overlook fine-grained local structures, whereas handcrafted descriptors, though interpretable, struggle with adaptability. To address these limitations, this paper introduces HyTexNet, a hybrid framework that fuses percentile-guided local encoding with deep embeddings from DenseNet-121. The proposed encoding scheme employs an adaptive threshold based on the 75th percentile of neighborhood intensity differences, enabling the descriptor to capture significant local contrasts while suppressing redundant variations. This local representation is combined with global semantic features obtained through global average pooling, and a lightweight fusion head optimizes the joint feature space for classification. Extensive experiments on four benchmark datasets (UIUC, Kylberg, Brodatz, and KTH-TIPS2b) demonstrate that HyTexNet achieves classification accuracies of 95.65%, 100%, 99.22%, and 99.79%, respectively, indicating consistently strong performance across diverse texture categories and imaging conditions. Additional evaluation on a challenging real-world texture dataset (DTD) further demonstrates the robustness and generalization capability of the proposed framework beyond controlled benchmark settings. In addition to accuracy, the framework is compact and computationally efficient, making it practical for scenarios with limited data and resources. These results position HyTexNet as a balanced alternative to recent texture analysis methods, offering a combination of robustness, interpretability, and scalability that bridges the gap between handcrafted and deep learning-based approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115482"},"PeriodicalIF":7.6,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1016/j.knosys.2026.115455
Qian Chen , Huiying Xu , Ruidong Wang , Yue Liu , Xinzhong Zhu
Graph anomaly detection is essential for many security-related fields but faces significant challenges in handling complex real-world graph data. Due to the complex and imbalanced graph structure, it is difficult to find abnormal points among many nodes. Current contrastive learning methods often overlook structural imperfections in real-world graphs, such as redundant edges and low-degree sparse nodes. Redundant connections may introduce noise during message passing, while sparse nodes receive insufficient structural information to accurately learn representation, which can degrade detection performance. To overcome above challenges, we propose SAA-GCL, an innovative framework that integrates adaptive structure adversarial augmentation with multi-view contrastive learning. Specifically, by edge weight learning and LMSE loss calculation, our approach adaptively optimizes the structure of the augmented graph, discards redundant edges as much as possible, and retains more discriminating features. For low-degree sparse nodes, we mix their self-networks with the self-networks of auxiliary nodes to improve the representation quality. In order to fully mine abnormal information, we use the multi-view contrastive loss function to distinguish positive and negative sample pairs within the view and maintain cross-view consistency. The framework adaptively refines the graph topology to suppress noisy edges and enhance representations for structurally weak nodes, so it can improve anomaly detection performance in the imbalanced structure attribute graph. Comprehensive experiments on six real-world graph datasets show that SAA-GCL is superior to existing methods in detection accuracy. Our code is open source at https://github.com/HZAI-ZJNU/SAAGCL.
{"title":"Structure adversarial augmented graph anomaly detection via multi-view contrastive learning","authors":"Qian Chen , Huiying Xu , Ruidong Wang , Yue Liu , Xinzhong Zhu","doi":"10.1016/j.knosys.2026.115455","DOIUrl":"10.1016/j.knosys.2026.115455","url":null,"abstract":"<div><div>Graph anomaly detection is essential for many security-related fields but faces significant challenges in handling complex real-world graph data. Due to the complex and imbalanced graph structure, it is difficult to find abnormal points among many nodes. Current contrastive learning methods often overlook structural imperfections in real-world graphs, such as redundant edges and low-degree sparse nodes. Redundant connections may introduce noise during message passing, while sparse nodes receive insufficient structural information to accurately learn representation, which can degrade detection performance. To overcome above challenges, we propose SAA-GCL, an innovative framework that integrates adaptive structure adversarial augmentation with multi-view contrastive learning. Specifically, by edge weight learning and LMSE loss calculation, our approach adaptively optimizes the structure of the augmented graph, discards redundant edges as much as possible, and retains more discriminating features. For low-degree sparse nodes, we mix their self-networks with the self-networks of auxiliary nodes to improve the representation quality. In order to fully mine abnormal information, we use the multi-view contrastive loss function to distinguish positive and negative sample pairs within the view and maintain cross-view consistency. The framework adaptively refines the graph topology to suppress noisy edges and enhance representations for structurally weak nodes, so it can improve anomaly detection performance in the imbalanced structure attribute graph. Comprehensive experiments on six real-world graph datasets show that SAA-GCL is superior to existing methods in detection accuracy. Our code is open source at <span><span>https://github.com/HZAI-ZJNU/SAAGCL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115455"},"PeriodicalIF":7.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1016/j.knosys.2026.115467
Zongshun Wang , Ce Li , Zhiqiang Feng , Limei Xiao , Pengcheng Wang , Mengmeng Ping
3D Visual Grounding (3DVG) aims to accurately localize target objects in complex 3D point cloud scenes using natural language descriptions. However, current methods typically utilize static visual encoders with fixed parameters to handle the infinite variety of linguistic queries. This static approach inevitably leads to low signal-to-noise ratios in the feature inputs during the subsequent visual-language fusion stage. To overcome this limitation, we propose a Language-guided Adaptive Weight Adjustment (LAWA) framework that equips the visual backbone with query-aware dynamic adaptability during the early visual encoding stage via a lightweight language-guided strategy. Specifically, we first construct visual features that integrate class prior information using Object Semantic Augmented Encoding. Then, by leveraging weight coefficients derived from multimodal embeddings, we employ a Low-Rank Adaptation-based Dynamic Weight Adjustment (DWA) module to update the linear projection layers and weight matrices within the visual encoder’s attention mechanism. This approach enables the model to focus more effectively on visual regions that are semantically aligned with the textual descriptions. Extensive experiments demonstrate that LAWA achieves an [email protected] of 86.2% on the ScanRefer dataset, and overall accuracies of 69.5% and 58.4% on the Sr3D and Nr3D datasets, respectively, all while maintaining superior parameter efficiency.
{"title":"Rethinking static weights: Language-guided adaptive weight adjustment for 3D visual grounding","authors":"Zongshun Wang , Ce Li , Zhiqiang Feng , Limei Xiao , Pengcheng Wang , Mengmeng Ping","doi":"10.1016/j.knosys.2026.115467","DOIUrl":"10.1016/j.knosys.2026.115467","url":null,"abstract":"<div><div>3D Visual Grounding (3DVG) aims to accurately localize target objects in complex 3D point cloud scenes using natural language descriptions. However, current methods typically utilize static visual encoders with fixed parameters to handle the infinite variety of linguistic queries. This static approach inevitably leads to low signal-to-noise ratios in the feature inputs during the subsequent visual-language fusion stage. To overcome this limitation, we propose a Language-guided Adaptive Weight Adjustment (LAWA) framework that equips the visual backbone with query-aware dynamic adaptability during the early visual encoding stage via a lightweight language-guided strategy. Specifically, we first construct visual features that integrate class prior information using Object Semantic Augmented Encoding. Then, by leveraging weight coefficients derived from multimodal embeddings, we employ a Low-Rank Adaptation-based Dynamic Weight Adjustment (DWA) module to update the linear projection layers and weight matrices within the visual encoder’s attention mechanism. This approach enables the model to focus more effectively on visual regions that are semantically aligned with the textual descriptions. Extensive experiments demonstrate that LAWA achieves an [email protected] of 86.2% on the ScanRefer dataset, and overall accuracies of 69.5% and 58.4% on the Sr3D and Nr3D datasets, respectively, all while maintaining superior parameter efficiency.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115467"},"PeriodicalIF":7.6,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-31DOI: 10.1016/j.knosys.2026.115451
Yu Sun , Dengyu Xiao , Mengdie Huang , Jiali Wang , Chuan Tong , Jun Luo , Huayan Pu
Accurately predicting future multi-agent trajectories at intersections is crucial yet challenging due to complex and dynamic traffic environments. Existing methods struggle with cross-domain trajectory prediction owing to: 1) there are significant differences in spatiotemporal features between domains, which leads to insufficient modeling of trajectory temporal sequence dynamics during cross-domain spatiotemporal alignment; and 2) the strong heterogeneity of behavioral patterns within different datasets causes significant domain shifts, resulting in a notable performance decline when the model is transferred across datasets. To address the aforementioned challenges, this paper proposes a transferable multi-level spatial-temporal graph neural network (T-MLSTG). Based on maximum mean discrepancy theory, we design a windowed mean gradient discrepancy (WMGD) metric that incorporates mean and gradient information of temporal features to better capture cross-domain distribution differences. Furthermore, a multi-level spatial-temporal graph network (MLSTG) is designed with a two-level architecture. The first level encodes historical spatiotemporal features independently, while the second level integrates spatiotemporal features and employs a channel attention mechanism to enhance feature discrimination. The performance of T-MLSTG was evaluated on the inD and INTERACTION datasets. Compared to the baseline model, the cross-domain trajectory prediction results demonstrate a reduction in root mean square error (RMSE) of 0.812. In cross-dataset trajectory prediction evaluation, the mean error was reduced by 27.8%, demonstrating the method’s effectiveness and generalization capability.
{"title":"Transferable multi-level spatial-temporal graph neural network for adaptive multi-agent trajectory prediction","authors":"Yu Sun , Dengyu Xiao , Mengdie Huang , Jiali Wang , Chuan Tong , Jun Luo , Huayan Pu","doi":"10.1016/j.knosys.2026.115451","DOIUrl":"10.1016/j.knosys.2026.115451","url":null,"abstract":"<div><div>Accurately predicting future multi-agent trajectories at intersections is crucial yet challenging due to complex and dynamic traffic environments. Existing methods struggle with cross-domain trajectory prediction owing to: 1) there are significant differences in spatiotemporal features between domains, which leads to insufficient modeling of trajectory temporal sequence dynamics during cross-domain spatiotemporal alignment; and 2) the strong heterogeneity of behavioral patterns within different datasets causes significant domain shifts, resulting in a notable performance decline when the model is transferred across datasets. To address the aforementioned challenges, this paper proposes a transferable multi-level spatial-temporal graph neural network (T-MLSTG). Based on maximum mean discrepancy theory, we design a windowed mean gradient discrepancy (<em>WMGD</em>) metric that incorporates mean and gradient information of temporal features to better capture cross-domain distribution differences. Furthermore, a multi-level spatial-temporal graph network (MLSTG) is designed with a two-level architecture. The first level encodes historical spatiotemporal features independently, while the second level integrates spatiotemporal features and employs a channel attention mechanism to enhance feature discrimination. The performance of T-MLSTG was evaluated on the inD and INTERACTION datasets. Compared to the baseline model, the cross-domain trajectory prediction results demonstrate a reduction in root mean square error (RMSE) of 0.812. In cross-dataset trajectory prediction evaluation, the mean error was reduced by 27.8%, demonstrating the method’s effectiveness and generalization capability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115451"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-31DOI: 10.1016/j.knosys.2026.115445
Sugantha Priyadharshini P , Grace Selvarani A
Biometric recognition is a necessary task in security control systems, yet unimodal techniques often suffer from missing modalities, noise and limited robustness. Thus, the proposed research introduced a multi-modal biometric identification system by integrating both face and gait based images. The key frames are selected by using an enhanced agglomerative nesting clustering algorithm (EAg-NCA) to preserve various information with minimal redundancy in the input video. Noise in the selected key frames is removed by using the Trimmed Pixel density based median filter (TPDMF). Then, from the pre-processed images, faces are detected using a Dung beetle optimization tuned YOLO-V9, while gait silhouettes are extracted by Mask Region based Convolutional Neural Network (Mask-RCNN). The features are extracted from the face, and the gait image is accomplished through a pooled convolutional dense net model (PoC-Den). The extracted features are fused, and a decision has been made regarding the authentication of a person by matching the current features with the features in the database by using a novel similarity based optimized hybrid bidirectional recurrent neural pooling transformer encoder block (Sim-OpPTr). Finally, get the classified result as the person is authorized or unauthorized. The results are evaluated by using various performance metrics, proposed methodology obtained an accuracy of 99.08%. The proposed hybrid strategy improves multi-modal fusion, robustness to noise, and authentication accuracy, making it suitable for real-world surveillance applications.
{"title":"Face and gait based authentication using similarity-optimized bidirectional recurrent neural transformer model","authors":"Sugantha Priyadharshini P , Grace Selvarani A","doi":"10.1016/j.knosys.2026.115445","DOIUrl":"10.1016/j.knosys.2026.115445","url":null,"abstract":"<div><div>Biometric recognition is a necessary task in security control systems, yet unimodal techniques often suffer from missing modalities, noise and limited robustness. Thus, the proposed research introduced a multi-modal biometric identification system by integrating both face and gait based images. The key frames are selected by using an enhanced agglomerative nesting clustering algorithm (EAg-NCA) to preserve various information with minimal redundancy in the input video. Noise in the selected key frames is removed by using the Trimmed Pixel density based median filter (TPDMF). Then, from the pre-processed images, faces are detected using a Dung beetle optimization tuned YOLO-V9, while gait silhouettes are extracted by Mask Region based Convolutional Neural Network (Mask-RCNN). The features are extracted from the face, and the gait image is accomplished through a pooled convolutional dense net model (PoC-Den). The extracted features are fused, and a decision has been made regarding the authentication of a person by matching the current features with the features in the database by using a novel similarity based optimized hybrid bidirectional recurrent neural pooling transformer encoder block (Sim-OpPTr). Finally, get the classified result as the person is authorized or unauthorized. The results are evaluated by using various performance metrics, proposed methodology obtained an accuracy of 99.08%. The proposed hybrid strategy improves multi-modal fusion, robustness to noise, and authentication accuracy, making it suitable for real-world surveillance applications.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115445"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-31DOI: 10.1016/j.knosys.2026.115434
Jun Cheng , Wim De Waele
Hunting-inspired algorithms have gained widespread attention in the field of optimization because of their simplicity, flexibility, and natural metaphors. However, many suffer from limitations such as slow convergence rates, sensitivity to parameter settings, and a tendency to become trapped in local optima. To address these challenges, this paper proposes the Detective Behavior Algorithm (DBA), a novel meta-heuristic approach that integrates three core search mechanisms: large-area directional exploration, localized exploitation, and direct target-oriented attacks. DBA is designed to balance exploration and exploitation effectively, enabling faster convergence and improved global search capabilities. The performance is validated through comprehensive application on a suite of benchmark functions and real-world engineering problems. A comparative analysis is conducted against eight state-of-the-art optimization algorithms, including recently developed hunting-inspired methods such as the Walrus Optimizer and Sea-Horse Optimizer. Results consistently demonstrate that DBA outperforms these approaches in terms of convergence speed, solution accuracy, and robustness, particularly in complex optimization scenarios. Furthermore, DBA is applied to predict and optimize surface waviness in Wire Arc Additive Manufacturing components. Two predictive models are developed: one employing an Artificial Neural Network (ANN) optimized by DBA, and another using Particle Swarm Optimization (PSO). The DBA-optimized ANN model exhibits superior predictive accuracy and reliability compared to both standard ANN and PSO-optimized ANN models. Leveraging this enhanced prediction capability, DBA is further used to minimize surface waviness, consistently outperforming competing algorithms. These findings underscore the robustness, adaptability, and real-world applicability of DBA in both theoretical and practical contexts. The source codes of DBA are publicly available at (https://www.mathworks.com/matlabcentral/fileexchange/183178-detective-behavior-algorithm-dba).
{"title":"Detective Behavior Algorithm (DBA): A New Metaheuristic for Design and Engineering Optimization","authors":"Jun Cheng , Wim De Waele","doi":"10.1016/j.knosys.2026.115434","DOIUrl":"10.1016/j.knosys.2026.115434","url":null,"abstract":"<div><div>Hunting-inspired algorithms have gained widespread attention in the field of optimization because of their simplicity, flexibility, and natural metaphors. However, many suffer from limitations such as slow convergence rates, sensitivity to parameter settings, and a tendency to become trapped in local optima. To address these challenges, this paper proposes the Detective Behavior Algorithm (DBA), a novel meta-heuristic approach that integrates three core search mechanisms: large-area directional exploration, localized exploitation, and direct target-oriented attacks. DBA is designed to balance exploration and exploitation effectively, enabling faster convergence and improved global search capabilities. The performance is validated through comprehensive application on a suite of benchmark functions and real-world engineering problems. A comparative analysis is conducted against eight state-of-the-art optimization algorithms, including recently developed hunting-inspired methods such as the Walrus Optimizer and Sea-Horse Optimizer. Results consistently demonstrate that DBA outperforms these approaches in terms of convergence speed, solution accuracy, and robustness, particularly in complex optimization scenarios. Furthermore, DBA is applied to predict and optimize surface waviness in Wire Arc Additive Manufacturing components. Two predictive models are developed: one employing an Artificial Neural Network (ANN) optimized by DBA, and another using Particle Swarm Optimization (PSO). The DBA-optimized ANN model exhibits superior predictive accuracy and reliability compared to both standard ANN and PSO-optimized ANN models. Leveraging this enhanced prediction capability, DBA is further used to minimize surface waviness, consistently outperforming competing algorithms. These findings underscore the robustness, adaptability, and real-world applicability of DBA in both theoretical and practical contexts. The source codes of DBA are publicly available at (<span><span>https://www.mathworks.com/matlabcentral/fileexchange/183178-detective-behavior-algorithm-dba</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115434"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-31DOI: 10.1016/j.knosys.2026.115441
Meitian Li, Jing Sun, Heng Ma, Fasheng Wang, Fuming Sun
<div><div>In the task of infrared and visible image fusion, fully preserving the complementary information from different modalities while avoiding detail loss and redundant information superposition has been a core challenge in recent research. Most existing methods primarily focus on feature processing at a single level or for a single modality, leading to insufficient cross-level information interaction and inadequate cross-modal feature fusion. This deficiency typically results in two types of issues: firstly, the lack of effective compensation between adjacent-level features prevents the synergistic utilization of low-level details and high-level semantics; secondly, the differences between features from different modalities are not explicitly modeled, where direct concatenation or weighted summation often introduces redundancy or even artifacts, thereby compromising the overall quality of the fused image. To address these challenges, this paper proposes a novel infrared and visible image fusion network based on a Multi-modal and Multi-scale Cross-compensation referred to as MMCFusion. The proposed network incorporates an Upper-Lower-level Cross-Compensation (ULCC) module that integrates features from adjacent levels to enhance the richness and diversity of feature representations. Additionally, we introduce a Feature-Difference Cross-Compensation (FDCC) module to facilitate cross-compensation of upper-lower-level information through a differential approach. This design enhances the complementarity between features and effectively mitigates the problem of detail information loss prevalent in conventional methods. To further augment the model’s ability to detect and represent objects across various scales, we also devise the Multi-Scale Fusion Module (MSFM) that effectively integrates feature information from multiple scales, thereby improving the model’s adaptability to diverse objects. Furthermore, we design a Texture Enhancement Module (TEM) to capture and retain local structures and texture information in the image, thereby providing richer detail representation after processing. Finally, to comprehensively capture multi-modal information and perform remote modeling, we employ Pyramid Vision Transformer (PVTv2) to construct a dual-stream Transformer encoder, which can capture valuable information at multiple scales and provide robust global modeling capabilities, thereby improving the fusion results. The efficacy of the proposed method is rigorously evaluated on several datasets, including infrared and visible datasets such as MSRS, TNO, and RoadScene, as well as medical imaging datasets, such as PET-MRI. Experimental results demonstrate that MMCFusion significantly outperforms current state-of-the-art methods in terms of both visual quality and quantitative metrics, while also exhibiting strong generalization capability across different datasets, thereby validating its effectiveness and robustness in practical applications. The source co
{"title":"Infrared and visible image fusion based on multi-modal and multi-scale cross-compensation","authors":"Meitian Li, Jing Sun, Heng Ma, Fasheng Wang, Fuming Sun","doi":"10.1016/j.knosys.2026.115441","DOIUrl":"10.1016/j.knosys.2026.115441","url":null,"abstract":"<div><div>In the task of infrared and visible image fusion, fully preserving the complementary information from different modalities while avoiding detail loss and redundant information superposition has been a core challenge in recent research. Most existing methods primarily focus on feature processing at a single level or for a single modality, leading to insufficient cross-level information interaction and inadequate cross-modal feature fusion. This deficiency typically results in two types of issues: firstly, the lack of effective compensation between adjacent-level features prevents the synergistic utilization of low-level details and high-level semantics; secondly, the differences between features from different modalities are not explicitly modeled, where direct concatenation or weighted summation often introduces redundancy or even artifacts, thereby compromising the overall quality of the fused image. To address these challenges, this paper proposes a novel infrared and visible image fusion network based on a Multi-modal and Multi-scale Cross-compensation referred to as MMCFusion. The proposed network incorporates an Upper-Lower-level Cross-Compensation (ULCC) module that integrates features from adjacent levels to enhance the richness and diversity of feature representations. Additionally, we introduce a Feature-Difference Cross-Compensation (FDCC) module to facilitate cross-compensation of upper-lower-level information through a differential approach. This design enhances the complementarity between features and effectively mitigates the problem of detail information loss prevalent in conventional methods. To further augment the model’s ability to detect and represent objects across various scales, we also devise the Multi-Scale Fusion Module (MSFM) that effectively integrates feature information from multiple scales, thereby improving the model’s adaptability to diverse objects. Furthermore, we design a Texture Enhancement Module (TEM) to capture and retain local structures and texture information in the image, thereby providing richer detail representation after processing. Finally, to comprehensively capture multi-modal information and perform remote modeling, we employ Pyramid Vision Transformer (PVTv2) to construct a dual-stream Transformer encoder, which can capture valuable information at multiple scales and provide robust global modeling capabilities, thereby improving the fusion results. The efficacy of the proposed method is rigorously evaluated on several datasets, including infrared and visible datasets such as MSRS, TNO, and RoadScene, as well as medical imaging datasets, such as PET-MRI. Experimental results demonstrate that MMCFusion significantly outperforms current state-of-the-art methods in terms of both visual quality and quantitative metrics, while also exhibiting strong generalization capability across different datasets, thereby validating its effectiveness and robustness in practical applications. The source co","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115441"},"PeriodicalIF":7.6,"publicationDate":"2026-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}