首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
MM-AttacKG: A multimodal approach to attack graph construction with large language models MM-AttacKG:使用大型语言模型构建攻击图的多模态方法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-04 DOI: 10.1016/j.knosys.2026.115483
Yongheng Zhang , Xinyun Zhao , Yunshan Ma , Haokai Ma , Yingxiao Guan , Guozheng Yang , Yuliang Lu , Xiang Wang
Cyber Threat Intelligence (CTI) parsing aims to extract key threat information from massive data, transform it into actionable intelligence, enhance threat detection and defense efficiency, including attack graph construction, intelligence fusion, and indicator extraction. Among these research topics, Attack Graph Construction (AGC) is essential for visualizing and understanding the potential attack paths of threat events from CTI reports. Existing approaches primarily construct the attack graphs purely from the textual data to reveal the logical threat relationships between entities within the attack behavioral sequence. However, they typically overlook the specific threat information inherent in visual modalities, which preserves key threat details from inherently multimodal CTI reports. Inspired by the remarkable multimodal understanding capabilities of Multimodal Large Language Models (MLLMs), we explore their potential in enhancing multimodal attack graph construction. To be specific, we propose a novel framework, MM-AttacKG, which can effectively extract key information from threat images and integrate it into attack graph construction, thereby enhancing the comprehensiveness and accuracy of attack graphs. It first employs a threat image parsing module to extract critical threat information from images and generate textual descriptions using MLLMs. Subsequently, it builds an iterative question-answering pipeline tailored for image parsing to refine the understanding of threat images. Finally, it achieves content-level integration between attack graphs and image-based answers through MLLMs, completing threat information enhancement. We construct a new multimodal dataset, AG-LLM-mm, and conduct extensive experiments to evaluate the effectiveness of MM-AttacKG. The results demonstrate that MM-AttacKG can accurately identify key information in threat images and significantly improve the quality of multimodal attack graph construction, effectively addressing the shortcomings of existing methods in utilizing image-based threat information.
CTI (Cyber Threat Intelligence)分析旨在从海量数据中提取关键威胁信息,转化为可操作的情报,提高威胁检测和防御效率,包括构建攻击图、融合情报、提取指标等。在这些研究课题中,攻击图构建(AGC)对于可视化和理解CTI报告中威胁事件的潜在攻击路径至关重要。现有的方法主要是从纯文本数据构建攻击图,以揭示攻击行为序列中实体之间的逻辑威胁关系。然而,它们通常忽略了视觉模态中固有的特定威胁信息,而视觉模态保留了固有多模态CTI报告中的关键威胁细节。受多模态大型语言模型(mllm)出色的多模态理解能力的启发,我们探索了它们在增强多模态攻击图构建方面的潜力。具体而言,我们提出了一种新的框架MM-AttacKG,它可以有效地从威胁图像中提取关键信息,并将其整合到攻击图的构建中,从而提高攻击图的全能性和准确性。首先利用威胁图像解析模块,从图像中提取关键威胁信息,并利用mllm生成文本描述;随后,构建了针对图像解析定制的迭代问答管道,以细化对威胁图像的理解。最后,通过mllm实现攻击图与基于图像的答案之间的内容级集成,完成威胁信息增强。我们构建了一个新的多模态数据集AG-LLM-mm,并进行了大量的实验来评估MM-AttacKG的有效性。结果表明,MM-AttacKG能够准确识别威胁图像中的关键信息,显著提高多模态攻击图构建质量,有效解决了现有方法在利用基于图像的威胁信息方面存在的不足。
{"title":"MM-AttacKG: A multimodal approach to attack graph construction with large language models","authors":"Yongheng Zhang ,&nbsp;Xinyun Zhao ,&nbsp;Yunshan Ma ,&nbsp;Haokai Ma ,&nbsp;Yingxiao Guan ,&nbsp;Guozheng Yang ,&nbsp;Yuliang Lu ,&nbsp;Xiang Wang","doi":"10.1016/j.knosys.2026.115483","DOIUrl":"10.1016/j.knosys.2026.115483","url":null,"abstract":"<div><div>Cyber Threat Intelligence (CTI) parsing aims to extract key threat information from massive data, transform it into actionable intelligence, enhance threat detection and defense efficiency, including attack graph construction, intelligence fusion, and indicator extraction. Among these research topics, Attack Graph Construction (AGC) is essential for visualizing and understanding the potential attack paths of threat events from CTI reports. Existing approaches primarily construct the attack graphs purely from the textual data to reveal the logical threat relationships between entities within the attack behavioral sequence. However, they typically overlook the specific threat information inherent in visual modalities, which preserves key threat details from inherently multimodal CTI reports. Inspired by the remarkable multimodal understanding capabilities of Multimodal Large Language Models (MLLMs), we explore their potential in enhancing multimodal attack graph construction. To be specific, we propose a novel framework, MM-AttacKG, which can effectively extract key information from threat images and integrate it into attack graph construction, thereby enhancing the comprehensiveness and accuracy of attack graphs. It first employs a threat image parsing module to extract critical threat information from images and generate textual descriptions using MLLMs. Subsequently, it builds an iterative question-answering pipeline tailored for image parsing to refine the understanding of threat images. Finally, it achieves content-level integration between attack graphs and image-based answers through MLLMs, completing threat information enhancement. We construct a new multimodal dataset, AG-LLM-mm, and conduct extensive experiments to evaluate the effectiveness of MM-AttacKG. The results demonstrate that MM-AttacKG can accurately identify key information in threat images and significantly improve the quality of multimodal attack graph construction, effectively addressing the shortcomings of existing methods in utilizing image-based threat information.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115483"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated vision transformer with adaptive focal loss for medical image classification 用于医学图像分类的自适应焦损联邦视觉变压器
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-04 DOI: 10.1016/j.knosys.2026.115474
Xinyuan Zhao , Yihang Wu , Ahmad Chaddad , Tareef Daqqaq , Reem Kateb
While deep learning models like Vision Transformer (ViT) have achieved significant advances, they typically require large datasets. With data privacy regulations, access to many original datasets is restricted, especially medical images. Federated learning (FL) addresses this challenge by enabling global model aggregation without data exchange. However, the heterogeneity of the data and the class imbalance that exist in local clients pose challenges for the generalization of the model. This study proposes a FL framework leveraging a dynamic adaptive focal loss (DAFL) and a client-aware aggregation strategy for local training. Specifically, we design a dynamic class imbalance coefficient that adjusts based on each client’s sample distribution and class data distribution, ensuring minority classes receive sufficient attention and preventing sparse data from being ignored. To address client heterogeneity, a weighted aggregation strategy is adopted, which adapts to data size and characteristics to better capture inter-client variations. The classification results on three public datasets (ISIC, Ocular Disease and RSNA-ICH) show that the proposed framework outperforms DenseNet121, ResNet50, ViT-S/16, ViT-L/32, FedCLIP, Swin Transformer, CoAtNet, and MixNet in most cases, with accuracy improvements ranging from 0.98% to 41.69%. Ablation studies on the imbalanced ISIC dataset validate the effectiveness of the proposed loss function and aggregation strategy compared to traditional loss functions and other FL approaches. The codes can be found at: https://github.com/AIPMLab/ViT-FLDAF.
虽然像Vision Transformer (ViT)这样的深度学习模型已经取得了重大进展,但它们通常需要大型数据集。由于数据隐私法规的限制,对许多原始数据集的访问受到限制,尤其是医学图像。联邦学习(FL)通过支持无需数据交换的全局模型聚合来解决这一挑战。然而,本地客户中存在的数据异质性和类别不平衡给模型的泛化带来了挑战。本研究提出了一个利用动态自适应焦点丢失(DAFL)和客户感知聚合策略进行局部训练的FL框架。具体而言,我们设计了一个动态类失衡系数,根据每个客户端的样本分布和类数据分布进行调整,保证少数类得到足够的关注,防止稀疏数据被忽略。为了解决客户端异构问题,采用了加权聚合策略,该策略可根据数据大小和特征进行调整,从而更好地捕获客户端之间的差异。在ISIC、Ocular Disease和RSNA-ICH三个公共数据集上的分类结果表明,在大多数情况下,所提出的框架优于DenseNet121、ResNet50、viti - s /16、viti - l /32、FedCLIP、Swin Transformer、CoAtNet和MixNet,准确率提高了0.98% ~ 41.69%。在不平衡ISIC数据集上的消融研究验证了所提出的损失函数和聚合策略与传统损失函数和其他FL方法相比的有效性。这些代码可以在https://github.com/AIPMLab/ViT-FLDAF上找到。
{"title":"Federated vision transformer with adaptive focal loss for medical image classification","authors":"Xinyuan Zhao ,&nbsp;Yihang Wu ,&nbsp;Ahmad Chaddad ,&nbsp;Tareef Daqqaq ,&nbsp;Reem Kateb","doi":"10.1016/j.knosys.2026.115474","DOIUrl":"10.1016/j.knosys.2026.115474","url":null,"abstract":"<div><div>While deep learning models like Vision Transformer (ViT) have achieved significant advances, they typically require large datasets. With data privacy regulations, access to many original datasets is restricted, especially medical images. Federated learning (FL) addresses this challenge by enabling global model aggregation without data exchange. However, the heterogeneity of the data and the class imbalance that exist in local clients pose challenges for the generalization of the model. This study proposes a FL framework leveraging a dynamic adaptive focal loss (DAFL) and a client-aware aggregation strategy for local training. Specifically, we design a dynamic class imbalance coefficient that adjusts based on each client’s sample distribution and class data distribution, ensuring minority classes receive sufficient attention and preventing sparse data from being ignored. To address client heterogeneity, a weighted aggregation strategy is adopted, which adapts to data size and characteristics to better capture inter-client variations. The classification results on three public datasets (ISIC, Ocular Disease and RSNA-ICH) show that the proposed framework outperforms DenseNet121, ResNet50, ViT-S/16, ViT-L/32, FedCLIP, Swin Transformer, CoAtNet, and MixNet in most cases, with accuracy improvements ranging from 0.98% to 41.69%. Ablation studies on the imbalanced ISIC dataset validate the effectiveness of the proposed loss function and aggregation strategy compared to traditional loss functions and other FL approaches. The codes can be found at: <span><span>https://github.com/AIPMLab/ViT-FLDAF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115474"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-spatial complementary attention network for computed tomography 计算机断层扫描的频率-空间互补注意网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-06 DOI: 10.1016/j.knosys.2026.115468
Xing Wu , Yimin Zhu , Shuo Duan , Xinyuan Zhang , Xing Wei , Bo Huang , Quan Qian
Computed tomography (CT) denoising is essential for clinical diagnosis and industrial inspection, but it is challenged by various noise and structural artifacts. Existing deep learning methods are limited by insufficient modeling of long-term dependencies, a disregard for intrinsic frequency-domain priors, and a significant domain gap caused by their reliance on unrealistic synthetic noise. To address these issues, a frequency-spatial complementary attention network (FSCANet) is proposed, which is based on the complementary fusion of frequency and spatial domains. The frequency domain branch explicitly decouples structural and phase information to model global context, while the spatial-domain branch improves local details. Simultaneously, a real-data-guided physics-informed noise model is introduced to bridge the domain gap by formalizing the physical noise generation process as a differentiable layer. FSCANet and the noise model are jointly optimized using a hybrid data-driven co-optimization strategy, resulting in a dynamic feedback loop that not only compels the noise model to generate physically interpretable noise but also drives FSCANet to achieve greater robustness. FSCANet achieves state-of-the-art performance on the DeepLesion dataset with a PSNR of 40.5861 dB and an SSIM of 0.9913, and demonstrates robust generalization on authentic clinical data from the Mayo dataset.
计算机断层扫描(CT)去噪在临床诊断和工业检测中是必不可少的,但它受到各种噪声和结构伪影的挑战。现有的深度学习方法受到长期依赖关系建模不足、忽视固有频域先验以及依赖于不现实的合成噪声而导致的显著域间隙的限制。为了解决这些问题,提出了一种基于频率域和空间域互补融合的频率-空间互补注意网络(FSCANet)。频域分支明确解耦结构和相位信息来建模全局上下文,而空域分支改进局部细节。同时,通过将物理噪声产生过程形式化为一个可微层,引入了一种实时数据引导的物理通知噪声模型来弥合域间隙。FSCANet和噪声模型使用混合数据驱动的协同优化策略进行联合优化,从而形成一个动态反馈回路,不仅迫使噪声模型产生物理上可解释的噪声,而且还驱动FSCANet实现更强的鲁棒性。FSCANet在DeepLesion数据集上实现了最先进的性能,PSNR为40.5861 dB, SSIM为0.9913,并对来自Mayo数据集的真实临床数据进行了稳健的泛化。
{"title":"Frequency-spatial complementary attention network for computed tomography","authors":"Xing Wu ,&nbsp;Yimin Zhu ,&nbsp;Shuo Duan ,&nbsp;Xinyuan Zhang ,&nbsp;Xing Wei ,&nbsp;Bo Huang ,&nbsp;Quan Qian","doi":"10.1016/j.knosys.2026.115468","DOIUrl":"10.1016/j.knosys.2026.115468","url":null,"abstract":"<div><div>Computed tomography (CT) denoising is essential for clinical diagnosis and industrial inspection, but it is challenged by various noise and structural artifacts. Existing deep learning methods are limited by insufficient modeling of long-term dependencies, a disregard for intrinsic frequency-domain priors, and a significant domain gap caused by their reliance on unrealistic synthetic noise. To address these issues, a frequency-spatial complementary attention network (FSCANet) is proposed, which is based on the complementary fusion of frequency and spatial domains. The frequency domain branch explicitly decouples structural and phase information to model global context, while the spatial-domain branch improves local details. Simultaneously, a real-data-guided physics-informed noise model is introduced to bridge the domain gap by formalizing the physical noise generation process as a differentiable layer. FSCANet and the noise model are jointly optimized using a hybrid data-driven co-optimization strategy, resulting in a dynamic feedback loop that not only compels the noise model to generate physically interpretable noise but also drives FSCANet to achieve greater robustness. FSCANet achieves state-of-the-art performance on the DeepLesion dataset with a PSNR of 40.5861 dB and an SSIM of 0.9913, and demonstrates robust generalization on authentic clinical data from the Mayo dataset.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115468"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-level dual contrastive learning for cloud API cold-start recommendation 云API冷启动推荐的多级双对比学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-10 DOI: 10.1016/j.knosys.2026.115534
Mengmeng Sun , Yueshen Xu , Dianlong You , Zhen Chen
A longstanding challenge in cloud API recommender systems is the cold-start problem associated with newly released cloud APIs, which lack historical interactions. Existing approaches typically 1) integrate the content and collaborative embeddings of a cloud API to generate its representation, or 2) adopt API-level alignment strategies that maximize mutual information between them. However, they often assume that cold cloud APIs have similar content features to warm ones, which doesn’t always hold in practice. Additionally, developers frequently combine multiple cloud APIs to create Mashups, indicating that focusing solely on individual cloud APIs fails to capture Mashup preferences. To this end, we propose a multi-level dual contrastive learning (MDCL) framework that explores Mashup preferences to impose effective embedding constraints on low-similarity cold cloud APIs. Specifically, MDCL generates a Mashup preference representation by aggregating the collaborative embeddings of warm cloud APIs based on the Mashup’s interaction history. It then performs group-level alignment between the content embedding of a cloud API and the Mashup’s preference representation, thereby guiding low-similarity cold cloud APIs toward the collaborative space. Furthermore, MDCL integrates API-level alignment and Mashup-API alignment to improve consistency between a cloud API’s content and collaborative embeddings, and to better model interaction patterns between Mashups and cloud APIs. A hybrid training strategy is employed to jointly optimize three alignment objectives: Mashup-API, API-level, and group-level alignment, achieving a better balance between cold-start and warm-start recommendations. Extensive experiments on real-world datasets demonstrate that MDCL outperforms SOTA methods in cold- and warm-start scenarios. Implementation code is available at https://github.com/MengMeng3399/MDCL.
云API推荐系统中一个长期存在的挑战是与新发布的云API相关的冷启动问题,它们缺乏历史交互。现有的方法通常是1)集成云API的内容和协作嵌入来生成其表示,或者2)采用API级对齐策略来最大化它们之间的相互信息。然而,他们通常认为冷云api与热云api具有相似的内容特征,这在实践中并不总是成立。此外,开发人员经常组合多个云api来创建Mashup,这表明仅关注单个云api无法捕获Mashup首选项。为此,我们提出了一个多层次的双重对比学习(MDCL)框架,该框架探索Mashup偏好,以对低相似度冷云api施加有效的嵌入约束。具体来说,MDCL根据Mashup的交互历史,通过聚合热云api的协作嵌入,生成Mashup首选项表示。然后,它在云API的内容嵌入和Mashup的首选项表示之间执行组级对齐,从而将低相似度的冷云API引导到协作空间。此外,MDCL还集成了API级对齐和Mashup-API对齐,以提高云API内容和协作嵌入之间的一致性,并更好地为mashup和云API之间的交互模式建模。采用混合训练策略联合优化三个对齐目标:Mashup-API、api级和组级对齐,在冷启动和热启动建议之间实现更好的平衡。在实际数据集上进行的大量实验表明,MDCL在冷启动和热启动情况下优于SOTA方法。实现代码可从https://github.com/MengMeng3399/MDCL获得。
{"title":"Multi-level dual contrastive learning for cloud API cold-start recommendation","authors":"Mengmeng Sun ,&nbsp;Yueshen Xu ,&nbsp;Dianlong You ,&nbsp;Zhen Chen","doi":"10.1016/j.knosys.2026.115534","DOIUrl":"10.1016/j.knosys.2026.115534","url":null,"abstract":"<div><div>A longstanding challenge in cloud API recommender systems is the cold-start problem associated with newly released cloud APIs, which lack historical interactions. Existing approaches typically 1) integrate the content and collaborative embeddings of a cloud API to generate its representation, or 2) adopt API-level alignment strategies that maximize mutual information between them. However, they often assume that cold cloud APIs have similar content features to warm ones, which doesn’t always hold in practice. Additionally, developers frequently combine multiple cloud APIs to create Mashups, indicating that focusing solely on individual cloud APIs fails to capture Mashup preferences. To this end, we propose a multi-level dual contrastive learning (MDCL) framework that explores Mashup preferences to impose effective embedding constraints on low-similarity cold cloud APIs. Specifically, MDCL generates a Mashup preference representation by aggregating the collaborative embeddings of warm cloud APIs based on the Mashup’s interaction history. It then performs group-level alignment between the content embedding of a cloud API and the Mashup’s preference representation, thereby guiding low-similarity cold cloud APIs toward the collaborative space. Furthermore, MDCL integrates API-level alignment and Mashup-API alignment to improve consistency between a cloud API’s content and collaborative embeddings, and to better model interaction patterns between Mashups and cloud APIs. A hybrid training strategy is employed to jointly optimize three alignment objectives: Mashup-API, API-level, and group-level alignment, achieving a better balance between cold-start and warm-start recommendations. Extensive experiments on real-world datasets demonstrate that MDCL outperforms SOTA methods in cold- and warm-start scenarios. Implementation code is available at <span><span>https://github.com/MengMeng3399/MDCL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115534"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Opposition and reinforcement learning growth-starfish optimization algorithm for engineering design and feature selection 面向工程设计与特征选择的对抗与强化学习生长海星优化算法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-07 DOI: 10.1016/j.knosys.2026.115522
Changting Zhong , Hao Chen , Dabo Xin , Tong Xu , Zeng Meng , Xinwei Wang , Ali Riza Yildiz , Seyedali Mirjalili
Starfish optimization algorithm (SFOA) is a bio-inspired metaheuristic algorithm for global optimization, which has demonstrated accuracy and efficiency in popular benchmark functions. However, for complex practical problems such as engineering design and feature selection, SFOA still requires a better balance between exploration and exploitation to ensure robust performance in real-world applications. In this paper, we present an improved SFOA algorithm named ORLGSFOA, which integrates opposition-based learning, reinforcement learning, and the growth optimizer with the basic SFOA. The algorithm first incorporates the opposition-based learning strategy during initialization to improve the diversity and quality of the initial solutions. Then, the updating rule from the growth optimizer is hybridized with SFOA to balance exploration and exploitation. Moreover, ORLGSFOA integrates the reinforcement learning strategy to reward the winner from SFOA and growth optimizer by adding updating positions during optimization to enhance global convergence. Experiments demonstrate the superior performance of ORLGSFOA. In comprehensive benchmark tests on 65 functions from classical, CEC2017, and CEC2022 suites, ORLGSFOA outperformed 15 other metaheuristic algorithms by achieving more accurate solutions. Additionally, this effectiveness translates directly to real-world applications, as is evidenced by tests on seven engineering design problems. Besides, the effectiveness of ORLGSFOA in solving discrete combinatorial optimization problems is verified through 52 feature selection problems, and the algorithm is extended to the wind engineering scenarios. In conclusion, ORLGSFOA demonstrates powerful efficacy in addressing a wide range of challenges, including global optimization, engineering design, and feature selection problems. The source code of ORLGSFOA is publicly available at: https://ww2.mathworks.cn/matlabcentral/fileexchange/183223-orlgsfoa.
海星优化算法(SFOA)是一种生物启发的全局优化元启发式算法,在常用的基准函数中显示出准确性和高效性。然而,对于复杂的实际问题,如工程设计和特征选择,SFOA仍然需要在探索和利用之间取得更好的平衡,以确保在实际应用中的稳健性能。在本文中,我们提出了一种改进的SFOA算法,称为ORLGSFOA,它将基于对立的学习、强化学习和生长优化器与基本的SFOA相结合。该算法首先在初始化过程中引入了基于对手的学习策略,提高了初始解的多样性和质量。然后,将来自增长优化器的更新规则与SFOA相结合,以平衡勘探和开发。此外,ORLGSFOA集成了强化学习策略,通过在优化过程中增加更新位置来奖励SFOA和增长优化器中的获胜者,以增强全局收敛性。实验证明了ORLGSFOA的优越性能。在对来自经典、CEC2017和CEC2022套件的65个函数的综合基准测试中,ORLGSFOA通过获得更准确的解决方案,优于其他15种元启发式算法。此外,这种有效性直接转化为现实世界的应用程序,正如七个工程设计问题的测试所证明的那样。通过52个特征选择问题验证了ORLGSFOA解决离散组合优化问题的有效性,并将该算法推广到风工程场景。总之,ORLGSFOA在解决包括全局优化、工程设计和特征选择问题在内的广泛挑战方面表现出强大的有效性。ORLGSFOA的源代码可在https://ww2.mathworks.cn/matlabcentral/fileexchange/183223-orlgsfoa公开获取。
{"title":"Opposition and reinforcement learning growth-starfish optimization algorithm for engineering design and feature selection","authors":"Changting Zhong ,&nbsp;Hao Chen ,&nbsp;Dabo Xin ,&nbsp;Tong Xu ,&nbsp;Zeng Meng ,&nbsp;Xinwei Wang ,&nbsp;Ali Riza Yildiz ,&nbsp;Seyedali Mirjalili","doi":"10.1016/j.knosys.2026.115522","DOIUrl":"10.1016/j.knosys.2026.115522","url":null,"abstract":"<div><div>Starfish optimization algorithm (SFOA) is a bio-inspired metaheuristic algorithm for global optimization, which has demonstrated accuracy and efficiency in popular benchmark functions. However, for complex practical problems such as engineering design and feature selection, SFOA still requires a better balance between exploration and exploitation to ensure robust performance in real-world applications. In this paper, we present an improved SFOA algorithm named ORLGSFOA, which integrates opposition-based learning, reinforcement learning, and the growth optimizer with the basic SFOA. The algorithm first incorporates the opposition-based learning strategy during initialization to improve the diversity and quality of the initial solutions. Then, the updating rule from the growth optimizer is hybridized with SFOA to balance exploration and exploitation. Moreover, ORLGSFOA integrates the reinforcement learning strategy to reward the winner from SFOA and growth optimizer by adding updating positions during optimization to enhance global convergence. Experiments demonstrate the superior performance of ORLGSFOA. In comprehensive benchmark tests on 65 functions from classical, CEC2017, and CEC2022 suites, ORLGSFOA outperformed 15 other metaheuristic algorithms by achieving more accurate solutions. Additionally, this effectiveness translates directly to real-world applications, as is evidenced by tests on seven engineering design problems. Besides, the effectiveness of ORLGSFOA in solving discrete combinatorial optimization problems is verified through 52 feature selection problems, and the algorithm is extended to the wind engineering scenarios. In conclusion, ORLGSFOA demonstrates powerful efficacy in addressing a wide range of challenges, including global optimization, engineering design, and feature selection problems. The source code of ORLGSFOA is publicly available at: <span><span>https://ww2.mathworks.cn/matlabcentral/fileexchange/183223-orlgsfoa</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115522"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking static weights: Language-guided adaptive weight adjustment for 3D visual grounding 重新思考静态权重:三维视觉基础的语言引导自适应权重调整
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-01 DOI: 10.1016/j.knosys.2026.115467
Zongshun Wang , Ce Li , Zhiqiang Feng , Limei Xiao , Pengcheng Wang , Mengmeng Ping
3D Visual Grounding (3DVG) aims to accurately localize target objects in complex 3D point cloud scenes using natural language descriptions. However, current methods typically utilize static visual encoders with fixed parameters to handle the infinite variety of linguistic queries. This static approach inevitably leads to low signal-to-noise ratios in the feature inputs during the subsequent visual-language fusion stage. To overcome this limitation, we propose a Language-guided Adaptive Weight Adjustment (LAWA) framework that equips the visual backbone with query-aware dynamic adaptability during the early visual encoding stage via a lightweight language-guided strategy. Specifically, we first construct visual features that integrate class prior information using Object Semantic Augmented Encoding. Then, by leveraging weight coefficients derived from multimodal embeddings, we employ a Low-Rank Adaptation-based Dynamic Weight Adjustment (DWA) module to update the linear projection layers and weight matrices within the visual encoder’s attention mechanism. This approach enables the model to focus more effectively on visual regions that are semantically aligned with the textual descriptions. Extensive experiments demonstrate that LAWA achieves an [email protected] of 86.2% on the ScanRefer dataset, and overall accuracies of 69.5% and 58.4% on the Sr3D and Nr3D datasets, respectively, all while maintaining superior parameter efficiency.
3D视觉定位(3D Visual Grounding, 3DVG)旨在利用自然语言描述在复杂的三维点云场景中精确定位目标物体。然而,目前的方法通常使用具有固定参数的静态视觉编码器来处理无穷多种语言查询。这种静态方法不可避免地导致在随后的视觉语言融合阶段特征输入的低信噪比。为了克服这一限制,我们提出了一种语言引导的自适应权重调整(LAWA)框架,该框架通过轻量级的语言引导策略,在早期视觉编码阶段为视觉主干提供查询感知的动态适应性。具体而言,我们首先使用对象语义增强编码构建集成类先验信息的视觉特征。然后,通过利用从多模态嵌入中获得的权重系数,我们采用基于低秩自适应的动态权重调整(DWA)模块来更新视觉编码器注意机制中的线性投影层和权重矩阵。这种方法使模型能够更有效地关注语义上与文本描述一致的视觉区域。大量实验表明,LAWA在scanreference数据集上的[email protected]准确率为86.2%,在Sr3D和Nr3D数据集上的总体准确率分别为69.5%和58.4%,同时保持了优越的参数效率。
{"title":"Rethinking static weights: Language-guided adaptive weight adjustment for 3D visual grounding","authors":"Zongshun Wang ,&nbsp;Ce Li ,&nbsp;Zhiqiang Feng ,&nbsp;Limei Xiao ,&nbsp;Pengcheng Wang ,&nbsp;Mengmeng Ping","doi":"10.1016/j.knosys.2026.115467","DOIUrl":"10.1016/j.knosys.2026.115467","url":null,"abstract":"<div><div>3D Visual Grounding (3DVG) aims to accurately localize target objects in complex 3D point cloud scenes using natural language descriptions. However, current methods typically utilize static visual encoders with fixed parameters to handle the infinite variety of linguistic queries. This static approach inevitably leads to low signal-to-noise ratios in the feature inputs during the subsequent visual-language fusion stage. To overcome this limitation, we propose a Language-guided Adaptive Weight Adjustment (LAWA) framework that equips the visual backbone with query-aware dynamic adaptability during the early visual encoding stage via a lightweight language-guided strategy. Specifically, we first construct visual features that integrate class prior information using Object Semantic Augmented Encoding. Then, by leveraging weight coefficients derived from multimodal embeddings, we employ a Low-Rank Adaptation-based Dynamic Weight Adjustment (DWA) module to update the linear projection layers and weight matrices within the visual encoder’s attention mechanism. This approach enables the model to focus more effectively on visual regions that are semantically aligned with the textual descriptions. Extensive experiments demonstrate that LAWA achieves an [email protected] of 86.2% on the ScanRefer dataset, and overall accuracies of 69.5% and 58.4% on the Sr3D and Nr3D datasets, respectively, all while maintaining superior parameter efficiency.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115467"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of double Beltrami horn surface resistor networks and efficient path planning 双贝特拉米角表面电阻网络分析及有效路径规划
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-08 DOI: 10.1016/j.knosys.2026.115489
Xiaoyu Jiang , Jianwei Dai , Yanpeng Zheng , Zhaolin Jiang
Resistor networks, valued for their topological versatility and stable electrical properties, have emerged as a focal point across multiple disciplines. Yet, resistor networks with profound mathematical and physical significance remain largely unexplored. This study presents a detailed investigation of the double Beltrami horn surface resistor network and proposes an interpretable reasoning framework based on graph structures and grounded in physical laws. To improve the efficiency of large scale computation, the seventh type of discrete sine transform and Chebyshev polynomials of the first class are employed to derive the exact potential formula. In addition to generating potential distribution diagrams for various special scenarios, a fast algorithm is developed to significantly enhance the efficiency of potential computation. Furthermore, to expand the application potential of the resistor network, an efficient path planning algorithm based on the exact potential formula is proposed, and its applicability in dynamic environments is validated in preliminary experiments.
电阻器网络因其拓扑通用性和稳定的电性能而受到重视,已成为多个学科的焦点。然而,具有深刻数学和物理意义的电阻网络在很大程度上仍未被探索。本文对双贝尔特拉米角表面电阻网络进行了详细的研究,并提出了一个基于图结构和基于物理定律的可解释推理框架。为了提高大规模计算的效率,采用了第七类离散正弦变换和第一类切比雪夫多项式来推导精确势公式。除了针对各种特殊情况生成势分布图外,还开发了一种快速算法,大大提高了势计算的效率。为了扩大电阻器网络的应用潜力,提出了一种基于精确电势公式的高效路径规划算法,并通过初步实验验证了该算法在动态环境中的适用性。
{"title":"Analysis of double Beltrami horn surface resistor networks and efficient path planning","authors":"Xiaoyu Jiang ,&nbsp;Jianwei Dai ,&nbsp;Yanpeng Zheng ,&nbsp;Zhaolin Jiang","doi":"10.1016/j.knosys.2026.115489","DOIUrl":"10.1016/j.knosys.2026.115489","url":null,"abstract":"<div><div>Resistor networks, valued for their topological versatility and stable electrical properties, have emerged as a focal point across multiple disciplines. Yet, resistor networks with profound mathematical and physical significance remain largely unexplored. This study presents a detailed investigation of the double Beltrami horn surface resistor network and proposes an interpretable reasoning framework based on graph structures and grounded in physical laws. To improve the efficiency of large scale computation, the seventh type of discrete sine transform and Chebyshev polynomials of the first class are employed to derive the exact potential formula. In addition to generating potential distribution diagrams for various special scenarios, a fast algorithm is developed to significantly enhance the efficiency of potential computation. Furthermore, to expand the application potential of the resistor network, an efficient path planning algorithm based on the exact potential formula is proposed, and its applicability in dynamic environments is validated in preliminary experiments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115489"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FeNeC: Enhancing continual learning via feature clustering with neighbor- or logit-based classification FeNeC:通过基于邻居或基于逻辑的分类的特征聚类来增强持续学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-03 DOI: 10.1016/j.knosys.2026.115479
Kamil Książek , Hubert Jastrzębski , Krzysztof Pniaczek , Bartosz Trojan , Michał Karp , Jacek Tabor
The ability of deep learning models to learn continuously is essential for adapting to new data categories and evolving data distributions. In recent years, approaches leveraging frozen feature extractors after an initial learning phase have been extensively studied. Many of these methods estimate per-class covariance matrices and prototypes based on backbone-derived feature representations. Within this paradigm, we introduce FeNeC (Feature Neighborhood Classifier) and FeNeC-Log, its variant based on the log-likelihood function. Our approach significantly extends the concept of per-class prototypes by constructing multiple, fine-grained sub-prototypes for each class, thereby enhancing the representation of class distributions. Utilizing the Mahalanobis distance, our models classify samples either through a nearest neighbor assignment to these sub-prototypes or trainable logit values assigned to consecutive classes. Our proposition can be seen as a generalization that reduces to existing single-prototype approaches in a special case, while extending them with the ability for more flexible adaptation to data. We demonstrate that our FeNeC variants establish state-of-the-art results across several benchmarks, proving particularly effective on CIFAR-100 and the complex ImageNet-Subset, where our method outperforms the strong FeCAM baseline by over 1% in average incremental accuracy and 1.5% in last task accuracy.
深度学习模型持续学习的能力对于适应新的数据类别和不断变化的数据分布至关重要。近年来,在初始学习阶段之后利用冻结特征提取器的方法得到了广泛的研究。这些方法中有许多是基于主干衍生的特征表示来估计每个类的协方差矩阵和原型的。在这个范例中,我们引入了FeNeC (Feature Neighborhood Classifier)和FeNeC- log,它是基于对数似然函数的变体。我们的方法通过为每个类构建多个细粒度的子原型,显著扩展了每个类原型的概念,从而增强了类分布的表示。利用马氏距离,我们的模型通过对这些子原型的最近邻分配或分配给连续类的可训练logit值对样本进行分类。我们的命题可以被看作是一种泛化,在特殊情况下减少到现有的单原型方法,同时扩展它们以更灵活地适应数据的能力。我们证明了我们的FeNeC变体在几个基准测试中建立了最先进的结果,证明在CIFAR-100和复杂的imagenet -子集上特别有效,其中我们的方法在平均增量精度上优于强FeCAM基线超过1%,在最后任务精度上优于1.5%。
{"title":"FeNeC: Enhancing continual learning via feature clustering with neighbor- or logit-based classification","authors":"Kamil Książek ,&nbsp;Hubert Jastrzębski ,&nbsp;Krzysztof Pniaczek ,&nbsp;Bartosz Trojan ,&nbsp;Michał Karp ,&nbsp;Jacek Tabor","doi":"10.1016/j.knosys.2026.115479","DOIUrl":"10.1016/j.knosys.2026.115479","url":null,"abstract":"<div><div>The ability of deep learning models to learn continuously is essential for adapting to new data categories and evolving data distributions. In recent years, approaches leveraging frozen feature extractors after an initial learning phase have been extensively studied. Many of these methods estimate per-class covariance matrices and prototypes based on backbone-derived feature representations. Within this paradigm, we introduce FeNeC (Feature Neighborhood Classifier) and FeNeC-Log, its variant based on the log-likelihood function. Our approach significantly extends the concept of per-class prototypes by constructing multiple, fine-grained sub-prototypes for each class, thereby enhancing the representation of class distributions. Utilizing the Mahalanobis distance, our models classify samples either through a nearest neighbor assignment to these sub-prototypes or trainable logit values assigned to consecutive classes. Our proposition can be seen as a generalization that reduces to existing single-prototype approaches in a special case, while extending them with the ability for more flexible adaptation to data. We demonstrate that our FeNeC variants establish state-of-the-art results across several benchmarks, proving particularly effective on CIFAR-100 and the complex ImageNet-Subset, where our method outperforms the strong FeCAM baseline by over 1% in average incremental accuracy and 1.5% in last task accuracy.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115479"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on anomaly segmentation in urban scene understanding with image data 基于图像数据的城市场景理解异常分割研究
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-09 DOI: 10.1016/j.knosys.2026.115521
Yuxuan Zhang , Shuchang Wang , Zhenbo Shi , Wei Yang
Semantic segmentation has made significant advancements over the past decade. However, it typically relies on a closed-set taxonomy, which limits its ability to generalize to objects of unknown categories. This limitation poses security risks in real-world applications, such as autonomous vehicles. To address this issue, anomaly segmentation in urban scene understanding has gained considerable attention, aiming to identify and segment outliers effectively. Considering the rapid progress in anomaly segmentation in recent years, there is no comprehensive survey of the latest developments in this field. In this paper, we systematically summarize recent advancements and introduce a novel perspective to categorize these approaches based on their underlying motivations. We then analyze the performance of each approach on several public leaderboards, demonstrating that this categorization criteria reflects the development trends of recent progress. Additionally, we identify existing challenges and outlook for potential future research directions.
语义分割在过去十年中取得了重大进展。然而,它通常依赖于闭集分类法,这限制了它泛化到未知类别对象的能力。这一限制在自动驾驶汽车等现实应用中带来了安全风险。为了解决这一问题,城市场景理解中的异常分割得到了广泛的关注,旨在有效地识别和分割异常点。由于近年来异常分割研究进展迅速,目前还没有对该领域的最新进展进行全面的综述。在本文中,我们系统地总结了最近的进展,并介绍了一种新的视角,根据它们的潜在动机对这些方法进行分类。然后,我们分析了每种方法在多个公共排行榜上的表现,证明这种分类标准反映了近期进展的发展趋势。此外,我们还确定了现有的挑战和对未来潜在研究方向的展望。
{"title":"A survey on anomaly segmentation in urban scene understanding with image data","authors":"Yuxuan Zhang ,&nbsp;Shuchang Wang ,&nbsp;Zhenbo Shi ,&nbsp;Wei Yang","doi":"10.1016/j.knosys.2026.115521","DOIUrl":"10.1016/j.knosys.2026.115521","url":null,"abstract":"<div><div>Semantic segmentation has made significant advancements over the past decade. However, it typically relies on a closed-set taxonomy, which limits its ability to generalize to objects of unknown categories. This limitation poses security risks in real-world applications, such as autonomous vehicles. To address this issue, anomaly segmentation in urban scene understanding has gained considerable attention, aiming to identify and segment outliers effectively. Considering the rapid progress in anomaly segmentation in recent years, there is no comprehensive survey of the latest developments in this field. In this paper, we systematically summarize recent advancements and introduce a novel perspective to categorize these approaches based on their underlying motivations. We then analyze the performance of each approach on several public leaderboards, demonstrating that this categorization criteria reflects the development trends of recent progress. Additionally, we identify existing challenges and outlook for potential future research directions.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115521"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boundary-aware and multi-angle modeling-based object tracking in polarimetric images 基于边界感知和多角度建模的偏振图像目标跟踪
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-04-08 Epub Date: 2026-02-03 DOI: 10.1016/j.knosys.2026.115442
Qiaohui Wang , Fan Shi , Mianzhao Wang , Xinbo Geng , Meng Zhao
Object tracking is a fundamental task in computer vision with applications ranging from surveillance to autonomous driving. Although RGB-based tracking methods have seen significant advancements by leveraging color and texture features, they often struggle under challenging conditions such as low light, occlusions, and fast motion. Polarimetric imaging, which encodes surface properties, material characteristics, and geometric structures, offers unique advantages as a complementary modality. However, its potential remains underexplored due to the lack of large-scale datasets and specialized algorithms designed for polarization-specific features. To address this gap, we introduce POL, the first large-scale benchmark dataset for polarimetric vision that enables comprehensive evaluations under diverse conditions. Building on this dataset, we propose PMTT, a cross-modal transformer framework that integrates polarimetric and RGB data. The Detailed Feature Prompter (DFP) module extracts boundary and multi-angle features from polarimetric images, while the Spatial-Channel Attention (SCA) mechanism enhances feature recognition in complex environments. Extensive experiments confirm that PMTT superior performance and robustness, highlighting the transformative potential of polarimetric imaging for dynamic object tracking.
目标跟踪是计算机视觉的一项基本任务,其应用范围从监视到自动驾驶。尽管基于rgb的跟踪方法通过利用颜色和纹理特征取得了重大进展,但它们经常在低光、遮挡和快速运动等具有挑战性的条件下挣扎。偏振成像,编码表面特性,材料特性和几何结构,提供了独特的优势,作为一种互补的模式。然而,由于缺乏大规模的数据集和专门针对极化特征设计的算法,其潜力仍未得到充分开发。为了解决这一差距,我们引入了POL,这是偏振视觉的第一个大规模基准数据集,可以在不同条件下进行综合评估。在此数据集的基础上,我们提出了PMTT,这是一个集成极化和RGB数据的跨模态变压器框架。细节特征提示器(DFP)模块从偏振图像中提取边界和多角度特征,而空间通道注意(SCA)机制增强了复杂环境下的特征识别。大量的实验证实了PMTT优越的性能和鲁棒性,突出了偏振成像在动态目标跟踪方面的变革潜力。
{"title":"Boundary-aware and multi-angle modeling-based object tracking in polarimetric images","authors":"Qiaohui Wang ,&nbsp;Fan Shi ,&nbsp;Mianzhao Wang ,&nbsp;Xinbo Geng ,&nbsp;Meng Zhao","doi":"10.1016/j.knosys.2026.115442","DOIUrl":"10.1016/j.knosys.2026.115442","url":null,"abstract":"<div><div>Object tracking is a fundamental task in computer vision with applications ranging from surveillance to autonomous driving. Although RGB-based tracking methods have seen significant advancements by leveraging color and texture features, they often struggle under challenging conditions such as low light, occlusions, and fast motion. Polarimetric imaging, which encodes surface properties, material characteristics, and geometric structures, offers unique advantages as a complementary modality. However, its potential remains underexplored due to the lack of large-scale datasets and specialized algorithms designed for polarization-specific features. To address this gap, we introduce POL, the first large-scale benchmark dataset for polarimetric vision that enables comprehensive evaluations under diverse conditions. Building on this dataset, we propose PMTT, a cross-modal transformer framework that integrates polarimetric and RGB data. The Detailed Feature Prompter (DFP) module extracts boundary and multi-angle features from polarimetric images, while the Spatial-Channel Attention (SCA) mechanism enhances feature recognition in complex environments. Extensive experiments confirm that PMTT superior performance and robustness, highlighting the transformative potential of polarimetric imaging for dynamic object tracking.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"338 ","pages":"Article 115442"},"PeriodicalIF":7.6,"publicationDate":"2026-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146175321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1