首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
Multiscale scattering forests: A domain-generalizing approach for fault diagnosis under data constraints 多尺度散射森林:数据约束下故障诊断的域泛化方法
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.knosys.2026.115389
Zhuyun Chen , Hongqi Lin , Youpeng Gao , Jingke He , Zehao Li , Weihua Li , Qiang Liu
Currently, deep learning-based intelligent fault diagnosis techniques have been widely used in the manufacturing industry. However, due to various constraints, fault data for rotating machinery is often limited. Moreover, in real industrial environments, operating conditions of rotating machinery vary based on task requirements, leading to significant data variability across different operating conditions. This variability presents a major challenge for few-shot fault diagnosis, especially in scenarios requiring domain generalization across diverse operating conditions. To address this challenge, this paper proposes multiscale scattering forests (MSF): a domain-generalizing approach for fault diagnosis under data constraints. Firstly, a multiscale wavelet scattering predefined layer is designed to extract robust invariant features from input samples, where these scattering coefficients are concatenated and then used as new samples resulting from the data enhancement of the original samples. Then, a deep stacked ensemble forests with skip connection is designed to handle the transformed multiscale samples, allowing earlier information to jump over layers and improving the model’s feature representation capabilities. Finally, a similarity metric-based weighting learning strategy is developed to implement diagnostic results of each forest, integrating the models of assigning weights into an ensemble framework to enhance domain generalization performance under various operation conditions. The MSF model is comprehensively evaluated using a computer numerical control (CNC) machine tool spindle bearing dataset in an industrial environment. Experimental results demonstrate that the proposed approach not only exhibits strong diagnostic and generalization performance in few-shot scenarios without the support of additional source domains but also outperforms other state-of-the-art few-shot fault diagnosis methods.
目前,基于深度学习的智能故障诊断技术已广泛应用于制造业。然而,由于各种限制,旋转机械的故障数据往往是有限的。此外,在实际工业环境中,旋转机械的操作条件因任务要求而异,导致不同操作条件下的数据存在显著差异。这种可变性对少量故障诊断提出了主要挑战,特别是在需要跨不同操作条件进行域泛化的情况下。为了解决这一问题,本文提出了多尺度散射森林(MSF)——一种数据约束下的故障诊断域泛化方法。首先,设计一个多尺度小波散射预定义层,从输入样本中提取鲁棒不变特征,将这些散射系数串联起来,作为原始样本数据增强后的新样本;然后,设计了一个具有跳跃连接的深度堆叠集成森林来处理转换后的多尺度样本,允许早期信息跳过层,提高模型的特征表示能力。最后,提出了一种基于相似度度量的加权学习策略来实现每个森林的诊断结果,并将权重分配模型集成到一个集成框架中,以提高不同操作条件下的领域泛化性能。利用工业环境下的计算机数控机床主轴轴承数据集对MSF模型进行了综合评估。实验结果表明,该方法不仅在没有额外源域支持的情况下具有较强的诊断性能和泛化性能,而且优于其他先进的少弹故障诊断方法。
{"title":"Multiscale scattering forests: A domain-generalizing approach for fault diagnosis under data constraints","authors":"Zhuyun Chen ,&nbsp;Hongqi Lin ,&nbsp;Youpeng Gao ,&nbsp;Jingke He ,&nbsp;Zehao Li ,&nbsp;Weihua Li ,&nbsp;Qiang Liu","doi":"10.1016/j.knosys.2026.115389","DOIUrl":"10.1016/j.knosys.2026.115389","url":null,"abstract":"<div><div>Currently, deep learning-based intelligent fault diagnosis techniques have been widely used in the manufacturing industry. However, due to various constraints, fault data for rotating machinery is often limited. Moreover, in real industrial environments, operating conditions of rotating machinery vary based on task requirements, leading to significant data variability across different operating conditions. This variability presents a major challenge for few-shot fault diagnosis, especially in scenarios requiring domain generalization across diverse operating conditions. To address this challenge, this paper proposes multiscale scattering forests (MSF): a domain-generalizing approach for fault diagnosis under data constraints. Firstly, a multiscale wavelet scattering predefined layer is designed to extract robust invariant features from input samples, where these scattering coefficients are concatenated and then used as new samples resulting from the data enhancement of the original samples. Then, a deep stacked ensemble forests with skip connection is designed to handle the transformed multiscale samples, allowing earlier information to jump over layers and improving the model’s feature representation capabilities. Finally, a similarity metric-based weighting learning strategy is developed to implement diagnostic results of each forest, integrating the models of assigning weights into an ensemble framework to enhance domain generalization performance under various operation conditions. The MSF model is comprehensively evaluated using a computer numerical control (CNC) machine tool spindle bearing dataset in an industrial environment. Experimental results demonstrate that the proposed approach not only exhibits strong diagnostic and generalization performance in few-shot scenarios without the support of additional source domains but also outperforms other state-of-the-art few-shot fault diagnosis methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115389"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Omniscient bottom-up double-stream symmetric network for image captioning 全知自底向上双流对称网络图像字幕
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.knosys.2026.115366
Jianchao Li, Wei Zhou, Kai Wang, Haifeng Hu
Transformer-based image captioning models have achieved promising performance through various effective learning schemes. We contend that a truly comprehensive learning schema, defined as omniscient learning, encompasses two components: 1) a hierarchical knowledge base with low redundancy as input, and 2) a bottom-up layer-wise network as architecture. While previous captioning models primarily focus on network design and neglect the construction of knowledge base. In this paper, our hierarchical knowledge base is constituted by personalized knowledge of real-time features and contextual knowledge of consensus. Simultaneously, we devise a bottom-up double-stream symmetric network (BuNet) to progressively learn layered features. Specifically, the hierarchical knowledge base includes single-image region and grid features from the local-domain and contextual knowledge tokens from the broad-domain. Correspondingly, BuNet is divided into local-domain self-learning (LDS) stage and broad-domain consensus-learning (BDC) stage. Besides, we explore noise decoupling strategies to illustrate the extraction of contextual knowledge tokens. Furthermore, the knowledge disparity between region and grid reveals that the purely “symmetric network” of BuNet cannot effectively capture additional spatial relationships present in the region stream. Consequently, we design relative spatial encoding in LDS stage of BuNet to learn regional spatial knowledge. In addition, we employ a lightweight backbone to reduce computational complexity while providing a simple paradigm for omniscient learning. Our method is extensively tested on MS-COCO and Flickr30K, where it achieves better performance than some captioning models.
通过各种有效的学习方案,基于变压器的图像字幕模型取得了令人满意的效果。我们认为,一个真正全面的学习模式,定义为全知学习,包括两个组成部分:1)一个具有低冗余的分层知识库作为输入,2)一个自下而上的分层网络作为架构。而以往的字幕模型主要关注网络设计,忽略了知识库的构建。在本文中,我们的分层知识库由实时特征的个性化知识和共识的语境知识组成。同时,我们设计了一个自下而上的双流对称网络(BuNet)来逐步学习分层特征。具体来说,层次知识库包括来自局部域的单个图像区域和网格特征以及来自广泛域的上下文知识标记。相应地,BuNet分为局域自学习(LDS)阶段和广域共识学习(BDC)阶段。此外,我们还探讨了噪声解耦策略来说明上下文知识令牌的提取。此外,区域和网格之间的知识差异表明,纯“对称网络”的BuNet不能有效捕获区域流中存在的额外空间关系。因此,我们在BuNet的LDS阶段设计了相对空间编码来学习区域空间知识。此外,我们采用轻量级主干来降低计算复杂性,同时为全知学习提供简单的范例。我们的方法在MS-COCO和Flickr30K上进行了广泛的测试,取得了比某些字幕模型更好的性能。
{"title":"Omniscient bottom-up double-stream symmetric network for image captioning","authors":"Jianchao Li,&nbsp;Wei Zhou,&nbsp;Kai Wang,&nbsp;Haifeng Hu","doi":"10.1016/j.knosys.2026.115366","DOIUrl":"10.1016/j.knosys.2026.115366","url":null,"abstract":"<div><div>Transformer-based image captioning models have achieved promising performance through various effective learning schemes. We contend that a truly comprehensive learning schema, defined as omniscient learning, encompasses two components: 1) a hierarchical knowledge base with low redundancy as input, and 2) a bottom-up layer-wise network as architecture. While previous captioning models primarily focus on network design and neglect the construction of knowledge base. In this paper, our hierarchical knowledge base is constituted by personalized knowledge of real-time features and contextual knowledge of consensus. Simultaneously, we devise a bottom-up double-stream symmetric network (BuNet) to progressively learn layered features. Specifically, the hierarchical knowledge base includes single-image region and grid features from the local-domain and contextual knowledge tokens from the broad-domain. Correspondingly, BuNet is divided into local-domain self-learning (LDS) stage and broad-domain consensus-learning (BDC) stage. Besides, we explore noise decoupling strategies to illustrate the extraction of contextual knowledge tokens. Furthermore, the knowledge disparity between region and grid reveals that the purely “symmetric network” of BuNet cannot effectively capture additional spatial relationships present in the region stream. Consequently, we design relative spatial encoding in LDS stage of BuNet to learn regional spatial knowledge. In addition, we employ a lightweight backbone to reduce computational complexity while providing a simple paradigm for omniscient learning. Our method is extensively tested on MS-COCO and Flickr30K, where it achieves better performance than some captioning models.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115366"},"PeriodicalIF":7.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UECNet: A unified framework for exposure correction utilizing region-level prompts UECNet:使用区域级提示进行曝光校正的统一框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1016/j.knosys.2026.115365
Shucheng Xia , Kan Chang , Yuqing Li , Mingyang Ling , Xuxin Tai , Yehua Ling , Yujian Yuan , Zan Gao
In real-world scenarios, complex illumination often causes improper exposure in images. Most existing correction methods assume uniform exposure degradation across the entire image, leading to suboptimal performance when multiple exposure degradations coexist in a single image. To address this limitation, we propose UECNet, a Unified Exposure Correction Network guided by region-level prompts. Specifically, we first derive five degradation-specific text prompts through prompt tuning. These prompts are fed into our Exposure Prompts Generation (EPG) module, which generates spatially adaptive, region-level descriptors to characterize local exposure properties. To effectively integrate these region-specific descriptors into the exposure correction pipeline, we design a Prompt-guided Token Mixer (PTM) module. The PTM enables global interactive modeling between high-dimensional visual features and region-level prompts, thereby dynamically steering the correction process. UECNet is built by incorporating EPG and PTM into a U-shaped Transformer backbone. Furthermore, we introduce SICE-DE (SICE-based Diverse Exposure), a new benchmark dataset reorganized from the well-known SICE dataset, to facilitate effective training and comprehensive evaluation. SICE-DE covers six distinct exposure conditions, including challenging severe over/underexposure and non-uniform exposure. Extensive experiments demonstrate that the proposed UECNet consistently outperforms state-of-the-art methods on multiple exposure correction benchmarks. Our code and the SICE-DE dataset will be available at https://github.com/ShuchengXia/UECNet.
在现实世界中,复杂的照明经常会导致图像曝光不当。大多数现有的校正方法假设整个图像的曝光退化是均匀的,当多个曝光退化共存于单个图像时,导致性能不理想。为了解决这一限制,我们提出了UECNet,一个由区域级提示引导的统一曝光校正网络。具体来说,我们首先通过提示调优获得5个特定于降级的文本提示。这些提示被输入到我们的暴露提示生成(EPG)模块中,该模块生成空间自适应的区域级描述符,以表征局部暴露属性。为了有效地将这些特定区域的描述符集成到曝光校正管道中,我们设计了一个提示引导的令牌混合器(PTM)模块。PTM支持高维视觉特征和区域级提示之间的全局交互建模,从而动态地指导校正过程。UECNet是通过将EPG和PTM整合到u型Transformer主干中而构建的。此外,我们引入了SICE- de (SICE-based Diverse Exposure),这是一个从著名的SICE数据集重组的新的基准数据集,以促进有效的训练和综合评估。SICE-DE涵盖六种不同的曝光条件,包括具有挑战性的严重过度/不足曝光和不均匀曝光。广泛的实验表明,所提出的UECNet在多重曝光校正基准上始终优于最先进的方法。我们的代码和SICE-DE数据集可以在https://github.com/ShuchengXia/UECNet上获得。
{"title":"UECNet: A unified framework for exposure correction utilizing region-level prompts","authors":"Shucheng Xia ,&nbsp;Kan Chang ,&nbsp;Yuqing Li ,&nbsp;Mingyang Ling ,&nbsp;Xuxin Tai ,&nbsp;Yehua Ling ,&nbsp;Yujian Yuan ,&nbsp;Zan Gao","doi":"10.1016/j.knosys.2026.115365","DOIUrl":"10.1016/j.knosys.2026.115365","url":null,"abstract":"<div><div>In real-world scenarios, complex illumination often causes improper exposure in images. Most existing correction methods assume uniform exposure degradation across the entire image, leading to suboptimal performance when multiple exposure degradations coexist in a single image. To address this limitation, we propose UECNet, a Unified Exposure Correction Network guided by region-level prompts. Specifically, we first derive five degradation-specific text prompts through prompt tuning. These prompts are fed into our Exposure Prompts Generation (EPG) module, which generates spatially adaptive, region-level descriptors to characterize local exposure properties. To effectively integrate these region-specific descriptors into the exposure correction pipeline, we design a Prompt-guided Token Mixer (PTM) module. The PTM enables global interactive modeling between high-dimensional visual features and region-level prompts, thereby dynamically steering the correction process. UECNet is built by incorporating EPG and PTM into a U-shaped Transformer backbone. Furthermore, we introduce SICE-DE (SICE-based Diverse Exposure), a new benchmark dataset reorganized from the well-known SICE dataset, to facilitate effective training and comprehensive evaluation. SICE-DE covers six distinct exposure conditions, including challenging severe over/underexposure and non-uniform exposure. Extensive experiments demonstrate that the proposed UECNet consistently outperforms state-of-the-art methods on multiple exposure correction benchmarks. Our code and the SICE-DE dataset will be available at <span><span>https://github.com/ShuchengXia/UECNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115365"},"PeriodicalIF":7.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal inference for reliable chest X-ray report generation 可靠的胸部x线报告生成的因果推断
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1016/j.knosys.2026.115377
Haoxiang Liu , Sijun Bao , Shugeng Zhang , Jiancheng Wu , Chenhong Cao , Wei Gong
Automated report generation for chest X-ray images poses significant challenges due to the extreme imbalance between normal and abnormal regions: the majority of image areas are unremarkable, while only a minority exhibit clinically relevant abnormalities, leading to prevalent “normal” descriptions in reports. Traditional association-based generative models often learn biased mappings that favor normal template descriptions regardless of actual image content, thereby compromising clinical reliability. To address this, we propose a causal inference framework CausalRx that explicitly models the causal pathway from visual evidence to detected pathology and finally to report text, accounting for potential spurious associations. Our approach introduces a causal diagram at the image region level, where the presence or absence of pathology is estimated and intervened upon as a mediating variable during report generation. We employ a two-stage process: a detection model is first utilized to infer the pathology status for abnormal regions, then a report generation vision language model is trained to produce the report conditioning primarily on the intervened pathology states rather than direct image-to-text correlations. Moreover, counterfactual data augmentation techniques are leveraged to ensure the generated statements are causally controlled. To further enhance the generation performance, reinforcement learning is employed to optimize the model based on the causality-based model. Experimental results demonstrate that our causally grounded framework achieves a substantial average improvement of 23.1% and 19.8% on IU-Xray and MIMIC-CXR datasets compared to traditional association-based method, respectively.
由于正常和异常区域之间的极度不平衡,胸部x线图像的自动报告生成带来了巨大的挑战:大多数图像区域不显著,而只有少数表现出临床相关的异常,导致报告中普遍存在“正常”描述。传统的基于关联的生成模型经常学习偏向于正常模板描述的映射,而不考虑实际图像内容,从而影响临床可靠性。为了解决这个问题,我们提出了一个因果推理框架CausalRx,它明确地模拟了从视觉证据到检测到的病理,最后到报告文本的因果路径,考虑到潜在的虚假关联。我们的方法在图像区域级别引入了因果图,其中病理的存在或不存在被估计并作为报告生成过程中的中介变量进行干预。我们采用了两个阶段的过程:首先使用检测模型来推断异常区域的病理状态,然后训练报告生成视觉语言模型,主要根据干预的病理状态而不是直接的图像到文本相关性来生成报告。此外,还利用反事实数据增强技术来确保生成的语句受到因果控制。为了进一步提高生成性能,在基于因果关系模型的基础上,采用强化学习对模型进行优化。实验结果表明,与传统的基于关联的方法相比,我们的因果接地框架在IU-Xray和MIMIC-CXR数据集上的平均性能分别提高了23.1%和19.8%。
{"title":"Causal inference for reliable chest X-ray report generation","authors":"Haoxiang Liu ,&nbsp;Sijun Bao ,&nbsp;Shugeng Zhang ,&nbsp;Jiancheng Wu ,&nbsp;Chenhong Cao ,&nbsp;Wei Gong","doi":"10.1016/j.knosys.2026.115377","DOIUrl":"10.1016/j.knosys.2026.115377","url":null,"abstract":"<div><div>Automated report generation for chest X-ray images poses significant challenges due to the extreme imbalance between normal and abnormal regions: the majority of image areas are unremarkable, while only a minority exhibit clinically relevant abnormalities, leading to prevalent “normal” descriptions in reports. Traditional association-based generative models often learn biased mappings that favor normal template descriptions regardless of actual image content, thereby compromising clinical reliability. To address this, we propose a causal inference framework CausalRx that explicitly models the causal pathway from visual evidence to detected pathology and finally to report text, accounting for potential spurious associations. Our approach introduces a causal diagram at the image region level, where the presence or absence of pathology is estimated and intervened upon as a mediating variable during report generation. We employ a two-stage process: a detection model is first utilized to infer the pathology status for abnormal regions, then a report generation vision language model is trained to produce the report conditioning primarily on the intervened pathology states rather than direct image-to-text correlations. Moreover, counterfactual data augmentation techniques are leveraged to ensure the generated statements are causally controlled. To further enhance the generation performance, reinforcement learning is employed to optimize the model based on the causality-based model. Experimental results demonstrate that our causally grounded framework achieves a substantial average improvement of 23.1% and 19.8% on IU-Xray and MIMIC-CXR datasets compared to traditional association-based method, respectively.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115377"},"PeriodicalIF":7.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
REAL-SORT: RElation-aware for real-time multiple object tracking REAL-SORT:关系感知的实时多对象跟踪
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1016/j.knosys.2026.115373
Xinling Zhang , Huijuan Zhao , Shuangjiang He , Li Yu
Recent advancements in multi-object tracking (MOT) have accelerated progress in autonomous driving and human-computer interaction. Tracking-by-detection approaches remain dominant due to their computational efficiency and streamlined architectures. However, this paradigm faces two critical challenges that trade off tracking accuracy with real-time efficiency: (i) spatial cues often exhibit inconsistent reliability across diverse scenarios, limiting their effectiveness; and (ii) false associations across consecutive frames frequently cause tracking failures, undermining long-term robustness. To address these issues, we propose Relation-Aware Simple Online and Real-time Tracker (REAL-SORT), which effectively leverages both spatial and temporal relationships. Regarding spatial relations, we introduce two association strategies that incorporate occlusion-aware cross-scenario feature extraction and relative-position-based matching. On the temporal side, an ID Recovery Module (IRM) exploits multi-frame information to estimate ID lost probabilities, enabling robust trajectory recovery. Extensive experiments on DanceTrack, MOT17, and MOT20 benchmarks demonstrate that our method outperforms existing state-of-the-art trackers across HOTA, IDF1 and AssA metrics, particularly excelling in challenging scenarios. Furthermore, REAL-SORT exhibits strong generalizability, consistently improving performance when integrated into various leading tracking frameworks.
近年来多目标跟踪技术的发展加速了自动驾驶和人机交互技术的发展。由于其计算效率和精简的架构,检测跟踪方法仍然占主导地位。然而,这种模式面临着两个关键的挑战,即在跟踪准确性和实时效率之间进行权衡:(i)空间线索在不同的场景中往往表现出不一致的可靠性,限制了它们的有效性;并且(ii)跨连续帧的错误关联经常导致跟踪失败,破坏长期鲁棒性。为了解决这些问题,我们提出了关系感知简单在线实时跟踪器(REAL-SORT),它有效地利用了空间和时间关系。在空间关系方面,我们引入了两种关联策略,包括闭塞感知的跨场景特征提取和基于相对位置的匹配。在时间方面,ID恢复模块(IRM)利用多帧信息来估计ID丢失概率,从而实现稳健的轨迹恢复。在DanceTrack、MOT17和MOT20基准测试上进行的大量实验表明,我们的方法在HOTA、IDF1和AssA指标上优于现有的最先进的跟踪器,特别是在具有挑战性的场景中表现出色。此外,REAL-SORT显示出强大的泛化性,当集成到各种领先的跟踪框架中时,性能不断提高。
{"title":"REAL-SORT: RElation-aware for real-time multiple object tracking","authors":"Xinling Zhang ,&nbsp;Huijuan Zhao ,&nbsp;Shuangjiang He ,&nbsp;Li Yu","doi":"10.1016/j.knosys.2026.115373","DOIUrl":"10.1016/j.knosys.2026.115373","url":null,"abstract":"<div><div>Recent advancements in multi-object tracking (MOT) have accelerated progress in autonomous driving and human-computer interaction. Tracking-by-detection approaches remain dominant due to their computational efficiency and streamlined architectures. However, this paradigm faces two critical challenges that trade off tracking accuracy with real-time efficiency: (i) spatial cues often exhibit inconsistent reliability across diverse scenarios, limiting their effectiveness; and (ii) false associations across consecutive frames frequently cause tracking failures, undermining long-term robustness. To address these issues, we propose Relation-Aware Simple Online and Real-time Tracker (REAL-SORT), which effectively leverages both spatial and temporal relationships. Regarding spatial relations, we introduce two association strategies that incorporate occlusion-aware cross-scenario feature extraction and relative-position-based matching. On the temporal side, an ID Recovery Module (IRM) exploits multi-frame information to estimate ID lost probabilities, enabling robust trajectory recovery. Extensive experiments on DanceTrack, MOT17, and MOT20 benchmarks demonstrate that our method outperforms existing state-of-the-art trackers across HOTA, IDF1 and AssA metrics, particularly excelling in challenging scenarios. Furthermore, REAL-SORT exhibits strong generalizability, consistently improving performance when integrated into various leading tracking frameworks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115373"},"PeriodicalIF":7.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring prompt distributions and probability bias for long-tailed multi-label image recognition 探索长尾多标签图像识别的提示分布和概率偏差
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1016/j.knosys.2026.115363
Liuyi Fan, Xinbo Ai
In long-tailed multi-label image recognition, balancing inter-class distribution and mitigating the long-tail effect are crucial for improving model performance. We study the impact of prompt distribution characteristics and introduce a hierarchical traction strategy guided by Class Separability Index scores to strengthen boundaries among head, medium, and tail categories. Furthermore, since semantic coherence in word embeddings tends to be disrupted during prompt embedding learning, we smooth the learnable prompts with the continuous semantics of the original text, thereby improving performance. To mitigate the probability bias between head and tail classes in conventional CLIP predictions, we propose a logits calibration module that attends to salient and secondary labels, enhancing the confidence of correct categories, particularly for head classes. The learnable prompts are jointly optimized with a classification loss and a hierarchical distribution loss to enhance the model’s generalization ability. Comparative studies on the COCO-LT and VOC-LT benchmark datasets demonstrate the superiority and effectiveness of the proposed approach in long-tailed multi-label image recognition.
在长尾多标签图像识别中,平衡类间分布和减轻长尾效应是提高模型性能的关键。我们研究了提示分布特征的影响,并引入了一种以类别可分性指数为指导的分层牵引策略,以加强头、中、尾类别之间的边界。此外,由于词嵌入的语义连贯性在提示嵌入学习过程中容易被破坏,我们使用原始文本的连续语义来平滑可学习的提示,从而提高了性能。为了减轻传统CLIP预测中头部和尾部类别之间的概率偏差,我们提出了一个logits校准模块,该模块关注显著和次要标签,增强正确类别的置信度,特别是对于头部类别。可学习提示符与分类损失和分层分布损失联合优化,增强模型的泛化能力。通过对COCO-LT和VOC-LT基准数据集的对比研究,证明了该方法在长尾多标签图像识别中的优越性和有效性。
{"title":"Exploring prompt distributions and probability bias for long-tailed multi-label image recognition","authors":"Liuyi Fan,&nbsp;Xinbo Ai","doi":"10.1016/j.knosys.2026.115363","DOIUrl":"10.1016/j.knosys.2026.115363","url":null,"abstract":"<div><div>In long-tailed multi-label image recognition, balancing inter-class distribution and mitigating the long-tail effect are crucial for improving model performance. We study the impact of prompt distribution characteristics and introduce a hierarchical traction strategy guided by Class Separability Index scores to strengthen boundaries among head, medium, and tail categories. Furthermore, since semantic coherence in word embeddings tends to be disrupted during prompt embedding learning, we smooth the learnable prompts with the continuous semantics of the original text, thereby improving performance. To mitigate the probability bias between head and tail classes in conventional CLIP predictions, we propose a logits calibration module that attends to salient and secondary labels, enhancing the confidence of correct categories, particularly for head classes. The learnable prompts are jointly optimized with a classification loss and a hierarchical distribution loss to enhance the model’s generalization ability. Comparative studies on the COCO-LT and VOC-LT benchmark datasets demonstrate the superiority and effectiveness of the proposed approach in long-tailed multi-label image recognition.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115363"},"PeriodicalIF":7.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146025752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Isotropic3D: Image-to-3D generation based on a single CLIP embedding Isotropic3D:基于单个CLIP嵌入的图像到3d生成
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1016/j.knosys.2026.115367
Pengkun Liu , Yikai Wang , Hang Xiao , Hongxiang Xue , Xinzhou Wang , Fuchun Sun
Encouraged by the growing availability of pre-trained 2D diffusion models, image-to-3D generation by leveraging Score Distillation Sampling (SDS) is making remarkable progress. However, most existing approaches rely heavily on reference-view image supervision, which often disrupts the inductive priors of diffusion models and leads to distorted geometry or overly smooth back regions. To overcome these limitations, we propose Isotropic3D, a novel image-to-3D framework that takes only a single image CLIP embedding as input. Our method ensures azimuth-angle isotropy by relying exclusively on the SDS loss, avoiding overfitting to the reference image. Isotropic3D consists of two main components: a EMA-conditioned multi-view diffusion model (EMA-MVD) and a Neural Radiance Field (NeRF). The core of EMA-MVD lies in a two-stage fine-tuning. Firstly, we fine-tune a text-to-3D diffusion model by substituting its text encoder with an image encoder, by which the model preliminarily acquires image-to-image capabilities. Secondly, we perform fine-tuning using our Explicit Multi-view Attention (EMA), which combines noisy multi-view images with the noise-free reference image as an explicit condition. After fine-tuning, Isotropic3D, built upon SDS with NeRF, can generate multi-view consistent images from a single CLIP embedding and reconstruct a 3D model with improved symmetry, well-proportioned geometry, richly colored textures, and reduced distortion. The project page is available at https://pkunliu.github.io/Isotropic3D.
在预训练的2D扩散模型日益普及的鼓舞下,利用分数蒸馏采样(SDS)的图像到3d生成正在取得显著进展。然而,大多数现有方法严重依赖于参考视图图像监督,这往往会破坏扩散模型的归纳先验,导致几何变形或过于光滑的背面区域。为了克服这些限制,我们提出了Isotropic3D,这是一种新颖的图像到3d框架,仅将单个图像CLIP嵌入作为输入。我们的方法通过完全依赖SDS损失来确保方位角各向同性,避免了与参考图像的过拟合。Isotropic3D由两个主要部分组成:ema条件下的多视图扩散模型(EMA-MVD)和神经辐射场(NeRF)。EMA-MVD的核心是两个阶段的微调。首先,我们通过用图像编码器替换文本编码器对文本到3d扩散模型进行微调,从而初步获得图像到图像的能力。其次,我们使用显式多视图注意(EMA)进行微调,该方法将有噪声的多视图图像与无噪声的参考图像结合起来作为显式条件。经过微调后,Isotropic3D,建立在带有NeRF的SDS上,可以从单个CLIP嵌入中生成多视图一致的图像,并重建具有改进的对称性、均匀的几何形状、丰富的色彩纹理和减少失真的3D模型。项目页面可在https://pkunliu.github.io/Isotropic3D上找到。
{"title":"Isotropic3D: Image-to-3D generation based on a single CLIP embedding","authors":"Pengkun Liu ,&nbsp;Yikai Wang ,&nbsp;Hang Xiao ,&nbsp;Hongxiang Xue ,&nbsp;Xinzhou Wang ,&nbsp;Fuchun Sun","doi":"10.1016/j.knosys.2026.115367","DOIUrl":"10.1016/j.knosys.2026.115367","url":null,"abstract":"<div><div>Encouraged by the growing availability of pre-trained 2D diffusion models, image-to-3D generation by leveraging Score Distillation Sampling (SDS) is making remarkable progress. However, most existing approaches rely heavily on reference-view image supervision, which often disrupts the inductive priors of diffusion models and leads to distorted geometry or overly smooth back regions. To overcome these limitations, we propose Isotropic3D, a novel image-to-3D framework that takes only a single image CLIP embedding as input. Our method ensures azimuth-angle isotropy by relying exclusively on the SDS loss, avoiding overfitting to the reference image. Isotropic3D consists of two main components: a EMA-conditioned multi-view diffusion model (EMA-MVD) and a Neural Radiance Field (NeRF). The core of EMA-MVD lies in a two-stage fine-tuning. Firstly, we fine-tune a text-to-3D diffusion model by substituting its text encoder with an image encoder, by which the model preliminarily acquires image-to-image capabilities. Secondly, we perform fine-tuning using our Explicit Multi-view Attention (EMA), which combines noisy multi-view images with the noise-free reference image as an explicit condition. After fine-tuning, Isotropic3D, built upon SDS with NeRF, can generate multi-view consistent images from a single CLIP embedding and reconstruct a 3D model with improved symmetry, well-proportioned geometry, richly colored textures, and reduced distortion. The project page is available at <span><span>https://pkunliu.github.io/Isotropic3D</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115367"},"PeriodicalIF":7.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146025737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AGPL-KEM : Attribute-guided prompt learning with knowledge experts mixture for few-shot remote sensing image classification AGPL-KEM:基于属性引导的知识专家混合快速学习的少拍遥感图像分类
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1016/j.knosys.2026.115375
Chunlei Wu , Congzheng Zhu , Qinfu Xu , Xu Liu , Yongzhen Zhang , Leiquan Wang , Jie Wu
Large-scale vision-language models (VLMs) have shown significant success in various computer vision tasks. However, adapting VLMs to remote sensing (RS) tasks remains challenging due to the distinct characteristics of RS imagery, such as spectral heterogeneity, fine-grained textures, and complex structural layouts. Existing methods attempt to encode diverse RS attributes into a unified latent space, but this implicit encoding strategy often leads to attribute conflation, undermining generalization under domain shifts. To address these limitations, we propose Attribute-Guided Prompt Learning with Knowledge Experts Mixture (AGPL-KEM), a prompt learning framework that explicitly disentangles RS semantics through structured domain knowledge. Specifically, AGPL-KEM introduces a Knowledge Experts Mixture module to partition the latent space into attribute-specific subspaces, thereby enhancing the model’s ability to capture and separate key RS attributes. To promote attribute-specific learning and reduce inter-expert redundancy, we design an Attribute-Guided Dual-Loss mechanism comprising an Attribute-Guided Semantic Alignment Loss for expert-attribute consistency and an Expert Semantic Orthogonality Loss that reduces semantic redundancy among experts through orthogonality constraints. Comprehensive experiments conducted on four remote sensing benchmark datasets (PatternNet, RSICD, RESISC45, and MLRSNet) demonstrate that AGPL-KEM achieves state-of-the-art performance, validating its effectiveness and robustness. Codes are available at https://github.com/4wlb/AGPL-KEM.
大规模视觉语言模型(VLMs)在各种计算机视觉任务中取得了显著的成功。然而,由于遥感图像具有光谱非均质性、细粒纹理和复杂的结构布局等特点,使VLMs适应遥感任务仍然具有挑战性。现有方法试图将多种RS属性编码到统一的潜在空间中,但这种隐式编码策略往往导致属性合并,不利于域移位下的泛化。为了解决这些限制,我们提出了基于知识专家混合的属性引导提示学习(AGPL-KEM),这是一个通过结构化领域知识明确地解开RS语义的提示学习框架。具体而言,AGPL-KEM引入了Knowledge Experts Mixture模块,将潜在空间划分为特定属性的子空间,从而增强了模型捕获和分离关键RS属性的能力。为了促进特定属性学习和减少专家间冗余,我们设计了一种属性引导双损失机制,包括属性引导的专家-属性一致性语义对齐损失和专家语义正交性损失,通过正交性约束减少专家间的语义冗余。在四个遥感基准数据集(PatternNet、RSICD、RESISC45和MLRSNet)上进行的综合实验表明,AGPL-KEM达到了最先进的性能,验证了其有效性和鲁棒性。代码可在https://github.com/4wlb/AGPL-KEM上获得。
{"title":"AGPL-KEM : Attribute-guided prompt learning with knowledge experts mixture for few-shot remote sensing image classification","authors":"Chunlei Wu ,&nbsp;Congzheng Zhu ,&nbsp;Qinfu Xu ,&nbsp;Xu Liu ,&nbsp;Yongzhen Zhang ,&nbsp;Leiquan Wang ,&nbsp;Jie Wu","doi":"10.1016/j.knosys.2026.115375","DOIUrl":"10.1016/j.knosys.2026.115375","url":null,"abstract":"<div><div>Large-scale vision-language models (VLMs) have shown significant success in various computer vision tasks. However, adapting VLMs to remote sensing (RS) tasks remains challenging due to the distinct characteristics of RS imagery, such as spectral heterogeneity, fine-grained textures, and complex structural layouts. Existing methods attempt to encode diverse RS attributes into a unified latent space, but this implicit encoding strategy often leads to attribute conflation, undermining generalization under domain shifts. To address these limitations, we propose Attribute-Guided Prompt Learning with Knowledge Experts Mixture (AGPL-KEM), a prompt learning framework that explicitly disentangles RS semantics through structured domain knowledge. Specifically, AGPL-KEM introduces a Knowledge Experts Mixture module to partition the latent space into attribute-specific subspaces, thereby enhancing the model’s ability to capture and separate key RS attributes. To promote attribute-specific learning and reduce inter-expert redundancy, we design an Attribute-Guided Dual-Loss mechanism comprising an Attribute-Guided Semantic Alignment Loss for expert-attribute consistency and an Expert Semantic Orthogonality Loss that reduces semantic redundancy among experts through orthogonality constraints. Comprehensive experiments conducted on four remote sensing benchmark datasets (PatternNet, RSICD, RESISC45, and MLRSNet) demonstrate that AGPL-KEM achieves state-of-the-art performance, validating its effectiveness and robustness. Codes are available at <span><span>https://github.com/4wlb/AGPL-KEM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115375"},"PeriodicalIF":7.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IGPC-MSOS: A knowledge-preserving transfer learning framework with dynamic mode-switching for handling concept drift in network intrusion detection systems IGPC-MSOS:一种具有动态模式切换的知识保留迁移学习框架,用于处理网络入侵检测系统中的概念漂移
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1016/j.knosys.2026.115361
Methaq A. Shyaa , Noor Farizah Ibrahim , Zurinahni Binti Zainol , Rosni Abdullah , Mohammed Anbar , Laith Alzubaidi
The rapid evolution of cyber threats poses significant challenges to Intrusion Detection Systems (IDS), particularly in dynamic environments affected by concept drift, where shifting attack behaviors degrade long-term detection performance. Existing adaptive IDS solutions often remain limited by fragmented drift-handling mechanisms, weak knowledge retention, and insufficient integration of complementary learning strategies, leaving exploitable blind spots. This paper introduces a unified adaptive IDS framework based on a mode-switching architecture that integrates Online Sequential Extreme Learning Machine (OSELM), Feature-Adaptive OSELM (FA-OSELM), and Knowledge-Preserving OSELM (KP-OSELM). The proposed Incremental Genetic Programming Combiner with Mode-Switching Online Sequential (IGPC-MSOS) method dynamically selects the most effective operational mode according to detected drift patterns and real-time performance feedback. Experimental evaluations across five benchmark datasets demonstrate that IGPC-MSOS consistently achieves 96%–100% recall, delivers competitive or superior F1-scores (0.96–0.9995), and reduces inference latency compared to the State-of-the-Art Approaches. These results confirm the strong adaptability, robustness, and real-time suitability of the proposed approach for intrusion detection in evolving and high-throughput network environments.
网络威胁的快速演变给入侵检测系统(IDS)带来了重大挑战,特别是在受概念漂移影响的动态环境中,攻击行为的变化会降低长期检测性能。现有的自适应IDS解决方案往往受到碎片化的漂移处理机制、较弱的知识保留和互补学习策略集成不足的限制,留下可利用的盲点。本文介绍了一种基于模式切换体系结构的统一自适应入侵检测系统框架,该框架集成了在线顺序极限学习机(OSELM)、特征自适应OSELM (FA-OSELM)和知识保持OSELM (KP-OSELM)。提出了基于模式切换在线序列(IGPC-MSOS)的增量遗传规划组合方法,该方法根据检测到的漂移模式和实时性能反馈动态选择最有效的工作模式。五个基准数据集的实验评估表明,与最先进的方法相比,IGPC-MSOS始终达到96%-100%的召回率,提供具有竞争力或更高的f1分数(0.96-0.9995),并减少了推理延迟。这些结果证实了该方法在不断发展和高吞吐量的网络环境中具有很强的适应性、鲁棒性和实时性。
{"title":"IGPC-MSOS: A knowledge-preserving transfer learning framework with dynamic mode-switching for handling concept drift in network intrusion detection systems","authors":"Methaq A. Shyaa ,&nbsp;Noor Farizah Ibrahim ,&nbsp;Zurinahni Binti Zainol ,&nbsp;Rosni Abdullah ,&nbsp;Mohammed Anbar ,&nbsp;Laith Alzubaidi","doi":"10.1016/j.knosys.2026.115361","DOIUrl":"10.1016/j.knosys.2026.115361","url":null,"abstract":"<div><div>The rapid evolution of cyber threats poses significant challenges to Intrusion Detection Systems (IDS), particularly in dynamic environments affected by concept drift, where shifting attack behaviors degrade long-term detection performance. Existing adaptive IDS solutions often remain limited by fragmented drift-handling mechanisms, weak knowledge retention, and insufficient integration of complementary learning strategies, leaving exploitable blind spots. This paper introduces a unified adaptive IDS framework based on a mode-switching architecture that integrates Online Sequential Extreme Learning Machine (OSELM), Feature-Adaptive OSELM (FA-OSELM), and Knowledge-Preserving OSELM (KP-OSELM). The proposed Incremental Genetic Programming Combiner with Mode-Switching Online Sequential (IGPC-MSOS) method dynamically selects the most effective operational mode according to detected drift patterns and real-time performance feedback. Experimental evaluations across five benchmark datasets demonstrate that IGPC-MSOS consistently achieves 96%–100% recall, delivers competitive or superior F1-scores (0.96–0.9995), and reduces inference latency compared to the State-of-the-Art Approaches. These results confirm the strong adaptability, robustness, and real-time suitability of the proposed approach for intrusion detection in evolving and high-throughput network environments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115361"},"PeriodicalIF":7.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FreqMambaMark: Wavelet-Mamba-driven robust medical image watermarking FreqMambaMark:小波mamba驱动的鲁棒医学图像水印
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-20 DOI: 10.1016/j.knosys.2026.115344
Zhongxiang He , Yuling Chen , Yixian Yang , Zhi Ouyang , Yun Luo , Long Chen
Digital watermarking technology embeds authentication information into medical images to ensure data authenticity, integrity, and traceability. However, subtle textures and grayscale variations in medical images are crucial for diagnosis. Inappropriate watermark embedding may interfere with clinical interpretation, posing potential risks to diagnostic accuracy. Meanwhile, existing methods struggle to balance watermark imperceptibility and robustness against composite attacks.
To address these challenges, we propose FreqMambaMark, a robust watermarking framework for medical images based on the Mamba architecture in the wavelet domain. The framework decomposes image frequency bands using the Haar wavelet, employs an adaptive residual hiding strategy, and utilizes a convolutional neural network (CNN) for fine-grained watermark embedding, achieving high-fidelity results (PSNR: 43.8 dB, SSIM: 0.971). In watermark extraction, the long-range modeling of the Mamba state-space model enhances the recovery ability of low-frequency components, improving watermark robustness under complex distortions by 10.8%. Furthermore, we introduce a dynamic composite attack training paradigm and validate the framework’s generalization ability on the NIH Chest X-ray, Brain-Tumor-MRI, and COVIDx CT-3 datasets, providing an efficient solution for medical image security.
数字水印技术将认证信息嵌入到医学图像中,确保数据的真实性、完整性和可追溯性。然而,医学图像中的细微纹理和灰度变化对诊断至关重要。不适当的水印嵌入可能会干扰临床解释,对诊断的准确性构成潜在风险。同时,现有的方法难以平衡水印不可感知性和抗复合攻击的鲁棒性。为了解决这些挑战,我们提出了一种基于小波域曼巴结构的医学图像鲁棒水印框架FreqMambaMark。该框架利用Haar小波对图像频带进行分解,采用自适应残差隐藏策略,并利用卷积神经网络(CNN)进行细粒度水印嵌入,获得了高保真效果(PSNR: 43.8 dB, SSIM: 0.971)。在水印提取中,曼巴状态空间模型的远程建模增强了低频分量的恢复能力,使水印在复杂失真条件下的鲁棒性提高了10.8%。此外,我们引入了动态复合攻击训练范式,并在NIH胸部x射线、脑瘤- mri和covid -3数据集上验证了框架的泛化能力,为医学图像安全提供了有效的解决方案。
{"title":"FreqMambaMark: Wavelet-Mamba-driven robust medical image watermarking","authors":"Zhongxiang He ,&nbsp;Yuling Chen ,&nbsp;Yixian Yang ,&nbsp;Zhi Ouyang ,&nbsp;Yun Luo ,&nbsp;Long Chen","doi":"10.1016/j.knosys.2026.115344","DOIUrl":"10.1016/j.knosys.2026.115344","url":null,"abstract":"<div><div>Digital watermarking technology embeds authentication information into medical images to ensure data authenticity, integrity, and traceability. However, subtle textures and grayscale variations in medical images are crucial for diagnosis. Inappropriate watermark embedding may interfere with clinical interpretation, posing potential risks to diagnostic accuracy. Meanwhile, existing methods struggle to balance watermark imperceptibility and robustness against composite attacks.</div><div>To address these challenges, we propose <em>FreqMambaMark</em>, a robust watermarking framework for medical images based on the Mamba architecture in the wavelet domain. The framework decomposes image frequency bands using the Haar wavelet, employs an adaptive residual hiding strategy, and utilizes a convolutional neural network (CNN) for fine-grained watermark embedding, achieving high-fidelity results (PSNR: 43.8 dB, SSIM: 0.971). In watermark extraction, the long-range modeling of the Mamba state-space model enhances the recovery ability of low-frequency components, improving watermark robustness under complex distortions by 10.8%. Furthermore, we introduce a dynamic composite attack training paradigm and validate the framework’s generalization ability on the NIH Chest X-ray, Brain-Tumor-MRI, and COVIDx CT-3 datasets, providing an efficient solution for medical image security.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"337 ","pages":"Article 115344"},"PeriodicalIF":7.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146025753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1