首页 > 最新文献

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

英文 中文
Video Frame Interpolation via Appearance-Based Intermediate Flow Estimation 基于外观的中间流估计视频帧插值。
IF 13.7 Pub Date : 2026-02-26 DOI: 10.1109/TIP.2026.3666772
Keyi Chen;Jingwei Xin;Nannan Wang;Jie Li;Xinbo Gao
Intermediate flow estimation is an important part of video frame interpolation (VFI). Most previous works use interpolation to derive the intermediate flow assuming localized linear motion. However, this method is not effective when dealing with extreme motions. In this work, we assume that the motion trajectory of an object is determined by the appearance characteristics of this object. Based on this assumption, we propose a new intermediate flow estimation method, which obtains the motion features of intermediate frames from image appearance and inter-frame motion features. In addition, in order to fully extract the inter-frame features, we rethink the difference of VFI and previous works on using Swin-Transformer and compute the appearance features and motion features within the adaptive neighborhood by cyclically shifting the window. Experimental results show that our method achieves state-of-the-art performance on different datasets for both fixed-time and arbitrary-time interpolation. Moreover, our proposed method outperforms models that require inputting a sequence of four frames when handling videos with extremely large motion. The source code is available from https://github.com/chen12304/IFE-VFI
中间流估计是视频帧插值(VFI)的重要组成部分。以往的工作大多采用插值法来推导假定局部线性运动的中间流。然而,这种方法在处理极端运动时并不有效。在这项工作中,我们假设一个物体的运动轨迹是由这个物体的外观特征决定的。基于这一假设,我们提出了一种新的中间流估计方法,该方法从图像外观和帧间运动特征中获取中间帧的运动特征。此外,为了充分提取帧间特征,我们重新思考了VFI与以往使用swing - transformer的不同之处,通过循环移动窗口来计算自适应邻域内的外观特征和运动特征。实验结果表明,我们的方法在固定时间和任意时间插值的不同数据集上都达到了最先进的性能。此外,在处理具有极大运动的视频时,我们提出的方法优于需要输入四帧序列的模型。源代码可从https://github.com/chen12304/IFE-VFI获得。
{"title":"Video Frame Interpolation via Appearance-Based Intermediate Flow Estimation","authors":"Keyi Chen;Jingwei Xin;Nannan Wang;Jie Li;Xinbo Gao","doi":"10.1109/TIP.2026.3666772","DOIUrl":"10.1109/TIP.2026.3666772","url":null,"abstract":"Intermediate flow estimation is an important part of video frame interpolation (VFI). Most previous works use interpolation to derive the intermediate flow assuming localized linear motion. However, this method is not effective when dealing with extreme motions. In this work, we assume that the motion trajectory of an object is determined by the appearance characteristics of this object. Based on this assumption, we propose a new intermediate flow estimation method, which obtains the motion features of intermediate frames from image appearance and inter-frame motion features. In addition, in order to fully extract the inter-frame features, we rethink the difference of VFI and previous works on using Swin-Transformer and compute the appearance features and motion features within the adaptive neighborhood by cyclically shifting the window. Experimental results show that our method achieves state-of-the-art performance on different datasets for both fixed-time and arbitrary-time interpolation. Moreover, our proposed method outperforms models that require inputting a sequence of four frames when handling videos with extremely large motion. The source code is available from <uri>https://github.com/chen12304/IFE-VFI</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"2335-2349"},"PeriodicalIF":13.7,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147313731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAMind: Zero-shot Visual Cross-Domain Alignment and Representation for EEG Decoding. 脑电解码的零镜头视觉跨域对齐与表示。
IF 13.7 Pub Date : 2026-02-26 DOI: 10.1109/TIP.2026.3666730
Haodong Jing, Yongqiang Ma, Panqi Yang, Haoyu Li, Shuai Huang, Badong Chen, Nanning Zheng

To efficiently assist humans in various tasks, it is crucial to accurately decode and understand the rich information embedded in brain's visual cognition. Existing brain-driven research often fails to overcome the challenge of small target data domains, and the lack of explicit semantic, spatial, and other information constraints on feature extractors prevents brain decoding models from learning uniform cross-domain representations, leading to degradation of their performance in unseen domains. To overcome these limitations, we propose DAMind, a multimodal EEG-based model for robust visual cross-domain alignment and decoding. Our approach integrates VLM with brain-inspired cognitive mechanisms, leveraging the strong image-text representation abilities to learn both fine-grained primary visual features and high-level semantic concepts from neural signals, provide effective visual fine-tuning using the visual guidance mechanism. DAMind introduces a stepwise EEG encoding process aligned with visual processing, and employs an instruction-based learning strategy for effective cross-domain zero-shot transfer. Its robust architecture efficiently achieves good generalization performance, enabling the mapping of EEG signals from multiple domains to a unified learning domain. We construct a comprehensive EEG decoding benchmark EBench, DAMind achieves state-of-the-art results on several visual tasks, and outperforms the baseline in zero-shot setting.

为了有效地协助人类完成各种任务,准确解码和理解大脑视觉认知中蕴含的丰富信息至关重要。现有的脑驱动研究往往无法克服小目标数据域的挑战,而且特征提取器缺乏明确的语义、空间和其他信息约束,导致脑解码模型无法学习统一的跨域表示,从而导致其在未知域的性能下降。为了克服这些限制,我们提出了DAMind,一个基于多模态脑电图的鲁棒视觉跨域对齐和解码模型。我们的方法将VLM与大脑启发的认知机制相结合,利用强大的图像-文本表示能力从神经信号中学习细粒度的主要视觉特征和高级语义概念,利用视觉引导机制提供有效的视觉微调。DAMind引入了一种与视觉处理相结合的逐步脑电图编码过程,并采用基于指令的学习策略进行有效的跨域零点迁移。其鲁棒性架构有效地实现了良好的泛化性能,实现了脑电信号从多个域到统一学习域的映射。我们构建了一个全面的EEG解码基准EBench, DAMind在几个视觉任务上取得了最先进的结果,并且在零射击设置下优于基线。
{"title":"DAMind: Zero-shot Visual Cross-Domain Alignment and Representation for EEG Decoding.","authors":"Haodong Jing, Yongqiang Ma, Panqi Yang, Haoyu Li, Shuai Huang, Badong Chen, Nanning Zheng","doi":"10.1109/TIP.2026.3666730","DOIUrl":"https://doi.org/10.1109/TIP.2026.3666730","url":null,"abstract":"<p><p>To efficiently assist humans in various tasks, it is crucial to accurately decode and understand the rich information embedded in brain's visual cognition. Existing brain-driven research often fails to overcome the challenge of small target data domains, and the lack of explicit semantic, spatial, and other information constraints on feature extractors prevents brain decoding models from learning uniform cross-domain representations, leading to degradation of their performance in unseen domains. To overcome these limitations, we propose DAMind, a multimodal EEG-based model for robust visual cross-domain alignment and decoding. Our approach integrates VLM with brain-inspired cognitive mechanisms, leveraging the strong image-text representation abilities to learn both fine-grained primary visual features and high-level semantic concepts from neural signals, provide effective visual fine-tuning using the visual guidance mechanism. DAMind introduces a stepwise EEG encoding process aligned with visual processing, and employs an instruction-based learning strategy for effective cross-domain zero-shot transfer. Its robust architecture efficiently achieves good generalization performance, enabling the mapping of EEG signals from multiple domains to a unified learning domain. We construct a comprehensive EEG decoding benchmark EBench, DAMind achieves state-of-the-art results on several visual tasks, and outperforms the baseline in zero-shot setting.</p>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"PP ","pages":""},"PeriodicalIF":13.7,"publicationDate":"2026-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147313714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMTrack: State-Aware Mamba for Efficient Temporal Modeling in Visual Tracking SMTrack:状态感知曼巴在视觉跟踪中的有效时间建模。
IF 13.7 Pub Date : 2026-02-23 DOI: 10.1109/TIP.2026.3661393
Yinchao Ma;Dengqing Yang;Zhangyu He;Wenfei Yang;Tianzhu Zhang
Visual tracking aims to automatically estimate the state of a target object in a video sequence, which is challenging especially in dynamic scenarios. Thus, numerous methods are proposed to introduce temporal cues to enhance tracking robustness. However, conventional CNN and Transformer architectures exhibit inherent limitations in modeling long-range temporal dependencies in visual tracking, often necessitating either complex customized modules or substantial computational costs to integrate temporal cues. Inspired by the success of the state space model, we propose a novel temporal modeling paradigm for visual tracking, termed State-aware Mamba Tracker (SMTrack), providing a neat pipeline for training and tracking without needing customized modules or substantial computational costs to build long-range temporal dependencies. It enjoys several merits. First, we propose a novel selective state-aware space model with state-wise parameters to capture more diverse temporal cues for robust tracking. Second, SMTrack facilitates long-range temporal interactions with linear computational complexity during training. Third, SMTrack enables each frame to interact with previously tracked frames via hidden state propagation and updating, which releases computational costs of handling temporal cues during tracking. Extensive experimental results demonstrate that SMTrack achieves promising performance with low computational costs.
视觉跟踪旨在自动估计视频序列中目标物体的状态,这在动态场景下尤其具有挑战性。因此,提出了许多方法来引入时间线索来增强跟踪的鲁棒性。然而,传统的CNN和Transformer架构在视觉跟踪中的远程时间依赖性建模方面存在固有的局限性,通常需要复杂的定制模块或大量的计算成本来集成时间线索。受状态空间模型成功的启发,我们提出了一种新的用于视觉跟踪的时间建模范式,称为状态感知曼巴跟踪器(SMTrack),它为训练和跟踪提供了一个整洁的管道,而不需要定制模块或大量的计算成本来构建远程时间依赖性。它有几个优点。首先,我们提出了一种具有状态参数的选择性状态感知空间模型,以捕获更多不同的时间线索进行鲁棒跟踪。其次,SMTrack在训练过程中促进了与线性计算复杂性的远程时间交互。第三,SMTrack通过隐藏状态传播和更新使每一帧与先前跟踪的帧交互,从而释放了在跟踪过程中处理时间线索的计算成本。大量的实验结果表明,SMTrack以较低的计算成本取得了良好的性能。
{"title":"SMTrack: State-Aware Mamba for Efficient Temporal Modeling in Visual Tracking","authors":"Yinchao Ma;Dengqing Yang;Zhangyu He;Wenfei Yang;Tianzhu Zhang","doi":"10.1109/TIP.2026.3661393","DOIUrl":"10.1109/TIP.2026.3661393","url":null,"abstract":"Visual tracking aims to automatically estimate the state of a target object in a video sequence, which is challenging especially in dynamic scenarios. Thus, numerous methods are proposed to introduce temporal cues to enhance tracking robustness. However, conventional CNN and Transformer architectures exhibit inherent limitations in modeling long-range temporal dependencies in visual tracking, often necessitating either complex customized modules or substantial computational costs to integrate temporal cues. Inspired by the success of the state space model, we propose a novel temporal modeling paradigm for visual tracking, termed State-aware Mamba Tracker (SMTrack), providing a neat pipeline for training and tracking without needing customized modules or substantial computational costs to build long-range temporal dependencies. It enjoys several merits. First, we propose a novel selective state-aware space model with state-wise parameters to capture more diverse temporal cues for robust tracking. Second, SMTrack facilitates long-range temporal interactions with linear computational complexity during training. Third, SMTrack enables each frame to interact with previously tracked frames via hidden state propagation and updating, which releases computational costs of handling temporal cues during tracking. Extensive experimental results demonstrate that SMTrack achieves promising performance with low computational costs.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"2249-2261"},"PeriodicalIF":13.7,"publicationDate":"2026-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147277979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Source-Free Domain Adaptation From Non-Robust Source Models 基于非鲁棒源模型的鲁棒无源域自适应
IF 13.7 Pub Date : 2026-02-20 DOI: 10.1109/TIP.2026.3661392
Yao Xiao;Pengxu Wei;Guangrun Wang;Cong Liu;Liang Lin
A few recent works attempt to train an adversarially robust Unsupervised Domain Adaptation (UDA) model, transferring the robustness from a robust source model or other robust pre-trained models to an unlabeled target domain. However, it is usually impractical to assume the availability of robust source models or robust pre-training, and meanwhile, source data are not always accessible or efficient for adaptation training in many real-world scenarios. In this paper, we dive into a more practical and challenging problem of robust source-free domain adaptation: can we train a robust model on an unlabeled target domain given only a non-robust source model (without source data)? Empirically, we find that applying adversarial training (AT) to the self-supervised adaptation process leads to severe model degradation, as it tends to amplify the inevitable errors of UDA models. To tackle this issue, we propose a novel approach called Source-Free Alternating Optimization (SFAO), which employs a non-robust target model to provide better guidance for the AT of the desired robust target model. The two models are trained in an alternating manner to minimize the discrepancy between the clean source domain and the adversarial target domain. Moreover, we propose Softly-Constrained Adversarial Training (SCAT) to further mitigate the adverse effects of incorrect pseudo-labels in AT. Extensive experimental results demonstrate that the proposed method significantly improves the model performance on both clean and adversarial data. Source code is available at: https://github.com/Coxy7/robust-SFDA.
最近的一些工作试图训练一个对抗鲁棒的无监督域自适应(UDA)模型,将鲁棒性从鲁棒源模型或其他鲁棒预训练模型转移到未标记的目标域。然而,假设具有鲁棒源模型或鲁棒预训练通常是不切实际的,同时,在许多现实场景中,源数据并不总是可访问或有效地用于适应性训练。在本文中,我们深入研究了一个更实际和更具挑战性的鲁棒无源域自适应问题:我们能否在仅给定非鲁棒源模型(没有源数据)的未标记目标域上训练鲁棒模型?经验上,我们发现将对抗训练(AT)应用于自监督适应过程会导致严重的模型退化,因为它往往会放大UDA模型不可避免的错误。为了解决这一问题,我们提出了一种称为无源交替优化(SFAO)的新方法,该方法采用非鲁棒目标模型为所需鲁棒目标模型的AT提供更好的指导。这两个模型以交替的方式进行训练,以最小化干净的源域和对抗的目标域之间的差异。此外,我们提出了软约束对抗训练(SCAT),以进一步减轻在AT中错误伪标签的不利影响。大量的实验结果表明,该方法显著提高了模型在干净数据和对抗数据上的性能。源代码可从https://github.com/Coxy7/robust-SFDA获得。
{"title":"Robust Source-Free Domain Adaptation From Non-Robust Source Models","authors":"Yao Xiao;Pengxu Wei;Guangrun Wang;Cong Liu;Liang Lin","doi":"10.1109/TIP.2026.3661392","DOIUrl":"10.1109/TIP.2026.3661392","url":null,"abstract":"A few recent works attempt to train an adversarially robust Unsupervised Domain Adaptation (UDA) model, transferring the robustness from a robust source model or other robust pre-trained models to an unlabeled target domain. However, it is usually impractical to assume the availability of robust source models or robust pre-training, and meanwhile, source data are not always accessible or efficient for adaptation training in many real-world scenarios. In this paper, we dive into a more practical and challenging problem of robust source-free domain adaptation: can we train a robust model on an unlabeled target domain given only a non-robust source model (without source data)? Empirically, we find that applying adversarial training (AT) to the self-supervised adaptation process leads to severe model degradation, as it tends to amplify the inevitable errors of UDA models. To tackle this issue, we propose a novel approach called Source-Free Alternating Optimization (SFAO), which employs a non-robust target model to provide better guidance for the AT of the desired robust target model. The two models are trained in an alternating manner to minimize the discrepancy between the clean source domain and the adversarial target domain. Moreover, we propose Softly-Constrained Adversarial Training (<monospace>SCAT</monospace>) to further mitigate the adverse effects of incorrect pseudo-labels in AT. Extensive experimental results demonstrate that the proposed method significantly improves the model performance on both clean and adversarial data. Source code is available at: <uri>https://github.com/Coxy7/robust-SFDA</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"2350-2363"},"PeriodicalIF":13.7,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146230864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partially Supervised Compositional Zero-Shot Learning by Class-Balanced Distribution Alignment 基于类平衡分布对齐的部分监督作文零射击学习
IF 13.7 Pub Date : 2026-02-20 DOI: 10.1109/TIP.2026.3664762
Aditya Panda;Dipti Prasad Mukherjee
The partially supervised Compositional Zero-Shot Learning (pCZSL) recognizes new compositions of states and objects, where for every image in the training set either the state or the object annotation is available. In pCZSL, features of a state vary depending on the object in the composition (e.g. the features of state ripe are different for ripe banana and ripe apple). Understanding the variation in features across scales of objects is also a key challenge. In the proposed architecture, a swin transformer based Hierarchical Feature Extractor (HFE) captures the large range of semantic interactions between state and object features. The Discriminative Context Aggregation module utilizes features from the intermediate layers of the HFE to understand the features of object at their corresponding scales. To leverage the partially labeled data in pCZSL, we pass strongly and weakly augmented versions of the input image to the proposed architecture. The predicted class probabilities for strongly and weakly augmented images are encouraged to be similar, minimizing a distribution alignment loss. This loss incorporates class specific re-weighting approach to alleviate the effect of data imbalance for pCZSL. Extensive experiments on three benchmark datasets demonstrate the superiority of the proposed approach.
部分监督组合零射击学习(pCZSL)识别状态和对象的新组合,其中对于训练集中的每个图像,状态或对象注释都是可用的。在pCZSL中,一种状态的特征根据组合物中的对象而变化(例如熟香蕉和熟苹果的状态成熟特征是不同的)。理解物体在不同尺度上的特征变化也是一个关键的挑战。在提出的体系结构中,基于swin变压器的分层特征提取器(HFE)捕获了状态和对象特征之间的大范围语义交互。判别上下文聚合模块利用HFE中间层的特征来理解对象在相应尺度上的特征。为了利用pCZSL中部分标记的数据,我们将输入图像的强增强和弱增强版本传递给所提议的体系结构。强增强和弱增强图像的预测类概率被鼓励相似,以最小化分布对齐损失。这种损失结合了类特定的重加权方法来减轻pCZSL数据不平衡的影响。在三个基准数据集上的大量实验证明了该方法的优越性。
{"title":"Partially Supervised Compositional Zero-Shot Learning by Class-Balanced Distribution Alignment","authors":"Aditya Panda;Dipti Prasad Mukherjee","doi":"10.1109/TIP.2026.3664762","DOIUrl":"10.1109/TIP.2026.3664762","url":null,"abstract":"The partially supervised Compositional Zero-Shot Learning (pCZSL) recognizes <italic>new</i> compositions of states and objects, where for every image in the training set either the state or the object annotation is available. In pCZSL, features of a state vary depending on the object in the composition (e.g. the features of state <italic>ripe</i> are different for <italic>ripe banana</i> and <italic>ripe apple</i>). Understanding the variation in features across scales of objects is also a key challenge. In the proposed architecture, a <italic>swin</i> transformer based Hierarchical Feature Extractor (HFE) captures the large range of semantic interactions between state and object features. The Discriminative Context Aggregation module utilizes features from the intermediate layers of the HFE to understand the features of object at their corresponding scales. To leverage the partially labeled data in pCZSL, we pass <italic>strongly</i> and <italic>weakly</i> augmented versions of the input image to the proposed architecture. The predicted class probabilities for <italic>strongly</i> and <italic>weakly</i> augmented images are encouraged to be similar, minimizing a <italic>distribution alignment</i> loss. This loss incorporates class specific re-weighting approach to alleviate the effect of data imbalance for pCZSL. Extensive experiments on three benchmark datasets demonstrate the superiority of the proposed approach.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"2484-2498"},"PeriodicalIF":13.7,"publicationDate":"2026-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146230866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Domain Adaptive Object Detection via Semantic Consistency and Compactness Learning 基于语义一致性和紧凑性学习的无监督域自适应目标检测。
IF 13.7 Pub Date : 2026-02-19 DOI: 10.1109/TIP.2026.3663935
Yajing Liu;Zhen Zhang;Yiming Su;Chunhui Hao;Xiyao Liu;Jiandong Tian
Unsupervised domain adaptive object detection methods enhance model robustness in the target domain without requiring target-domain annotations. Despite notable progress, existing methods face two major challenges: 1) insufficient and inefficient learning of holistic feature consistency due to cumbersome pixel-level style matching and semantic discrepancy elimination between domains as well as the overlooking of their collaborative effect; and 2) unreliable learning of category feature compactness caused by poor-quality target-domain samples, inaccurate pseudo-labels and noisy cross-domain contrast paradigms. To address these challenges, we propose a novel Semantic Consistency and Compactness Learning (SCCL) network. For consistency learning, we introduce a Visual Adaptation-guided Semantic Alignment (VSA) module that achieves style matching through simple feature adaptation and incorporates a novel adversarial-free self-supervised method for feature disentanglement. The collaboration between these two aspects enables sufficient and efficient consistency learning. For reliable compactness learning, we develop a plug-and-play Instance Center-Contrastive (ICC) head that, for the first time, comprehensively addresses all three potential causes of unreliable learning through three integrated innovations, concerning sample pseudo-label quality enhancement, reliable sample storage and updating, and a robust sample contrast paradigm. Besides, the mutual reinforcement effect of VSA and ICC simultaneously enhances feature transferability and discriminability. Extensive experiments across four UDA object detection benchmarks with two baselines show that SCCL achieves superior adaptability and robustness. Code will be available at https://github.com/TooZE23/SCCL.
无监督域自适应目标检测方法在不需要目标域注释的情况下增强了模型在目标域的鲁棒性。尽管取得了显著进展,但现有方法仍面临两大挑战:1)由于像素级风格匹配和领域间语义差异消除过于繁琐,忽略了它们的协同效应,导致整体特征一致性学习不足且效率低下;2)由于目标域样本质量差、伪标签不准确以及跨域对比范式存在噪声,导致类别特征紧凑性学习不可靠。为了解决这些挑战,我们提出了一种新的语义一致性和紧凑性学习(SCCL)网络。对于一致性学习,我们引入了视觉自适应引导语义对齐(VSA)模块,该模块通过简单的特征自适应实现风格匹配,并结合了一种新的无对抗性的自监督特征解纠缠方法。这两个方面的协作使一致性学习充分有效。为了可靠的紧凑性学习,我们开发了一个即插即用的实例中心对比(ICC)头,该头首次通过三个集成创新全面解决了不可靠学习的所有三个潜在原因,涉及样本伪标签质量增强,可靠的样本存储和更新,以及稳健的样本对比范例。此外,VSA和ICC的相互强化效应同时增强了特征的可转移性和可分辨性。在四个UDA目标检测基准和两个基线上进行的大量实验表明,SCCL具有优越的适应性和鲁棒性。
{"title":"Unsupervised Domain Adaptive Object Detection via Semantic Consistency and Compactness Learning","authors":"Yajing Liu;Zhen Zhang;Yiming Su;Chunhui Hao;Xiyao Liu;Jiandong Tian","doi":"10.1109/TIP.2026.3663935","DOIUrl":"10.1109/TIP.2026.3663935","url":null,"abstract":"Unsupervised domain adaptive object detection methods enhance model robustness in the target domain without requiring target-domain annotations. Despite notable progress, existing methods face two major challenges: 1) insufficient and inefficient learning of holistic feature consistency due to cumbersome pixel-level style matching and semantic discrepancy elimination between domains as well as the overlooking of their collaborative effect; and 2) unreliable learning of category feature compactness caused by poor-quality target-domain samples, inaccurate pseudo-labels and noisy cross-domain contrast paradigms. To address these challenges, we propose a novel Semantic Consistency and Compactness Learning (SCCL) network. For consistency learning, we introduce a Visual Adaptation-guided Semantic Alignment (VSA) module that achieves style matching through simple feature adaptation and incorporates a novel adversarial-free self-supervised method for feature disentanglement. The collaboration between these two aspects enables sufficient and efficient consistency learning. For reliable compactness learning, we develop a plug-and-play Instance Center-Contrastive (ICC) head that, for the first time, comprehensively addresses all three potential causes of unreliable learning through three integrated innovations, concerning sample pseudo-label quality enhancement, reliable sample storage and updating, and a robust sample contrast paradigm. Besides, the mutual reinforcement effect of VSA and ICC simultaneously enhances feature transferability and discriminability. Extensive experiments across four UDA object detection benchmarks with two baselines show that SCCL achieves superior adaptability and robustness. Code will be available at <uri>https://github.com/TooZE23/SCCL</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"2276-2291"},"PeriodicalIF":13.7,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146230223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scale-Invariant Feature Matching Network for V-D-T Few-Shot Semantic Segmentation V-D-T少镜头语义分割的尺度不变特征匹配网络。
IF 13.7 Pub Date : 2026-02-19 DOI: 10.1109/TIP.2026.3663882
Xiaofei Zhou;Jia Lin;Dongmei Chen;Deyang Liu;Jiyong Zhang;Runmin Cong
Multi-modal few-shot semantic segmentation (FSS) aims to perform dense prediction from multiple modality images including visible image, depth image, and thermal image with a few annotated samples. However, some efforts treat the three modality information equally, where they don’t incorporate the inherent differences among multiple modalities. Besides, the objects vary in size greatly, and the cutting-edge matching paradigms fail to establish an effective support-query connection. Therefore, we propose a novel scale-invariant feature matching network (i.e., SFM-Net), which consists of an encoder, a feature matching block, a feature elevation block, and a decoder, to conduct visible-depth-thermal (V-D-T) few-shot semantic segmentation. Firstly, in the encoder part, after the extraction of multi-level initial features, we fuse each level’s RGB feature and thermal feature, yielding the support features and the query features. Secondly, in the feature matching block, a pixel-to-patch cross-attention (PTPCA) module is deployed to explore the correlation between each level’s support feature and the query feature, where the pixel-to-patch pooling (PTP-pool) units are designed to build scale-invariant relationships, generating the coarse mask for the query image. Thirdly, in the feature elevation block, we employ the prior-related fusion (PF) module to integrate the depth image with a coarse mask via the cross-attention mechanism, yielding the enhanced coarse prediction result, which is further aggregated in a bottom-up way. Finally, in the decoder, we deploy a reverse attention (RA) unit to gradually explore the complementarity between object internal regions and spatial details, and further generate the final segmentation results via conventional convolution layers. Extensive experiments are conducted on the VDT-2048- $5^{i}$ dataset, and the results show that our model outperforms the state-of-the-art methods with a large margin.
多模态少镜头语义分割(Multi-modal few-shot semantic segmentation, FSS)的目的是利用少量带注释的样本,对包括可见图像、深度图像和热图像在内的多模态图像进行密集预测。然而,一些研究对三种情态信息一视同仁,没有考虑到多种情态之间的内在差异。此外,对象大小差异较大,前沿匹配范式无法建立有效的支持-查询连接。因此,我们提出了一种新的尺度不变特征匹配网络(即SFM-Net),该网络由编码器、特征匹配块、特征高程块和解码器组成,用于进行可见深度-热(V-D-T)少镜头语义分割。首先,在编码器部分,在提取多层次初始特征后,融合每一层的RGB特征和热特征,得到支持特征和查询特征;其次,在特征匹配块中,部署像素到补丁交叉关注(PTPCA)模块,探索每一层支持特征与查询特征之间的相关性,其中像素到补丁池(PTP-pool)单元构建尺度不变关系,生成查询图像的粗掩码。第三,在特征高程块中,采用先验相关融合(PF)模块,通过交叉注意机制将深度图像与粗掩模进行融合,得到增强的粗预测结果,并进一步自下而上进行聚合。最后,在解码器中,我们部署了一个反向注意(reverse attention, RA)单元,逐步探索目标内部区域与空间细节之间的互补性,并进一步通过常规卷积层生成最终的分割结果。在VDT-2048-5i数据集上进行了大量实验,结果表明我们的模型在很大程度上优于最先进的方法。
{"title":"Scale-Invariant Feature Matching Network for V-D-T Few-Shot Semantic Segmentation","authors":"Xiaofei Zhou;Jia Lin;Dongmei Chen;Deyang Liu;Jiyong Zhang;Runmin Cong","doi":"10.1109/TIP.2026.3663882","DOIUrl":"10.1109/TIP.2026.3663882","url":null,"abstract":"Multi-modal few-shot semantic segmentation (FSS) aims to perform dense prediction from multiple modality images including visible image, depth image, and thermal image with a few annotated samples. However, some efforts treat the three modality information equally, where they don’t incorporate the inherent differences among multiple modalities. Besides, the objects vary in size greatly, and the cutting-edge matching paradigms fail to establish an effective support-query connection. Therefore, we propose a novel scale-invariant feature matching network (i.e., SFM-Net), which consists of an encoder, a feature matching block, a feature elevation block, and a decoder, to conduct visible-depth-thermal (V-D-T) few-shot semantic segmentation. Firstly, in the encoder part, after the extraction of multi-level initial features, we fuse each level’s RGB feature and thermal feature, yielding the support features and the query features. Secondly, in the feature matching block, a pixel-to-patch cross-attention (PTPCA) module is deployed to explore the correlation between each level’s support feature and the query feature, where the pixel-to-patch pooling (PTP-pool) units are designed to build scale-invariant relationships, generating the coarse mask for the query image. Thirdly, in the feature elevation block, we employ the prior-related fusion (PF) module to integrate the depth image with a coarse mask via the cross-attention mechanism, yielding the enhanced coarse prediction result, which is further aggregated in a bottom-up way. Finally, in the decoder, we deploy a reverse attention (RA) unit to gradually explore the complementarity between object internal regions and spatial details, and further generate the final segmentation results via conventional convolution layers. Extensive experiments are conducted on the VDT-2048-<inline-formula> <tex-math>$5^{i}$ </tex-math></inline-formula> dataset, and the results show that our model outperforms the state-of-the-art methods with a large margin.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"2198-2209"},"PeriodicalIF":13.7,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146230241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Visual Semantics-Induced Human Motion Understanding With Large Language Models 基于大语言模型的时间视觉语义诱导人类动作理解。
IF 13.7 Pub Date : 2026-02-19 DOI: 10.1109/TIP.2026.3663857
Zheng Xing;Weibing Zhao
Unsupervised human motion segmentation (HMS) can be effectively achieved using subspace clustering techniques. However, traditional methods overlook the role of temporal semantic exploration in HMS. This paper explores the use of temporal vision semantics (TVS) derived from human motion sequences, leveraging the image-to-text capabilities of a large language model (LLM) to enhance subspace clustering performance. The core idea is to extract textual motion information from consecutive frames via LLM and incorporate this learned information into the subspace clustering framework. The primary challenge lies in learning TVS from human motion sequences using LLM and incorporating this information into subspace clustering. To address this, we determine whether consecutive frames depict the same motion by querying the LLM and subsequently learn temporal neighboring information based on its response. We then develop a TVS-integrated subspace clustering approach, incorporating subspace embedding with a temporal regularizer that induces each frame to share similar subspace embeddings with its temporal neighbors. Additionally, segmentation is performed based on subspace embedding with a temporal constraint that induces the grouping of each frame with its temporal neighbors. We also introduce a feedback-enabled framework that continuously optimizes subspace embedding based on the segmentation output. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches on four benchmark human motion datasets.
利用子空间聚类技术可以有效地实现无监督人体运动分割。然而,传统的方法忽略了时间语义探索在HMS中的作用。本文探讨了源自人类运动序列的时间视觉语义(TVS)的使用,利用大型语言模型(LLM)的图像到文本功能来增强子空间聚类性能。其核心思想是通过LLM从连续帧中提取文本运动信息,并将这些信息整合到子空间聚类框架中。主要的挑战在于使用LLM从人类运动序列中学习TVS,并将这些信息合并到子空间聚类中。为了解决这个问题,我们通过查询LLM来确定连续帧是否描绘了相同的运动,并随后根据其响应学习时间相邻信息。然后,我们开发了一种集成tvs的子空间聚类方法,将子空间嵌入与时间正则化器相结合,该正则化器诱导每个帧与其时间邻居共享相似的子空间嵌入。此外,分割是基于子空间嵌入的时间约束来执行的,该时间约束诱导每个帧与其时间邻居分组。我们还引入了一个支持反馈的框架,该框架基于分割输出不断优化子空间嵌入。实验结果表明,该方法在四个基准人体运动数据集上优于现有的最先进方法。
{"title":"Temporal Visual Semantics-Induced Human Motion Understanding With Large Language Models","authors":"Zheng Xing;Weibing Zhao","doi":"10.1109/TIP.2026.3663857","DOIUrl":"10.1109/TIP.2026.3663857","url":null,"abstract":"Unsupervised human motion segmentation (HMS) can be effectively achieved using subspace clustering techniques. However, traditional methods overlook the role of temporal semantic exploration in HMS. This paper explores the use of temporal vision semantics (TVS) derived from human motion sequences, leveraging the image-to-text capabilities of a large language model (LLM) to enhance subspace clustering performance. The core idea is to extract textual motion information from consecutive frames via LLM and incorporate this learned information into the subspace clustering framework. The primary challenge lies in learning TVS from human motion sequences using LLM and incorporating this information into subspace clustering. To address this, we determine whether consecutive frames depict the same motion by querying the LLM and subsequently learn temporal neighboring information based on its response. We then develop a TVS-integrated subspace clustering approach, incorporating subspace embedding with a temporal regularizer that induces each frame to share similar subspace embeddings with its temporal neighbors. Additionally, segmentation is performed based on subspace embedding with a temporal constraint that induces the grouping of each frame with its temporal neighbors. We also introduce a feedback-enabled framework that continuously optimizes subspace embedding based on the segmentation output. Experimental results demonstrate that the proposed method outperforms existing state-of-the-art approaches on four benchmark human motion datasets.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"2182-2197"},"PeriodicalIF":13.7,"publicationDate":"2026-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146230245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HoloQA: Full Reference Video Quality Assessor of Rendered Human Avatars in Virtual Reality HoloQA:在虚拟现实中渲染的人类化身的完整参考视频质量评估器。
IF 13.7 Pub Date : 2026-02-18 DOI: 10.1109/TIP.2026.3663930
Avinab Saha;Yu-Chih Chen;Christian Häne;Jean-Charles Bazin;Ioannis Katsavounidis;Alexandre Chapiro;Alan C. Bovik
We present HoloQA, a new state-of-the-art Full Reference Video Quality Assessment (VQA) model that was designed using principles of visual neuroscience, information theory, and self-supervised deep learning to accurately predict the quality of rendered digital human avatars in Virtual Reality (VR) and Augmented Reality (AR) systems. The growing adoption of VR/AR applications that aim to transmit digital human avatars over bandwidth-limited video networks has driven the need for VQA algorithms that better account for the kinds of distortions that reduce the quality of rendered and viewed avatars. As we will show, standard VQA models often fail to capture distortions unique to the rendering, transmission, and compression of videos containing human avatars. Towards solving this difficult problem, we adopt a multi-level Mixture-of-Experts approach. This involves computing distortion-aware perceptual features and high-level content-aware deep features that capture semantic attributes of human body avatars. The high-level features are computed using a self-supervised, pre-trained deep learning network. We show that HoloQA is able to achieve state-of-the-art performance on the recently introduced LIVE-Meta Rendered Human Avatar VQA database, demonstrating its efficacy in predicting the quality of rendered human avatars in VR. Furthermore, we demonstrate the competitive performance of HoloQA on other digital human avatar databases and on another synthetically generated video quality use case: cloud gaming. The code associated with this work will be made available on https://github.com/avinabsaha/HologramQAGitHub
我们提出了HoloQA,一种新的最先进的全参考视频质量评估(VQA)模型,该模型使用视觉神经科学,信息论和自我监督深度学习的原理设计,用于准确预测虚拟现实(VR)和增强现实(AR)系统中呈现的数字人类化身的质量。越来越多的VR/AR应用程序旨在通过带宽有限的视频网络传输数字人类化身,这推动了对VQA算法的需求,该算法可以更好地解释各种失真,从而降低渲染和观看的化身的质量。正如我们将展示的那样,标准的VQA模型通常无法捕捉到包含人类化身的视频的渲染、传输和压缩所特有的扭曲。为了解决这个难题,我们采用了多层次的专家混合方法。这包括计算扭曲感知的感知特征和捕获人体化身语义属性的高级内容感知深度特征。高级特征是使用自监督、预训练的深度学习网络计算的。我们展示了HoloQA能够在最近推出的LIVE-Meta渲染的人类化身VQA数据库上实现最先进的性能,证明了它在预测VR中渲染的人类化身质量方面的有效性。此外,我们还展示了HoloQA在其他数字人类化身数据库和另一个合成视频质量用例(云游戏)上的竞争性能。与此工作相关的代码将在GitHub上提供。
{"title":"HoloQA: Full Reference Video Quality Assessor of Rendered Human Avatars in Virtual Reality","authors":"Avinab Saha;Yu-Chih Chen;Christian Häne;Jean-Charles Bazin;Ioannis Katsavounidis;Alexandre Chapiro;Alan C. Bovik","doi":"10.1109/TIP.2026.3663930","DOIUrl":"10.1109/TIP.2026.3663930","url":null,"abstract":"We present HoloQA, a new state-of-the-art Full Reference Video Quality Assessment (VQA) model that was designed using principles of visual neuroscience, information theory, and self-supervised deep learning to accurately predict the quality of rendered digital human avatars in Virtual Reality (VR) and Augmented Reality (AR) systems. The growing adoption of VR/AR applications that aim to transmit digital human avatars over bandwidth-limited video networks has driven the need for VQA algorithms that better account for the kinds of distortions that reduce the quality of rendered and viewed avatars. As we will show, standard VQA models often fail to capture distortions unique to the rendering, transmission, and compression of videos containing human avatars. Towards solving this difficult problem, we adopt a multi-level Mixture-of-Experts approach. This involves computing distortion-aware perceptual features and high-level content-aware deep features that capture semantic attributes of human body avatars. The high-level features are computed using a self-supervised, pre-trained deep learning network. We show that HoloQA is able to achieve state-of-the-art performance on the recently introduced LIVE-Meta Rendered Human Avatar VQA database, demonstrating its efficacy in predicting the quality of rendered human avatars in VR. Furthermore, we demonstrate the competitive performance of HoloQA on other digital human avatar databases and on another synthetically generated video quality use case: cloud gaming. The code associated with this work will be made available on <uri>https://github.com/avinabsaha/HologramQAGitHub</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"2210-2223"},"PeriodicalIF":13.7,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent Fingerprint Quality Assessment for Criminal Investigations: A Benchmark Dataset and Method 用于刑事侦查的潜在指纹质量评估:一个基准数据集和方法。
IF 13.7 Pub Date : 2026-02-18 DOI: 10.1109/TIP.2026.3663898
Chao Huang;Jingxuan Zhang;Ye Zhang;Hao Wu;Peibei Cao;Zhihua Wang;Yang Yu;Xiaochun Cao
Fingerprint biometrics plays a crucial role in biometric identification, especially in applications such as criminal investigations. Although recent progress in recognition methodology has significantly enhanced automated fingerprint recognition, these systems still rely heavily on the quality of the input fingerprints. In criminal investigations, fingerprints are often of low quality due to their incidental deposition from natural oils and sweat, rather than being deliberately captured under controlled conditions. This degradation can significantly impact usability and identification accuracy, underscoring the need for effective Fingerprint Quality Assessment (FQA) methods. In this paper, we establish the Crime Scene Fingerprints quality assessment Dataset (CSFD-10k), the largest dataset of its kind, containing 11,500 fingerprint images from real criminal investigations. Of these, 10,000 samples are assigned Mean Opinion Scores (MOSs) for correlation testing, while the remaining 1,500 are labeled based on matching performance for generalizability testing. All labels are provided by frontline criminal police officers. Using this dataset, we propose a deep neural network-based Dual-Branch FQA (DB-FQA) framework that integrates image-level and edge-level features. The DB-FQA enhances ridge details by transforming raw grayscale fingerprints into edge maps using the Logical/Linear operator. A dual-branch network processes both the raw fingerprint and the edge map, and the Multi-scale Adaptive Cross feature Fusion (MACF) module fuses these features, guided by the edge map to highlight quality-related regions of interest. Extensive experiments demonstrate the robustness and superiority of our proposed method, offering substantial support for forensic fingerprint biometrics. The code and dataset are available at https://github.com/wzhsysu/FIQA.
指纹识别技术在生物特征识别中起着至关重要的作用,特别是在刑事调查等应用中。尽管近年来在识别方法方面取得了显著的进展,但这些系统仍然严重依赖于输入指纹的质量。在刑事调查中,指纹的质量往往较低,因为它们是偶然沉积在天然油脂和汗液中,而不是在受控条件下故意捕获的。这种退化会严重影响可用性和识别准确性,因此需要有效的指纹质量评估(FQA)方法。在本文中,我们建立了犯罪现场指纹质量评估数据集(CSFD-10k),这是同类数据集中最大的数据集,包含来自真实犯罪调查的11500张指纹图像。其中,10,000个样本被分配平均意见分数(MOSs)用于相关性测试,而其余1,500个样本则根据匹配性能进行标记,用于泛化测试。所有标签均由前线刑事警务人员提供。利用该数据集,我们提出了一个基于深度神经网络的双分支FQA (DB-FQA)框架,该框架集成了图像级和边缘级特征。DB-FQA通过使用逻辑/线性算子将原始灰度指纹转换为边缘图来增强山脊细节。双分支网络同时处理原始指纹和边缘图,多尺度自适应交叉特征融合(MACF)模块融合这些特征,在边缘图的引导下突出与质量相关的感兴趣区域。大量的实验证明了该方法的鲁棒性和优越性,为法医指纹生物识别提供了有力的支持。代码和数据集可从https://github.com/wzhsysu/FIQA获得。
{"title":"Latent Fingerprint Quality Assessment for Criminal Investigations: A Benchmark Dataset and Method","authors":"Chao Huang;Jingxuan Zhang;Ye Zhang;Hao Wu;Peibei Cao;Zhihua Wang;Yang Yu;Xiaochun Cao","doi":"10.1109/TIP.2026.3663898","DOIUrl":"10.1109/TIP.2026.3663898","url":null,"abstract":"Fingerprint biometrics plays a crucial role in biometric identification, especially in applications such as criminal investigations. Although recent progress in recognition methodology has significantly enhanced automated fingerprint recognition, these systems still rely heavily on the quality of the input fingerprints. In criminal investigations, fingerprints are often of low quality due to their incidental deposition from natural oils and sweat, rather than being deliberately captured under controlled conditions. This degradation can significantly impact usability and identification accuracy, underscoring the need for effective Fingerprint Quality Assessment (FQA) methods. In this paper, we establish the Crime Scene Fingerprints quality assessment Dataset (CSFD-10k), the largest dataset of its kind, containing 11,500 fingerprint images from real criminal investigations. Of these, 10,000 samples are assigned Mean Opinion Scores (MOSs) for correlation testing, while the remaining 1,500 are labeled based on matching performance for generalizability testing. All labels are provided by frontline criminal police officers. Using this dataset, we propose a deep neural network-based Dual-Branch FQA (DB-FQA) framework that integrates image-level and edge-level features. The DB-FQA enhances ridge details by transforming raw grayscale fingerprints into edge maps using the Logical/Linear operator. A dual-branch network processes both the raw fingerprint and the edge map, and the Multi-scale Adaptive Cross feature Fusion (MACF) module fuses these features, guided by the edge map to highlight quality-related regions of interest. Extensive experiments demonstrate the robustness and superiority of our proposed method, offering substantial support for forensic fingerprint biometrics. The code and dataset are available at <uri>https://github.com/wzhsysu/FIQA</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"35 ","pages":"2262-2275"},"PeriodicalIF":13.7,"publicationDate":"2026-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146222643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on image processing : a publication of the IEEE Signal Processing Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1