International Journal of Computer Vision最新文献_第2页

EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining EfficientDeRain+：通过雨水混合增强学习不确定性感知过滤，实现高效去污

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-11-04 DOI: 10.1007/s11263-024-02281-7

Qing Guo, Hua Qi, Jingyang Sun, Felix Juefei-Xu, Lei Ma, Di Lin, Wei Feng, Song Wang

Deraining is a significant and fundamental computer vision task, aiming to remove the rain streaks and accumulations in an image or video. Existing deraining methods usually make heuristic assumptions of the rain model, which compels them to employ complex optimization or iterative refinement for high recovery quality. However, this leads to time-consuming methods and affects the effectiveness of addressing rain patterns, deviating from the assumptions. This paper proposes a simple yet efficient deraining method by formulating deraining as a predictive filtering problem without complex rain model assumptions. Specifically, we identify spatially-variant predictive filtering (SPFilt) that adaptively predicts proper kernels via a deep network to filter different individual pixels. Since the filtering can be implemented via well-accelerated convolution, our method can be significantly efficient. We further propose the EfDeRain+ that contains three main contributions to address residual rain traces, multi-scale, and diverse rain patterns without harming efficiency. First, we propose the uncertainty-aware cascaded predictive filtering (UC-PFilt) that can identify the difficulties of reconstructing clean pixels via predicted kernels and remove the residual rain traces effectively. Second, we design the weight-sharing multi-scale dilated filtering (WS-MS-DFilt) to handle multi-scale rain streaks without harming the efficiency. Third, to eliminate the gap across diverse rain patterns, we propose a novel data augmentation method (i.e., RainMix) to train our deep models. By combining all contributions with sophisticated analysis on different variants, our final method outperforms baseline methods on six single-image deraining datasets and one video-deraining dataset in terms of both recovery quality and speed. In particular, EfDeRain+ can derain within about 6.3 ms on a (481times 321) image and is over 74 times faster than the top baseline method with even better recovery quality. We release code in https://github.com/tsingqguo/efficientderainplus.

去毛刺是一项重要而基本的计算机视觉任务，旨在去除图像或视频中的雨条纹和积雨。现有的去毛刺方法通常会对雨水模型做出启发式假设，这就迫使它们采用复杂的优化或迭代改进来获得较高的恢复质量。然而，这导致方法耗时，并影响了处理雨模式的效果，偏离了假设。本文提出了一种简单而高效的降雨预报方法，将降雨预报表述为一个预测性过滤问题，而无需复杂的降雨模型假设。具体来说，我们确定了空间变异预测过滤（SPFilt），通过深度网络自适应地预测适当的内核，以过滤不同的单个像素。由于可以通过加速卷积实现过滤，我们的方法可以显著提高效率。我们进一步提出了 EfDeRain+，它包含三个主要贡献，可在不影响效率的情况下解决残留雨迹、多尺度和多样化雨模式等问题。首先，我们提出了不确定性感知级联预测滤波（UC-PFilt），它能识别通过预测核重建干净像素的困难，并有效去除残留雨迹。其次，我们设计了分权多尺度扩张滤波（WS-MS-DFilt）来处理多尺度雨痕，而不会降低效率。第三，为了消除不同降雨模式之间的差距，我们提出了一种新颖的数据增强方法（即 RainMix）来训练我们的深度模型。通过将所有贡献与对不同变体的复杂分析相结合，我们的最终方法在六个单图像去污数据集和一个视频去污数据集上的恢复质量和速度均优于基准方法。特别是，EfDeRain+ 可以在大约 6.3 毫秒内对（481 次/321）图像进行去污，比最高基线方法快 74 倍以上，而且恢复质量更好。我们在 https://github.com/tsingqguo/efficientderainplus 中发布了代码。

{"title":"EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining","authors":"Qing Guo, Hua Qi, Jingyang Sun, Felix Juefei-Xu, Lei Ma, Di Lin, Wei Feng, Song Wang","doi":"10.1007/s11263-024-02281-7","DOIUrl":"https://doi.org/10.1007/s11263-024-02281-7","url":null,"abstract":"Deraining is a significant and fundamental computer vision task, aiming to remove the rain streaks and accumulations in an image or video. Existing deraining methods usually make heuristic assumptions of the rain model, which compels them to employ complex optimization or iterative refinement for high recovery quality. However, this leads to time-consuming methods and affects the effectiveness of addressing rain patterns, deviating from the assumptions. This paper proposes a simple yet efficient deraining method by formulating deraining as a predictive filtering problem without complex rain model assumptions. Specifically, we identify spatially-variant predictive filtering (SPFilt) that adaptively predicts proper kernels via a deep network to filter different individual pixels. Since the filtering can be implemented via well-accelerated convolution, our method can be significantly efficient. We further propose the EfDeRain+ that contains three main contributions to address residual rain traces, multi-scale, and diverse rain patterns without harming efficiency. First, we propose the uncertainty-aware cascaded predictive filtering (UC-PFilt) that can identify the difficulties of reconstructing clean pixels via predicted kernels and remove the residual rain traces effectively. Second, we design the weight-sharing multi-scale dilated filtering (WS-MS-DFilt) to handle multi-scale rain streaks without harming the efficiency. Third, to eliminate the gap across diverse rain patterns, we propose a novel data augmentation method (i.e., RainMix) to train our deep models. By combining all contributions with sophisticated analysis on different variants, our final method outperforms baseline methods on six single-image deraining datasets and one video-deraining dataset in terms of both recovery quality and speed. In particular, EfDeRain+ can derain within about 6.3 ms on a (481times 321) image and is over 74 times faster than the top baseline method with even better recovery quality. We release code in https://github.com/tsingqguo/efficientderainplus.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"68 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142580522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes 基于少量注释像素和点云的驾驶场景弱监督语义分割

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-11-04 DOI: 10.1007/s11263-024-02275-5

Huimin Ma, Sheng Yi, Shijie Chen, Jiansheng Chen, Yu Wang

Previous weakly supervised semantic segmentation (WSSS) methods mainly begin with the segmentation seeds from the CAM method. Because of the high complexity of driving scene images, their framework performs not well on driving scene datasets. In this paper, we propose a new kind of WSSS annotations on the complex driving scene dataset, with only one or several labeled points per category. This annotation is more lightweight than image-level annotation and provides critical localization information for prototypes. We propose a framework to address the WSSS task under this annotation, which generates prototype feature vectors from labeled points and then produces 2D pseudo labels. Besides, we found the point cloud data is useful for distinguishing different objects. Our framework could extract rich semantic information from unlabeled point cloud data and generate instance masks, which does not require extra annotation resources. We combine the pseudo labels and the instance masks to modify the incorrect regions and thus obtain more accurate supervision for training the semantic segmentation network. We evaluated this framework on the KITTI dataset. Experiments show that the proposed method achieves state-of-the-art performance.

以往的弱监督语义分割（WSSS）方法主要从 CAM 方法的分割种子开始。由于驾驶场景图像的高复杂性，他们的框架在驾驶场景数据集上表现不佳。在本文中，我们针对复杂的驾驶场景数据集提出了一种新的 WSSS 注释，每个类别只有一个或几个标注点。这种注释比图像级注释更轻量级，能为原型提供关键的定位信息。我们提出了一个框架来解决这种标注下的 WSSS 任务，该框架可根据标注点生成原型特征向量，然后生成二维伪标签。此外，我们还发现点云数据有助于区分不同的物体。我们的框架可以从未标明的点云数据中提取丰富的语义信息并生成实例掩码，这不需要额外的标注资源。我们结合伪标签和实例掩码来修改不正确的区域，从而为训练语义分割网络获得更准确的监督。我们在 KITTI 数据集上对这一框架进行了评估。实验表明，所提出的方法达到了最先进的性能。

{"title":"Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes","authors":"Huimin Ma, Sheng Yi, Shijie Chen, Jiansheng Chen, Yu Wang","doi":"10.1007/s11263-024-02275-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02275-5","url":null,"abstract":"Previous weakly supervised semantic segmentation (WSSS) methods mainly begin with the segmentation seeds from the CAM method. Because of the high complexity of driving scene images, their framework performs not well on driving scene datasets. In this paper, we propose a new kind of WSSS annotations on the complex driving scene dataset, with only one or several labeled points per category. This annotation is more lightweight than image-level annotation and provides critical localization information for prototypes. We propose a framework to address the WSSS task under this annotation, which generates prototype feature vectors from labeled points and then produces 2D pseudo labels. Besides, we found the point cloud data is useful for distinguishing different objects. Our framework could extract rich semantic information from unlabeled point cloud data and generate instance masks, which does not require extra annotation resources. We combine the pseudo labels and the instance masks to modify the incorrect regions and thus obtain more accurate supervision for training the semantic segmentation network. We evaluated this framework on the KITTI dataset. Experiments show that the proposed method achieves state-of-the-art performance.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"2022 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142580565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

APPTracker+: Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking APPTracker+：在低帧速率多目标跟踪中处理遮挡的位移不确定性

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-11-03 DOI: 10.1007/s11263-024-02237-x

Tao Zhou, Qi Ye, Wenhan Luo, Haizhou Ran, Zhiguo Shi, Jiming Chen

Multi-object tracking (MOT) in the scenario of low-frame-rate videos is a promising solution to better meet the computing, storage, and transmitting bandwidth resource constraints of edge devices. Tracking with a low frame rate poses particular challenges in the association stage as objects in two successive frames typically exhibit much quicker variations in locations, velocities, appearances, and visibilities than those in normal frame rates. In this paper, we observe severe performance degeneration of many existing association strategies caused by such variations. Though optical-flow-based methods like CenterTrack can handle the large displacement to some extent due to their large receptive field, the temporally local nature makes them fail to give reliable displacement estimations of objects that newly appear in the current frame (i.e., not visible in the previous frame). To overcome the local nature of optical-flow-based methods, we propose an online tracking method by extending the CenterTrack architecture with a new head, named APP, to recognize unreliable displacement estimations. Further, to capture the fine-grained and private unreliability of each displacement estimation, we extend the binary APP predictions to displacement uncertainties. To this end, we reformulate the displacement estimation task via Bayesian deep learning tools. With APP predictions, we propose to conduct association in a multi-stage manner where vision cues or historical motion cues are leveraged in the corresponding stage. By rethinking the commonly used bipartite matching algorithms, we equip the proposed multi-stage association policy with a hybrid matching strategy conditioned on displacement uncertainties. Our method shows robustness in preserving identities in low-frame-rate video sequences. Experimental results on public datasets in various low-frame-rate settings demonstrate the advantages of the proposed method.

低帧频视频中的多目标跟踪（MOT）是一种很有前景的解决方案，能更好地满足边缘设备在计算、存储和传输带宽资源方面的限制。由于连续两帧中的物体在位置、速度、外观和可见度上的变化通常比正常帧率下的物体要快得多，因此低帧率下的跟踪在关联阶段面临着特殊的挑战。在本文中，我们观察到许多现有的关联策略都因这种变化而导致性能严重下降。虽然基于光流的方法（如 CenterTrack）由于具有较大的感受野，可以在一定程度上处理较大的位移，但其时间局部性使其无法对当前帧中新出现的物体（即在前一帧中不可见的物体）进行可靠的位移估计。为了克服基于光流的方法的局部性，我们提出了一种在线跟踪方法，通过扩展 CenterTrack 架构，增加一个新的头部（名为 APP）来识别不可靠的位移估计。此外，为了捕捉每个位移估计的细粒度和私人不可靠度，我们将二进制 APP 预测扩展到位移不确定性。为此，我们通过贝叶斯深度学习工具重新制定了位移估计任务。通过 APP 预测，我们建议以多阶段方式进行关联，在相应阶段利用视觉线索或历史运动线索。通过重新思考常用的两端匹配算法，我们为所提出的多阶段关联策略配备了以位移不确定性为条件的混合匹配策略。我们的方法在低帧率视频序列中显示出保护身份的鲁棒性。在各种低帧率环境下的公共数据集上的实验结果证明了所提方法的优势。

{"title":"APPTracker+: Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking","authors":"Tao Zhou, Qi Ye, Wenhan Luo, Haizhou Ran, Zhiguo Shi, Jiming Chen","doi":"10.1007/s11263-024-02237-x","DOIUrl":"https://doi.org/10.1007/s11263-024-02237-x","url":null,"abstract":"Multi-object tracking (MOT) in the scenario of low-frame-rate videos is a promising solution to better meet the computing, storage, and transmitting bandwidth resource constraints of edge devices. Tracking with a low frame rate poses particular challenges in the association stage as objects in two successive frames typically exhibit much quicker variations in locations, velocities, appearances, and visibilities than those in normal frame rates. In this paper, we observe severe performance degeneration of many existing association strategies caused by such variations. Though optical-flow-based methods like CenterTrack can handle the large displacement to some extent due to their large receptive field, the temporally local nature makes them fail to give reliable displacement estimations of objects that newly appear in the current frame (i.e., not visible in the previous frame). To overcome the local nature of optical-flow-based methods, we propose an online tracking method by extending the CenterTrack architecture with a new head, named APP, to recognize unreliable displacement estimations. Further, to capture the fine-grained and private unreliability of each displacement estimation, we extend the binary APP predictions to displacement uncertainties. To this end, we reformulate the displacement estimation task via Bayesian deep learning tools. With APP predictions, we propose to conduct association in a multi-stage manner where vision cues or historical motion cues are leveraged in the corresponding stage. By rethinking the commonly used bipartite matching algorithms, we equip the proposed multi-stage association policy with a hybrid matching strategy conditioned on displacement uncertainties. Our method shows robustness in preserving identities in low-frame-rate video sequences. Experimental results on public datasets in various low-frame-rate settings demonstrate the advantages of the proposed method.\u0000","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"7 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142566097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Anti-Fake Vaccine: Safeguarding Privacy Against Face Swapping via Visual-Semantic Dual Degradation 防伪疫苗：通过视觉语义双重降级保护隐私，防止人脸互换

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-11-01 DOI: 10.1007/s11263-024-02259-5

Jingzhi Li, Changjiang Luo, Hua Zhang, Yang Cao, Xin Liao, Xiaochun Cao

Deepfake techniques pose a significant threat to personal privacy and social security. To mitigate these risks, various defensive techniques have been introduced, including passive methods through fake detection and proactive methods through adding invisible perturbations. Recent proactive methods mainly focus on face manipulation but perform poorly against face swapping, as face swapping involves the more complex process of identity information transfer. To address this issue, we develop a novel privacy-preserving framework, named Anti-Fake Vaccine, to protect the facial images against the malicious face swapping. This new proactive technique dynamically fuses visual corruption and content misdirection, significantly enhancing protection performance. Specifically, we first formulate constraints from two distinct perspectives: visual quality and identity semantics. The visual perceptual constraint targets image quality degradation in the visual space, while the identity similarity constraint induces erroneous alterations in the semantic space. We then introduce a multi-objective optimization solution to effectively balance the allocation of adversarial perturbations generated according to these constraints. To further improving performance, we develop an additive perturbation strategy to discover the shared adversarial perturbations across diverse face swapping models. Extensive experiments on the CelebA-HQ and FFHQ datasets demonstrate that our method exhibits superior generalization capabilities across diverse face swapping models, including commercial ones.

深度伪造技术对个人隐私和社会安全构成重大威胁。为了降低这些风险，人们引入了各种防御技术，包括通过假冒检测的被动方法和通过添加隐形扰动的主动方法。最近的主动方法主要针对人脸操纵，但在对付人脸互换方面表现不佳，因为人脸互换涉及更复杂的身份信息传递过程。为了解决这个问题，我们开发了一种新颖的隐私保护框架，名为 "反假冒疫苗"（Anti-Fake Vaccine），以保护面部图像免受恶意换脸的侵害。这种新的主动技术动态地融合了视觉破坏和内容误导，大大提高了保护性能。具体来说，我们首先从视觉质量和身份语义两个不同的角度制定了约束条件。视觉感知约束针对的是视觉空间中的图像质量下降，而身份相似性约束诱发的是语义空间中的错误更改。然后，我们引入了一种多目标优化解决方案，以有效平衡根据这些约束产生的对抗性扰动的分配。为了进一步提高性能，我们开发了一种加法扰动策略，以发现不同换脸模型中共享的对抗性扰动。在 CelebA-HQ 和 FFHQ 数据集上进行的大量实验表明，我们的方法在不同的人脸互换模型（包括商业模型）中表现出卓越的泛化能力。

{"title":"Anti-Fake Vaccine: Safeguarding Privacy Against Face Swapping via Visual-Semantic Dual Degradation","authors":"Jingzhi Li, Changjiang Luo, Hua Zhang, Yang Cao, Xin Liao, Xiaochun Cao","doi":"10.1007/s11263-024-02259-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02259-5","url":null,"abstract":"Deepfake techniques pose a significant threat to personal privacy and social security. To mitigate these risks, various defensive techniques have been introduced, including passive methods through fake detection and proactive methods through adding invisible perturbations. Recent proactive methods mainly focus on face manipulation but perform poorly against face swapping, as face swapping involves the more complex process of identity information transfer. To address this issue, we develop a novel privacy-preserving framework, named Anti-Fake Vaccine, to protect the facial images against the malicious face swapping. This new proactive technique dynamically fuses visual corruption and content misdirection, significantly enhancing protection performance. Specifically, we first formulate constraints from two distinct perspectives: visual quality and identity semantics. The visual perceptual constraint targets image quality degradation in the visual space, while the identity similarity constraint induces erroneous alterations in the semantic space. We then introduce a multi-objective optimization solution to effectively balance the allocation of adversarial perturbations generated according to these constraints. To further improving performance, we develop an additive perturbation strategy to discover the shared adversarial perturbations across diverse face swapping models. Extensive experiments on the CelebA-HQ and FFHQ datasets demonstrate that our method exhibits superior generalization capabilities across diverse face swapping models, including commercial ones.\u0000","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"20 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142562160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving 3D Finger Traits Recognition via Generalizable Neural Rendering 通过通用神经渲染改进 3D 手指特征识别

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-10-30 DOI: 10.1007/s11263-024-02248-8

Hongbin Xu, Junduan Huang, Yuer Ma, Zifeng Li, Wenxiong Kang

3D biometric techniques on finger traits have become a new trend and have demonstrated a powerful ability for recognition and anti-counterfeiting. Existing methods follow an explicit 3D pipeline that reconstructs the models first and then extracts features from 3D models. However, these explicit 3D methods suffer from the following problems: 1) Inevitable information dropping during 3D reconstruction; 2) Tight coupling between specific hardware and algorithm for 3D reconstruction. It leads us to a question: Is it indispensable to reconstruct 3D information explicitly in recognition tasks? Hence, we consider this problem in an implicit manner, leaving the nerve-wracking 3D reconstruction problem for learnable neural networks with the help of neural radiance fields (NeRFs). We propose FingerNeRF, a novel generalizable NeRF for 3D finger biometrics. To handle the shape-radiance ambiguity problem that may result in incorrect 3D geometry, we aim to involve extra geometric priors based on the correspondence of binary finger traits like fingerprints or finger veins. First, we propose a novel Trait Guided Transformer (TGT) module to enhance the feature correspondence with the guidance of finger traits. Second, we involve extra geometric constraints on the volume rendering loss with the proposed Depth Distillation Loss and Trait Guided Rendering Loss. To evaluate the performance of the proposed method on different modalities, we collect two new datasets: SCUT-Finger-3D with finger images and SCUT-FingerVein-3D with finger vein images. Moreover, we also utilize the UNSW-3D dataset with fingerprint images for evaluation. In experiments, our FingerNeRF can achieve 4.37% EER on SCUT-Finger-3D dataset, 8.12% EER on SCUT-FingerVein-3D dataset, and 2.90% EER on UNSW-3D dataset, showing the superiority of the proposed implicit method in 3D finger biometrics.

针对手指特征的三维生物识别技术已成为一种新趋势，并显示出强大的识别和防伪能力。现有方法采用显式三维管道，首先重建模型，然后从三维模型中提取特征。然而，这些显式三维方法存在以下问题：1）三维重建过程中不可避免的信息丢失；2）特定硬件与三维重建算法之间的紧密耦合。这就引出了一个问题：在识别任务中，明确重建三维信息是否必不可少？因此，我们以隐含的方式考虑这个问题，将令人头疼的三维重建问题留给借助神经辐射场（NeRF）的可学习神经网络。我们提出的 FingerNeRF 是一种用于三维手指生物识别的新型通用 NeRF。为了处理可能导致三维几何形状不正确的形状-辐射模糊问题，我们旨在根据指纹或指静脉等二进制手指特征的对应关系，引入额外的几何先验。首先，我们提出了一个新颖的特征引导变换器（TGT）模块，以手指特征为导向增强特征对应性。其次，我们提出了深度蒸馏损耗和特质引导渲染损耗，对体积渲染损耗进行了额外的几何约束。为了评估所提出的方法在不同模态上的性能，我们收集了两个新的数据集：SCUT-Finger-3D 包含手指图像，SCUT-FingerVein-3D 包含手指静脉图像。此外，我们还利用 UNSW-3D 数据集对指纹图像进行了评估。在实验中，我们的 FingerNeRF 在 SCUT-Finger-3D 数据集上的误差率为 4.37%，在 SCUT-FingerVein-3D 数据集上的误差率为 8.12%，在 UNSW-3D 数据集上的误差率为 2.90%，显示了所提出的隐式方法在三维手指生物识别中的优越性。

{"title":"Improving 3D Finger Traits Recognition via Generalizable Neural Rendering","authors":"Hongbin Xu, Junduan Huang, Yuer Ma, Zifeng Li, Wenxiong Kang","doi":"10.1007/s11263-024-02248-8","DOIUrl":"https://doi.org/10.1007/s11263-024-02248-8","url":null,"abstract":"3D biometric techniques on finger traits have become a new trend and have demonstrated a powerful ability for recognition and anti-counterfeiting. Existing methods follow an explicit 3D pipeline that reconstructs the models first and then extracts features from 3D models. However, these explicit 3D methods suffer from the following problems: 1) Inevitable information dropping during 3D reconstruction; 2) Tight coupling between specific hardware and algorithm for 3D reconstruction. It leads us to a question: Is it indispensable to reconstruct 3D information explicitly in recognition tasks? Hence, we consider this problem in an implicit manner, leaving the nerve-wracking 3D reconstruction problem for learnable neural networks with the help of neural radiance fields (NeRFs). We propose FingerNeRF, a novel generalizable NeRF for 3D finger biometrics. To handle the shape-radiance ambiguity problem that may result in incorrect 3D geometry, we aim to involve extra geometric priors based on the correspondence of binary finger traits like fingerprints or finger veins. First, we propose a novel Trait Guided Transformer (TGT) module to enhance the feature correspondence with the guidance of finger traits. Second, we involve extra geometric constraints on the volume rendering loss with the proposed Depth Distillation Loss and Trait Guided Rendering Loss. To evaluate the performance of the proposed method on different modalities, we collect two new datasets: SCUT-Finger-3D with finger images and SCUT-FingerVein-3D with finger vein images. Moreover, we also utilize the UNSW-3D dataset with fingerprint images for evaluation. In experiments, our FingerNeRF can achieve 4.37% EER on SCUT-Finger-3D dataset, 8.12% EER on SCUT-FingerVein-3D dataset, and 2.90% EER on UNSW-3D dataset, showing the superiority of the proposed implicit method in 3D finger biometrics.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"66 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces 未注册表面空间上的基点受限弹性形状分析

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-10-30 DOI: 10.1007/s11263-024-02269-3

Emmanuel Hartman, Emery Pierson, Martin Bauer, Mohamed Daoudi, Nicolas Charon

This paper introduces a new framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as reparametrizations. The specificity of our approach is to restrict the space of allowable transformations to predefined finite dimensional bases of deformation fields. These are estimated in a data-driven way so as to emulate specific types of surface transformations. This allows us to simplify the representation of the corresponding shape space to a finite dimensional latent space. However, in sharp contrast with methods involving e.g. mesh autoencoders, the latent space is equipped with a non-Euclidean Riemannian metric inherited from the family of elastic metrics. We demonstrate how this model can be effectively implemented to perform a variety of tasks on surface meshes which, importantly, does not assume these to be pre-registered or to even have a consistent mesh structure. We specifically validate our approach on human body shape and pose data as well as human face and hand scans for problems such as shape registration, interpolation, motion transfer or random pose generation.

本文从形状空间上的弹性黎曼度量的一般设置出发，介绍了一种新的曲面分析框架。传统上，这些度量定义在浸没曲面的无限维流形上，并满足特定的不变性属性，从而可以比较曲面在形状保持变换（如重拟三维变换）下的模态。我们方法的特殊性在于将允许变换的空间限制在预定义的变形场有限维基上。这些基础是以数据驱动的方式估算的，以便模拟特定类型的表面变换。这样，我们就能将相应形状空间的表示简化为有限维潜在空间。然而，与网格自动编码器等方法形成鲜明对比的是，潜空间配备了从弹性度量系列继承而来的非欧几里得黎曼度量。我们展示了如何有效地实施这一模型，以便在曲面网格上执行各种任务，重要的是，我们并不假定这些网格是预先注册的，甚至不假定它们具有一致的网格结构。我们特别在人体形状和姿势数据以及人脸和手部扫描上验证了我们的方法，以解决形状注册、插值、运动转移或随机姿势生成等问题。

{"title":"Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces","authors":"Emmanuel Hartman, Emery Pierson, Martin Bauer, Mohamed Daoudi, Nicolas Charon","doi":"10.1007/s11263-024-02269-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02269-3","url":null,"abstract":"This paper introduces a new framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as reparametrizations. The specificity of our approach is to restrict the space of allowable transformations to predefined finite dimensional bases of deformation fields. These are estimated in a data-driven way so as to emulate specific types of surface transformations. This allows us to simplify the representation of the corresponding shape space to a finite dimensional latent space. However, in sharp contrast with methods involving e.g. mesh autoencoders, the latent space is equipped with a non-Euclidean Riemannian metric inherited from the family of elastic metrics. We demonstrate how this model can be effectively implemented to perform a variety of tasks on surface meshes which, importantly, does not assume these to be pre-registered or to even have a consistent mesh structure. We specifically validate our approach on human body shape and pose data as well as human face and hand scans for problems such as shape registration, interpolation, motion transfer or random pose generation.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection 用于弱监督在线活动检测的具有课程预测功能的记忆辅助知识转移框架

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-10-28 DOI: 10.1007/s11263-024-02279-1

Tianshan Liu, Kin-Man Lam, Bing-Kun Bao

As a crucial topic of high-level video understanding, weakly supervised online activity detection (WS-OAD) involves identifying the ongoing behaviors moment-to-moment in streaming videos, trained with solely cheap video-level annotations. It is essentially a challenging task, which requires addressing the entangled issues of the weakly supervised settings and online constraints. In this paper, we tackle the WS-OAD task from the knowledge-distillation (KD) perspective, which trains an online student detector to distill dual-level knowledge from a weakly supervised offline teacher model. To guarantee the completeness of knowledge transfer, we improve the vanilla KD framework from two aspects. First, we introduce an external memory bank to maintain the long-term activity prototypes, which serves as a bridge to align the activity semantics learned from the offline teacher and online student models. Second, to compensate the missing contexts of unseen near future, we leverage a curriculum learning paradigm to gradually train the online student detector to anticipate the future activity semantics. By dynamically scheduling the provided auxiliary future states, the online detector progressively distills contextual information from the offline model in an easy-to-hard course. Extensive experimental results on three public data sets demonstrate the superiority of our proposed method over the competing methods.

作为高级视频理解的一个重要课题，弱监督在线活动检测（WS-OAD）是指仅利用廉价的视频级注释进行训练，识别流媒体视频中每时每刻正在发生的行为。从本质上讲，这是一项具有挑战性的任务，需要解决弱监督设置和在线约束之间的纠缠不清的问题。在本文中，我们从知识提炼（KD）的角度来解决 WS-OAD 任务，即训练一个在线学生检测器，从弱监督离线教师模型中提炼出双层知识。为了保证知识转移的完整性，我们从两个方面改进了虚无的 KD 框架。首先，我们引入了一个外部记忆库来维护长期的活动原型，作为一座桥梁，将从离线教师模型和在线学生模型中学到的活动语义统一起来。其次，为了弥补近期未见语境的缺失，我们利用课程学习范式来逐步训练在线学生检测器，以预测未来的活动语义。通过动态调度所提供的辅助未来状态，在线检测器在由易到难的过程中逐步从离线模型中提炼出上下文信息。在三个公共数据集上的广泛实验结果表明，我们提出的方法优于其他竞争方法。

{"title":"A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection","authors":"Tianshan Liu, Kin-Man Lam, Bing-Kun Bao","doi":"10.1007/s11263-024-02279-1","DOIUrl":"https://doi.org/10.1007/s11263-024-02279-1","url":null,"abstract":"As a crucial topic of high-level video understanding, weakly supervised online activity detection (WS-OAD) involves identifying the ongoing behaviors moment-to-moment in streaming videos, trained with solely cheap video-level annotations. It is essentially a challenging task, which requires addressing the entangled issues of the weakly supervised settings and online constraints. In this paper, we tackle the WS-OAD task from the knowledge-distillation (KD) perspective, which trains an online student detector to distill dual-level knowledge from a weakly supervised offline teacher model. To guarantee the completeness of knowledge transfer, we improve the vanilla KD framework from two aspects. First, we introduce an external memory bank to maintain the long-term activity prototypes, which serves as a bridge to align the activity semantics learned from the offline teacher and online student models. Second, to compensate the missing contexts of unseen near future, we leverage a curriculum learning paradigm to gradually train the online student detector to anticipate the future activity semantics. By dynamically scheduling the provided auxiliary future states, the online detector progressively distills contextual information from the offline model in an easy-to-hard course. Extensive experimental results on three public data sets demonstrate the superiority of our proposed method over the competing methods.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"75 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142536604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Attention Vision-Language Transformer Network for Person Re-identification 用于人员再识别的动态注意力视觉语言转换器网络

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-10-26 DOI: 10.1007/s11263-024-02277-3

Guifang Zhang, Shijun Tan, Zhe Ji, Yuming Fang

Multimodal based person re-identification (ReID) has garnered increasing attention in recent years. However, the integration of visual and textual information encounters significant challenges. Biases in feature integration are frequently observed in existing methods, resulting in suboptimal performance and restricted generalization across a spectrum of ReID tasks. At the same time, since there is a domain gap between the datasets used by the pretraining model and the ReID datasets, it has a certain impact on the performance. In response to these challenges, we proposed a dynamic attention vision-language transformer network for the ReID task. In this network, a novel image-text dynamic attention module (ITDA) is designed to promote unbiased feature integration by dynamically assigning the importance of image and text representations. Additionally, an adapter module is adopted to address the domain gap between pretraining datasets and ReID datasets. Our network can capture complex connections between visual and textual information and achieve satisfactory performance. We conducted numerous experiments on ReID benchmarks to demonstrate the efficacy of our proposed method. The experimental results show that our method achieves state-of-the-art performance, surpassing existing integration strategies. These findings underscore the critical role of unbiased feature dynamic integration in enhancing the capabilities of multimodal based ReID models.

近年来，基于多模态的人员再识别（ReID）技术受到越来越多的关注。然而，视觉和文本信息的整合遇到了重大挑战。现有方法在特征整合方面经常出现偏差，导致在一系列 ReID 任务中表现不佳，通用性受限。同时，由于预训练模型所使用的数据集与 ReID 数据集之间存在领域差距，这对性能有一定的影响。为了应对这些挑战，我们为 ReID 任务提出了一种动态注意力视觉语言转换器网络。在这个网络中，我们设计了一个新颖的图像-文本动态注意力模块（ITDA），通过动态分配图像和文本表征的重要性来促进无偏见的特征整合。此外，还采用了一个适配器模块来解决预训练数据集和 ReID 数据集之间的领域差距。我们的网络能够捕捉视觉和文本信息之间的复杂联系，并取得了令人满意的性能。我们在 ReID 基准上进行了大量实验，以证明我们提出的方法的有效性。实验结果表明，我们的方法达到了最先进的性能，超越了现有的整合策略。这些发现强调了无偏差特征动态整合在增强基于多模态的 ReID 模型能力方面的关键作用。

{"title":"Dynamic Attention Vision-Language Transformer Network for Person Re-identification","authors":"Guifang Zhang, Shijun Tan, Zhe Ji, Yuming Fang","doi":"10.1007/s11263-024-02277-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02277-3","url":null,"abstract":"Multimodal based person re-identification (ReID) has garnered increasing attention in recent years. However, the integration of visual and textual information encounters significant challenges. Biases in feature integration are frequently observed in existing methods, resulting in suboptimal performance and restricted generalization across a spectrum of ReID tasks. At the same time, since there is a domain gap between the datasets used by the pretraining model and the ReID datasets, it has a certain impact on the performance. In response to these challenges, we proposed a dynamic attention vision-language transformer network for the ReID task. In this network, a novel image-text dynamic attention module (ITDA) is designed to promote unbiased feature integration by dynamically assigning the importance of image and text representations. Additionally, an adapter module is adopted to address the domain gap between pretraining datasets and ReID datasets. Our network can capture complex connections between visual and textual information and achieve satisfactory performance. We conducted numerous experiments on ReID benchmarks to demonstrate the efficacy of our proposed method. The experimental results show that our method achieves state-of-the-art performance, surpassing existing integration strategies. These findings underscore the critical role of unbiased feature dynamic integration in enhancing the capabilities of multimodal based ReID models.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"96 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142490658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

StyleAdapter: A Unified Stylized Image Generation Model 样式适配器统一的风格化图像生成模型

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-10-25 DOI: 10.1007/s11263-024-02253-x

Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo

This work focuses on generating high-quality images with specific style of reference images and content of provided textual descriptions. Current leading algorithms, i.e., DreamBooth and LoRA, require fine-tuning for each style, leading to time-consuming and computationally expensive processes. In this work, we propose StyleAdapter, a unified stylized image generation model capable of producing a variety of stylized images that match both the content of a given prompt and the style of reference images, without the need for per-style fine-tuning. It introduces a two-path cross-attention (TPCA) module to separately process style information and textual prompt, which cooperate with a semantic suppressing vision model (SSVM) to suppress the semantic content of style images. In this way, it can ensure that the prompt maintains control over the content of the generated images, while also mitigating the negative impact of semantic information in style references. This results in the content of the generated image adhering to the prompt, and its style aligning with the style references. Besides, our StyleAdapter can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet, to attain a more controllable and stable generation process. Extensive experiments demonstrate the superiority of our method over previous works.

这项工作的重点是生成具有特定参考图像风格和所提供文本描述内容的高质量图像。目前的主流算法，即 DreamBooth 和 LoRA，需要对每种风格进行微调，导致过程耗时且计算成本高昂。在这项工作中，我们提出了样式适配器（StyleAdapter），这是一种统一的样式化图像生成模型，能够生成与给定提示内容和参考图像样式相匹配的各种样式化图像，而无需对每种样式进行微调。它引入了双路径交叉注意（TPCA）模块，分别处理风格信息和文本提示，并与语义抑制视觉模型（SSVM）合作，抑制风格图像的语义内容。这样，既能确保提示信息保持对生成图像内容的控制，又能减轻样式参考中语义信息的负面影响。这样，生成图像的内容就会与提示保持一致，其样式也会与样式参考保持一致。此外，我们的 StyleAdapter 可以与现有的可控合成方法（如 T2I-adapter 和 ControlNet）集成，以实现更可控、更稳定的生成过程。广泛的实验证明了我们的方法优于之前的作品。

{"title":"StyleAdapter: A Unified Stylized Image Generation Model","authors":"Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo","doi":"10.1007/s11263-024-02253-x","DOIUrl":"https://doi.org/10.1007/s11263-024-02253-x","url":null,"abstract":"This work focuses on generating high-quality images with specific style of reference images and content of provided textual descriptions. Current leading algorithms, i.e., DreamBooth and LoRA, require fine-tuning for each style, leading to time-consuming and computationally expensive processes. In this work, we propose StyleAdapter, a unified stylized image generation model capable of producing a variety of stylized images that match both the content of a given prompt and the style of reference images, without the need for per-style fine-tuning. It introduces a two-path cross-attention (TPCA) module to separately process style information and textual prompt, which cooperate with a semantic suppressing vision model (SSVM) to suppress the semantic content of style images. In this way, it can ensure that the prompt maintains control over the content of the generated images, while also mitigating the negative impact of semantic information in style references. This results in the content of the generated image adhering to the prompt, and its style aligning with the style references. Besides, our StyleAdapter can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet, to attain a more controllable and stable generation process. Extensive experiments demonstrate the superiority of our method over previous works.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"60 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142489489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sample Correlation for Fingerprinting Deep Face Recognition 指纹深度人脸识别的样本相关性

IF 19.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

International Journal of Computer Vision

Pub Date : 2024-10-25 DOI: 10.1007/s11263-024-02254-w

Jiyang Guan, Jian Liang, Yanbo Wang, Ran He

Face recognition has witnessed remarkable advancements in recent years, thanks to the development of deep learning techniques. However, an off-the-shelf face recognition model as a commercial service could be stolen by model stealing attacks, posing great threats to the rights of the model owner. Model fingerprinting, as a model stealing detection method, aims to verify whether a suspect model is stolen from the victim model, gaining more and more attention nowadays. Previous methods always utilize transferable adversarial examples as the model fingerprint, but this method is known to be sensitive to adversarial defense and transfer learning techniques. To address this issue, we consider the pairwise relationship between samples instead and propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC). Specifically, we present SAC-JC that selects JPEG compressed samples as model inputs and calculates the correlation matrix among their model outputs. Extensive results validate that SAC successfully defends against various model stealing attacks in deep face recognition, encompassing face verification and face emotion recognition, exhibiting the highest performance in terms of AUC, p-value and F1 score. Furthermore, we extend our evaluation of SAC-JC to object recognition datasets including Tiny-ImageNet and CIFAR10, which also demonstrates the superior performance of SAC-JC to previous methods. The code will be available at https://github.com/guanjiyang/SAC_JC.

近年来，得益于深度学习技术的发展，人脸识别技术取得了显著进步。然而，作为一种商业服务，现成的人脸识别模型可能会被模型窃取攻击窃取，对模型所有者的权益造成极大威胁。模型指纹识别作为一种模型窃取检测方法，旨在验证可疑模型是否是从受害模型中窃取的，如今正受到越来越多的关注。以往的方法总是利用可转移的对抗实例作为模型指纹，但众所周知，这种方法对对抗防御和转移学习技术很敏感。为了解决这个问题，我们考虑了样本之间的成对关系，提出了一种新颖而简单的基于 "简单相关性"（SAmple Correlation，SAC）的模型窃取检测方法。具体来说，我们提出的 SAC-JC 可以选择 JPEG 压缩样本作为模型输入，并计算其模型输出之间的相关矩阵。广泛的结果验证了 SAC 成功抵御了深度人脸识别（包括人脸验证和人脸情感识别）中的各种模型窃取攻击，在 AUC、P 值和 F1 分数方面表现出了最高的性能。此外，我们还将SAC-JC的评估扩展到了物体识别数据集，包括Tiny-ImageNet和CIFAR10，这也证明了SAC-JC的性能优于之前的方法。代码可在 https://github.com/guanjiyang/SAC_JC 上获取。

{"title":"Sample Correlation for Fingerprinting Deep Face Recognition","authors":"Jiyang Guan, Jian Liang, Yanbo Wang, Ran He","doi":"10.1007/s11263-024-02254-w","DOIUrl":"https://doi.org/10.1007/s11263-024-02254-w","url":null,"abstract":"Face recognition has witnessed remarkable advancements in recent years, thanks to the development of deep learning techniques. However, an off-the-shelf face recognition model as a commercial service could be stolen by model stealing attacks, posing great threats to the rights of the model owner. Model fingerprinting, as a model stealing detection method, aims to verify whether a suspect model is stolen from the victim model, gaining more and more attention nowadays. Previous methods always utilize transferable adversarial examples as the model fingerprint, but this method is known to be sensitive to adversarial defense and transfer learning techniques. To address this issue, we consider the pairwise relationship between samples instead and propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC). Specifically, we present SAC-JC that selects JPEG compressed samples as model inputs and calculates the correlation matrix among their model outputs. Extensive results validate that SAC successfully defends against various model stealing attacks in deep face recognition, encompassing face verification and face emotion recognition, exhibiting the highest performance in terms of AUC, p-value and F1 score. Furthermore, we extend our evaluation of SAC-JC to object recognition datasets including Tiny-ImageNet and CIFAR10, which also demonstrates the superior performance of SAC-JC to previous methods. The code will be available at https://github.com/guanjiyang/SAC_JC.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"75 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142490657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0