首页 > 最新文献

International Journal of Computer Vision最新文献

英文 中文
Rethinking Contemporary Deep Learning Techniques for Error Correction in Biometric Data 反思当代生物识别数据纠错的深度学习技术
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-06 DOI: 10.1007/s11263-024-02280-8
YenLung Lai, XingBo Dong, Zhe Jin, Wei Jia, Massimo Tistarelli, XueJun Li

In the realm of cryptography, the implementation of error correction in biometric data offers many benefits, including secure data storage and key derivation. Deep learning-based decoders have emerged as a catalyst for improved error correction when decoding noisy biometric data. Although these decoders exhibit competence in approximating precise solutions, we expose the potential inadequacy of their security assurances through a minimum entropy analysis. This limitation curtails their applicability in secure biometric contexts, as the inherent complexities of their non-linear neural network architectures pose challenges in modeling the solution distribution precisely. To address this limitation, we introduce U-Sketch, a universal approach for error correction in biometrics, which converts arbitrary input random biometric source distributions into independent and identically distributed (i.i.d.) data while maintaining the pairwise distance of the data post-transformation. This method ensures interpretability within the decoder, facilitating transparent entropy analysis and a substantiated security claim. Moreover, U-Sketch employs Maximum Likelihood Decoding, which provides optimal error tolerance and a precise security guarantee.

在密码学领域,在生物识别数据中实施纠错有很多好处,包括安全的数据存储和密钥推导。基于深度学习的解码器已成为改进生物识别数据解码纠错的催化剂。虽然这些解码器在近似精确解法方面表现出了能力,但我们通过最小熵分析揭示了其安全保证的潜在不足。这一局限性限制了它们在安全生物识别领域的应用,因为其非线性神经网络架构的内在复杂性给精确建模解分布带来了挑战。为解决这一局限性,我们引入了 U-Sketch 这种生物识别中的通用纠错方法,它能将任意输入的随机生物识别源分布转换为独立且同分布(i.i.d.)的数据,同时保持转换后数据的成对距离。这种方法可确保解码器内的可解释性,便于进行透明的熵分析,并提出可靠的安全主张。此外,U-Sketch 还采用了最大似然解码,从而提供了最佳的容错能力和精确的安全保证。
{"title":"Rethinking Contemporary Deep Learning Techniques for Error Correction in Biometric Data","authors":"YenLung Lai, XingBo Dong, Zhe Jin, Wei Jia, Massimo Tistarelli, XueJun Li","doi":"10.1007/s11263-024-02280-8","DOIUrl":"https://doi.org/10.1007/s11263-024-02280-8","url":null,"abstract":"<p>In the realm of cryptography, the implementation of error correction in biometric data offers many benefits, including secure data storage and key derivation. Deep learning-based decoders have emerged as a catalyst for improved error correction when decoding noisy biometric data. Although these decoders exhibit competence in approximating precise solutions, we expose the potential inadequacy of their security assurances through a minimum entropy analysis. This limitation curtails their applicability in secure biometric contexts, as the inherent complexities of their non-linear neural network architectures pose challenges in modeling the solution distribution precisely. To address this limitation, we introduce U-Sketch, a universal approach for error correction in biometrics, which converts arbitrary input random biometric source distributions into independent and identically distributed (i.i.d.) data while maintaining the pairwise distance of the data post-transformation. This method ensures interpretability within the decoder, facilitating transparent entropy analysis and a substantiated security claim. Moreover, U-Sketch employs Maximum Likelihood Decoding, which provides optimal error tolerance and a precise security guarantee.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142588713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight Day2Dark:超越无声日光的伪监督活动识别
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-06 DOI: 10.1007/s11263-024-02273-7
Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek

This paper strives to recognize activities in the dark, as well as in the day. We first establish that state-of-the-art activity recognizers are effective during the day, but not trustworthy in the dark. The main causes are the limited availability of labeled dark videos to learn from, as well as the distribution shift towards the lower color contrast at test-time. To compensate for the lack of labeled dark videos, we introduce a pseudo-supervised learning scheme, which utilizes easy to obtain unlabeled and task-irrelevant dark videos to improve an activity recognizer in low light. As the lower color contrast results in visual information loss, we further propose to incorporate the complementary activity information within audio, which is invariant to illumination. Since the usefulness of audio and visual features differs depending on the amount of illumination, we introduce our ‘darkness-adaptive’ audio-visual recognizer. Experiments on EPIC-Kitchens, Kinetics-Sound, and Charades demonstrate our proposals are superior to image enhancement, domain adaptation and alternative audio-visual fusion methods, and can even improve robustness to local darkness caused by occlusions. Project page: https://xiaobai1217.github.io/Day2Dark/.

本文致力于识别黑暗中和白天的活动。我们首先确定,最先进的活动识别器在白天是有效的,但在黑暗中却不可信。主要原因是可供学习的有标签的黑暗视频有限,以及测试时颜色对比度较低的分布变化。为了弥补标记暗视频的不足,我们引入了一种伪监督学习方案,利用容易获得的非标记和任务无关的暗视频来改进暗光下的活动识别器。由于较低的色彩对比度会导致视觉信息损失,我们进一步建议将互补的活动信息纳入音频中,因为音频不受光照影响。由于音频和视觉特征的有用性因光照度而异,因此我们推出了 "暗度适应 "视听识别器。在 EPIC-Kitchens、Kinetics-Sound 和 Charades 上进行的实验表明,我们的建议优于图像增强、域自适应和其他视听融合方法,甚至可以提高对遮挡物造成的局部黑暗的鲁棒性。项目页面:https://xiaobai1217.github.io/Day2Dark/。
{"title":"Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight","authors":"Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek","doi":"10.1007/s11263-024-02273-7","DOIUrl":"https://doi.org/10.1007/s11263-024-02273-7","url":null,"abstract":"<p>This paper strives to recognize activities in the dark, as well as in the day. We first establish that state-of-the-art activity recognizers are effective during the day, but not trustworthy in the dark. The main causes are the limited availability of labeled dark videos to learn from, as well as the distribution shift towards the lower color contrast at test-time. To compensate for the lack of labeled dark videos, we introduce a pseudo-supervised learning scheme, which utilizes easy to obtain unlabeled and task-irrelevant dark videos to improve an activity recognizer in low light. As the lower color contrast results in visual information loss, we further propose to incorporate the complementary activity information within audio, which is invariant to illumination. Since the usefulness of audio and visual features differs depending on the amount of illumination, we introduce our ‘darkness-adaptive’ audio-visual recognizer. Experiments on EPIC-Kitchens, Kinetics-Sound, and Charades demonstrate our proposals are superior to image enhancement, domain adaptation and alternative audio-visual fusion methods, and can even improve robustness to local darkness caused by occlusions. Project page: https://xiaobai1217.github.io/Day2Dark/.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142588590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Achieving Procedure-Aware Instructional Video Correlation Learning Under Weak Supervision from a Collaborative Perspective 从协作视角实现弱监督下的程序感知教学视频关联学习
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1007/s11263-024-02272-8
Tianyao He, Huabin Liu, Zelin Ni, Yuxi Li, Xiao Ma, Cheng Zhong, Yang Zhang, Yingxue Wang, Weiyao Lin

Video Correlation Learning (VCL) delineates a high-level research domain that centers on analyzing the semantic and temporal correspondences between videos through a comparative paradigm. Recently, instructional video-related tasks have drawn increasing attention due to their promising potential. Compared with general videos, instructional videos possess more complex procedure information, making correlation learning quite challenging. To obtain procedural knowledge, current methods rely heavily on fine-grained step-level annotations, which are costly and non-scalable. To improve VCL on instructional videos, we introduce a weakly supervised framework named Collaborative Procedure Alignment (CPA). To be specific, our framework comprises two core components: the collaborative step mining (CSM) module and the frame-to-step alignment (FSA) module. Free of the necessity for step-level annotations, the CSM module can properly conduct temporal step segmentation and pseudo-step learning by exploring the inner procedure correspondences between paired videos. Subsequently, the FSA module efficiently yields the probability of aligning one video’s frame-level features with another video’s pseudo-step labels, which can act as a reliable correlation degree for paired videos. The two modules are inherently interconnected and can mutually enhance each other to extract the step-level knowledge and measure the video correlation distances accurately. Our framework provides an effective tool for instructional video correlation learning. We instantiate our framework on four representative tasks, including sequence verification, few-shot action recognition, temporal action segmentation, and action quality assessment. Furthermore, we extend our framework to more innovative functions to further exhibit its potential. Extensive and in-depth experiments validate CPA’s strong correlation learning capability on instructional videos. The implementation can be found at https://github.com/hotelll/Collaborative_Procedure_Alignment.

视频相关学习(Video Correlation Learning,VCL)是一个高级研究领域,其核心是通过比较范式分析视频之间的语义和时间对应关系。最近,与教学视频相关的任务因其巨大的潜力而日益受到关注。与普通视频相比,教学视频拥有更复杂的程序信息,这使得关联学习具有相当大的挑战性。为了获取程序知识,目前的方法主要依赖于细粒度的步骤级注释,这种方法成本高且不可扩展。为了改进教学视频中的 VCL,我们引入了一个名为协作程序对齐(CPA)的弱监督框架。具体来说,我们的框架由两个核心部分组成:协作步骤挖掘(CSM)模块和帧到步骤对齐(FSA)模块。CSM 模块无需步骤级注释,通过探索配对视频之间的内部程序对应关系,可以正确地进行时间步骤分割和伪步骤学习。随后,FSA 模块能有效地得出一个视频的帧级特征与另一个视频的伪步骤标签的对齐概率,这可以作为配对视频的可靠相关度。这两个模块之间存在内在联系,可以相互促进,从而提取步骤级知识并准确测量视频相关距离。我们的框架为教学视频相关性学习提供了有效工具。我们在四个具有代表性的任务中实例化了我们的框架,包括序列验证、少镜头动作识别、时序动作分割和动作质量评估。此外,我们还将框架扩展到更多创新功能,以进一步展示其潜力。广泛而深入的实验验证了 CPA 在教学视频中强大的关联学习能力。具体实现方法请访问 https://github.com/hotelll/Collaborative_Procedure_Alignment。
{"title":"Achieving Procedure-Aware Instructional Video Correlation Learning Under Weak Supervision from a Collaborative Perspective","authors":"Tianyao He, Huabin Liu, Zelin Ni, Yuxi Li, Xiao Ma, Cheng Zhong, Yang Zhang, Yingxue Wang, Weiyao Lin","doi":"10.1007/s11263-024-02272-8","DOIUrl":"https://doi.org/10.1007/s11263-024-02272-8","url":null,"abstract":"<p>Video Correlation Learning (VCL) delineates a high-level research domain that centers on analyzing the semantic and temporal correspondences between videos through a comparative paradigm. Recently, instructional video-related tasks have drawn increasing attention due to their promising potential. Compared with general videos, instructional videos possess more complex procedure information, making correlation learning quite challenging. To obtain procedural knowledge, current methods rely heavily on fine-grained step-level annotations, which are costly and non-scalable. To improve VCL on instructional videos, we introduce a weakly supervised framework named Collaborative Procedure Alignment (CPA). To be specific, our framework comprises two core components: the collaborative step mining (CSM) module and the frame-to-step alignment (FSA) module. Free of the necessity for step-level annotations, the CSM module can properly conduct temporal step segmentation and pseudo-step learning by exploring the inner procedure correspondences between paired videos. Subsequently, the FSA module efficiently yields the probability of aligning one video’s frame-level features with another video’s pseudo-step labels, which can act as a reliable correlation degree for paired videos. The two modules are inherently interconnected and can mutually enhance each other to extract the step-level knowledge and measure the video correlation distances accurately. Our framework provides an effective tool for instructional video correlation learning. We instantiate our framework on four representative tasks, including sequence verification, few-shot action recognition, temporal action segmentation, and action quality assessment. Furthermore, we extend our framework to more innovative functions to further exhibit its potential. Extensive and in-depth experiments validate CPA’s strong correlation learning capability on instructional videos. The implementation can be found at https://github.com/hotelll/Collaborative_Procedure_Alignment.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142580525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining EfficientDeRain+:通过雨水混合增强学习不确定性感知过滤,实现高效去污
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1007/s11263-024-02281-7
Qing Guo, Hua Qi, Jingyang Sun, Felix Juefei-Xu, Lei Ma, Di Lin, Wei Feng, Song Wang

Deraining is a significant and fundamental computer vision task, aiming to remove the rain streaks and accumulations in an image or video. Existing deraining methods usually make heuristic assumptions of the rain model, which compels them to employ complex optimization or iterative refinement for high recovery quality. However, this leads to time-consuming methods and affects the effectiveness of addressing rain patterns, deviating from the assumptions. This paper proposes a simple yet efficient deraining method by formulating deraining as a predictive filtering problem without complex rain model assumptions. Specifically, we identify spatially-variant predictive filtering (SPFilt) that adaptively predicts proper kernels via a deep network to filter different individual pixels. Since the filtering can be implemented via well-accelerated convolution, our method can be significantly efficient. We further propose the EfDeRain+ that contains three main contributions to address residual rain traces, multi-scale, and diverse rain patterns without harming efficiency. First, we propose the uncertainty-aware cascaded predictive filtering (UC-PFilt) that can identify the difficulties of reconstructing clean pixels via predicted kernels and remove the residual rain traces effectively. Second, we design the weight-sharing multi-scale dilated filtering (WS-MS-DFilt) to handle multi-scale rain streaks without harming the efficiency. Third, to eliminate the gap across diverse rain patterns, we propose a novel data augmentation method (i.e., RainMix) to train our deep models. By combining all contributions with sophisticated analysis on different variants, our final method outperforms baseline methods on six single-image deraining datasets and one video-deraining dataset in terms of both recovery quality and speed. In particular, EfDeRain+ can derain within about 6.3 ms on a (481times 321) image and is over 74 times faster than the top baseline method with even better recovery quality. We release code in https://github.com/tsingqguo/efficientderainplus.

去毛刺是一项重要而基本的计算机视觉任务,旨在去除图像或视频中的雨条纹和积雨。现有的去毛刺方法通常会对雨水模型做出启发式假设,这就迫使它们采用复杂的优化或迭代改进来获得较高的恢复质量。然而,这导致方法耗时,并影响了处理雨模式的效果,偏离了假设。本文提出了一种简单而高效的降雨预报方法,将降雨预报表述为一个预测性过滤问题,而无需复杂的降雨模型假设。具体来说,我们确定了空间变异预测过滤(SPFilt),通过深度网络自适应地预测适当的内核,以过滤不同的单个像素。由于可以通过加速卷积实现过滤,我们的方法可以显著提高效率。我们进一步提出了 EfDeRain+,它包含三个主要贡献,可在不影响效率的情况下解决残留雨迹、多尺度和多样化雨模式等问题。首先,我们提出了不确定性感知级联预测滤波(UC-PFilt),它能识别通过预测核重建干净像素的困难,并有效去除残留雨迹。其次,我们设计了分权多尺度扩张滤波(WS-MS-DFilt)来处理多尺度雨痕,而不会降低效率。第三,为了消除不同降雨模式之间的差距,我们提出了一种新颖的数据增强方法(即 RainMix)来训练我们的深度模型。通过将所有贡献与对不同变体的复杂分析相结合,我们的最终方法在六个单图像去污数据集和一个视频去污数据集上的恢复质量和速度均优于基准方法。特别是,EfDeRain+ 可以在大约 6.3 毫秒内对(481 次/321)图像进行去污,比最高基线方法快 74 倍以上,而且恢复质量更好。我们在 https://github.com/tsingqguo/efficientderainplus 中发布了代码。
{"title":"EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining","authors":"Qing Guo, Hua Qi, Jingyang Sun, Felix Juefei-Xu, Lei Ma, Di Lin, Wei Feng, Song Wang","doi":"10.1007/s11263-024-02281-7","DOIUrl":"https://doi.org/10.1007/s11263-024-02281-7","url":null,"abstract":"<p>Deraining is a significant and fundamental computer vision task, aiming to remove the rain streaks and accumulations in an image or video. Existing deraining methods usually make heuristic assumptions of the rain model, which compels them to employ complex optimization or iterative refinement for high recovery quality. However, this leads to time-consuming methods and affects the effectiveness of addressing rain patterns, deviating from the assumptions. This paper proposes a simple yet efficient deraining method by formulating deraining as a predictive filtering problem without complex rain model assumptions. Specifically, we identify spatially-variant predictive filtering (SPFilt) that adaptively predicts proper kernels via a deep network to filter different individual pixels. Since the filtering can be implemented via well-accelerated convolution, our method can be significantly efficient. We further propose the <i>EfDeRain+</i> that contains three main contributions to address residual rain traces, multi-scale, and diverse rain patterns without harming efficiency. <i>First</i>, we propose the uncertainty-aware cascaded predictive filtering (UC-PFilt) that can identify the difficulties of reconstructing clean pixels via predicted kernels and remove the residual rain traces effectively. <i>Second</i>, we design the weight-sharing multi-scale dilated filtering (WS-MS-DFilt) to handle multi-scale rain streaks without harming the efficiency. <i>Third</i>, to eliminate the gap across diverse rain patterns, we propose a novel data augmentation method (<i>i.e</i>., <i>RainMix</i>) to train our deep models. By combining all contributions with sophisticated analysis on different variants, our final method outperforms baseline methods on six single-image deraining datasets and one video-deraining dataset in terms of both recovery quality and speed. In particular, <i>EfDeRain+</i> can derain within about 6.3 ms on a <span>(481times 321)</span> image and is over 74 times faster than the top baseline method with even better recovery quality. We release code in https://github.com/tsingqguo/efficientderainplus.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142580522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes 基于少量注释像素和点云的驾驶场景弱监督语义分割
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-04 DOI: 10.1007/s11263-024-02275-5
Huimin Ma, Sheng Yi, Shijie Chen, Jiansheng Chen, Yu Wang

Previous weakly supervised semantic segmentation (WSSS) methods mainly begin with the segmentation seeds from the CAM method. Because of the high complexity of driving scene images, their framework performs not well on driving scene datasets. In this paper, we propose a new kind of WSSS annotations on the complex driving scene dataset, with only one or several labeled points per category. This annotation is more lightweight than image-level annotation and provides critical localization information for prototypes. We propose a framework to address the WSSS task under this annotation, which generates prototype feature vectors from labeled points and then produces 2D pseudo labels. Besides, we found the point cloud data is useful for distinguishing different objects. Our framework could extract rich semantic information from unlabeled point cloud data and generate instance masks, which does not require extra annotation resources. We combine the pseudo labels and the instance masks to modify the incorrect regions and thus obtain more accurate supervision for training the semantic segmentation network. We evaluated this framework on the KITTI dataset. Experiments show that the proposed method achieves state-of-the-art performance.

以往的弱监督语义分割(WSSS)方法主要从 CAM 方法的分割种子开始。由于驾驶场景图像的高复杂性,他们的框架在驾驶场景数据集上表现不佳。在本文中,我们针对复杂的驾驶场景数据集提出了一种新的 WSSS 注释,每个类别只有一个或几个标注点。这种注释比图像级注释更轻量级,能为原型提供关键的定位信息。我们提出了一个框架来解决这种标注下的 WSSS 任务,该框架可根据标注点生成原型特征向量,然后生成二维伪标签。此外,我们还发现点云数据有助于区分不同的物体。我们的框架可以从未标明的点云数据中提取丰富的语义信息并生成实例掩码,这不需要额外的标注资源。我们结合伪标签和实例掩码来修改不正确的区域,从而为训练语义分割网络获得更准确的监督。我们在 KITTI 数据集上对这一框架进行了评估。实验表明,所提出的方法达到了最先进的性能。
{"title":"Few Annotated Pixels and Point Cloud Based Weakly Supervised Semantic Segmentation of Driving Scenes","authors":"Huimin Ma, Sheng Yi, Shijie Chen, Jiansheng Chen, Yu Wang","doi":"10.1007/s11263-024-02275-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02275-5","url":null,"abstract":"<p>Previous weakly supervised semantic segmentation (WSSS) methods mainly begin with the segmentation seeds from the CAM method. Because of the high complexity of driving scene images, their framework performs not well on driving scene datasets. In this paper, we propose a new kind of WSSS annotations on the complex driving scene dataset, with only one or several labeled points per category. This annotation is more lightweight than image-level annotation and provides critical localization information for prototypes. We propose a framework to address the WSSS task under this annotation, which generates prototype feature vectors from labeled points and then produces 2D pseudo labels. Besides, we found the point cloud data is useful for distinguishing different objects. Our framework could extract rich semantic information from unlabeled point cloud data and generate instance masks, which does not require extra annotation resources. We combine the pseudo labels and the instance masks to modify the incorrect regions and thus obtain more accurate supervision for training the semantic segmentation network. We evaluated this framework on the KITTI dataset. Experiments show that the proposed method achieves state-of-the-art performance.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142580565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
APPTracker+: Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking APPTracker+:在低帧速率多目标跟踪中处理遮挡的位移不确定性
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-03 DOI: 10.1007/s11263-024-02237-x
Tao Zhou, Qi Ye, Wenhan Luo, Haizhou Ran, Zhiguo Shi, Jiming Chen

Multi-object tracking (MOT) in the scenario of low-frame-rate videos is a promising solution to better meet the computing, storage, and transmitting bandwidth resource constraints of edge devices. Tracking with a low frame rate poses particular challenges in the association stage as objects in two successive frames typically exhibit much quicker variations in locations, velocities, appearances, and visibilities than those in normal frame rates. In this paper, we observe severe performance degeneration of many existing association strategies caused by such variations. Though optical-flow-based methods like CenterTrack can handle the large displacement to some extent due to their large receptive field, the temporally local nature makes them fail to give reliable displacement estimations of objects that newly appear in the current frame (i.e., not visible in the previous frame). To overcome the local nature of optical-flow-based methods, we propose an online tracking method by extending the CenterTrack architecture with a new head, named APP, to recognize unreliable displacement estimations. Further, to capture the fine-grained and private unreliability of each displacement estimation, we extend the binary APP predictions to displacement uncertainties. To this end, we reformulate the displacement estimation task via Bayesian deep learning tools. With APP predictions, we propose to conduct association in a multi-stage manner where vision cues or historical motion cues are leveraged in the corresponding stage. By rethinking the commonly used bipartite matching algorithms, we equip the proposed multi-stage association policy with a hybrid matching strategy conditioned on displacement uncertainties. Our method shows robustness in preserving identities in low-frame-rate video sequences. Experimental results on public datasets in various low-frame-rate settings demonstrate the advantages of the proposed method.

低帧频视频中的多目标跟踪(MOT)是一种很有前景的解决方案,能更好地满足边缘设备在计算、存储和传输带宽资源方面的限制。由于连续两帧中的物体在位置、速度、外观和可见度上的变化通常比正常帧率下的物体要快得多,因此低帧率下的跟踪在关联阶段面临着特殊的挑战。在本文中,我们观察到许多现有的关联策略都因这种变化而导致性能严重下降。虽然基于光流的方法(如 CenterTrack)由于具有较大的感受野,可以在一定程度上处理较大的位移,但其时间局部性使其无法对当前帧中新出现的物体(即在前一帧中不可见的物体)进行可靠的位移估计。为了克服基于光流的方法的局部性,我们提出了一种在线跟踪方法,通过扩展 CenterTrack 架构,增加一个新的头部(名为 APP)来识别不可靠的位移估计。此外,为了捕捉每个位移估计的细粒度和私人不可靠度,我们将二进制 APP 预测扩展到位移不确定性。为此,我们通过贝叶斯深度学习工具重新制定了位移估计任务。通过 APP 预测,我们建议以多阶段方式进行关联,在相应阶段利用视觉线索或历史运动线索。通过重新思考常用的两端匹配算法,我们为所提出的多阶段关联策略配备了以位移不确定性为条件的混合匹配策略。我们的方法在低帧率视频序列中显示出保护身份的鲁棒性。在各种低帧率环境下的公共数据集上的实验结果证明了所提方法的优势。
{"title":"APPTracker+: Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking","authors":"Tao Zhou, Qi Ye, Wenhan Luo, Haizhou Ran, Zhiguo Shi, Jiming Chen","doi":"10.1007/s11263-024-02237-x","DOIUrl":"https://doi.org/10.1007/s11263-024-02237-x","url":null,"abstract":"<p>Multi-object tracking (MOT) in the scenario of low-frame-rate videos is a promising solution to better meet the computing, storage, and transmitting bandwidth resource constraints of edge devices. Tracking with a low frame rate poses particular challenges in the association stage as objects in two successive frames typically exhibit much quicker variations in locations, velocities, appearances, and visibilities than those in normal frame rates. In this paper, we observe severe performance degeneration of many existing association strategies caused by such variations. Though optical-flow-based methods like CenterTrack can handle the large displacement to some extent due to their large receptive field, the temporally local nature makes them fail to give reliable displacement estimations of objects that newly appear in the current frame (i.e., not visible in the previous frame). To overcome the local nature of optical-flow-based methods, we propose an online tracking method by extending the CenterTrack architecture with a new head, named APP, to recognize unreliable displacement estimations. Further, to capture the fine-grained and private unreliability of each displacement estimation, we extend the binary APP predictions to displacement uncertainties. To this end, we reformulate the displacement estimation task via Bayesian deep learning tools. With APP predictions, we propose to conduct association in a multi-stage manner where vision cues or historical motion cues are leveraged in the corresponding stage. By rethinking the commonly used bipartite matching algorithms, we equip the proposed multi-stage association policy with a hybrid matching strategy conditioned on displacement uncertainties. Our method shows robustness in preserving identities in low-frame-rate video sequences. Experimental results on public datasets in various low-frame-rate settings demonstrate the advantages of the proposed method.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142566097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anti-Fake Vaccine: Safeguarding Privacy Against Face Swapping via Visual-Semantic Dual Degradation 防伪疫苗:通过视觉语义双重降级保护隐私,防止人脸互换
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-01 DOI: 10.1007/s11263-024-02259-5
Jingzhi Li, Changjiang Luo, Hua Zhang, Yang Cao, Xin Liao, Xiaochun Cao

Deepfake techniques pose a significant threat to personal privacy and social security. To mitigate these risks, various defensive techniques have been introduced, including passive methods through fake detection and proactive methods through adding invisible perturbations. Recent proactive methods mainly focus on face manipulation but perform poorly against face swapping, as face swapping involves the more complex process of identity information transfer. To address this issue, we develop a novel privacy-preserving framework, named Anti-Fake Vaccine, to protect the facial images against the malicious face swapping. This new proactive technique dynamically fuses visual corruption and content misdirection, significantly enhancing protection performance. Specifically, we first formulate constraints from two distinct perspectives: visual quality and identity semantics. The visual perceptual constraint targets image quality degradation in the visual space, while the identity similarity constraint induces erroneous alterations in the semantic space. We then introduce a multi-objective optimization solution to effectively balance the allocation of adversarial perturbations generated according to these constraints. To further improving performance, we develop an additive perturbation strategy to discover the shared adversarial perturbations across diverse face swapping models. Extensive experiments on the CelebA-HQ and FFHQ datasets demonstrate that our method exhibits superior generalization capabilities across diverse face swapping models, including commercial ones.

深度伪造技术对个人隐私和社会安全构成重大威胁。为了降低这些风险,人们引入了各种防御技术,包括通过假冒检测的被动方法和通过添加隐形扰动的主动方法。最近的主动方法主要针对人脸操纵,但在对付人脸互换方面表现不佳,因为人脸互换涉及更复杂的身份信息传递过程。为了解决这个问题,我们开发了一种新颖的隐私保护框架,名为 "反假冒疫苗"(Anti-Fake Vaccine),以保护面部图像免受恶意换脸的侵害。这种新的主动技术动态地融合了视觉破坏和内容误导,大大提高了保护性能。具体来说,我们首先从视觉质量和身份语义两个不同的角度制定了约束条件。视觉感知约束针对的是视觉空间中的图像质量下降,而身份相似性约束诱发的是语义空间中的错误更改。然后,我们引入了一种多目标优化解决方案,以有效平衡根据这些约束产生的对抗性扰动的分配。为了进一步提高性能,我们开发了一种加法扰动策略,以发现不同换脸模型中共享的对抗性扰动。在 CelebA-HQ 和 FFHQ 数据集上进行的大量实验表明,我们的方法在不同的人脸互换模型(包括商业模型)中表现出卓越的泛化能力。
{"title":"Anti-Fake Vaccine: Safeguarding Privacy Against Face Swapping via Visual-Semantic Dual Degradation","authors":"Jingzhi Li, Changjiang Luo, Hua Zhang, Yang Cao, Xin Liao, Xiaochun Cao","doi":"10.1007/s11263-024-02259-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02259-5","url":null,"abstract":"<p>Deepfake techniques pose a significant threat to personal privacy and social security. To mitigate these risks, various defensive techniques have been introduced, including passive methods through fake detection and proactive methods through adding invisible perturbations. Recent proactive methods mainly focus on face manipulation but perform poorly against face swapping, as face swapping involves the more complex process of identity information transfer. To address this issue, we develop a novel privacy-preserving framework, named <i>Anti-Fake Vaccine</i>, to protect the facial images against the malicious face swapping. This new proactive technique dynamically fuses visual corruption and content misdirection, significantly enhancing protection performance. Specifically, we first formulate constraints from two distinct perspectives: visual quality and identity semantics. The visual perceptual constraint targets image quality degradation in the visual space, while the identity similarity constraint induces erroneous alterations in the semantic space. We then introduce a multi-objective optimization solution to effectively balance the allocation of adversarial perturbations generated according to these constraints. To further improving performance, we develop an additive perturbation strategy to discover the shared adversarial perturbations across diverse face swapping models. Extensive experiments on the CelebA-HQ and FFHQ datasets demonstrate that our method exhibits superior generalization capabilities across diverse face swapping models, including commercial ones.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142562160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving 3D Finger Traits Recognition via Generalizable Neural Rendering 通过通用神经渲染改进 3D 手指特征识别
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-30 DOI: 10.1007/s11263-024-02248-8
Hongbin Xu, Junduan Huang, Yuer Ma, Zifeng Li, Wenxiong Kang

3D biometric techniques on finger traits have become a new trend and have demonstrated a powerful ability for recognition and anti-counterfeiting. Existing methods follow an explicit 3D pipeline that reconstructs the models first and then extracts features from 3D models. However, these explicit 3D methods suffer from the following problems: 1) Inevitable information dropping during 3D reconstruction; 2) Tight coupling between specific hardware and algorithm for 3D reconstruction. It leads us to a question: Is it indispensable to reconstruct 3D information explicitly in recognition tasks? Hence, we consider this problem in an implicit manner, leaving the nerve-wracking 3D reconstruction problem for learnable neural networks with the help of neural radiance fields (NeRFs). We propose FingerNeRF, a novel generalizable NeRF for 3D finger biometrics. To handle the shape-radiance ambiguity problem that may result in incorrect 3D geometry, we aim to involve extra geometric priors based on the correspondence of binary finger traits like fingerprints or finger veins. First, we propose a novel Trait Guided Transformer (TGT) module to enhance the feature correspondence with the guidance of finger traits. Second, we involve extra geometric constraints on the volume rendering loss with the proposed Depth Distillation Loss and Trait Guided Rendering Loss. To evaluate the performance of the proposed method on different modalities, we collect two new datasets: SCUT-Finger-3D with finger images and SCUT-FingerVein-3D with finger vein images. Moreover, we also utilize the UNSW-3D dataset with fingerprint images for evaluation. In experiments, our FingerNeRF can achieve 4.37% EER on SCUT-Finger-3D dataset, 8.12% EER on SCUT-FingerVein-3D dataset, and 2.90% EER on UNSW-3D dataset, showing the superiority of the proposed implicit method in 3D finger biometrics.

针对手指特征的三维生物识别技术已成为一种新趋势,并显示出强大的识别和防伪能力。现有方法采用显式三维管道,首先重建模型,然后从三维模型中提取特征。然而,这些显式三维方法存在以下问题:1)三维重建过程中不可避免的信息丢失;2)特定硬件与三维重建算法之间的紧密耦合。这就引出了一个问题:在识别任务中,明确重建三维信息是否必不可少?因此,我们以隐含的方式考虑这个问题,将令人头疼的三维重建问题留给借助神经辐射场(NeRF)的可学习神经网络。我们提出的 FingerNeRF 是一种用于三维手指生物识别的新型通用 NeRF。为了处理可能导致三维几何形状不正确的形状-辐射模糊问题,我们旨在根据指纹或指静脉等二进制手指特征的对应关系,引入额外的几何先验。首先,我们提出了一个新颖的特征引导变换器(TGT)模块,以手指特征为导向增强特征对应性。其次,我们提出了深度蒸馏损耗和特质引导渲染损耗,对体积渲染损耗进行了额外的几何约束。为了评估所提出的方法在不同模态上的性能,我们收集了两个新的数据集:SCUT-Finger-3D 包含手指图像,SCUT-FingerVein-3D 包含手指静脉图像。此外,我们还利用 UNSW-3D 数据集对指纹图像进行了评估。在实验中,我们的 FingerNeRF 在 SCUT-Finger-3D 数据集上的误差率为 4.37%,在 SCUT-FingerVein-3D 数据集上的误差率为 8.12%,在 UNSW-3D 数据集上的误差率为 2.90%,显示了所提出的隐式方法在三维手指生物识别中的优越性。
{"title":"Improving 3D Finger Traits Recognition via Generalizable Neural Rendering","authors":"Hongbin Xu, Junduan Huang, Yuer Ma, Zifeng Li, Wenxiong Kang","doi":"10.1007/s11263-024-02248-8","DOIUrl":"https://doi.org/10.1007/s11263-024-02248-8","url":null,"abstract":"<p>3D biometric techniques on finger traits have become a new trend and have demonstrated a powerful ability for recognition and anti-counterfeiting. Existing methods follow an explicit 3D pipeline that reconstructs the models first and then extracts features from 3D models. However, these explicit 3D methods suffer from the following problems: 1) Inevitable information dropping during 3D reconstruction; 2) Tight coupling between specific hardware and algorithm for 3D reconstruction. It leads us to a question: Is it indispensable to reconstruct 3D information explicitly in recognition tasks? Hence, we consider this problem in an implicit manner, leaving the nerve-wracking 3D reconstruction problem for learnable neural networks with the help of neural radiance fields (NeRFs). We propose FingerNeRF, a novel generalizable NeRF for 3D finger biometrics. To handle the shape-radiance ambiguity problem that may result in incorrect 3D geometry, we aim to involve extra geometric priors based on the correspondence of binary finger traits like fingerprints or finger veins. First, we propose a novel Trait Guided Transformer (TGT) module to enhance the feature correspondence with the guidance of finger traits. Second, we involve extra geometric constraints on the volume rendering loss with the proposed Depth Distillation Loss and Trait Guided Rendering Loss. To evaluate the performance of the proposed method on different modalities, we collect two new datasets: SCUT-Finger-3D with finger images and SCUT-FingerVein-3D with finger vein images. Moreover, we also utilize the UNSW-3D dataset with fingerprint images for evaluation. In experiments, our FingerNeRF can achieve 4.37% EER on SCUT-Finger-3D dataset, 8.12% EER on SCUT-FingerVein-3D dataset, and 2.90% EER on UNSW-3D dataset, showing the superiority of the proposed implicit method in 3D finger biometrics.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142541703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces 未注册表面空间上的基点受限弹性形状分析
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-30 DOI: 10.1007/s11263-024-02269-3
Emmanuel Hartman, Emery Pierson, Martin Bauer, Mohamed Daoudi, Nicolas Charon

This paper introduces a new framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as reparametrizations. The specificity of our approach is to restrict the space of allowable transformations to predefined finite dimensional bases of deformation fields. These are estimated in a data-driven way so as to emulate specific types of surface transformations. This allows us to simplify the representation of the corresponding shape space to a finite dimensional latent space. However, in sharp contrast with methods involving e.g. mesh autoencoders, the latent space is equipped with a non-Euclidean Riemannian metric inherited from the family of elastic metrics. We demonstrate how this model can be effectively implemented to perform a variety of tasks on surface meshes which, importantly, does not assume these to be pre-registered or to even have a consistent mesh structure. We specifically validate our approach on human body shape and pose data as well as human face and hand scans for problems such as shape registration, interpolation, motion transfer or random pose generation.

本文从形状空间上的弹性黎曼度量的一般设置出发,介绍了一种新的曲面分析框架。传统上,这些度量定义在浸没曲面的无限维流形上,并满足特定的不变性属性,从而可以比较曲面在形状保持变换(如重拟三维变换)下的模态。我们方法的特殊性在于将允许变换的空间限制在预定义的变形场有限维基上。这些基础是以数据驱动的方式估算的,以便模拟特定类型的表面变换。这样,我们就能将相应形状空间的表示简化为有限维潜在空间。然而,与网格自动编码器等方法形成鲜明对比的是,潜空间配备了从弹性度量系列继承而来的非欧几里得黎曼度量。我们展示了如何有效地实施这一模型,以便在曲面网格上执行各种任务,重要的是,我们并不假定这些网格是预先注册的,甚至不假定它们具有一致的网格结构。我们特别在人体形状和姿势数据以及人脸和手部扫描上验证了我们的方法,以解决形状注册、插值、运动转移或随机姿势生成等问题。
{"title":"Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces","authors":"Emmanuel Hartman, Emery Pierson, Martin Bauer, Mohamed Daoudi, Nicolas Charon","doi":"10.1007/s11263-024-02269-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02269-3","url":null,"abstract":"<p>This paper introduces a new framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as reparametrizations. The specificity of our approach is to restrict the space of allowable transformations to predefined finite dimensional bases of deformation fields. These are estimated in a data-driven way so as to emulate specific types of surface transformations. This allows us to simplify the representation of the corresponding shape space to a finite dimensional latent space. However, in sharp contrast with methods involving e.g. mesh autoencoders, the latent space is equipped with a non-Euclidean Riemannian metric inherited from the family of elastic metrics. We demonstrate how this model can be effectively implemented to perform a variety of tasks on surface meshes which, importantly, does not assume these to be pre-registered or to even have a consistent mesh structure. We specifically validate our approach on human body shape and pose data as well as human face and hand scans for problems such as shape registration, interpolation, motion transfer or random pose generation.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection 用于弱监督在线活动检测的具有课程预测功能的记忆辅助知识转移框架
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-28 DOI: 10.1007/s11263-024-02279-1
Tianshan Liu, Kin-Man Lam, Bing-Kun Bao

As a crucial topic of high-level video understanding, weakly supervised online activity detection (WS-OAD) involves identifying the ongoing behaviors moment-to-moment in streaming videos, trained with solely cheap video-level annotations. It is essentially a challenging task, which requires addressing the entangled issues of the weakly supervised settings and online constraints. In this paper, we tackle the WS-OAD task from the knowledge-distillation (KD) perspective, which trains an online student detector to distill dual-level knowledge from a weakly supervised offline teacher model. To guarantee the completeness of knowledge transfer, we improve the vanilla KD framework from two aspects. First, we introduce an external memory bank to maintain the long-term activity prototypes, which serves as a bridge to align the activity semantics learned from the offline teacher and online student models. Second, to compensate the missing contexts of unseen near future, we leverage a curriculum learning paradigm to gradually train the online student detector to anticipate the future activity semantics. By dynamically scheduling the provided auxiliary future states, the online detector progressively distills contextual information from the offline model in an easy-to-hard course. Extensive experimental results on three public data sets demonstrate the superiority of our proposed method over the competing methods.

作为高级视频理解的一个重要课题,弱监督在线活动检测(WS-OAD)是指仅利用廉价的视频级注释进行训练,识别流媒体视频中每时每刻正在发生的行为。从本质上讲,这是一项具有挑战性的任务,需要解决弱监督设置和在线约束之间的纠缠不清的问题。在本文中,我们从知识提炼(KD)的角度来解决 WS-OAD 任务,即训练一个在线学生检测器,从弱监督离线教师模型中提炼出双层知识。为了保证知识转移的完整性,我们从两个方面改进了虚无的 KD 框架。首先,我们引入了一个外部记忆库来维护长期的活动原型,作为一座桥梁,将从离线教师模型和在线学生模型中学到的活动语义统一起来。其次,为了弥补近期未见语境的缺失,我们利用课程学习范式来逐步训练在线学生检测器,以预测未来的活动语义。通过动态调度所提供的辅助未来状态,在线检测器在由易到难的过程中逐步从离线模型中提炼出上下文信息。在三个公共数据集上的广泛实验结果表明,我们提出的方法优于其他竞争方法。
{"title":"A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection","authors":"Tianshan Liu, Kin-Man Lam, Bing-Kun Bao","doi":"10.1007/s11263-024-02279-1","DOIUrl":"https://doi.org/10.1007/s11263-024-02279-1","url":null,"abstract":"<p>As a crucial topic of high-level video understanding, weakly supervised online activity detection (WS-OAD) involves identifying the ongoing behaviors moment-to-moment in streaming videos, trained with solely cheap video-level annotations. It is essentially a challenging task, which requires addressing the entangled issues of the weakly supervised settings and online constraints. In this paper, we tackle the WS-OAD task from the knowledge-distillation (KD) perspective, which trains an online student detector to distill dual-level knowledge from a weakly supervised offline teacher model. To guarantee the completeness of knowledge transfer, we improve the vanilla KD framework from two aspects. First, we introduce an external memory bank to maintain the long-term activity prototypes, which serves as a bridge to align the activity semantics learned from the offline teacher and online student models. Second, to compensate the missing contexts of unseen near future, we leverage a curriculum learning paradigm to gradually train the online student detector to anticipate the future activity semantics. By dynamically scheduling the provided auxiliary future states, the online detector progressively distills contextual information from the offline model in an easy-to-hard course. Extensive experimental results on three public data sets demonstrate the superiority of our proposed method over the competing methods.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142536604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1