首页 > 最新文献

2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)最新文献

英文 中文
Oriented Splits Network to Distill Background for Vehicle Re-Identification 面向分割网络提取车辆再识别背景
A. Munir, N. Martinel, C. Micheloni
Vehicle re-identification (re-id) is a challenging task due to the presence of high intra-class and low inter-class variations in the visual data acquired from monitoring camera networks. Unique and discriminative feature representations are needed to overcome the existence of several variations including color, illumination, orientation, background and occlusion. The orientations of the vehicles in the images make the learned models unable to learn multiple parts of the vehicle and relationship between them. The combination of global and partial features is one of the solutions to improve the discriminative learning of deep learning models. Leveraging on such solutions, we propose an Oriented Splits Network (OSN) for an end to end learning of multiple features along with global features to form a strong descriptor for vehicle re-identification. To capture the orientation variability of the vehicles, the proposed network introduces a partition of the images into several oriented stripes to obtain local descriptors for each part/region. Such a scheme is therefore exploited by a camera based feature distillation (CBD) training strategy to remove the background features. These are filtered out from oriented vehicles representations which yield to a much stronger unique representation of the vehicles. We perform experiments on two benchmark vehicle re-id datasets to verify the performance of the proposed approach which show that the proposed solution achieves better result with respect to the state of the art with margin.
车辆再识别是一项具有挑战性的任务,因为从监控摄像头网络中获取的视觉数据存在高类别内和低类别间的变化。为了克服颜色、光照、方向、背景和遮挡等多种变化的存在,需要独特的、有区别的特征表示。车辆在图像中的方向使得学习模型无法学习车辆的多个部件及其之间的关系。将全局特征与局部特征相结合是改善深度学习模型判别学习的解决方案之一。利用这些解决方案,我们提出了一个面向分割网络(OSN),用于多个特征的端到端学习以及全局特征,以形成车辆重新识别的强描述符。为了捕获车辆的方向变化,该网络将图像划分为几个有方向的条纹,以获得每个部分/区域的局部描述符。因此,基于相机的特征蒸馏(CBD)训练策略利用这种方案来去除背景特征。这些都是从面向车辆的表示中过滤出来的,从而产生更强的车辆的独特表示。我们在两个基准车辆重新识别数据集上进行了实验,以验证所提出方法的性能,结果表明所提出的解决方案在具有裕度的情况下取得了更好的结果。
{"title":"Oriented Splits Network to Distill Background for Vehicle Re-Identification","authors":"A. Munir, N. Martinel, C. Micheloni","doi":"10.1109/AVSS52988.2021.9663832","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663832","url":null,"abstract":"Vehicle re-identification (re-id) is a challenging task due to the presence of high intra-class and low inter-class variations in the visual data acquired from monitoring camera networks. Unique and discriminative feature representations are needed to overcome the existence of several variations including color, illumination, orientation, background and occlusion. The orientations of the vehicles in the images make the learned models unable to learn multiple parts of the vehicle and relationship between them. The combination of global and partial features is one of the solutions to improve the discriminative learning of deep learning models. Leveraging on such solutions, we propose an Oriented Splits Network (OSN) for an end to end learning of multiple features along with global features to form a strong descriptor for vehicle re-identification. To capture the orientation variability of the vehicles, the proposed network introduces a partition of the images into several oriented stripes to obtain local descriptors for each part/region. Such a scheme is therefore exploited by a camera based feature distillation (CBD) training strategy to remove the background features. These are filtered out from oriented vehicles representations which yield to a much stronger unique representation of the vehicles. We perform experiments on two benchmark vehicle re-id datasets to verify the performance of the proposed approach which show that the proposed solution achieves better result with respect to the state of the art with margin.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125907407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation 基于知识蒸馏的变压器多模态关注到单模态关注
Dhruv Agarwal, Tanay Agrawal, Laura M. Ferrari, Franccois Bremond
Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3%, we reduced the number of parameters by 2.5 times and the inference time by 22%. Such performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded
由于交叉注意机制,多模态深度学习引起了人们的兴趣,变形器引发了新的方法。在这里,我们提出了一种方法来处理两个关键的现有挑战:高计算资源需求和缺失模式的问题。首次在变压器中引入知识蒸馏的概念,在推理时只使用一种模态。我们报告了一项完整的研究,分析了多个学生-教师配置,应用蒸馏的水平和不同的方法。通过最佳配置,我们将最先进的精度提高了3%,将参数数量减少了2.5倍,推理时间减少了22%。这种性能计算的权衡可以在许多应用程序中利用,我们的目标是打开一个新的研究领域,在有限资源的复杂模型的部署需求
{"title":"From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation","authors":"Dhruv Agarwal, Tanay Agrawal, Laura M. Ferrari, Franccois Bremond","doi":"10.1109/AVSS52988.2021.9663793","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663793","url":null,"abstract":"Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3%, we reduced the number of parameters by 2.5 times and the inference time by 22%. Such performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"28 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131860250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning Temporal 3D Human Pose Estimation with Pseudo-Labels 用伪标签学习时间三维人体姿态估计
Arij Bouazizi, U. Kressel, Vasileios Belagiannis
We present a simple, yet effective, approach for self-supervised 3D human pose estimation. Unlike the prior work, we explore the temporal information next to the multi-view self-supervision. During training, we rely on triangulating 2D body pose estimates of a multiple-view camera system. A temporal convolutional neural network is trained with the generated 3D ground-truth and the geometric multi-view consistency loss, imposing geometrical constraints on the predicted 3D body skeleton. During inference, our model receives a sequence of 2D body pose estimates from a single-view to predict the 3D body pose for each of them. An extensive evaluation shows that our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks. Our code and models are publicly available at https://github.com/vru2020/TM_HPE/.
我们提出了一种简单而有效的自监督3D人体姿态估计方法。与之前的工作不同,我们探索了多视图自我监督旁边的时间信息。在训练过程中,我们依赖于多视图相机系统的二维身体姿态估计三角测量。利用生成的三维地基真值和几何多视图一致性损失对时间卷积神经网络进行训练,对预测的三维人体骨架施加几何约束。在推理过程中,我们的模型从单视图中接收一系列2D身体姿势估计,以预测每个人的3D身体姿势。广泛的评估表明,我们的方法在Human3.6M和MPI-INF-3DHP基准测试中达到了最先进的性能。我们的代码和模型可以在https://github.com/vru2020/TM_HPE/上公开获得。
{"title":"Learning Temporal 3D Human Pose Estimation with Pseudo-Labels","authors":"Arij Bouazizi, U. Kressel, Vasileios Belagiannis","doi":"10.1109/AVSS52988.2021.9663755","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663755","url":null,"abstract":"We present a simple, yet effective, approach for self-supervised 3D human pose estimation. Unlike the prior work, we explore the temporal information next to the multi-view self-supervision. During training, we rely on triangulating 2D body pose estimates of a multiple-view camera system. A temporal convolutional neural network is trained with the generated 3D ground-truth and the geometric multi-view consistency loss, imposing geometrical constraints on the predicted 3D body skeleton. During inference, our model receives a sequence of 2D body pose estimates from a single-view to predict the 3D body pose for each of them. An extensive evaluation shows that our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks. Our code and models are publicly available at https://github.com/vru2020/TM_HPE/.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122130933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation FLAME:面部地标热图激活的多模态凝视估计
Neelabh Sinha, Michal Balazia, F. Brémond
3D gaze estimation is about predicting the line of sight of a person in 3D space. Person-independent models for the same lack precision due to anatomical differences of subjects, whereas person-specific calibrated techniques add strict constraints on scalability. To overcome these issues, we propose a novel technique, Facial Landmark Heatmap Activated Multimodal Gaze Estimation (FLAME), as a way of combining eye anatomical information using eye land-mark heatmaps to obtain precise gaze estimation without any person-specific calibration. Our evaluation demonstrates a competitive performance of about 10% improvement on benchmark datasets ColumbiaGaze and EYEDIAP. We also conduct an ablation study to validate our method.
3D凝视估计是关于预测一个人在3D空间中的视线。由于受试者解剖结构的差异,独立于个人的模型缺乏精度,而针对个人的校准技术对可扩展性增加了严格的限制。为了克服这些问题,我们提出了一种新的技术,面部地标热图激活多模态凝视估计(FLAME),作为一种结合眼睛解剖信息的方法,使用眼睛地标热图来获得精确的凝视估计,而无需任何个人特定的校准。我们的评估表明,在基准数据集ColumbiaGaze和EYEDIAP上,其竞争性能提高了约10%。我们还进行了消融研究来验证我们的方法。
{"title":"FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation","authors":"Neelabh Sinha, Michal Balazia, F. Brémond","doi":"10.1109/AVSS52988.2021.9663816","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663816","url":null,"abstract":"3D gaze estimation is about predicting the line of sight of a person in 3D space. Person-independent models for the same lack precision due to anatomical differences of subjects, whereas person-specific calibrated techniques add strict constraints on scalability. To overcome these issues, we propose a novel technique, Facial Landmark Heatmap Activated Multimodal Gaze Estimation (FLAME), as a way of combining eye anatomical information using eye land-mark heatmaps to obtain precise gaze estimation without any person-specific calibration. Our evaluation demonstrates a competitive performance of about 10% improvement on benchmark datasets ColumbiaGaze and EYEDIAP. We also conduct an ablation study to validate our method.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116319537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
ZSpeedL - Evaluating the Performance of Zero-Shot Learning Methods using Low-Power Devices 使用低功耗器件评估零射击学习方法的性能
Cristiano Patr'icio, J. Neves
The recognition of unseen objects from a semantic representation or textual description, usually denoted as zero-shot learning, is more prone to be used in real-world scenarios when compared to traditional object recognition. Nevertheless, no work has evaluated the feasibility of deploying zero-shot learning approaches in these scenarios, particularly when using low-power devices. In this paper, we provide the first benchmark on the inference time of zero-shot learning, comprising an evaluation of state-of-the-art approaches regarding their speed/accuracy trade-off. An analysis to the processing time of the different phases of the ZSL inference stage reveals that visual feature extraction is the major bottleneck in this paradigm, but, we show that lightweight networks can dramatically reduce the overall inference time without reducing the accuracy obtained by the de facto ResNet101 architecture. Also, this benchmark evaluates how different ZSL approaches perform in low-power devices, and how the visual feature extraction phase could be optimized in this hardware. To foster the research and deployment of ZSL systems capable of operating in real-world scenarios, we release the evaluation framework used in this benchmark(https://github.com/CristianoPatricio/zsl-methods).
从语义表示或文本描述中识别看不见的物体,通常被称为零射击学习,与传统的物体识别相比,更倾向于在现实场景中使用。然而,没有工作评估在这些情况下部署零射击学习方法的可行性,特别是在使用低功耗设备时。在本文中,我们提供了关于零射击学习的推理时间的第一个基准,包括对最先进的方法的速度/精度权衡的评估。对ZSL推理阶段不同阶段的处理时间的分析表明,视觉特征提取是该范式的主要瓶颈,但是,我们表明轻量级网络可以在不降低实际ResNet101架构获得的精度的情况下显着减少总体推理时间。此外,该基准测试还评估了不同的ZSL方法在低功耗设备中的性能,以及如何在该硬件中优化视觉特征提取阶段。为了促进能够在真实场景中运行的ZSL系统的研究和部署,我们发布了这个基准测试中使用的评估框架(https://github.com/CristianoPatricio/zsl-methods)。
{"title":"ZSpeedL - Evaluating the Performance of Zero-Shot Learning Methods using Low-Power Devices","authors":"Cristiano Patr'icio, J. Neves","doi":"10.1109/AVSS52988.2021.9663762","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663762","url":null,"abstract":"The recognition of unseen objects from a semantic representation or textual description, usually denoted as zero-shot learning, is more prone to be used in real-world scenarios when compared to traditional object recognition. Nevertheless, no work has evaluated the feasibility of deploying zero-shot learning approaches in these scenarios, particularly when using low-power devices. In this paper, we provide the first benchmark on the inference time of zero-shot learning, comprising an evaluation of state-of-the-art approaches regarding their speed/accuracy trade-off. An analysis to the processing time of the different phases of the ZSL inference stage reveals that visual feature extraction is the major bottleneck in this paradigm, but, we show that lightweight networks can dramatically reduce the overall inference time without reducing the accuracy obtained by the de facto ResNet101 architecture. Also, this benchmark evaluates how different ZSL approaches perform in low-power devices, and how the visual feature extraction phase could be optimized in this hardware. To foster the research and deployment of ZSL systems capable of operating in real-world scenarios, we release the evaluation framework used in this benchmark(https://github.com/CristianoPatricio/zsl-methods).","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117263157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CPNet: Cross-Parallel Network for Efficient Anomaly Detection 高效异常检测的交叉并行网络
Youngsaeng Jin, Jonghwan Hong, D. Han, Hanseok Ko
Anomaly detection in video streams is a challenging problem because of the scarcity of abnormal events and the difficulty of accurately annotating them. To alleviate these issues, unsupervised learning-based prediction methods have been previously applied. These approaches train the model with only normal events and predict a future frame from a sequence of preceding frames by use of encoder-decoder architectures so that they result in small prediction errors on normal events but large errors on abnormal events. The architecture, however, comes with the computational burden as some anomaly detection tasks require low computational cost without sacrificing performance. In this paper, Cross-Parallel Network (CPNet) for efficient anomaly detection is proposed here to minimize computations without performance drops. It consists of N smaller parallel U-Net, each of which is designed to handle a single input frame, to make the calculations significantly more efficient. Additionally, an inter-network shift module is incorporated to capture temporal relationships among sequential frames to enable more accurate future predictions. The quantitative results show that our model requires less computational cost than the baseline U-Net while delivering equivalent performance in anomaly detection.
视频流异常检测是一个具有挑战性的问题,因为异常事件的稀缺性和准确标注的难度。为了缓解这些问题,以前已经应用了基于无监督学习的预测方法。这些方法只使用正常事件训练模型,并通过使用编码器-解码器架构从之前的帧序列中预测未来的帧,因此它们对正常事件的预测误差很小,但对异常事件的预测误差很大。然而,该体系结构带来了计算负担,因为一些异常检测任务需要低计算成本而不牺牲性能。本文提出了一种基于交叉并行网络(CPNet)的高效异常检测方法,在不降低性能的前提下减少计算量。它由N个较小的并行U-Net组成,每个U-Net设计用于处理单个输入帧,以使计算显着提高效率。此外,还集成了网络间移位模块,以捕获序列帧之间的时间关系,从而实现更准确的未来预测。定量结果表明,我们的模型比基线U-Net所需的计算成本更低,同时在异常检测方面具有同等的性能。
{"title":"CPNet: Cross-Parallel Network for Efficient Anomaly Detection","authors":"Youngsaeng Jin, Jonghwan Hong, D. Han, Hanseok Ko","doi":"10.1109/AVSS52988.2021.9663798","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663798","url":null,"abstract":"Anomaly detection in video streams is a challenging problem because of the scarcity of abnormal events and the difficulty of accurately annotating them. To alleviate these issues, unsupervised learning-based prediction methods have been previously applied. These approaches train the model with only normal events and predict a future frame from a sequence of preceding frames by use of encoder-decoder architectures so that they result in small prediction errors on normal events but large errors on abnormal events. The architecture, however, comes with the computational burden as some anomaly detection tasks require low computational cost without sacrificing performance. In this paper, Cross-Parallel Network (CPNet) for efficient anomaly detection is proposed here to minimize computations without performance drops. It consists of N smaller parallel U-Net, each of which is designed to handle a single input frame, to make the calculations significantly more efficient. Additionally, an inter-network shift module is incorporated to capture temporal relationships among sequential frames to enable more accurate future predictions. The quantitative results show that our model requires less computational cost than the baseline U-Net while delivering equivalent performance in anomaly detection.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130991693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fine-grained anomaly detection via multi-task self-supervision 通过多任务自我监督进行细粒度异常检测
Loic Jezequel, Ngoc-Son Vu, Jean Beaudet, A. Histace
Detecting anomalies using deep learning has become a major challenge over the last years, and is becoming increasingly promising in several fields. The introduction of self-supervised learning has greatly helped many methods including anomaly detection where simple geometric transformation recognition tasks are used. However these methods do not perform well on fine-grained problems since they lack finer features. By combining both high-scale shape features and low-scale fine features in a multi-task framework, our method greatly improves fine-grained anomaly detection. It outperforms state-of-the-art with up to 31% relative error reduction measured with AUROC on various anomaly detection problems including one-vs-all, out-of-distribution detection and face presentation attack detection.
在过去的几年里,利用深度学习检测异常已经成为一个主要的挑战,并且在几个领域变得越来越有前途。自监督学习的引入极大地帮助了许多方法,包括使用简单几何变换识别任务的异常检测。然而,这些方法在细粒度问题上表现不佳,因为它们缺乏更精细的特征。该方法通过在多任务框架中结合高尺度形状特征和低尺度精细特征,大大提高了细粒度异常检测的效率。在各种异常检测问题上,包括一对一、分布外检测和面部呈现攻击检测,它的相对误差降低了31%,超过了最先进的技术。
{"title":"Fine-grained anomaly detection via multi-task self-supervision","authors":"Loic Jezequel, Ngoc-Son Vu, Jean Beaudet, A. Histace","doi":"10.1109/AVSS52988.2021.9663783","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663783","url":null,"abstract":"Detecting anomalies using deep learning has become a major challenge over the last years, and is becoming increasingly promising in several fields. The introduction of self-supervised learning has greatly helped many methods including anomaly detection where simple geometric transformation recognition tasks are used. However these methods do not perform well on fine-grained problems since they lack finer features. By combining both high-scale shape features and low-scale fine features in a multi-task framework, our method greatly improves fine-grained anomaly detection. It outperforms state-of-the-art with up to 31% relative error reduction measured with AUROC on various anomaly detection problems including one-vs-all, out-of-distribution detection and face presentation attack detection.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121821770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
MultAV: Multiplicative Adversarial Videos MultAV:乘法对抗性视频
Shao-Yuan Lo, Vishal M. Patel
The majority of adversarial machine learning research focuses on additive attacks, which add adversarial perturbation to input data. On the other hand, unlike image recognition problems, only a handful of attack approaches have been explored in the video domain. In this paper, we propose a novel attack method against video recognition models, Multiplicative Adversarial Videos (MultAV), which imposes perturbation on video data by multiplication. MultAV has different noise distributions to the additive counterparts and thus challenges the defense methods tailored to resisting additive adversarial attacks. Moreover, it can be generalized to not only $ell_{p}$-norm attacks with a new adversary constraint called ratio bound, but also different types of physically realizable attacks. Experimental results show that the model adversarially trained against additive attack is less robust to MultAV.
大多数对抗性机器学习研究集中在加性攻击上,即在输入数据中添加对抗性扰动。另一方面,与图像识别问题不同,只有少数攻击方法在视频领域得到了探索。在本文中,我们提出了一种针对视频识别模型的新攻击方法——乘法对抗视频(MultAV),该方法通过乘法对视频数据施加扰动。与加性对应物相比,MultAV具有不同的噪声分布,因此对针对抵抗加性对抗性攻击的防御方法提出了挑战。此外,它不仅可以推广到具有新的对手约束(称为比率界)的$ell_{p}$范数攻击,还可以推广到不同类型的物理可实现攻击。实验结果表明,针对加性攻击进行对抗训练的模型对MultAV的鲁棒性较差。
{"title":"MultAV: Multiplicative Adversarial Videos","authors":"Shao-Yuan Lo, Vishal M. Patel","doi":"10.1109/AVSS52988.2021.9663769","DOIUrl":"https://doi.org/10.1109/AVSS52988.2021.9663769","url":null,"abstract":"The majority of adversarial machine learning research focuses on additive attacks, which add adversarial perturbation to input data. On the other hand, unlike image recognition problems, only a handful of attack approaches have been explored in the video domain. In this paper, we propose a novel attack method against video recognition models, Multiplicative Adversarial Videos (MultAV), which imposes perturbation on video data by multiplication. MultAV has different noise distributions to the additive counterparts and thus challenges the defense methods tailored to resisting additive adversarial attacks. Moreover, it can be generalized to not only $ell_{p}$-norm attacks with a new adversary constraint called ratio bound, but also different types of physically realizable attacks. Experimental results show that the model adversarially trained against additive attack is less robust to MultAV.","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129785804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1