首页 > 最新文献

2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)最新文献

英文 中文
Facial Expression Neutralization With StoicNet 面部表情中和StoicNet
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00026
W. Carver, Ifeoma Nwogu
Expression neutralization is the process of synthetically altering an image of a face so as to remove any facial expression from it without changing the face’s identity. Facial expression neutralization could have a variety of applications, particularly in the realms of facial recognition, in action unit analysis, or even improving the quality of identification pictures for various types of documents. Our proposed model, StoicNet, combines the robust encoding capacity of variational autoencoders, the generative power of generative adversarial networks, and the enhancing capabilities of super resolution networks with a learned encoding transformation to achieve compelling expression neutralization, while preserving the identity of the input face. Objective experiments demonstrate that StoicNet successfully generates realistic, identity-preserved faces with neutral expressions, regardless of the emotion or expression intensity of the input face.
表情中和是指在不改变人脸身份的情况下,对人脸图像进行综合改变,去除任何面部表情的过程。面部表情中和可以有各种各样的应用,特别是在面部识别领域,在行动单元分析,甚至提高各种类型文档的识别图片的质量。我们提出的StoicNet模型结合了变分自编码器的鲁棒编码能力、生成对抗网络的生成能力以及超分辨率网络的增强能力,通过学习编码转换实现令人信服的表情中立,同时保持输入人脸的身份。客观实验表明,无论输入人脸的情绪或表情强度如何,StoicNet都能成功生成具有中性表情的真实、身份保留的人脸。
{"title":"Facial Expression Neutralization With StoicNet","authors":"W. Carver, Ifeoma Nwogu","doi":"10.1109/WACVW52041.2021.00026","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00026","url":null,"abstract":"Expression neutralization is the process of synthetically altering an image of a face so as to remove any facial expression from it without changing the face’s identity. Facial expression neutralization could have a variety of applications, particularly in the realms of facial recognition, in action unit analysis, or even improving the quality of identification pictures for various types of documents. Our proposed model, StoicNet, combines the robust encoding capacity of variational autoencoders, the generative power of generative adversarial networks, and the enhancing capabilities of super resolution networks with a learned encoding transformation to achieve compelling expression neutralization, while preserving the identity of the input face. Objective experiments demonstrate that StoicNet successfully generates realistic, identity-preserved faces with neutral expressions, regardless of the emotion or expression intensity of the input face.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116977076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Semantic Information to Improve Generalization of Reinforcement Learning Policies for Autonomous Driving 利用语义信息改进自动驾驶强化学习策略的泛化
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00020
Florence Carton, David Filliat, Jaonary Rabarisoa, Q. Pham
The problem of generalization of reinforcement learning policies to new environments is seldom addressed but essential in practical applications. We focus on this problem in an autonomous driving context using the CARLA simulator and first show that semantic information is the key to a good generalization for this task. We then explore and compare different ways to exploit semantic information at training time in order to improve generalization in an unseen environment without fine-tuning, showing that using semantic segmentation as an auxiliary task is the most efficient approach.
强化学习策略在新环境中的泛化问题很少得到解决,但在实际应用中却是必不可少的。我们使用CARLA模拟器在自动驾驶环境中关注这个问题,并首先表明语义信息是该任务良好泛化的关键。然后,我们探索和比较了在训练时利用语义信息的不同方法,以便在不需要微调的情况下提高在未知环境中的泛化,表明使用语义分割作为辅助任务是最有效的方法。
{"title":"Using Semantic Information to Improve Generalization of Reinforcement Learning Policies for Autonomous Driving","authors":"Florence Carton, David Filliat, Jaonary Rabarisoa, Q. Pham","doi":"10.1109/WACVW52041.2021.00020","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00020","url":null,"abstract":"The problem of generalization of reinforcement learning policies to new environments is seldom addressed but essential in practical applications. We focus on this problem in an autonomous driving context using the CARLA simulator and first show that semantic information is the key to a good generalization for this task. We then explore and compare different ways to exploit semantic information at training time in order to improve generalization in an unseen environment without fine-tuning, showing that using semantic segmentation as an auxiliary task is the most efficient approach.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134427620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reliability of GAN Generated Data to Train and Validate Perception Systems for Autonomous Vehicles GAN生成数据在自动驾驶汽车感知系统训练和验证中的可靠性
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00023
Weihuang Xu, Nasim Souly, P. Brahma
Autonomous systems deployed in the real world have to deal with potential problem causing situations that they have never seen during their training phases. Due to the long-tail nature of events, collecting a large amount of data for such corner cases is a difficult task. While simulation is one plausible solution, recent developments in the field of Generative Adversarial Networks (GANs) make them a promising tool to generate and augment realistic data without exhibiting a domain shift from actual real data. In this manuscript, we empirically analyze and propose novel solutions for the trust that we can place on GAN generated data for training and validation of vision-based perception modules like object detection and scenario classification.
在现实世界中部署的自主系统必须处理在训练阶段从未见过的潜在问题。由于事件的长尾特性,为此类极端情况收集大量数据是一项艰巨的任务。虽然模拟是一种可行的解决方案,但生成对抗网络(gan)领域的最新发展使它们成为一种有前途的工具,可以生成和增强现实数据,而不会显示出与实际真实数据的域转移。在本文中,我们对GAN生成的数据的信任进行了实证分析并提出了新的解决方案,用于训练和验证基于视觉的感知模块,如对象检测和场景分类。
{"title":"Reliability of GAN Generated Data to Train and Validate Perception Systems for Autonomous Vehicles","authors":"Weihuang Xu, Nasim Souly, P. Brahma","doi":"10.1109/WACVW52041.2021.00023","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00023","url":null,"abstract":"Autonomous systems deployed in the real world have to deal with potential problem causing situations that they have never seen during their training phases. Due to the long-tail nature of events, collecting a large amount of data for such corner cases is a difficult task. While simulation is one plausible solution, recent developments in the field of Generative Adversarial Networks (GANs) make them a promising tool to generate and augment realistic data without exhibiting a domain shift from actual real data. In this manuscript, we empirically analyze and propose novel solutions for the trust that we can place on GAN generated data for training and validation of vision-based perception modules like object detection and scenario classification.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129731050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
DriveGuard: Robustification of Automated Driving Systems with Deep Spatio-Temporal Convolutional Autoencoder 基于深度时空卷积自编码器的自动驾驶系统鲁棒化
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00016
A. Papachristodoulou, C. Kyrkou, T. Theocharides
Autonomous vehicles increasingly rely on cameras to provide the input for perception and scene understanding and the ability of these models to classify their environment and objects, under adverse conditions and image noise is crucial. When the input is, either unintentionally or through targeted attacks, deteriorated, the reliability of autonomous vehicle is compromised. In order to mitigate such phenomena, we propose DriveGuard, a lightweight spatio-temporal autoencoder, as a solution to robustify the image segmentation process for autonomous vehicles. By first processing camera images with DriveGuard, we offer a more universal solution than having to re-train each perception model with noisy input. We explore the space of different autoencoder architectures and evaluate them on a diverse dataset created with real and synthetic images demonstrating that by exploiting spatio-temporal information combined with multi-component loss we significantly increase robustness against adverse image effects reaching within 5-6% of that of the original model on clean images.
自动驾驶汽车越来越依赖摄像头来提供感知和场景理解的输入,这些模型在不利条件和图像噪声下对环境和物体进行分类的能力至关重要。当输入被无意或有针对性的攻击破坏时,自动驾驶汽车的可靠性就会受到影响。为了缓解这种现象,我们提出了一种轻量级的时空自动编码器DriveGuard,作为自动驾驶汽车图像分割过程的鲁棒解决方案。通过首先用DriveGuard处理相机图像,我们提供了一个更通用的解决方案,而不是必须用噪声输入重新训练每个感知模型。我们探索了不同自编码器架构的空间,并在由真实图像和合成图像创建的不同数据集上对它们进行了评估,结果表明,通过利用时空信息结合多分量损失,我们显著提高了对不利图像影响的鲁棒性,在干净图像上达到原始模型的5-6%。
{"title":"DriveGuard: Robustification of Automated Driving Systems with Deep Spatio-Temporal Convolutional Autoencoder","authors":"A. Papachristodoulou, C. Kyrkou, T. Theocharides","doi":"10.1109/WACVW52041.2021.00016","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00016","url":null,"abstract":"Autonomous vehicles increasingly rely on cameras to provide the input for perception and scene understanding and the ability of these models to classify their environment and objects, under adverse conditions and image noise is crucial. When the input is, either unintentionally or through targeted attacks, deteriorated, the reliability of autonomous vehicle is compromised. In order to mitigate such phenomena, we propose DriveGuard, a lightweight spatio-temporal autoencoder, as a solution to robustify the image segmentation process for autonomous vehicles. By first processing camera images with DriveGuard, we offer a more universal solution than having to re-train each perception model with noisy input. We explore the space of different autoencoder architectures and evaluate them on a diverse dataset created with real and synthetic images demonstrating that by exploiting spatio-temporal information combined with multi-component loss we significantly increase robustness against adverse image effects reaching within 5-6% of that of the original model on clean images.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134394796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Explainable Fingerprint ROI Segmentation Using Monte Carlo Dropout 可解释的指纹ROI分割使用蒙特卡罗Dropout
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00011
Indu Joshi, R. Kothari, Ayush Utkarsh, V. Kurmi, A. Dantcheva, Sumantra Dutta Roy, P. Kalra
A fingerprint Region of Interest (ROI) segmentation module is one of the most crucial components in the fingerprint pre-processing pipeline. It separates the foreground finger-print and background region due to which feature extraction and matching is restricted to ROI instead of entire finger-print image. However, state-of-the-art segmentation algorithms act like a black box and do not indicate model confidence. In this direction, we propose an explainable finger-print ROI segmentation model which indicates the pixels on which the model is uncertain. Towards this, we benchmark four state-of-the-art models for semantic segmentation on fingerprint ROI segmentation. Furthermore, we demonstrate the effectiveness of model uncertainty as an attention mechanism to improve the segmentation performance of the best performing model. Experiments on publicly available Fingerprint Verification Challenge (FVC) databases show-case the effectiveness of the proposed model.
指纹感兴趣区域分割模块是指纹预处理流程中最关键的组成部分之一。该方法将前景指纹和背景区域分离开来,使得特征提取和匹配局限于感兴趣区域,而不是整个指纹图像。然而,最先进的分割算法就像一个黑匣子,并不表明模型的置信度。在这个方向上,我们提出了一个可解释的指纹ROI分割模型,该模型表明模型不确定的像素。为此,我们在指纹ROI分割上对四种最先进的语义分割模型进行了基准测试。此外,我们证明了模型不确定性作为一种关注机制的有效性,可以提高表现最好的模型的分割性能。在公开可用的指纹验证挑战(FVC)数据库上的实验证明了该模型的有效性。
{"title":"Explainable Fingerprint ROI Segmentation Using Monte Carlo Dropout","authors":"Indu Joshi, R. Kothari, Ayush Utkarsh, V. Kurmi, A. Dantcheva, Sumantra Dutta Roy, P. Kalra","doi":"10.1109/WACVW52041.2021.00011","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00011","url":null,"abstract":"A fingerprint Region of Interest (ROI) segmentation module is one of the most crucial components in the fingerprint pre-processing pipeline. It separates the foreground finger-print and background region due to which feature extraction and matching is restricted to ROI instead of entire finger-print image. However, state-of-the-art segmentation algorithms act like a black box and do not indicate model confidence. In this direction, we propose an explainable finger-print ROI segmentation model which indicates the pixels on which the model is uncertain. Towards this, we benchmark four state-of-the-art models for semantic segmentation on fingerprint ROI segmentation. Furthermore, we demonstrate the effectiveness of model uncertainty as an attention mechanism to improve the segmentation performance of the best performing model. Experiments on publicly available Fingerprint Verification Challenge (FVC) databases show-case the effectiveness of the proposed model.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125167139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
2020 Sequestered Data Evaluation for Known Activities in Extended Video: Summary and Results 扩展视频中已知活动的隔离数据评估:总结和结果
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00010
A. Godil, Yooyoung Lee, J. Fiscus, Andrew Delgado, Eliot Godard, Baptiste Chocot, Lukas L. Diduch, Jim Golden, Jesse Zhang
This paper presents a summary and results for the ActEV’20 SDL (Activities in Extended Video Sequestered Data Leaderboard) challenge that was held under the CVPR’20 ActivityNet workshop [38]. The primary goal of the challenge was to provide an impetus for advancing research and capabilities in the field of human activity detection in untrimmed multi-camera videos. Advancements in activity detection will help with a wide range of public safety applications. The challenge was administered by the National Institute of Standards and Technology (NIST), where anyone could submit their system which run on sequestered data with the resulting score posted to a public leaderboard. Ten teams submitted their systems for the ActEV’20 SDL competition on the Multiview Extended Video with Activities (MEVA) test set with 37 target activities. The performance metric for the leaderboard ranking is the partial, normalized Area Under the Detection Error Tradeoff (DET) curve (nAUDC). The top rank on activity detection was by UCF at 37%, followed by CMU at 39% and OPPO at 41%.
本文介绍了在CVPR ' 20 ActivityNet研讨会下举行的ActEV ' 20 SDL(扩展视频隔离数据排行榜中的活动)挑战的总结和结果[38]。挑战赛的主要目标是推动在未经修剪的多摄像头视频中检测人类活动领域的研究和能力。活动检测的进步将有助于广泛的公共安全应用。这项挑战由美国国家标准与技术研究所(NIST)管理,任何人都可以提交他们的系统,该系统运行在隔离的数据上,并将结果分数发布到公共排行榜上。有10个团队提交了他们的系统,参加ActEV ' 20 SDL竞赛,参加包含37个目标活动的多视图扩展视频(MEVA)测试集。排行榜排名的性能指标是检测错误权衡(DET)曲线下的部分标准化区域(nAUDC)。活动检测排名第一的是UCF(37%),其次是CMU(39%)和OPPO(41%)。
{"title":"2020 Sequestered Data Evaluation for Known Activities in Extended Video: Summary and Results","authors":"A. Godil, Yooyoung Lee, J. Fiscus, Andrew Delgado, Eliot Godard, Baptiste Chocot, Lukas L. Diduch, Jim Golden, Jesse Zhang","doi":"10.1109/WACVW52041.2021.00010","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00010","url":null,"abstract":"This paper presents a summary and results for the ActEV’20 SDL (Activities in Extended Video Sequestered Data Leaderboard) challenge that was held under the CVPR’20 ActivityNet workshop [38]. The primary goal of the challenge was to provide an impetus for advancing research and capabilities in the field of human activity detection in untrimmed multi-camera videos. Advancements in activity detection will help with a wide range of public safety applications. The challenge was administered by the National Institute of Standards and Technology (NIST), where anyone could submit their system which run on sequestered data with the resulting score posted to a public leaderboard. Ten teams submitted their systems for the ActEV’20 SDL competition on the Multiview Extended Video with Activities (MEVA) test set with 37 target activities. The performance metric for the leaderboard ranking is the partial, normalized Area Under the Detection Error Tradeoff (DET) curve (nAUDC). The top rank on activity detection was by UCF at 37%, followed by CMU at 39% and OPPO at 41%.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"7 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130850815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Geeks and guests: Estimating player’s level of experience from board game behaviors 极客和客人:从桌游行为中评估玩家的体验水平
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00007
Feyisayo Olalere, Metehan Doyran, R. Poppe, A. A. Salah
Board games have become promising tools for observing and studying social behaviors in multi-person settings. While traditional methods such as self-report questionnaires are used to analyze game-induced behaviors, there is a growing need to automate such analyses. In this paper, we focus on estimating the levels of board game experience by analyzing a player’s confidence and anxiety from visual cues. We use a board game setting to induce relevant interactions, and investigate facial expressions during critical game events. For our analysis, we annotated the critical game events in a multiplayer cooperative board game, using the publicly available MUMBAI board game corpus. Using off-the-shelf tools, we encoded facial behavior in dyadic interactions and built classifiers to predict each player’s level of experience. Our results show that considering the experience level of both parties involved in the interaction simultaneously improves the prediction results.
桌游已经成为观察和研究多人环境下社会行为的有效工具。虽然自我报告问卷等传统方法被用于分析游戏诱发行为,但越来越多的人需要自动化这种分析。在本文中,我们主要通过分析玩家的自信和视觉线索来评估桌面游戏体验水平。我们使用桌面游戏设置来诱导相关互动,并在关键游戏事件中研究面部表情。在我们的分析中,我们使用公开的孟买桌游语料库注释了多人合作桌游中的关键游戏事件。使用现成的工具,我们在二元交互中编码面部行为,并构建分类器来预测每个玩家的体验水平。我们的研究结果表明,同时考虑交互双方的经验水平可以改善预测结果。
{"title":"Geeks and guests: Estimating player’s level of experience from board game behaviors","authors":"Feyisayo Olalere, Metehan Doyran, R. Poppe, A. A. Salah","doi":"10.1109/WACVW52041.2021.00007","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00007","url":null,"abstract":"Board games have become promising tools for observing and studying social behaviors in multi-person settings. While traditional methods such as self-report questionnaires are used to analyze game-induced behaviors, there is a growing need to automate such analyses. In this paper, we focus on estimating the levels of board game experience by analyzing a player’s confidence and anxiety from visual cues. We use a board game setting to induce relevant interactions, and investigate facial expressions during critical game events. For our analysis, we annotated the critical game events in a multiplayer cooperative board game, using the publicly available MUMBAI board game corpus. Using off-the-shelf tools, we encoded facial behavior in dyadic interactions and built classifiers to predict each player’s level of experience. Our results show that considering the experience level of both parties involved in the interaction simultaneously improves the prediction results.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"105 19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127457213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Weakly Supervised Multi-Object Tracking and Segmentation 弱监督多目标跟踪与分割
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00018
Idoia Ruiz, L. Porzi, S. R. Bulò, P. Kontschieder, J. Serrat
We introduce the problem of weakly supervised Multi-Object Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation. To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning, i.e. classification and tracking tasks guide the training of the unsupervised instance segmentation. For that purpose, we extract weak foreground localization information, provided by Grad-CAM heatmaps, to generate a partial ground truth to learn from. Additionally, RGB image level information is employed to refine the mask prediction at the edges of the objects. We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just 12% and 12.7 % for cars and pedestrians, respectively.
我们引入了弱监督多目标跟踪和分割问题,即联合弱监督实例分割和多目标跟踪,其中我们不提供任何类型的掩码注释。为了解决这个问题,我们设计了一种新的协同训练策略,利用多任务学习,即分类和跟踪任务指导无监督实例分割的训练。为此,我们提取由Grad-CAM热图提供的弱前景定位信息,以生成部分地面真值以供学习。此外,利用RGB图像级信息对目标边缘的掩模预测进行细化。我们在KITTI MOTS(该任务最具代表性的基准)上评估了我们的方法,将完全监督和弱监督方法在汽车和行人的MOTSP度量上的性能差距分别缩小到12%和12.7%。
{"title":"Weakly Supervised Multi-Object Tracking and Segmentation","authors":"Idoia Ruiz, L. Porzi, S. R. Bulò, P. Kontschieder, J. Serrat","doi":"10.1109/WACVW52041.2021.00018","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00018","url":null,"abstract":"We introduce the problem of weakly supervised Multi-Object Tracking and Segmentation, i.e. joint weakly supervised instance segmentation and multi-object tracking, in which we do not provide any kind of mask annotation. To address it, we design a novel synergistic training strategy by taking advantage of multi-task learning, i.e. classification and tracking tasks guide the training of the unsupervised instance segmentation. For that purpose, we extract weak foreground localization information, provided by Grad-CAM heatmaps, to generate a partial ground truth to learn from. Additionally, RGB image level information is employed to refine the mask prediction at the edges of the objects. We evaluate our method on KITTI MOTS, the most representative benchmark for this task, reducing the performance gap on the MOTSP metric between the fully supervised and weakly supervised approach to just 12% and 12.7 % for cars and pedestrians, respectively.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131104364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An Explainable Attention-Guided Iris Presentation Attack Detector 一个可解释的注意力引导虹膜呈现攻击检测器
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00015
Cunjian Chen, A. Ross
Convolutional Neural Networks (CNNs) are being increasingly used to address the problem of iris presentation attack detection. In this work, we propose an explainable attention-guided iris presentation attack detector (AG-PAD) to augment CNNs with attention mechanisms and to provide visual explanations of model predictions. Two types of attention modules are independently placed on top of the last convolutional layer of the backbone network. Specifically, the channel attention module is used to model the inter-channel relationship between features, while the position attention module is used to model inter-spatial relationship between features. An element-wise sum is employed to fuse these two attention modules. Further, a novel hierarchical attention mechanism is introduced. Experiments involving both a JHU-APL proprietary dataset and the benchmark LivDet-Iris-2017 dataset suggest that the proposed method achieves promising detection results while explaining occurrences of salient regions for discriminative feature learning. To the best of our knowledge, this is the first work that exploits the use of attention mechanisms in iris presentation attack detection.
卷积神经网络(cnn)越来越多地用于解决虹膜呈现攻击检测问题。在这项工作中,我们提出了一个可解释的注意力引导虹膜呈现攻击检测器(AG-PAD)来增强cnn的注意力机制,并提供模型预测的视觉解释。两种类型的注意力模块被独立地放置在主干网的最后一个卷积层的顶部。其中,通道注意模块用于特征间通道关系建模,位置注意模块用于特征间空间关系建模。我们使用元素求和来融合这两个注意力模块。在此基础上,提出了一种新的分层注意机制。涉及JHU-APL专有数据集和基准LivDet-Iris-2017数据集的实验表明,所提出的方法在解释显著区域的出现以进行判别特征学习的同时取得了很好的检测结果。据我们所知,这是第一个在虹膜呈现攻击检测中利用注意机制的工作。
{"title":"An Explainable Attention-Guided Iris Presentation Attack Detector","authors":"Cunjian Chen, A. Ross","doi":"10.1109/WACVW52041.2021.00015","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00015","url":null,"abstract":"Convolutional Neural Networks (CNNs) are being increasingly used to address the problem of iris presentation attack detection. In this work, we propose an explainable attention-guided iris presentation attack detector (AG-PAD) to augment CNNs with attention mechanisms and to provide visual explanations of model predictions. Two types of attention modules are independently placed on top of the last convolutional layer of the backbone network. Specifically, the channel attention module is used to model the inter-channel relationship between features, while the position attention module is used to model inter-spatial relationship between features. An element-wise sum is employed to fuse these two attention modules. Further, a novel hierarchical attention mechanism is introduced. Experiments involving both a JHU-APL proprietary dataset and the benchmark LivDet-Iris-2017 dataset suggest that the proposed method achieves promising detection results while explaining occurrences of salient regions for discriminative feature learning. To the best of our knowledge, this is the first work that exploits the use of attention mechanisms in iris presentation attack detection.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127119777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Multi-Scale Voxel Class Balanced ASPP for LIDAR Pointcloud Semantic Segmentation 激光雷达点云语义分割的多尺度体素类平衡ASPP
Pub Date : 2021-01-01 DOI: 10.1109/WACVW52041.2021.00017
K. Kumar, S. Al-Stouhi
This paper explores efficient techniques to improve PolarNet model performance to address the real-time semantic segmentation of LiDAR point clouds. The core framework consists of an encoder network, Atrous spatial pyramid pooling (ASPP)/Dense Atrous spatial pyramid pooling (DenseASPP) followed by a decoder network. Encoder extracts multi-scale voxel information in a top-down manner while decoder fuses multiple feature maps from various scales in a bottom-up manner. In between encoder and decoder block, an ASPP/DenseASPP block is inserted to enlarge receptive fields in a very dense manner. In contrast to PolarNet model, we use weighted cross entropy in conjunction with Lovasz-softmax loss to improve segmentation accuracy. Also this paper accelerates training mechanism of PolarNet model by incorporating learning-rate schedulers in conjunction with Adam optimizer for faster convergence with fewer epochs without degrading accuracy. Extensive experiments conducted on challenging SemanticKITTI dataset shows that our high-resolution-grid model obtains competitive state-of-art result of 60.6 mIOU @21fps whereas our low-resolution-grid model obtains 54.01 mIOU @35fps thereby balancing accuracy/speed trade-off.
本文探讨了提高偏振网模型性能的有效技术,以解决激光雷达点云的实时语义分割问题。核心框架由编码器网络、亚特劳斯空间金字塔池(ASPP)/密集亚特劳斯空间金字塔池(DenseASPP)和解码器网络组成。编码器以自上而下的方式提取多尺度体素信息,解码器以自下而上的方式融合多个不同尺度的特征图。在编码器和解码器块之间,插入一个ASPP/DenseASPP块,以非常密集的方式扩大接收域。与PolarNet模型相比,我们使用加权交叉熵结合Lovasz-softmax损失来提高分割精度。通过结合学习率调优器和Adam优化器来加速PolarNet模型的训练机制,在不降低精度的前提下,以更少的epoch更快地收敛。在具有挑战性的SemanticKITTI数据集上进行的大量实验表明,我们的高分辨率网格模型获得了具有竞争力的最新结果60.6 mIOU @21fps,而我们的低分辨率网格模型获得了54.01 mIOU @35fps,从而平衡了精度和速度之间的权衡。
{"title":"Multi-Scale Voxel Class Balanced ASPP for LIDAR Pointcloud Semantic Segmentation","authors":"K. Kumar, S. Al-Stouhi","doi":"10.1109/WACVW52041.2021.00017","DOIUrl":"https://doi.org/10.1109/WACVW52041.2021.00017","url":null,"abstract":"This paper explores efficient techniques to improve PolarNet model performance to address the real-time semantic segmentation of LiDAR point clouds. The core framework consists of an encoder network, Atrous spatial pyramid pooling (ASPP)/Dense Atrous spatial pyramid pooling (DenseASPP) followed by a decoder network. Encoder extracts multi-scale voxel information in a top-down manner while decoder fuses multiple feature maps from various scales in a bottom-up manner. In between encoder and decoder block, an ASPP/DenseASPP block is inserted to enlarge receptive fields in a very dense manner. In contrast to PolarNet model, we use weighted cross entropy in conjunction with Lovasz-softmax loss to improve segmentation accuracy. Also this paper accelerates training mechanism of PolarNet model by incorporating learning-rate schedulers in conjunction with Adam optimizer for faster convergence with fewer epochs without degrading accuracy. Extensive experiments conducted on challenging SemanticKITTI dataset shows that our high-resolution-grid model obtains competitive state-of-art result of 60.6 mIOU @21fps whereas our low-resolution-grid model obtains 54.01 mIOU @35fps thereby balancing accuracy/speed trade-off.","PeriodicalId":313062,"journal":{"name":"2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2021 IEEE Winter Conference on Applications of Computer Vision Workshops (WACVW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1