首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)最新文献

英文 中文
Football Action Recognition Using Hierarchical LSTM 基于分层LSTM的足球动作识别
Takamasa Tsunoda, Y. Komori, M. Matsugu, T. Harada
We present a hierarchical recurrent network for understanding team sports activity in image and location sequences. In the hierarchical model, we integrate proposed multiple person-centered features over a temporal sequence based on LSTM's outputs. To achieve this scheme, we introduce the Keeping state in LSTM as one of externally controllable states, and extend the Hierarchical LSTMs to include mechanism for the integration. Experimental results demonstrate effectiveness of the proposed framework involving hierarchical LSTM and person-centered feature. In this study, we demonstrate improvement over the reference model. Specifically, by incorporating the person-centered feature with meta-information (e.g., location data) in our proposed late fusion framework, we also demonstrate increased discriminability of action categories and enhanced robustness against fluctuation in the number of observed players.
我们提出了一个层次递归网络来理解图像和位置序列中的团队体育活动。在分层模型中,我们在基于LSTM输出的时间序列上集成了提出的多个以人为中心的特征。为了实现该方案,我们在LSTM中引入保持状态作为一种外部可控状态,并对分层LSTM进行扩展,使其包含集成机制。实验结果验证了该框架的有效性,该框架结合了分层LSTM和以人为中心的特征。在本研究中,我们证明了对参考模型的改进。具体来说,通过将以人为中心的特征与元信息(如位置数据)结合到我们提出的后期融合框架中,我们还证明了行动类别的可辨别性增强,以及对观察玩家数量波动的鲁棒性增强。
{"title":"Football Action Recognition Using Hierarchical LSTM","authors":"Takamasa Tsunoda, Y. Komori, M. Matsugu, T. Harada","doi":"10.1109/CVPRW.2017.25","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.25","url":null,"abstract":"We present a hierarchical recurrent network for understanding team sports activity in image and location sequences. In the hierarchical model, we integrate proposed multiple person-centered features over a temporal sequence based on LSTM's outputs. To achieve this scheme, we introduce the Keeping state in LSTM as one of externally controllable states, and extend the Hierarchical LSTMs to include mechanism for the integration. Experimental results demonstrate effectiveness of the proposed framework involving hierarchical LSTM and person-centered feature. In this study, we demonstrate improvement over the reference model. Specifically, by incorporating the person-centered feature with meta-information (e.g., location data) in our proposed late fusion framework, we also demonstrate increased discriminability of action categories and enhanced robustness against fluctuation in the number of observed players.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"30 4 1","pages":"155-163"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90532333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Assisting Users in a World Full of Cameras: A Privacy-Aware Infrastructure for Computer Vision Applications 在一个充满摄像头的世界中协助用户:计算机视觉应用的隐私意识基础设施
Anupam Das, Martin Degeling, Xiaoyou Wang, Junjue Wang, N. Sadeh, M. Satyanarayanan
Computer vision based technologies have seen widespread adoption over the recent years. This use is not limited to the rapid adoption of facial recognition technology but extends to facial expression recognition, scene recognition and more. These developments raise privacy concerns and call for novel solutions to ensure adequate user awareness, and ideally, control over the resulting collection and use of potentially sensitive data. While cameras have become ubiquitous, most of the time users are not even aware of their presence. In this paper we introduce a novel distributed privacy infrastructure for the Internet-of-Things and discuss in particular how it can help enhance user's awareness of and control over the collection and use of video data about them. The infrastructure, which has undergone early deployment and evaluation on two campuses, supports the automated discovery of IoT resources and the selective notification of users. This includes the presence of computer vision applications that collect data about users. In particular, we describe an implementation of functionality that helps users discover nearby cameras and choose whether or not they want their faces to be denatured in the video streams.
近年来,基于计算机视觉的技术得到了广泛的应用。这种用途不仅限于面部识别技术的快速采用,还扩展到面部表情识别,场景识别等。这些发展引起了人们对隐私的关注,需要新的解决方案来确保充分的用户意识,并在理想情况下控制由此产生的潜在敏感数据的收集和使用。虽然摄像头已经无处不在,但大多数时候用户甚至没有意识到它们的存在。在本文中,我们为物联网介绍了一种新的分布式隐私基础设施,并特别讨论了它如何帮助增强用户对有关他们的视频数据的收集和使用的意识和控制。该基础设施已经在两个校区进行了早期部署和评估,支持物联网资源的自动发现和用户的选择性通知。这包括收集用户数据的计算机视觉应用程序的存在。特别是,我们描述了一种功能的实现,该功能可以帮助用户发现附近的摄像头,并选择他们是否希望在视频流中对自己的面部进行变性。
{"title":"Assisting Users in a World Full of Cameras: A Privacy-Aware Infrastructure for Computer Vision Applications","authors":"Anupam Das, Martin Degeling, Xiaoyou Wang, Junjue Wang, N. Sadeh, M. Satyanarayanan","doi":"10.1109/CVPRW.2017.181","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.181","url":null,"abstract":"Computer vision based technologies have seen widespread adoption over the recent years. This use is not limited to the rapid adoption of facial recognition technology but extends to facial expression recognition, scene recognition and more. These developments raise privacy concerns and call for novel solutions to ensure adequate user awareness, and ideally, control over the resulting collection and use of potentially sensitive data. While cameras have become ubiquitous, most of the time users are not even aware of their presence. In this paper we introduce a novel distributed privacy infrastructure for the Internet-of-Things and discuss in particular how it can help enhance user's awareness of and control over the collection and use of video data about them. The infrastructure, which has undergone early deployment and evaluation on two campuses, supports the automated discovery of IoT resources and the selective notification of users. This includes the presence of computer vision applications that collect data about users. In particular, we describe an implementation of functionality that helps users discover nearby cameras and choose whether or not they want their faces to be denatured in the video streams.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"80 8","pages":"1387-1396"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91501523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 60
Accurate and Efficient 3D Human Pose Estimation Algorithm Using Single Depth Images for Pose Analysis in Golf 基于单深度图像的高尔夫姿态分析中准确高效的三维人体姿态估计算法
Soonchan Park, Ju Yong Chang, Hyuk Jeong, Jae-Ho Lee, Jiyoung Park
Human pose analysis has been known to be an effective means to evaluate athlete's performance. Marker-less 3D human pose estimation is one of the most practical methods to acquire human pose but lacks sufficient accuracy required to achieve precise performance analysis for sports. In this paper, we propose a human pose estimation algorithm that utilizes multiple types of random forests to enhance results for sports analysis. Random regression forest voting to localize joints of the athlete's anatomy is followed by random verification forests that evaluate and optimize the votes to improve the accuracy of clustering that determine the final position of anatomic joints. Experiential results show that the proposed algorithm enhances not only accuracy, but also efficiency of human pose estimation. We also conduct the field study to investigate feasibility of the algorithm for sports applications with developed golf swing analyzing system.
人体姿势分析是评价运动员运动表现的有效手段。无标记的三维人体姿态估计是获取人体姿态最实用的方法之一,但缺乏足够的精度来实现精确的运动性能分析。在本文中,我们提出了一种利用多种类型的随机森林来增强运动分析结果的人体姿态估计算法。随机回归森林投票定位运动员解剖关节,然后随机验证森林评估和优化投票,以提高聚类的准确性,确定解剖关节的最终位置。实验结果表明,该算法不仅提高了人体姿态估计的精度,而且提高了人体姿态估计的效率。我们还利用开发的高尔夫挥杆分析系统进行了实地研究,以探讨该算法在体育应用中的可行性。
{"title":"Accurate and Efficient 3D Human Pose Estimation Algorithm Using Single Depth Images for Pose Analysis in Golf","authors":"Soonchan Park, Ju Yong Chang, Hyuk Jeong, Jae-Ho Lee, Jiyoung Park","doi":"10.1109/CVPRW.2017.19","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.19","url":null,"abstract":"Human pose analysis has been known to be an effective means to evaluate athlete's performance. Marker-less 3D human pose estimation is one of the most practical methods to acquire human pose but lacks sufficient accuracy required to achieve precise performance analysis for sports. In this paper, we propose a human pose estimation algorithm that utilizes multiple types of random forests to enhance results for sports analysis. Random regression forest voting to localize joints of the athlete's anatomy is followed by random verification forests that evaluate and optimize the votes to improve the accuracy of clustering that determine the final position of anatomic joints. Experiential results show that the proposed algorithm enhances not only accuracy, but also efficiency of human pose estimation. We also conduct the field study to investigate feasibility of the algorithm for sports applications with developed golf swing analyzing system.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"64 4 1","pages":"105-113"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91552094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Automated Layout Synthesis and Visualization from Images of Interior or Exterior Spaces 自动布局合成和可视化从内部或外部空间的图像
Tomer Weiss, Masaki Nakada, Demetri Terzopoulos
Recent work in computer graphics has explored the synthesis of indoor spaces with furniture, accessories, and other layout items. In this work, we bridge the gap between the physical and virtual worlds: Given an input image of an interior or exterior space, and a general user specification of the desired furnishings and layout constraints, our method automatically furnishes the scene with a realistic arrangement and displays it to the user by augmenting the original image. Our method can deal with varying layouts and target arrangements at interactive rates, which affords the user a sense of collaboration with the design program, enabling the rapid visual assessment of various layout designs, a process which would typically be time consuming if done manually. Our method is suitable for smartphones and other camera-enabled mobile devices.
最近在计算机图形学方面的工作探索了室内空间与家具、配件和其他布局项目的综合。在这项工作中,我们弥合了物理世界和虚拟世界之间的差距:给定室内或外部空间的输入图像,以及用户对所需家具和布局约束的一般说明,我们的方法自动为场景提供逼真的布置,并通过增强原始图像向用户显示。我们的方法可以以交互速率处理不同的布局和目标安排,这为用户提供了一种与设计程序协作的感觉,使各种布局设计的快速视觉评估成为可能,如果手工完成这个过程通常会很耗时。我们的方法适用于智能手机和其他支持摄像头的移动设备。
{"title":"Automated Layout Synthesis and Visualization from Images of Interior or Exterior Spaces","authors":"Tomer Weiss, Masaki Nakada, Demetri Terzopoulos","doi":"10.1109/CVPRW.2017.12","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.12","url":null,"abstract":"Recent work in computer graphics has explored the synthesis of indoor spaces with furniture, accessories, and other layout items. In this work, we bridge the gap between the physical and virtual worlds: Given an input image of an interior or exterior space, and a general user specification of the desired furnishings and layout constraints, our method automatically furnishes the scene with a realistic arrangement and displays it to the user by augmenting the original image. Our method can deal with varying layouts and target arrangements at interactive rates, which affords the user a sense of collaboration with the design program, enabling the rapid visual assessment of various layout designs, a process which would typically be time consuming if done manually. Our method is suitable for smartphones and other camera-enabled mobile devices.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"4 1","pages":"41-47"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90092076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Simple Black-Box Adversarial Attacks on Deep Neural Networks 深度神经网络的简单黑盒对抗性攻击
Nina Narodytska, S. Kasiviswanathan
Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expected system behavior leading to undesired consequences and could pose a security risk when these systems are deployed in the real world.,,,,,, In this work, we focus on deep convolutional neural networks and demonstrate that adversaries can easily craft adversarial examples even without any internal knowledge of the target network. Our attacks treat the network as an oracle (black-box) and only assume that the output of the network can be observed on the probed inputs. Our attacks utilize a novel local-search based technique to construct numerical approximation to the network gradient, which is then carefully used to construct a small set of pixels in an image to perturb. We demonstrate how this underlying idea can be adapted to achieve several strong notions of misclassification. The simplicity and effectiveness of our proposed schemes mean that they could serve as a litmus test for designing robust networks.
深度神经网络是一种强大而流行的学习模型,它在许多计算机视觉、语音和语言处理任务上实现了最先进的模式识别性能。然而,这些网络也被证明容易受到精心制作的对抗性扰动的影响,这些扰动会迫使输入错误分类。对抗性示例使攻击者能够破坏预期的系统行为,从而导致不期望的后果,并且当这些系统部署在现实世界中时,可能会造成安全风险。,,,,,,在这项工作中,我们专注于深度卷积神经网络,并证明即使没有任何目标网络的内部知识,攻击者也可以轻松地制作对抗性示例。我们的攻击将网络视为一个预言器(黑盒),并且只假设可以在探测的输入上观察到网络的输出。我们的攻击利用一种新颖的基于局部搜索的技术来构建网络梯度的数值近似,然后小心地使用它来构建图像中的一小组像素进行扰动。我们将演示如何调整这个基本概念来实现错误分类的几个强大概念。我们提出的方案的简单性和有效性意味着它们可以作为设计健壮网络的试金石。
{"title":"Simple Black-Box Adversarial Attacks on Deep Neural Networks","authors":"Nina Narodytska, S. Kasiviswanathan","doi":"10.1109/CVPRW.2017.172","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.172","url":null,"abstract":"Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expected system behavior leading to undesired consequences and could pose a security risk when these systems are deployed in the real world.,,,,,, In this work, we focus on deep convolutional neural networks and demonstrate that adversaries can easily craft adversarial examples even without any internal knowledge of the target network. Our attacks treat the network as an oracle (black-box) and only assume that the output of the network can be observed on the probed inputs. Our attacks utilize a novel local-search based technique to construct numerical approximation to the network gradient, which is then carefully used to construct a small set of pixels in an image to perturb. We demonstrate how this underlying idea can be adapted to achieve several strong notions of misclassification. The simplicity and effectiveness of our proposed schemes mean that they could serve as a litmus test for designing robust networks.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"115 1","pages":"1310-1318"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79527701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 253
Iris Super-Resolution Using Iterative Neighbor Embedding 基于迭代邻居嵌入的虹膜超分辨率
F. Alonso-Fernandez, R. Farrugia, J. Bigün
Iris recognition research is heading towards enabling more relaxed acquisition conditions. This has effects on the quality and resolution of acquired images, severely affecting the accuracy of recognition systems if not tackled appropriately. In this paper, we evaluate a super-resolution algorithm used to reconstruct iris images based on iterative neighbor embedding of local image patches which tries to represent input low-resolution patches while preserving the geometry of the original high-resolution space. To this end, the geometry of the low-and high-resolution manifolds are jointly considered during the reconstruction process. We validate the system with a database of 1,872 near-infrared iris images, while fusion of two iris comparators has been adopted to improve recognition performance. The presented approach is substantially superior to bilinear/bicubic interpolations at very low resolutions, and it also outperforms a previous PCA-based iris reconstruction approach which only considers the geometry of the low-resolution manifold during the reconstruction process.
虹膜识别研究正朝着更宽松的获取条件发展。这将影响所获取图像的质量和分辨率,如果处理不当,将严重影响识别系统的准确性。在本文中,我们评估了一种用于重建虹膜图像的超分辨率算法,该算法基于局部图像补丁的迭代邻居嵌入,该算法试图表示输入的低分辨率补丁,同时保留原始高分辨率空间的几何形状。为此,在重建过程中共同考虑了低分辨率和高分辨率歧管的几何形状。利用1872张近红外虹膜图像数据库对该系统进行了验证,并采用了两个虹膜比较器的融合来提高识别性能。所提出的方法在非常低的分辨率下大大优于双线性/双三次插值,并且也优于先前基于pca的虹膜重建方法,该方法在重建过程中仅考虑低分辨率流形的几何形状。
{"title":"Iris Super-Resolution Using Iterative Neighbor Embedding","authors":"F. Alonso-Fernandez, R. Farrugia, J. Bigün","doi":"10.1109/CVPRW.2017.94","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.94","url":null,"abstract":"Iris recognition research is heading towards enabling more relaxed acquisition conditions. This has effects on the quality and resolution of acquired images, severely affecting the accuracy of recognition systems if not tackled appropriately. In this paper, we evaluate a super-resolution algorithm used to reconstruct iris images based on iterative neighbor embedding of local image patches which tries to represent input low-resolution patches while preserving the geometry of the original high-resolution space. To this end, the geometry of the low-and high-resolution manifolds are jointly considered during the reconstruction process. We validate the system with a database of 1,872 near-infrared iris images, while fusion of two iris comparators has been adopted to improve recognition performance. The presented approach is substantially superior to bilinear/bicubic interpolations at very low resolutions, and it also outperforms a previous PCA-based iris reconstruction approach which only considers the geometry of the low-resolution manifold during the reconstruction process.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"1 1","pages":"655-663"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74889042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Automatic Curation of Golf Highlights Using Multimodal Excitement Features 使用多模式兴奋功能的高尔夫亮点自动管理
Michele Merler, D. Joshi, Q. Nguyen, Stephen Hammer, John Kent, John R. Smith, R. Feris
The production of sports highlight packages summarizing a game’s most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and use it to create a real-world system for the editorial aid of golf highlight reels. Our method fuses information from the players’ reactions (action recognition such as high-fives and fist pumps), spectators (crowd cheering), and commentator (tone of the voice and word analysis) to determine the most interesting moments of a game. We accurately identify the start and end frames of key shot highlights with additional metadata, such as the player’s name and the hole number, allowing personalized content summarization and retrieval. In addition, we introduce new techniques for learning our classifiers with reduced manual training data annotation by exploiting the correlation of different modalities. Our work has been demonstrated at a major golf tournament, successfully extracting highlights from live video streams over four consecutive days.
制作总结一场比赛最激动人心时刻的体育集锦是广播媒体的一项重要任务。然而,它需要劳动密集型的视频编辑。我们提出了一种新的方法来自动管理体育集锦,并用它来创建一个真实世界的系统,用于高尔夫集锦录像的编辑辅助。我们的方法融合了来自玩家反应(如击掌和握拳等动作识别)、观众(人群欢呼)和解说员(声音和文字分析)的信息,以确定游戏中最有趣的时刻。我们用额外的元数据,如球员的名字和洞号,准确地识别关键击球高光的开始和结束帧,允许个性化的内容总结和检索。此外,我们引入了新的技术,通过利用不同模态的相关性,减少人工训练数据注释来学习我们的分类器。我们的工作已经在一场大型高尔夫锦标赛中得到了验证,成功地从连续四天的实时视频流中提取了精彩片段。
{"title":"Automatic Curation of Golf Highlights Using Multimodal Excitement Features","authors":"Michele Merler, D. Joshi, Q. Nguyen, Stephen Hammer, John Kent, John R. Smith, R. Feris","doi":"10.1109/CVPRW.2017.14","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.14","url":null,"abstract":"The production of sports highlight packages summarizing a game’s most exciting moments is an essential task for broadcast media. Yet, it requires labor-intensive video editing. We propose a novel approach for auto-curating sports highlights, and use it to create a real-world system for the editorial aid of golf highlight reels. Our method fuses information from the players’ reactions (action recognition such as high-fives and fist pumps), spectators (crowd cheering), and commentator (tone of the voice and word analysis) to determine the most interesting moments of a game. We accurately identify the start and end frames of key shot highlights with additional metadata, such as the player’s name and the hole number, allowing personalized content summarization and retrieval. In addition, we introduce new techniques for learning our classifiers with reduced manual training data annotation by exploiting the correlation of different modalities. Our work has been demonstrated at a major golf tournament, successfully extracting highlights from live video streams over four consecutive days.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"142 1","pages":"57-65"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88959889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Joint 3D Human Motion Capture and Physical Analysis from Monocular Videos 单目视频的联合3D人体动作捕捉和物理分析
Petrissa Zell, Bastian Wandt, B. Rosenhahn
Motion analysis is often restricted to a laboratory setup with multiple cameras and force sensors which requires expensive equipment and knowledgeable operators. Therefore it lacks in simplicity and flexibility. We propose an algorithm combining monocular 3D pose estimation with physics-based modeling to introduce a statistical framework for fast and robust 3D motion analysis from 2D video-data. We use a factorization approach to learn 3D motion coefficients and join them with physical parameters, that describe the dynamic of a mass-spring-model. Our approach does neither require additional force measurement nor torque optimization and only uses a single camera while allowing to estimate unobservable torques in the human body. We show that our algorithm improves the monocular 3D reconstruction by enforcing plausible human motion and resolving the ambiguity of camera and object motion.,,,,,,The performance is evaluated on different motions and multiple test data sets as well as on challenging outdoor sequences.
运动分析通常仅限于具有多个摄像机和力传感器的实验室设置,这需要昂贵的设备和知识渊博的操作人员。因此,它缺乏简单性和灵活性。我们提出了一种将单目3D姿态估计与基于物理的建模相结合的算法,为从2D视频数据中快速和鲁棒的3D运动分析引入了一个统计框架。我们使用分解方法来学习三维运动系数,并将它们与描述质量-弹簧模型动力学的物理参数相结合。我们的方法既不需要额外的力测量也不需要扭矩优化,只使用单个摄像机,同时允许估计人体中不可观察的扭矩。我们证明了我们的算法通过强制合理的人体运动和解决相机和物体运动的模糊性来改善单目3D重建。,,,,,,性能在不同的运动和多个测试数据集以及具有挑战性的户外序列上进行评估。
{"title":"Joint 3D Human Motion Capture and Physical Analysis from Monocular Videos","authors":"Petrissa Zell, Bastian Wandt, B. Rosenhahn","doi":"10.1109/CVPRW.2017.9","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.9","url":null,"abstract":"Motion analysis is often restricted to a laboratory setup with multiple cameras and force sensors which requires expensive equipment and knowledgeable operators. Therefore it lacks in simplicity and flexibility. We propose an algorithm combining monocular 3D pose estimation with physics-based modeling to introduce a statistical framework for fast and robust 3D motion analysis from 2D video-data. We use a factorization approach to learn 3D motion coefficients and join them with physical parameters, that describe the dynamic of a mass-spring-model. Our approach does neither require additional force measurement nor torque optimization and only uses a single camera while allowing to estimate unobservable torques in the human body. We show that our algorithm improves the monocular 3D reconstruction by enforcing plausible human motion and resolving the ambiguity of camera and object motion.,,,,,,The performance is evaluated on different motions and multiple test data sets as well as on challenging outdoor sequences.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"30 1","pages":"17-26"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84382527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Two-Stream Neural Networks for Tampered Face Detection 用于篡改人脸检测的双流神经网络
Peng Zhou, Xintong Han, Vlad I. Morariu, L. Davis
We propose a two-stream network for face tampering detection. We train GoogLeNet to detect tampering artifacts in a face classification stream, and train a patch based triplet network to leverage features capturing local noise residuals and camera characteristics as a second stream. In addition, we use two different online face swaping applications to create a new dataset that consists of 2010 tampered images, each of which contains a tampered face. We evaluate the proposed two-stream network on our newly collected dataset. Experimental results demonstrate the effectness of our method.
我们提出了一种用于人脸篡改检测的双流网络。我们训练GoogLeNet来检测人脸分类流中的篡改伪像,并训练一个基于补丁的三重网络来利用捕捉局部噪声残差和相机特征的特征作为第二流。此外,我们使用两个不同的在线人脸交换应用程序来创建一个由2010个篡改图像组成的新数据集,每个图像都包含一个篡改的人脸。我们在新收集的数据集上评估了提出的双流网络。实验结果证明了该方法的有效性。
{"title":"Two-Stream Neural Networks for Tampered Face Detection","authors":"Peng Zhou, Xintong Han, Vlad I. Morariu, L. Davis","doi":"10.1109/CVPRW.2017.229","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.229","url":null,"abstract":"We propose a two-stream network for face tampering detection. We train GoogLeNet to detect tampering artifacts in a face classification stream, and train a patch based triplet network to leverage features capturing local noise residuals and camera characteristics as a second stream. In addition, we use two different online face swaping applications to create a new dataset that consists of 2010 tampered images, each of which contains a tampered face. We evaluate the proposed two-stream network on our newly collected dataset. Experimental results demonstrate the effectness of our method.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"66 1","pages":"1831-1839"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89285009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 410
Deep Spatial-Temporal Fusion Network for Video-Based Person Re-identification 基于视频的人物再识别深度时空融合网络
Lin Chen, Hua Yang, Ji Zhu, Qin Zhou, Shuang Wu, Zhiyong Gao
In this paper, we propose a novel deep end-to-end network to automatically learn the spatial-temporal fusion features for video-based person re-identification. Specifically, the proposed network consists of CNN and RNN to jointly learn both the spatial and the temporal features of input image sequences. The network is optimized by utilizing the siamese and softmax losses simultaneously to pull the instances of the same person closer and push the instances of different persons apart. Our network is trained on full-body and part-body image sequences respectively to learn complementary representations from holistic and local perspectives. By combining them together, we obtain more discriminative features that are beneficial to person re-identification. Experiments conducted on the PRID-2011, i-LIDS-VIS and MARS datasets show that the proposed method performs favorably against existing approaches.
在本文中,我们提出了一种新的深度端到端网络来自动学习基于视频的人物再识别的时空融合特征。具体来说,该网络由CNN和RNN组成,共同学习输入图像序列的空间和时间特征。通过同时利用siamese和softmax损失来优化网络,将同一个人的实例拉得更近,并将不同人的实例分开。我们的网络分别在全身和部分身体图像序列上进行训练,从整体和局部角度学习互补表示。将它们结合在一起,我们得到了更多有利于人的再识别的判别特征。在PRID-2011、i- lid - vis和MARS数据集上进行的实验表明,该方法优于现有方法。
{"title":"Deep Spatial-Temporal Fusion Network for Video-Based Person Re-identification","authors":"Lin Chen, Hua Yang, Ji Zhu, Qin Zhou, Shuang Wu, Zhiyong Gao","doi":"10.1109/CVPRW.2017.191","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.191","url":null,"abstract":"In this paper, we propose a novel deep end-to-end network to automatically learn the spatial-temporal fusion features for video-based person re-identification. Specifically, the proposed network consists of CNN and RNN to jointly learn both the spatial and the temporal features of input image sequences. The network is optimized by utilizing the siamese and softmax losses simultaneously to pull the instances of the same person closer and push the instances of different persons apart. Our network is trained on full-body and part-body image sequences respectively to learn complementary representations from holistic and local perspectives. By combining them together, we obtain more discriminative features that are beneficial to person re-identification. Experiments conducted on the PRID-2011, i-LIDS-VIS and MARS datasets show that the proposed method performs favorably against existing approaches.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"9 1","pages":"1478-1485"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87911439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1