首页 > 最新文献

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)最新文献

英文 中文
A Dataset for Persistent Multi-target Multi-camera Tracking in RGB-D RGB-D中持久多目标多摄像机跟踪数据集
Ryan Layne, S. Hannuna, M. Camplani, Jake Hall, Timothy M. Hospedales, T. Xiang, M. Mirmehdi, D. Damen
Video surveillance systems are now widely deployed to improve our lives by enhancing safety, security, health monitoring and business intelligence. This has motivated extensive research into automated video analysis. Nevertheless, there is a gap between the focus of contemporary research, and the needs of end users of video surveillance systems. Many existing benchmarks and methodologies focus on narrowly defined problems in detection, tracking, re-identification or recognition. In contrast, end users face higher-level problems such as long-term monitoring of identities in order to build a picture of a person's activity across the course of a day, producing usage statistics of a particular area of space, and that these capabilities should be robust to challenges such as change of clothing. To achieve this effectively requires less widely studied capabilities such as spatio-temporal reasoning about people identities and locations within a space partially observed by multiple cameras over an extended time period. To bridge this gap between research and required capabilities, we propose a new dataset LIMA that encompasses the challenges of monitoring a typical home / office environment. LIMA contains 4.5 hours of RGB-D video from three cameras monitoring a four room house. To reflect the challenges of a realistic practical application, the dataset includes clothes changes and visitors to ensure the global reasoning is a realistic open-set problem. In addition to raw data, we provide identity annotation for benchmarking, and tracking results from a contemporary RGB-D tracker – thus allowing focus on the higher level monitoring problems.
视频监控系统现已广泛部署,通过加强安全,安全,健康监控和商业智能来改善我们的生活。这激发了对自动视频分析的广泛研究。然而,当代研究的重点与视频监控系统最终用户的需求之间存在差距。许多现有的基准和方法侧重于检测、跟踪、再识别或识别方面的狭义问题。相比之下,终端用户面临着更高层次的问题,例如长期监控身份,以便建立一个人在一天中的活动图像,生成特定空间区域的使用统计数据,并且这些功能应该能够应对诸如更换衣服之类的挑战。为了有效地实现这一点,需要较少被广泛研究的能力,例如在长时间内由多个摄像头部分观察到的空间内,对人的身份和位置进行时空推理。为了弥合研究与所需能力之间的差距,我们提出了一个新的数据集LIMA,该数据集涵盖了监测典型家庭/办公环境的挑战。LIMA包含了4个小时的RGB-D视频,这些视频来自3个摄像头对一栋有4个房间的房子的监控。为了反映现实实际应用的挑战,数据集包括衣服变化和访客,以确保全局推理是一个现实的开集问题。除了原始数据之外,我们还提供了用于基准测试的标识注释,并跟踪来自现代RGB-D跟踪器的结果—从而允许关注更高级别的监视问题。
{"title":"A Dataset for Persistent Multi-target Multi-camera Tracking in RGB-D","authors":"Ryan Layne, S. Hannuna, M. Camplani, Jake Hall, Timothy M. Hospedales, T. Xiang, M. Mirmehdi, D. Damen","doi":"10.1109/CVPRW.2017.189","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.189","url":null,"abstract":"Video surveillance systems are now widely deployed to improve our lives by enhancing safety, security, health monitoring and business intelligence. This has motivated extensive research into automated video analysis. Nevertheless, there is a gap between the focus of contemporary research, and the needs of end users of video surveillance systems. Many existing benchmarks and methodologies focus on narrowly defined problems in detection, tracking, re-identification or recognition. In contrast, end users face higher-level problems such as long-term monitoring of identities in order to build a picture of a person's activity across the course of a day, producing usage statistics of a particular area of space, and that these capabilities should be robust to challenges such as change of clothing. To achieve this effectively requires less widely studied capabilities such as spatio-temporal reasoning about people identities and locations within a space partially observed by multiple cameras over an extended time period. To bridge this gap between research and required capabilities, we propose a new dataset LIMA that encompasses the challenges of monitoring a typical home / office environment. LIMA contains 4.5 hours of RGB-D video from three cameras monitoring a four room house. To reflect the challenges of a realistic practical application, the dataset includes clothes changes and visitors to ensure the global reasoning is a realistic open-set problem. In addition to raw data, we provide identity annotation for benchmarking, and tracking results from a contemporary RGB-D tracker – thus allowing focus on the higher level monitoring problems.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"53 1","pages":"1462-1470"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78210360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Inferring Hidden Statuses and Actions in Video by Causal Reasoning 用因果推理推断视频中的隐藏状态和动作
A. Fire, Song-Chun Zhu
In the physical world, cause and effect are inseparable: ambient conditions trigger humans to perform actions, thereby driving status changes of objects. In video, these actions and statuses may be hidden due to ambiguity, occlusion, or because they are otherwise unobservable, but humans nevertheless perceive them. In this paper, we extend the Causal And-Or Graph (C-AOG) to a sequential model representing actions and their effects on objects over time, and we build a probability model for it. For inference, we apply a Viterbi algorithm, grounded on probabilistic detections from video, to fill in hidden and misdetected actions and statuses. We analyze our method on a new video dataset that showcases causes and effects. Our results demonstrate the effectiveness of reasoning with causality over time.
在物理世界中,因果是不可分割的:环境条件触发人的行为,从而驱动物体状态的变化。在视频中,这些动作和状态可能由于模糊、遮挡或无法观察而被隐藏,但人类仍然可以感知到它们。在本文中,我们将因果或因果图(C-AOG)扩展为一个表示动作及其随时间对对象的影响的序列模型,并为此建立了一个概率模型。对于推理,我们应用基于视频概率检测的Viterbi算法来填充隐藏和误检测的动作和状态。我们在一个展示因果关系的新视频数据集上分析了我们的方法。我们的结果表明,随着时间的推移,因果关系推理的有效性。
{"title":"Inferring Hidden Statuses and Actions in Video by Causal Reasoning","authors":"A. Fire, Song-Chun Zhu","doi":"10.1109/CVPRW.2017.13","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.13","url":null,"abstract":"In the physical world, cause and effect are inseparable: ambient conditions trigger humans to perform actions, thereby driving status changes of objects. In video, these actions and statuses may be hidden due to ambiguity, occlusion, or because they are otherwise unobservable, but humans nevertheless perceive them. In this paper, we extend the Causal And-Or Graph (C-AOG) to a sequential model representing actions and their effects on objects over time, and we build a probability model for it. For inference, we apply a Viterbi algorithm, grounded on probabilistic detections from video, to fill in hidden and misdetected actions and statuses. We analyze our method on a new video dataset that showcases causes and effects. Our results demonstrate the effectiveness of reasoning with causality over time.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"52 1","pages":"48-56"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85921091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Video-Based Person Re-identification by Deep Feature Guided Pooling 基于深度特征引导池的视频人物再识别
You Li, L. Zhuo, Jiafeng Li, Jing Zhang, Xi Liang, Q. Tian
Person re-identification (re-id) aims to match a specific person across non-overlapping views of different cameras, which is currently one of the hot topics in computer vision. Compared with image-based person re-id, video-based techniques could achieve better performance by fully utilizing the space-time information. This paper presents a novel video-based person re-id method named Deep Feature Guided Pooling (DFGP), which can take full advantage of the space-time information. The contributions of the method are in the following aspects: (1) PCA-based convolutional network (PCN), a lightweight deep learning network, is trained to generate deep features of video frames. Deep features are aggregated by average pooling to obtain person deep feature vectors. The vectors are utilized to guide the generation of human appearance features, which makes the appearance features robust to the severe noise in videos. (2) Hand-crafted local features of videos are aggregated by max pooling to reinforce the motion variations of different persons. In this way, the human descriptors are more discriminative. (3) The final human descriptors are composed of deep features and hand-crafted local features to take their own advantages and the performance of identification is promoted. Experimental results show that our approach outperforms six other state-of-the-art video-based methods on the challenging PRID 2011 and iLIDS-VID video-based person re-id datasets.
人物再识别(re-id)的目标是在不同摄像机的非重叠视图中匹配特定的人,这是当前计算机视觉领域的热点之一。与基于图像的身份识别技术相比,基于视频的身份识别技术可以充分利用时空信息,达到更好的性能。本文提出了一种新的基于视频的人物身份识别方法——深度特征引导池(Deep Feature Guided Pooling, DFGP),该方法可以充分利用视频中的时空信息。该方法的贡献体现在以下几个方面:(1)基于pca的卷积网络(PCN)是一种轻量级的深度学习网络,用于训练生成视频帧的深度特征。通过平均池化对深度特征进行聚合,得到人的深度特征向量。利用矢量来指导人体外观特征的生成,使其对视频中的严重噪声具有鲁棒性。(2)对手工制作的视频局部特征进行最大池化聚合,增强不同人的动作变化。这样,人类的描述符就更有辨别力了。(3)最终的人类描述符由深层特征和手工制作的局部特征组成,发挥各自的优势,提高了识别性能。实验结果表明,我们的方法在具有挑战性的PRID 2011和iLIDS-VID视频的人重新身份数据集上优于其他六种最先进的基于视频的方法。
{"title":"Video-Based Person Re-identification by Deep Feature Guided Pooling","authors":"You Li, L. Zhuo, Jiafeng Li, Jing Zhang, Xi Liang, Q. Tian","doi":"10.1109/CVPRW.2017.188","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.188","url":null,"abstract":"Person re-identification (re-id) aims to match a specific person across non-overlapping views of different cameras, which is currently one of the hot topics in computer vision. Compared with image-based person re-id, video-based techniques could achieve better performance by fully utilizing the space-time information. This paper presents a novel video-based person re-id method named Deep Feature Guided Pooling (DFGP), which can take full advantage of the space-time information. The contributions of the method are in the following aspects: (1) PCA-based convolutional network (PCN), a lightweight deep learning network, is trained to generate deep features of video frames. Deep features are aggregated by average pooling to obtain person deep feature vectors. The vectors are utilized to guide the generation of human appearance features, which makes the appearance features robust to the severe noise in videos. (2) Hand-crafted local features of videos are aggregated by max pooling to reinforce the motion variations of different persons. In this way, the human descriptors are more discriminative. (3) The final human descriptors are composed of deep features and hand-crafted local features to take their own advantages and the performance of identification is promoted. Experimental results show that our approach outperforms six other state-of-the-art video-based methods on the challenging PRID 2011 and iLIDS-VID video-based person re-id datasets.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"115 1","pages":"1454-1461"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88079170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Image Super Resolution Based on Fusing Multiple Convolution Neural Networks 基于融合多重卷积神经网络的图像超分辨率
Haoyu Ren, Mostafa El-Khamy, Jungwon Lee
In this paper, we focus on constructing an accurate super resolution system based on multiple Convolution Neural Networks (CNNs). Each individual CNN is trained separately with different network structure. A Context-wise Network Fusion (CNF) approach is proposed to integrate the outputs of individual networks by additional convolution layers. With fine-tuning the whole fused network, the accuracy is significantly improved compared to the individual networks. We also discuss other network fusion schemes, including Pixel-Wise network Fusion (PWF) and Progressive Network Fusion (PNF). The experimental results show that the CNF outperforms PWF and PNF. Using SRCNN as individual network, the CNF network achieves the state-of-the-art accuracy on benchmark image datasets.
本文主要研究基于多个卷积神经网络(cnn)的精确超分辨率系统。每个单独的CNN用不同的网络结构单独训练。提出了一种基于上下文的网络融合(CNF)方法,通过附加的卷积层来整合单个网络的输出。通过对整个融合网络进行微调,与单个网络相比,精度得到了显著提高。我们还讨论了其他网络融合方案,包括逐像素网络融合(PWF)和渐进式网络融合(PNF)。实验结果表明,CNF的性能优于PWF和PNF。使用SRCNN作为单独的网络,CNF网络在基准图像数据集上达到了最先进的精度。
{"title":"Image Super Resolution Based on Fusing Multiple Convolution Neural Networks","authors":"Haoyu Ren, Mostafa El-Khamy, Jungwon Lee","doi":"10.1109/CVPRW.2017.142","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.142","url":null,"abstract":"In this paper, we focus on constructing an accurate super resolution system based on multiple Convolution Neural Networks (CNNs). Each individual CNN is trained separately with different network structure. A Context-wise Network Fusion (CNF) approach is proposed to integrate the outputs of individual networks by additional convolution layers. With fine-tuning the whole fused network, the accuracy is significantly improved compared to the individual networks. We also discuss other network fusion schemes, including Pixel-Wise network Fusion (PWF) and Progressive Network Fusion (PNF). The experimental results show that the CNF outperforms PWF and PNF. Using SRCNN as individual network, the CNF network achieves the state-of-the-art accuracy on benchmark image datasets.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"64 1","pages":"1050-1057"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91205093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
The Stereoscopic Zoom 立体变焦
S. Pujades, Frederic Devernay, Laurent Boiron, Rémi Ronfard
We study camera models to generate stereoscopic zoom shots, i.e. using very long focal length lenses. Stereoscopic images are usually generated with two cameras. However, we show that two cameras are unable to create compelling stereoscopic images for extreme focal length lenses. Inspired by the practitioners' use of the long focal length lenses we propose two different configurations: we "get closer" to the scene, or we create "perspective deformations". Both configurations are build upon state-of-the-art image-based rendering methods allowing the formal deduction of precise parameters of the cameras depending on the scene to be acquired. We present a proof of concept with the acquisition of a representative simplified scene. We discuss the advantages and drawbacks of each configuration.
我们研究相机模型来生成立体变焦镜头,即使用非常长的焦距镜头。立体图像通常由两个摄像头生成。然而,我们表明,两个相机无法创建引人注目的立体图像的极端焦距镜头。受从业者使用长焦距镜头的启发,我们提出了两种不同的配置:我们“接近”场景,或者我们创造“透视变形”。这两种配置都建立在最先进的基于图像的渲染方法上,允许根据要获取的场景正式推导相机的精确参数。我们通过获取一个具有代表性的简化场景来证明概念。我们将讨论每种配置的优点和缺点。
{"title":"The Stereoscopic Zoom","authors":"S. Pujades, Frederic Devernay, Laurent Boiron, Rémi Ronfard","doi":"10.1109/CVPRW.2017.170","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.170","url":null,"abstract":"We study camera models to generate stereoscopic zoom shots, i.e. using very long focal length lenses. Stereoscopic images are usually generated with two cameras. However, we show that two cameras are unable to create compelling stereoscopic images for extreme focal length lenses. Inspired by the practitioners' use of the long focal length lenses we propose two different configurations: we \"get closer\" to the scene, or we create \"perspective deformations\". Both configurations are build upon state-of-the-art image-based rendering methods allowing the formal deduction of precise parameters of the cameras depending on the scene to be acquired. We present a proof of concept with the acquisition of a representative simplified scene. We discuss the advantages and drawbacks of each configuration.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"47 1","pages":"1295-1304"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81166038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study 2017年全图像超分辨率挑战:数据集与研究
E. Agustsson, R. Timofte
This paper introduces a novel large dataset for example-based single image super-resolution and studies the state-of-the-art as emerged from the NTIRE 2017 challenge. The challenge is the first challenge of its kind, with 6 competitions, hundreds of participants and tens of proposed solutions. Our newly collected DIVerse 2K resolution image dataset (DIV2K) was employed by the challenge. In our study we compare the solutions from the challenge to a set of representative methods from the literature and evaluate them using diverse measures on our proposed DIV2K dataset. Moreover, we conduct a number of experiments and draw conclusions on several topics of interest. We conclude that the NTIRE 2017 challenge pushes the state-of-the-art in single-image super-resolution, reaching the best results to date on the popular Set5, Set14, B100, Urban100 datasets and on our newly proposed DIV2K.
本文介绍了一种新的基于示例的单图像超分辨率大型数据集,并研究了2017年NTIRE挑战赛中出现的最新技术。该挑战是同类挑战中的第一个,有6个比赛,数百名参与者和数十个提出的解决方案。本次挑战赛采用了我们新收集的多元2K分辨率图像数据集(DIV2K)。在我们的研究中,我们将挑战的解决方案与文献中的一组代表性方法进行了比较,并在我们提出的DIV2K数据集上使用不同的测量方法对它们进行了评估。此外,我们进行了一些实验,并就几个感兴趣的主题得出结论。我们的结论是,2017年的挑战赛推动了最先进的单图像超分辨率,在流行的Set5、Set14、B100、Urban100数据集和我们新提出的DIV2K数据集上取得了迄今为止最好的结果。
{"title":"NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study","authors":"E. Agustsson, R. Timofte","doi":"10.1109/CVPRW.2017.150","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.150","url":null,"abstract":"This paper introduces a novel large dataset for example-based single image super-resolution and studies the state-of-the-art as emerged from the NTIRE 2017 challenge. The challenge is the first challenge of its kind, with 6 competitions, hundreds of participants and tens of proposed solutions. Our newly collected DIVerse 2K resolution image dataset (DIV2K) was employed by the challenge. In our study we compare the solutions from the challenge to a set of representative methods from the literature and evaluate them using diverse measures on our proposed DIV2K dataset. Moreover, we conduct a number of experiments and draw conclusions on several topics of interest. We conclude that the NTIRE 2017 challenge pushes the state-of-the-art in single-image super-resolution, reaching the best results to date on the popular Set5, Set14, B100, Urban100 datasets and on our newly proposed DIV2K.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"1 1","pages":"1122-1131"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90613434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2070
Automated Risk Assessment for Scene Understanding and Domestic Robots Using RGB-D Data and 2.5D CNNs at a Patch Level 基于RGB-D数据和2.5D cnn的场景理解和家用机器人自动风险评估
Rob Dupre, Georgios Tzimiropoulos, V. Argyriou
In this work the notion of automated risk assessment for 3D scenes is addressed. Using deep learning techniques smart enabled homes and domestic robots can be equipped with the functionality to detect, draw attention to, or mitigate hazards in a given scene. We extend an existing risk estimation framework that incorporates physics and shape descriptors by introducing a novel CNN architecture allowing risk detection at a patch level. Analysis is conducted on RGB-D data and is performed on a frame by frame basis, requiring no temporal information between frames.
在这项工作中,解决了3D场景自动风险评估的概念。使用深度学习技术,智能家庭和家用机器人可以配备检测、引起注意或减轻给定场景中的危险的功能。我们通过引入一种新颖的CNN架构,扩展了现有的风险估计框架,该框架结合了物理和形状描述符,允许在补丁级别进行风险检测。对RGB-D数据进行分析,并以帧为单位进行分析,帧之间不需要时间信息。
{"title":"Automated Risk Assessment for Scene Understanding and Domestic Robots Using RGB-D Data and 2.5D CNNs at a Patch Level","authors":"Rob Dupre, Georgios Tzimiropoulos, V. Argyriou","doi":"10.1109/CVPRW.2017.65","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.65","url":null,"abstract":"In this work the notion of automated risk assessment for 3D scenes is addressed. Using deep learning techniques smart enabled homes and domestic robots can be equipped with the functionality to detect, draw attention to, or mitigate hazards in a given scene. We extend an existing risk estimation framework that incorporates physics and shape descriptors by introducing a novel CNN architecture allowing risk detection at a patch level. Analysis is conducted on RGB-D data and is performed on a frame by frame basis, requiring no temporal information between frames.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"160 1","pages":"476-477"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77281488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Application of Computer Vision and Vector Space Model for Tactical Movement Classification in Badminton 计算机视觉与向量空间模型在羽毛球战术动作分类中的应用
K. Weeratunga, A. Dharmarathne, K. B. How
Performance profiling in sports allow evaluating opponents' tactics and the development of counter tactics to gain a competitive advantage. The work presented develops a comprehensive methodology to automate tactical profiling in elite badminton. The proposed approach uses computer vision techniques to automate data gathering from video footage. The image processing algorithm is validated using video footage of the highest level tournaments, including the Olympic Games. The average accuracy of player position detection is 96.03% and 97.09% on the two halves of a badminton court. Next, frequent trajectories of badminton players are extracted and classified according to their tactical relevance. The classification performs at 97.79% accuracy, 97.81% precision, 97.44% recall, and 97.62% F-score. The combination of automated player position detection, frequent trajectory extraction, and the subsequent classification can be used to automatically generate player tactical profiles.
在体育运动中,表现分析允许评估对手的战术和发展反战术,以获得竞争优势。所提出的工作开发了一种全面的方法来自动化精英羽毛球的战术分析。提出的方法使用计算机视觉技术来自动收集视频片段中的数据。图像处理算法通过使用包括奥运会在内的最高水平比赛的视频片段进行验证。羽毛球场上两半场球员位置检测的平均准确率分别为96.03%和97.09%。其次,对羽毛球运动员的频繁轨迹进行提取,并根据其战术相关性进行分类。分类准确率为97.79%,精密度为97.81%,召回率为97.44%,f值为97.62%。将自动球员位置检测、频繁轨迹提取和随后的分类相结合,可用于自动生成球员战术概况。
{"title":"Application of Computer Vision and Vector Space Model for Tactical Movement Classification in Badminton","authors":"K. Weeratunga, A. Dharmarathne, K. B. How","doi":"10.1109/CVPRW.2017.22","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.22","url":null,"abstract":"Performance profiling in sports allow evaluating opponents' tactics and the development of counter tactics to gain a competitive advantage. The work presented develops a comprehensive methodology to automate tactical profiling in elite badminton. The proposed approach uses computer vision techniques to automate data gathering from video footage. The image processing algorithm is validated using video footage of the highest level tournaments, including the Olympic Games. The average accuracy of player position detection is 96.03% and 97.09% on the two halves of a badminton court. Next, frequent trajectories of badminton players are extracted and classified according to their tactical relevance. The classification performs at 97.79% accuracy, 97.81% precision, 97.44% recall, and 97.62% F-score. The combination of automated player position detection, frequent trajectory extraction, and the subsequent classification can be used to automatically generate player tactical profiles.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"57 1","pages":"132-138"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73838676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Track-Clustering Error Evaluation for Track-Based Multi-camera Tracking System Employing Human Re-identification 基于人再识别的多相机跟踪系统的轨迹聚类误差评估
Chih-Wei Wu, Meng-Ting Zhong, Yu-Yu Tsao, Shao-Wen Yang, Yen-kuang Chen, Shao-Yi Chien
In this study, we present a set of new evaluation measures for the track-based multi-camera tracking (T-MCT) task leveraging the clustering measurements. We demonstrate that the proposed evaluation measures provide notable advantages over previous ones. Moreover, a distributed and online T-MCT framework is proposed, where re-identification (Re-id) is embedded in T-MCT, to confirm the validity of the proposed evaluation measures. Experimental results reveal that with the proposed evaluation measures, the performance of T-MCT can be accurately measured, which is highly correlated to the performance of Re-id. Furthermore, it is also noted that our T-MCT framework achieves competitive score on the DukeMTMC dataset when compared to the previous work that used global optimization algorithms. Both the evaluation measures and the inter-camera tracking framework are proven to be the stepping stone for multi-camera tracking.
在这项研究中,我们提出了一套新的基于轨迹的多相机跟踪(T-MCT)任务的评估方法,利用聚类测量。我们证明了所提出的评价方法比以前的评价方法具有显著的优势。此外,提出了一个分布式的在线T-MCT框架,在T-MCT中嵌入再识别(Re-id),以验证所提出的评价措施的有效性。实验结果表明,利用所提出的评价指标可以准确地衡量T-MCT的性能,而T-MCT的性能与Re-id的性能高度相关。此外,还需要注意的是,与之前使用全局优化算法的工作相比,我们的T-MCT框架在DukeMTMC数据集上取得了具有竞争力的分数。评价指标和摄像机间跟踪框架都是实现多摄像机跟踪的基础。
{"title":"Track-Clustering Error Evaluation for Track-Based Multi-camera Tracking System Employing Human Re-identification","authors":"Chih-Wei Wu, Meng-Ting Zhong, Yu-Yu Tsao, Shao-Wen Yang, Yen-kuang Chen, Shao-Yi Chien","doi":"10.1109/CVPRW.2017.184","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.184","url":null,"abstract":"In this study, we present a set of new evaluation measures for the track-based multi-camera tracking (T-MCT) task leveraging the clustering measurements. We demonstrate that the proposed evaluation measures provide notable advantages over previous ones. Moreover, a distributed and online T-MCT framework is proposed, where re-identification (Re-id) is embedded in T-MCT, to confirm the validity of the proposed evaluation measures. Experimental results reveal that with the proposed evaluation measures, the performance of T-MCT can be accurately measured, which is highly correlated to the performance of Re-id. Furthermore, it is also noted that our T-MCT framework achieves competitive score on the DukeMTMC dataset when compared to the previous work that used global optimization algorithms. Both the evaluation measures and the inter-camera tracking framework are proven to be the stepping stone for multi-camera tracking.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"14 1","pages":"1416-1424"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80448147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Nonrigid Registration of Hyperspectral and Color Images with Vastly Different Spatial and Spectral Resolutions for Spectral Unmixing and Pansharpening 用于光谱解混和泛锐化的空间和光谱分辨率差异很大的高光谱和彩色图像的非刚性配准
Yuan Zhou, Anand Rangarajan, P. Gader
In this paper, we propose a framework to register images with very large scale differences by utilizing the point spread function (PSF), and apply it to register hyperspectral and hi-resolution color images. The algorithm minimizes a least-squares (LSQ) objective function with an incorporated spectral response function (SRF), a nonrigid freeform deformation applied on the hyperspectral image and a rigid transformation on the color image. The optimization problem is solved by updating the two transformations and the two physical functions in an alternating fashion. We executed the framework on a simulated Pavia University dataset and a real Salton Sea dataset, by comparing the proposed algorithm with its rigid variation, and two mutual information-based algorithms. The results indicate that the LSQ freeform version has the best performance for the nonrigid simulation and real datasets, with less than 0.15 pixel error given 1 pixel nonrigid distortion in the hyperspectral domain.
本文提出了一种利用点扩散函数(PSF)进行超大尺度差图像配准的框架,并将其应用于高光谱、高分辨率彩色图像的配准。该算法利用光谱响应函数(SRF)、对高光谱图像进行非刚性自由变形和对彩色图像进行刚性变换来实现最小二乘目标函数的最小化。通过交替更新两个变换和两个物理函数来解决优化问题。我们在模拟的Pavia University数据集和真实的Salton Sea数据集上执行了该框架,并将所提出的算法与其刚性变化和两种相互信息的算法进行了比较。结果表明,LSQ自由格式版本在非刚性模拟和真实数据集上具有最佳性能,在高光谱域给出1像素非刚性畸变时误差小于0.15像素。
{"title":"Nonrigid Registration of Hyperspectral and Color Images with Vastly Different Spatial and Spectral Resolutions for Spectral Unmixing and Pansharpening","authors":"Yuan Zhou, Anand Rangarajan, P. Gader","doi":"10.1109/CVPRW.2017.201","DOIUrl":"https://doi.org/10.1109/CVPRW.2017.201","url":null,"abstract":"In this paper, we propose a framework to register images with very large scale differences by utilizing the point spread function (PSF), and apply it to register hyperspectral and hi-resolution color images. The algorithm minimizes a least-squares (LSQ) objective function with an incorporated spectral response function (SRF), a nonrigid freeform deformation applied on the hyperspectral image and a rigid transformation on the color image. The optimization problem is solved by updating the two transformations and the two physical functions in an alternating fashion. We executed the framework on a simulated Pavia University dataset and a real Salton Sea dataset, by comparing the proposed algorithm with its rigid variation, and two mutual information-based algorithms. The results indicate that the LSQ freeform version has the best performance for the nonrigid simulation and real datasets, with less than 0.15 pixel error given 1 pixel nonrigid distortion in the hyperspectral domain.","PeriodicalId":6668,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)","volume":"26 1","pages":"1571-1579"},"PeriodicalIF":0.0,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81376323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1