从第一人称视频预测篮球运动员的行为

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-21 DOI:10.1109/CVPR.2017.133

Shan Su, J. Hong, Jianbo Shi, H. Park

{"title":"从第一人称视频预测篮球运动员的行为","authors":"Shan Su, J. Hong, Jianbo Shi, H. Park","doi":"10.1109/CVPR.2017.133","DOIUrl":null,"url":null,"abstract":"This paper presents a method to predict the future movements (location and gaze direction) of basketball players as a whole from their first person videos. The predicted behaviors reflect an individual physical space that affords to take the next actions while conforming to social behaviors by engaging to joint attention. Our key innovation is to use the 3D reconstruction of multiple first person cameras to automatically annotate each others visual semantics of social configurations. We leverage two learning signals uniquely embedded in first person videos. Individually, a first person video records the visual semantics of a spatial and social layout around a person that allows associating with past similar situations. Collectively, first person videos follow joint attention that can link the individuals to a group. We learn the egocentric visual semantics of group movements using a Siamese neural network to retrieve future trajectories. We consolidate the retrieved trajectories from all players by maximizing a measure of social compatibility—the gaze alignment towards joint attention predicted by their social formation, where the dynamics of joint attention is learned by a long-term recurrent convolutional network. This allows us to characterize which social configuration is more plausible and predict future group trajectories.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"60 1","pages":"1206-1215"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":"{\"title\":\"Predicting Behaviors of Basketball Players from First Person Videos\",\"authors\":\"Shan Su, J. Hong, Jianbo Shi, H. Park\",\"doi\":\"10.1109/CVPR.2017.133\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a method to predict the future movements (location and gaze direction) of basketball players as a whole from their first person videos. The predicted behaviors reflect an individual physical space that affords to take the next actions while conforming to social behaviors by engaging to joint attention. Our key innovation is to use the 3D reconstruction of multiple first person cameras to automatically annotate each others visual semantics of social configurations. We leverage two learning signals uniquely embedded in first person videos. Individually, a first person video records the visual semantics of a spatial and social layout around a person that allows associating with past similar situations. Collectively, first person videos follow joint attention that can link the individuals to a group. We learn the egocentric visual semantics of group movements using a Siamese neural network to retrieve future trajectories. We consolidate the retrieved trajectories from all players by maximizing a measure of social compatibility—the gaze alignment towards joint attention predicted by their social formation, where the dynamics of joint attention is learned by a long-term recurrent convolutional network. This allows us to characterize which social configuration is more plausible and predict future group trajectories.\",\"PeriodicalId\":6631,\"journal\":{\"name\":\"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)\",\"volume\":\"60 1\",\"pages\":\"1206-1215\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"44\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2017.133\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2017.133","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

摘要

本文提出了一种从篮球运动员的第一人称视频中整体预测其未来动作(位置和注视方向)的方法。预测的行为反映了个体的物理空间，可以采取下一步行动，同时通过参与共同关注来遵守社会行为。我们的关键创新是使用多个第一人称摄像机的3D重建来自动注释彼此的社会配置的视觉语义。我们利用了第一人称视频中独特的两种学习信号。单独来说，第一人称视频记录了一个人周围的空间和社会布局的视觉语义，允许将其与过去的类似情况联系起来。总的来说，第一人称视频遵循共同关注，可以将个人与群体联系起来。我们学习自我中心的视觉语义群体运动使用暹罗神经网络检索未来的轨迹。我们通过最大化社会相容性(social compatibility—)来巩固所有参与者的检索轨迹;由他们的社会形态预测的共同注意的注视对齐，其中共同注意的动态是通过长期循环卷积网络学习的。这使我们能够确定哪种社会结构更合理，并预测未来的群体轨迹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Predicting Behaviors of Basketball Players from First Person Videos

This paper presents a method to predict the future movements (location and gaze direction) of basketball players as a whole from their first person videos. The predicted behaviors reflect an individual physical space that affords to take the next actions while conforming to social behaviors by engaging to joint attention. Our key innovation is to use the 3D reconstruction of multiple first person cameras to automatically annotate each others visual semantics of social configurations. We leverage two learning signals uniquely embedded in first person videos. Individually, a first person video records the visual semantics of a spatial and social layout around a person that allows associating with past similar situations. Collectively, first person videos follow joint attention that can link the individuals to a group. We learn the egocentric visual semantics of group movements using a Siamese neural network to retrieve future trajectories. We consolidate the retrieved trajectories from all players by maximizing a measure of social compatibility—the gaze alignment towards joint attention predicted by their social formation, where the dynamics of joint attention is learned by a long-term recurrent convolutional network. This allows us to characterize which social configuration is more plausible and predict future group trajectories.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量

期刊最新文献

FFTLasso: Large-Scale LASSO in the Fourier Domain Semantically Coherent Co-Segmentation and Reconstruction of Dynamic Scenes Coarse-to-Fine Segmentation with Shape-Tailored Continuum Scale Spaces Joint Gap Detection and Inpainting of Line Drawings Wetness and Color from a Single Multispectral Image