行为视频中的自监督关键点发现。

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Pub Date : 2022-06-01 Epub Date: 2022-09-27 DOI:10.1109/cvpr52688.2022.00221

Jennifer J Sun, Serim Ryou, Roni H Goldshmid, Brandon Weissbourd, John O Dabiri, David J Anderson, Ann Kennedy, Yisong Yue, Pietro Perona

{"title":"行为视频中的自监督关键点发现。","authors":"Jennifer J Sun, Serim Ryou, Roni H Goldshmid, Brandon Weissbourd, John O Dabiri, David J Anderson, Ann Kennedy, Yisong Yue, Pietro Perona","doi":"10.1109/cvpr52688.2022.00221","DOIUrl":null,"url":null,"abstract":"We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. Starting from the observation that behaving agents are generally the main sources of movement in behavioral videos, our method, Behavioral Keypoint Discovery (B-KinD), uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the spatiotemporal difference between video frames. By focusing only on regions of movement, our approach works directly on input videos without requiring manual annotations. Experiments on a variety of agent types (mouse, fly, human, jellyfish, and trees) demonstrate the generality of our approach and reveal that our discovered keypoints represent semantically meaningful body parts, which achieve state-of-the-art performance on keypoint regression among self-supervised methods. Additionally, B-KinD achieve comparable performance to supervised keypoints on downstream tasks, such as behavior classification, suggesting that our method can dramatically reduce model training costs vis-a-vis supervised methods.","PeriodicalId":74560,"journal":{"name":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","volume":"2022 ","pages":"2161-2170"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9829414/pdf/nihms-1857208.pdf","citationCount":"0","resultStr":"{\"title\":\"Self-Supervised Keypoint Discovery in Behavioral Videos.\",\"authors\":\"Jennifer J Sun, Serim Ryou, Roni H Goldshmid, Brandon Weissbourd, John O Dabiri, David J Anderson, Ann Kennedy, Yisong Yue, Pietro Perona\",\"doi\":\"10.1109/cvpr52688.2022.00221\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. Starting from the observation that behaving agents are generally the main sources of movement in behavioral videos, our method, Behavioral Keypoint Discovery (B-KinD), uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the spatiotemporal difference between video frames. By focusing only on regions of movement, our approach works directly on input videos without requiring manual annotations. Experiments on a variety of agent types (mouse, fly, human, jellyfish, and trees) demonstrate the generality of our approach and reveal that our discovered keypoints represent semantically meaningful body parts, which achieve state-of-the-art performance on keypoint regression among self-supervised methods. Additionally, B-KinD achieve comparable performance to supervised keypoints on downstream tasks, such as behavior classification, suggesting that our method can dramatically reduce model training costs vis-a-vis supervised methods.\",\"PeriodicalId\":74560,\"journal\":{\"name\":\"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition\",\"volume\":\"2022 \",\"pages\":\"2161-2170\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9829414/pdf/nihms-1857208.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/cvpr52688.2022.00221\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/9/27 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cvpr52688.2022.00221","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/9/27 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种从无标签行为视频中学习代理姿态和结构的方法。行为视频中的主要运动来源通常是行为主体，从这一观察出发，我们的方法--行为关键点发现（B-KinD）--使用具有几何瓶颈的编码器-解码器架构来重建视频帧之间的时空差异。通过只关注运动区域，我们的方法可直接用于输入视频，而无需手动注释。在各种类型的物体（小鼠、苍蝇、人类、水母和树木）上进行的实验证明了我们方法的通用性，并揭示了我们发现的关键点代表了具有语义意义的身体部位，在自我监督方法中的关键点回归方面达到了最先进的性能。此外，B-KinD 在下游任务（如行为分类）中的表现与监督关键点不相上下，这表明与监督方法相比，我们的方法可以显著降低模型训练成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Self-Supervised Keypoint Discovery in Behavioral Videos.

We propose a method for learning the posture and structure of agents from unlabelled behavioral videos. Starting from the observation that behaving agents are generally the main sources of movement in behavioral videos, our method, Behavioral Keypoint Discovery (B-KinD), uses an encoder-decoder architecture with a geometric bottleneck to reconstruct the spatiotemporal difference between video frames. By focusing only on regions of movement, our approach works directly on input videos without requiring manual annotations. Experiments on a variety of agent types (mouse, fly, human, jellyfish, and trees) demonstrate the generality of our approach and reveal that our discovered keypoints represent semantically meaningful body parts, which achieve state-of-the-art performance on keypoint regression among self-supervised methods. Additionally, B-KinD achieve comparable performance to supervised keypoints on downstream tasks, such as behavior classification, suggesting that our method can dramatically reduce model training costs vis-a-vis supervised methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

CiteScore

43.50

自引率

0.00%

发文量