André Rosa de Sousa Porfírio Correia, L. Alexandre
{"title":"Multi-View Contrastive Learning from Demonstrations","authors":"André Rosa de Sousa Porfírio Correia, L. Alexandre","doi":"10.1109/IRC55401.2022.00067","DOIUrl":null,"url":null,"abstract":"This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating robotic tasks. We use contrastive learning to enhance task-relevant information while suppressing irrelevant information in the feature embeddings. We validate the proposed method on the publicly available Multi-View Pouring and a custom Pick and Place data sets and compare it with the TCN and CMC baselines. We evaluate the learned representations using three metrics: viewpoint alignment, stage classification and reinforcement learning. In all cases, the results improve when compared to state-of-the-art approaches.","PeriodicalId":282759,"journal":{"name":"2022 Sixth IEEE International Conference on Robotic Computing (IRC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Sixth IEEE International Conference on Robotic Computing (IRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRC55401.2022.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating robotic tasks. We use contrastive learning to enhance task-relevant information while suppressing irrelevant information in the feature embeddings. We validate the proposed method on the publicly available Multi-View Pouring and a custom Pick and Place data sets and compare it with the TCN and CMC baselines. We evaluate the learned representations using three metrics: viewpoint alignment, stage classification and reinforcement learning. In all cases, the results improve when compared to state-of-the-art approaches.