Tae Soo Kim, Michael Peven, Weichao Qiu, A. Yuille, Gregory Hager
{"title":"用虚幻引擎合成属性进行细粒度活动分析","authors":"Tae Soo Kim, Michael Peven, Weichao Qiu, A. Yuille, Gregory Hager","doi":"10.1109/WACVW.2019.00013","DOIUrl":null,"url":null,"abstract":"We examine the problem of activity recognition in video using simulated data for training. In contrast to the expensive task of obtaining accurate labels from real data, synthetic data creation is not only fast and scalable, but provides ground-truth labels for more than just the activities of interest, including segmentation masks, 3D object keypoints, and more. We aim to successfully transfer a model trained on synthetic data to work on video in the real world. In this work, we provide a method of transferring from synthetic to real at intermediate representations of a video. We wish to perform activity recognition from the low-dimensional latent representation of a scene as a collection of visual attributes. As the ground-truth data does not exist in the ActEV dataset for attributes of interest, specifically orientation of cars in the ground-plane with respect to the camera, we synthesize this data. We show how we can successfully transfer a car orientation classifier, and use its predictions in our defined set of visual attributes to classify actions in video.","PeriodicalId":254512,"journal":{"name":"2019 IEEE Winter Applications of Computer Vision Workshops (WACVW)","volume":"15 27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Synthesizing Attributes with Unreal Engine for Fine-grained Activity Analysis\",\"authors\":\"Tae Soo Kim, Michael Peven, Weichao Qiu, A. Yuille, Gregory Hager\",\"doi\":\"10.1109/WACVW.2019.00013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We examine the problem of activity recognition in video using simulated data for training. In contrast to the expensive task of obtaining accurate labels from real data, synthetic data creation is not only fast and scalable, but provides ground-truth labels for more than just the activities of interest, including segmentation masks, 3D object keypoints, and more. We aim to successfully transfer a model trained on synthetic data to work on video in the real world. In this work, we provide a method of transferring from synthetic to real at intermediate representations of a video. We wish to perform activity recognition from the low-dimensional latent representation of a scene as a collection of visual attributes. As the ground-truth data does not exist in the ActEV dataset for attributes of interest, specifically orientation of cars in the ground-plane with respect to the camera, we synthesize this data. We show how we can successfully transfer a car orientation classifier, and use its predictions in our defined set of visual attributes to classify actions in video.\",\"PeriodicalId\":254512,\"journal\":{\"name\":\"2019 IEEE Winter Applications of Computer Vision Workshops (WACVW)\",\"volume\":\"15 27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Winter Applications of Computer Vision Workshops (WACVW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WACVW.2019.00013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Winter Applications of Computer Vision Workshops (WACVW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACVW.2019.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Synthesizing Attributes with Unreal Engine for Fine-grained Activity Analysis
We examine the problem of activity recognition in video using simulated data for training. In contrast to the expensive task of obtaining accurate labels from real data, synthetic data creation is not only fast and scalable, but provides ground-truth labels for more than just the activities of interest, including segmentation masks, 3D object keypoints, and more. We aim to successfully transfer a model trained on synthetic data to work on video in the real world. In this work, we provide a method of transferring from synthetic to real at intermediate representations of a video. We wish to perform activity recognition from the low-dimensional latent representation of a scene as a collection of visual attributes. As the ground-truth data does not exist in the ActEV dataset for attributes of interest, specifically orientation of cars in the ground-plane with respect to the camera, we synthesize this data. We show how we can successfully transfer a car orientation classifier, and use its predictions in our defined set of visual attributes to classify actions in video.