Jinxing Pan, Xiaoshan Yang, Yi Huang, Changsheng Xu
{"title":"Few-shot Egocentric Multimodal Activity Recognition","authors":"Jinxing Pan, Xiaoshan Yang, Yi Huang, Changsheng Xu","doi":"10.1145/3469877.3490603","DOIUrl":null,"url":null,"abstract":"Activity recognition based on egocentric multimodal data collected by wearable devices has become increasingly popular recently. However, conventional activity recognition methods face the dilemma of the lack of large-scale labeled egocentric multimodal datasets due to the high cost of data collection. In this paper, we propose a new task of few-shot egocentric multimodal activity recognition, which has at least two significant challenges. On the one hand, it is difficult to extract effective features from the multimodal data sequences of video and sensor signals due to the scarcity of the samples. On the other hand, how to robustly recognize novel activity classes with very few labeled samples becomes another more critical challenge due to the complexity of the multimodal data. To resolve the challenges, we propose a two-stream graph network, which consists of a heterogeneous graph-based multimodal association module and a knowledge-aware activity classifier module. The former uses a heterogeneous graph network to comprehensively capture the dynamic and complementary information contained in the multimodal data stream. The latter learns robust activity classifiers through knowledge propagation among the classifier parameters of different classes. In addition, we adopt episodic training strategy to improve the generalization ability of the proposed few-shot activity recognition model. Experiments on two public datasets show that the proposed model achieves better performances than other baseline models.","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"75 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3490603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Activity recognition based on egocentric multimodal data collected by wearable devices has become increasingly popular recently. However, conventional activity recognition methods face the dilemma of the lack of large-scale labeled egocentric multimodal datasets due to the high cost of data collection. In this paper, we propose a new task of few-shot egocentric multimodal activity recognition, which has at least two significant challenges. On the one hand, it is difficult to extract effective features from the multimodal data sequences of video and sensor signals due to the scarcity of the samples. On the other hand, how to robustly recognize novel activity classes with very few labeled samples becomes another more critical challenge due to the complexity of the multimodal data. To resolve the challenges, we propose a two-stream graph network, which consists of a heterogeneous graph-based multimodal association module and a knowledge-aware activity classifier module. The former uses a heterogeneous graph network to comprehensively capture the dynamic and complementary information contained in the multimodal data stream. The latter learns robust activity classifiers through knowledge propagation among the classifier parameters of different classes. In addition, we adopt episodic training strategy to improve the generalization ability of the proposed few-shot activity recognition model. Experiments on two public datasets show that the proposed model achieves better performances than other baseline models.