{"title":"You don't drink a cupboard: improving egocentric action recognition with co-occurrence of verbs and nouns","authors":"Hiroki Kojima, Naoshi Kaneko, Seiya Ito, K. Sumi","doi":"10.1117/12.2591298","DOIUrl":null,"url":null,"abstract":"We propose a refinement module to improve action recognition by considering the semantic relevance between verbs and nouns. Existing methods recognize actions as a combination of verb and noun. However, they occasionally produce the semantically implausible combination, such as “drink a cupboard” or “open a carrot”. To tackle this problem, we propose a method that incorporates a word embedding model into an action recognition network. The word embedding model is trained to obtain co-occurrence between verbs and nouns and used to refine the initial class probabilities estimated by the network. Experimental results show that our method improves the estimation accuracy of verbs and nouns on the EPIC-KITCHENS Dataset.","PeriodicalId":295011,"journal":{"name":"International Conference on Quality Control by Artificial Vision","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Quality Control by Artificial Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2591298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a refinement module to improve action recognition by considering the semantic relevance between verbs and nouns. Existing methods recognize actions as a combination of verb and noun. However, they occasionally produce the semantically implausible combination, such as “drink a cupboard” or “open a carrot”. To tackle this problem, we propose a method that incorporates a word embedding model into an action recognition network. The word embedding model is trained to obtain co-occurrence between verbs and nouns and used to refine the initial class probabilities estimated by the network. Experimental results show that our method improves the estimation accuracy of verbs and nouns on the EPIC-KITCHENS Dataset.