{"title":"视觉识别中高层表示的分层联合最大边际学习","authors":"Hans Lobel, R. Vidal, Á. Soto","doi":"10.1109/ICCV.2013.213","DOIUrl":null,"url":null,"abstract":"Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is built on top of low-level image descriptors and top-level classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid- and top-level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason for this is the complex data association problem related to the usual large dictionary size needed by BoVW approaches. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to extensions of binary classification schemes, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach to visual recognition based on a BoVW scheme that jointly learns suitable mid- and top-level representations. Furthermore, using a max-margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular benchmark datasets. As our main result, we demonstrate that, by coupling learning of mid- and top-level representations, the proposed approach fosters sharing of discriminative visual words among target classes, being able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"18 1","pages":"1697-1704"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition\",\"authors\":\"Hans Lobel, R. Vidal, Á. Soto\",\"doi\":\"10.1109/ICCV.2013.213\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is built on top of low-level image descriptors and top-level classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid- and top-level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason for this is the complex data association problem related to the usual large dictionary size needed by BoVW approaches. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to extensions of binary classification schemes, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach to visual recognition based on a BoVW scheme that jointly learns suitable mid- and top-level representations. Furthermore, using a max-margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular benchmark datasets. As our main result, we demonstrate that, by coupling learning of mid- and top-level representations, the proposed approach fosters sharing of discriminative visual words among target classes, being able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.\",\"PeriodicalId\":6351,\"journal\":{\"name\":\"2013 IEEE International Conference on Computer Vision\",\"volume\":\"18 1\",\"pages\":\"1697-1704\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Conference on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2013.213\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2013.213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
Currently, Bag-of-Visual-Words (BoVW) and part-based methods are the most popular approaches for visual recognition. In both cases, a mid-level representation is built on top of low-level image descriptors and top-level classifiers use this mid-level representation to achieve visual recognition. While in current part-based approaches, mid- and top-level representations are usually jointly trained, this is not the usual case for BoVW schemes. A main reason for this is the complex data association problem related to the usual large dictionary size needed by BoVW approaches. As a further observation, typical solutions based on BoVW and part-based representations are usually limited to extensions of binary classification schemes, a strategy that ignores relevant correlations among classes. In this work we propose a novel hierarchical approach to visual recognition based on a BoVW scheme that jointly learns suitable mid- and top-level representations. Furthermore, using a max-margin learning framework, the proposed approach directly handles the multiclass case at both levels of abstraction. We test our proposed method using several popular benchmark datasets. As our main result, we demonstrate that, by coupling learning of mid- and top-level representations, the proposed approach fosters sharing of discriminative visual words among target classes, being able to achieve state-of-the-art recognition performance using far less visual words than previous approaches.