{"title":"Annotator rationales for visual recognition","authors":"Jeff Donahue, K. Grauman","doi":"10.1109/ICCV.2011.6126394","DOIUrl":null,"url":null,"abstract":"Traditional supervised visual learning simply asks annotators “what” label an image should have. We propose an approach for image classification problems requiring subjective judgment that also asks “why”, and uses that information to enrich the learned model. We develop two forms of visual annotator rationales: in the first, the annotator highlights the spatial region of interest he found most influential to the label selected, and in the second, he comments on the visual attributes that were most important. For either case, we show how to map the response to synthetic contrast examples, and then exploit an existing large-margin learning technique to refine the decision boundary accordingly. Results on multiple scene categorization and human attractiveness tasks show the promise of our approach, which can more accurately learn complex categories with the explanations behind the label choices.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"8 1","pages":"1395-1402"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"76","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2011.6126394","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 76
Abstract
Traditional supervised visual learning simply asks annotators “what” label an image should have. We propose an approach for image classification problems requiring subjective judgment that also asks “why”, and uses that information to enrich the learned model. We develop two forms of visual annotator rationales: in the first, the annotator highlights the spatial region of interest he found most influential to the label selected, and in the second, he comments on the visual attributes that were most important. For either case, we show how to map the response to synthetic contrast examples, and then exploit an existing large-margin learning technique to refine the decision boundary accordingly. Results on multiple scene categorization and human attractiveness tasks show the promise of our approach, which can more accurately learn complex categories with the explanations behind the label choices.