{"title":"Attention Networks for Weakly Supervised Object Localization","authors":"Eu Wern Teh, Mrigank Rochan, Yang Wang","doi":"10.5244/C.30.52","DOIUrl":null,"url":null,"abstract":"We consider the problem of weakly supervised learning for object localization. Given a collection of images with image-level annotations indicating the presence/absence of an object, our goal is to localize the object in each image. We propose a neural network architecture called the attention network for this problem. Given a set of candidate regions in an image, the attention network first computes an attention score on each candidate region in the image. Then these candidate regions are combined together with their attention scores to form a whole-image feature vector. This feature vector is used for classifying the image. The object localization is implicitly achieved via the attention scores on candidate regions. We demonstrate that our approach achieves superior performance on several benchmark datasets.","PeriodicalId":125761,"journal":{"name":"Procedings of the British Machine Vision Conference 2016","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedings of the British Machine Vision Conference 2016","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5244/C.30.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 64
Abstract
We consider the problem of weakly supervised learning for object localization. Given a collection of images with image-level annotations indicating the presence/absence of an object, our goal is to localize the object in each image. We propose a neural network architecture called the attention network for this problem. Given a set of candidate regions in an image, the attention network first computes an attention score on each candidate region in the image. Then these candidate regions are combined together with their attention scores to form a whole-image feature vector. This feature vector is used for classifying the image. The object localization is implicitly achieved via the attention scores on candidate regions. We demonstrate that our approach achieves superior performance on several benchmark datasets.