Mert Kilickaya, Nazli Ikizler-Cinbis, Erkut Erdem, Aykut Erdem
{"title":"Leveraging Captions in the Wild to Improve Object Detection","authors":"Mert Kilickaya, Nazli Ikizler-Cinbis, Erkut Erdem, Aykut Erdem","doi":"10.18653/v1/W16-3204","DOIUrl":null,"url":null,"abstract":"In this study, we explore whether the captions in the wild can boost the performance of object detection in images. Captions that accompany images usually provide significant information about the visual content of the image, making them an important resource for image understanding. However, captions in the wild are likely to include numerous types of noises which can hurt visual estimation. In this paper, we propose data-driven methods to deal with the noisy captions and utilize them to improve object detection. We show how a pre-trained state-of-theart object detector can take advantage of noisy captions. Our experiments demonstrate that captions provide promising cues about the visual content of the images and can aid in improving object detection.","PeriodicalId":444974,"journal":{"name":"Proceedings of the 5th Workshop on Vision and Language","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th Workshop on Vision and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W16-3204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this study, we explore whether the captions in the wild can boost the performance of object detection in images. Captions that accompany images usually provide significant information about the visual content of the image, making them an important resource for image understanding. However, captions in the wild are likely to include numerous types of noises which can hurt visual estimation. In this paper, we propose data-driven methods to deal with the noisy captions and utilize them to improve object detection. We show how a pre-trained state-of-theart object detector can take advantage of noisy captions. Our experiments demonstrate that captions provide promising cues about the visual content of the images and can aid in improving object detection.