{"title":"Image Caption Generation Using A Deep Architecture","authors":"A. Hani, Najiba Tagougui, M. Kherallah","doi":"10.1109/ACIT47987.2019.8990998","DOIUrl":null,"url":null,"abstract":"Recently, image captioning is a new challenging task that has gathered widespread interest. The task involves generating a concise description of an image in natural language and is currently accomplished by techniques that use a combination of computer vision (CV), natural language processing (NLP), and machine learning methods.In this paper, we presented a model that generates natural language description of an image. We used a combination of convolutional neural networks to extract features and then used recurrent neural networks to generate text from these features. We incorporated the attention mechanism while generating captions. We evaluated the model on MSCOCO database. The obtained results are promising and competitive.","PeriodicalId":314091,"journal":{"name":"2019 International Arab Conference on Information Technology (ACIT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIT47987.2019.8990998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Recently, image captioning is a new challenging task that has gathered widespread interest. The task involves generating a concise description of an image in natural language and is currently accomplished by techniques that use a combination of computer vision (CV), natural language processing (NLP), and machine learning methods.In this paper, we presented a model that generates natural language description of an image. We used a combination of convolutional neural networks to extract features and then used recurrent neural networks to generate text from these features. We incorporated the attention mechanism while generating captions. We evaluated the model on MSCOCO database. The obtained results are promising and competitive.