{"title":"Long-Term Recurrent Merge Network Model for Image Captioning","authors":"Yang Fan, Jungang Xu, Yingfei Sun, Ben He","doi":"10.1109/ICTAI.2018.00047","DOIUrl":null,"url":null,"abstract":"Language models based on Recurrent Neural Networks, e.g. Long Short Term Memory Network (LSTM), have shown strong ability in generating captions from image. However, in previous LSTM-based image captioning models, the image information is input to LSTM at 0th time step, and the network gradually forgets the image information, and only uses the language model to generate a simple description, leaving the potential in generating a better description. To address this challenge, in this paper, a Long-term Recurrent Merge Network (LRMN) model is proposed to merge the image feature at each step via a language model, which not only can improve the accuracy of image captioning, but also can describe the image better. Experimental results show that the proposed LRMN model has a promising improvement in image captioning.","PeriodicalId":254686,"journal":{"name":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2018.00047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Language models based on Recurrent Neural Networks, e.g. Long Short Term Memory Network (LSTM), have shown strong ability in generating captions from image. However, in previous LSTM-based image captioning models, the image information is input to LSTM at 0th time step, and the network gradually forgets the image information, and only uses the language model to generate a simple description, leaving the potential in generating a better description. To address this challenge, in this paper, a Long-term Recurrent Merge Network (LRMN) model is proposed to merge the image feature at each step via a language model, which not only can improve the accuracy of image captioning, but also can describe the image better. Experimental results show that the proposed LRMN model has a promising improvement in image captioning.