{"title":"Transformer Approaches in Image Captioning: A Literature Review","authors":"Hilya Tsaniya, C. Fatichah, N. Suciati","doi":"10.1109/ICITEE56407.2022.9954086","DOIUrl":null,"url":null,"abstract":"Image captioning is one of the challenging tasks that cross the computer vision and the Natural Language Processing (NLP) domain. Its main task is to interpret images in a descriptive text similar to humans. Image captioning is useful to help humans understand visual content. The main challenge is to get a coherent caption that could be understood by a human. With the trend of Transformer in computer vision that has proven successful to reach new results in state-of-the-art, the interest to implement it in Image Captioning is also increased. This paper presents a literature review of image captioning using transformer methods. The literature is reviewed from reputable journals and conferences. Our review focus on transformer approaches in order to improve the model performance in image captioning. We also explore the existing public datasets that are used in image captioning. The limitations and future research on image captioning are also discussed with additional potential subsidiary research.","PeriodicalId":246279,"journal":{"name":"2022 14th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEE56407.2022.9954086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Image captioning is one of the challenging tasks that cross the computer vision and the Natural Language Processing (NLP) domain. Its main task is to interpret images in a descriptive text similar to humans. Image captioning is useful to help humans understand visual content. The main challenge is to get a coherent caption that could be understood by a human. With the trend of Transformer in computer vision that has proven successful to reach new results in state-of-the-art, the interest to implement it in Image Captioning is also increased. This paper presents a literature review of image captioning using transformer methods. The literature is reviewed from reputable journals and conferences. Our review focus on transformer approaches in order to improve the model performance in image captioning. We also explore the existing public datasets that are used in image captioning. The limitations and future research on image captioning are also discussed with additional potential subsidiary research.