{"title":"图片说明:使用深度学习通过文本解释图片","authors":"Sunil Varma, Nitika Kapoor","doi":"10.1109/ICNWC57852.2023.10127293","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel approach to generate image captions using deep learning techniques. Our model employs a pre-trained visual language model and matches picture label information to generate familiar picture captions that can depict novel articles. We also use a range of pre-training techniques for learning cross-modal representations on picture text sets, which contribute to the model’s ability to predict picture text semantic arrangements. We demonstrate that our model outperforms state-of-the-art models on the Flicker 8K dataset. We also employ a combination of long short-term memory (LSTM) and Convolutional Neural Networks (CNNs) layers to extract image features, which help the model understand and highlight the relationship between image features and caption semantics. Our results suggest that our approach can provide a more effective and resource-efficient solution for generating image captions. Overall, this paper presents a comprehensive investigation into the use of deep learning techniques for image caption generation.","PeriodicalId":197525,"journal":{"name":"2023 International Conference on Networking and Communications (ICNWC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image Caption: Explaining Pictures by Text using Deep Learning\",\"authors\":\"Sunil Varma, Nitika Kapoor\",\"doi\":\"10.1109/ICNWC57852.2023.10127293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a novel approach to generate image captions using deep learning techniques. Our model employs a pre-trained visual language model and matches picture label information to generate familiar picture captions that can depict novel articles. We also use a range of pre-training techniques for learning cross-modal representations on picture text sets, which contribute to the model’s ability to predict picture text semantic arrangements. We demonstrate that our model outperforms state-of-the-art models on the Flicker 8K dataset. We also employ a combination of long short-term memory (LSTM) and Convolutional Neural Networks (CNNs) layers to extract image features, which help the model understand and highlight the relationship between image features and caption semantics. Our results suggest that our approach can provide a more effective and resource-efficient solution for generating image captions. Overall, this paper presents a comprehensive investigation into the use of deep learning techniques for image caption generation.\",\"PeriodicalId\":197525,\"journal\":{\"name\":\"2023 International Conference on Networking and Communications (ICNWC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Networking and Communications (ICNWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNWC57852.2023.10127293\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Networking and Communications (ICNWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNWC57852.2023.10127293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Image Caption: Explaining Pictures by Text using Deep Learning
In this paper, we propose a novel approach to generate image captions using deep learning techniques. Our model employs a pre-trained visual language model and matches picture label information to generate familiar picture captions that can depict novel articles. We also use a range of pre-training techniques for learning cross-modal representations on picture text sets, which contribute to the model’s ability to predict picture text semantic arrangements. We demonstrate that our model outperforms state-of-the-art models on the Flicker 8K dataset. We also employ a combination of long short-term memory (LSTM) and Convolutional Neural Networks (CNNs) layers to extract image features, which help the model understand and highlight the relationship between image features and caption semantics. Our results suggest that our approach can provide a more effective and resource-efficient solution for generating image captions. Overall, this paper presents a comprehensive investigation into the use of deep learning techniques for image caption generation.