{"title":"利用深度卷积生成对抗网络加强图像字幕制作","authors":"Tarun Jaiswal, Manju Pandey, Priyanka Tripathi","doi":"10.2174/0126662558282389231229063607","DOIUrl":null,"url":null,"abstract":"\n\nIntroduction: Image caption generation has long been a fundamental challenge in the\narea of computer vision (CV) and natural language processing (NLP). In this research, we present\nan innovative approach that harnesses the power of Deep Convolutional Generative Adversarial\nNetworks (DCGAN) and adversarial training to revolutionize the generation of natural\nand contextually relevant image captions.\n\n\n\nOur method significantly improves the\nfluency, coherence, and contextual relevance of generated captions and showcases the effectiveness\nof RL reward-based fine-tuning. Through a comprehensive evaluation of COCO datasets,\nour model demonstrates superior performance over baseline and state-of-the-art methods.\nOn the COCO dataset, our model outperforms current state-of-the-art (SOTA) models\nacross all metrics, achieving BLEU-4 (0.327), METEOR (0.249), Rough (0.525) and CIDEr\n(1.155) scores.\n\n\n\nThe integration of DCGAN and adversarial training opens new possibilities\nin image captioning, with applications spanning from automated content generation to enhanced\naccessibility solutions.\n\n\n\nThis research paves the way for more intelligent\nand context-aware image understanding systems, promising exciting future exploration and innovation\nprospects.\n","PeriodicalId":36514,"journal":{"name":"Recent Advances in Computer Science and Communications","volume":"61 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Image Captioning Using Deep Convolutional Generative Adversarial Networks\",\"authors\":\"Tarun Jaiswal, Manju Pandey, Priyanka Tripathi\",\"doi\":\"10.2174/0126662558282389231229063607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n\\nIntroduction: Image caption generation has long been a fundamental challenge in the\\narea of computer vision (CV) and natural language processing (NLP). In this research, we present\\nan innovative approach that harnesses the power of Deep Convolutional Generative Adversarial\\nNetworks (DCGAN) and adversarial training to revolutionize the generation of natural\\nand contextually relevant image captions.\\n\\n\\n\\nOur method significantly improves the\\nfluency, coherence, and contextual relevance of generated captions and showcases the effectiveness\\nof RL reward-based fine-tuning. Through a comprehensive evaluation of COCO datasets,\\nour model demonstrates superior performance over baseline and state-of-the-art methods.\\nOn the COCO dataset, our model outperforms current state-of-the-art (SOTA) models\\nacross all metrics, achieving BLEU-4 (0.327), METEOR (0.249), Rough (0.525) and CIDEr\\n(1.155) scores.\\n\\n\\n\\nThe integration of DCGAN and adversarial training opens new possibilities\\nin image captioning, with applications spanning from automated content generation to enhanced\\naccessibility solutions.\\n\\n\\n\\nThis research paves the way for more intelligent\\nand context-aware image understanding systems, promising exciting future exploration and innovation\\nprospects.\\n\",\"PeriodicalId\":36514,\"journal\":{\"name\":\"Recent Advances in Computer Science and Communications\",\"volume\":\"61 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Recent Advances in Computer Science and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/0126662558282389231229063607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Advances in Computer Science and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/0126662558282389231229063607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
Enhancing Image Captioning Using Deep Convolutional Generative Adversarial Networks
Introduction: Image caption generation has long been a fundamental challenge in the
area of computer vision (CV) and natural language processing (NLP). In this research, we present
an innovative approach that harnesses the power of Deep Convolutional Generative Adversarial
Networks (DCGAN) and adversarial training to revolutionize the generation of natural
and contextually relevant image captions.
Our method significantly improves the
fluency, coherence, and contextual relevance of generated captions and showcases the effectiveness
of RL reward-based fine-tuning. Through a comprehensive evaluation of COCO datasets,
our model demonstrates superior performance over baseline and state-of-the-art methods.
On the COCO dataset, our model outperforms current state-of-the-art (SOTA) models
across all metrics, achieving BLEU-4 (0.327), METEOR (0.249), Rough (0.525) and CIDEr
(1.155) scores.
The integration of DCGAN and adversarial training opens new possibilities
in image captioning, with applications spanning from automated content generation to enhanced
accessibility solutions.
This research paves the way for more intelligent
and context-aware image understanding systems, promising exciting future exploration and innovation
prospects.