Improved Framework using Rider Optimization Algorithm for Precise Image Caption Generation

Chaitrali Prasanna Chaudhari, S. Devane
{"title":"Improved Framework using Rider Optimization Algorithm for Precise Image Caption Generation","authors":"Chaitrali Prasanna Chaudhari, S. Devane","doi":"10.1142/s0219467822500218","DOIUrl":null,"url":null,"abstract":"“Image Captioning is the process of generating a textual description of an image”. It deploys both computer vision and natural language processing for caption generation. However, the majority of the image captioning systems offer unclear depictions regarding the objects like “man”, “woman”, “group of people”, “building”, etc. Hence, this paper intends to develop an intelligent-based image captioning model. The adopted model comprises of few steps like word generation, sentence formation, and caption generation. Initially, the input image is subjected to the Deep learning classifier called Convolutional Neural Network (CNN). Since the classifier is already trained in the relevant words that are related to all images, it can easily classify the associated words of the given image. Further, a set of sentences is formed with the generated words using Long-Short Term Memory (LSTM) model. The likelihood of the formed sentences is computed using the Maximum Likelihood (ML) function, and the sentences with higher probability are taken, which is further used for generating the visual representation of the scene in terms of image caption. As a major novelty, this paper aims to enhance the performance of CNN by optimally tuning its weight and activation function. This paper introduces a new enhanced optimization algorithm Rider with Randomized Bypass and Over-taker update (RR-BOU) for this optimal selection. In the proposed RR-BOU is the enhanced version of the Rider Optimization Algorithm (ROA). Finally, the performance of the proposed captioning model is compared over other conventional models with respect to statistical analysis.","PeriodicalId":177479,"journal":{"name":"Int. J. Image Graph.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Image Graph.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219467822500218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

“Image Captioning is the process of generating a textual description of an image”. It deploys both computer vision and natural language processing for caption generation. However, the majority of the image captioning systems offer unclear depictions regarding the objects like “man”, “woman”, “group of people”, “building”, etc. Hence, this paper intends to develop an intelligent-based image captioning model. The adopted model comprises of few steps like word generation, sentence formation, and caption generation. Initially, the input image is subjected to the Deep learning classifier called Convolutional Neural Network (CNN). Since the classifier is already trained in the relevant words that are related to all images, it can easily classify the associated words of the given image. Further, a set of sentences is formed with the generated words using Long-Short Term Memory (LSTM) model. The likelihood of the formed sentences is computed using the Maximum Likelihood (ML) function, and the sentences with higher probability are taken, which is further used for generating the visual representation of the scene in terms of image caption. As a major novelty, this paper aims to enhance the performance of CNN by optimally tuning its weight and activation function. This paper introduces a new enhanced optimization algorithm Rider with Randomized Bypass and Over-taker update (RR-BOU) for this optimal selection. In the proposed RR-BOU is the enhanced version of the Rider Optimization Algorithm (ROA). Finally, the performance of the proposed captioning model is compared over other conventional models with respect to statistical analysis.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于骑手优化算法的图像标题精确生成改进框架
“图像字幕是生成图像文本描述的过程”。它部署了计算机视觉和自然语言处理来生成标题。然而,大多数图像字幕系统对“男人”、“女人”、“一群人”、“建筑物”等物体提供了不明确的描述。因此,本文打算开发一种基于智能的图像字幕模型。所采用的模型包括单词生成、句子形成和标题生成等几个步骤。最初,输入图像要经过称为卷积神经网络(CNN)的深度学习分类器。由于分类器已经在与所有图像相关的相关词中进行了训练,因此它可以很容易地对给定图像的相关词进行分类。然后,使用长短期记忆(LSTM)模型将生成的单词组成一组句子。使用最大似然(Maximum likelihood, ML)函数计算所形成句子的似然,选取概率较高的句子,进一步用于生成场景在图像标题方面的视觉表示。作为一个主要的新颖点,本文旨在通过优化调整其权重和激活函数来提高CNN的性能。针对这一优化选择,本文提出了一种新的增强型随机旁路和超车更新优化算法(RR-BOU)。提出的RR-BOU是骑手优化算法(ROA)的增强版本。最后,在统计分析方面,将本文提出的字幕模型与其他传统模型的性能进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Hybrid Pattern Extraction with Deep Learning-Based Heart Disease Diagnosis Using Echocardiogram Images Certainty-Based Deep Fused Neural Network Using Transfer Learning and Adaptive Movement Estimation for the Diagnosis of Cardiomegaly Deep Ensemble Model for Spam Classification in Twitter via Sentiment Extraction: Bio-Inspiration-Based Classification Model A Systematic Survey on Photorealistic Computer Graphic and Photographic Image Discrimination A Review on Deep Learning Classifier for Hyperspectral Imaging
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1