Improving The Performance of the Image Captioning Systems Using a Pre- Classification Stage: تحسين أداء أنظمة وصف الصور باستخدام مرحلة التصنيف المسبق للصور

Journal of engineering sciences and information technology Pub Date : 2022-03-27 DOI:10.26389/ajsrp.l270721

Rasha Mohammed Mualla, Jafar Alkheir, Samer Sulaiman Rasha Mohammed Mualla, Jafar Alkheir, Samer Sulaim

{"title":"Improving The Performance of the Image Captioning Systems Using a Pre- Classification Stage: تحسين أداء أنظمة وصف الصور باستخدام مرحلة التصنيف المسبق للصور","authors":"Rasha Mohammed Mualla, Jafar Alkheir, Samer Sulaiman Rasha Mohammed Mualla, Jafar Alkheir, Samer Sulaim","doi":"10.26389/ajsrp.l270721","DOIUrl":null,"url":null,"abstract":" In this research, we introduce a novel image classification and captioning system by adding a classification layer before the image captioning models. The suggested approach consists of three main steps and inspired by the state- of- art that generating image captioning inside small sub- classes categories is better than the unclassified large dataset. In the first one, we have collected a dataset of two international datasets (MS- COCO and Flickr2k) including 10778 images in which 80% is used for training and 20% for validation. In the next step, dataset images have been classified into 11 classes (10 classes of indoor and outdoor categories and one class of \"Null\" category) and fed into a deep learning classifier. The classifier is re- trained again using our classes and learned to classify each image to the corresponding category. At the final step, each classified image is used as input of 11 pre- trained classified image captioning models, and the final captioning sentence is generated. The experiments show that adding the pre- classification step before the image captioning stage improves the performance significantly by (8.15% and 8.44%) and (12.7407% and 16.7048%) for Top- 1 and Top- 5 of English and Arabic systems respectively. The classification step achieves a true classification rate of 71.32% and 73.09% for English and Arabic systems respectively.","PeriodicalId":15747,"journal":{"name":"Journal of engineering sciences and information technology","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of engineering sciences and information technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26389/ajsrp.l270721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this research, we introduce a novel image classification and captioning system by adding a classification layer before the image captioning models. The suggested approach consists of three main steps and inspired by the state- of- art that generating image captioning inside small sub- classes categories is better than the unclassified large dataset. In the first one, we have collected a dataset of two international datasets (MS- COCO and Flickr2k) including 10778 images in which 80% is used for training and 20% for validation. In the next step, dataset images have been classified into 11 classes (10 classes of indoor and outdoor categories and one class of "Null" category) and fed into a deep learning classifier. The classifier is re- trained again using our classes and learned to classify each image to the corresponding category. At the final step, each classified image is used as input of 11 pre- trained classified image captioning models, and the final captioning sentence is generated. The experiments show that adding the pre- classification step before the image captioning stage improves the performance significantly by (8.15% and 8.44%) and (12.7407% and 16.7048%) for Top- 1 and Top- 5 of English and Arabic systems respectively. The classification step achieves a true classification rate of 71.32% and 73.09% for English and Arabic systems respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

采用预先分类阶段改进图像说明系统的运作

在本研究中，我们通过在图像字幕模型之前添加一个分类层，引入了一种新的图像分类和字幕系统。该方法由三个主要步骤组成，并受到在小子类类别中生成图像标题比未分类的大数据集更好的最新技术的启发。在第一篇文章中，我们收集了两个国际数据集(MS- COCO和Flickr2k)的数据集，包括10778张图像，其中80%用于训练，20%用于验证。在下一步中，数据集图像被分为11类(10类室内和室外类别以及1类“Null”类别)，并输入深度学习分类器。使用我们的类重新训练分类器，并学习将每个图像分类到相应的类别。最后一步，将每张分类图像作为11个预训练的分类图像字幕模型的输入，生成最终的字幕句。实验表明，在图像字幕阶段之前加入预分类步骤，对英语和阿拉伯语的Top- 1和Top- 5系统的性能分别提高了8.15%和8.44%和12.7407%和16.7048%。该分类步骤对英语和阿拉伯语系统的真实分类率分别为71.32%和73.09%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助