利用变换解码器生成与目标检测和颜色识别相关的图像标题

Z. U. Kamangar, G. Shaikh, Saif Hassan, Nimra Mughal, U. A. Kamangar
{"title":"利用变换解码器生成与目标检测和颜色识别相关的图像标题","authors":"Z. U. Kamangar, G. Shaikh, Saif Hassan, Nimra Mughal, U. A. Kamangar","doi":"10.1109/iCoMET57998.2023.10099161","DOIUrl":null,"url":null,"abstract":"The dependence on digital images is increasing in different fields. i.e, education, business, medicine, or defense, as they are shifting towards the online paradigm. So, there is a dire need for computers and other similar machines to interpret information related to these images and help the users understand the meaning of it. This has been achieved with the help of automatic Image captioning using different prediction models, such as machine learning and deep learning models. However, the problem with the traditional models, especially machine learning models, is that they may not generate a caption that accurately represents that Image. Although deep learning methods are better for generating captions of an image, it is still an open research area that requires a lot of work. Therefore, a model proposed in this research uses transformers with the help of attention layers to encode and decode the image token. Finally, it generates the image caption by identifying the objects along with their colours. The fliker8k and Conceptual Captions datasets are used to train this model, which contains images and captions. The fliker8k contains 8,092 images, each with five captions, and Conceptual Captions contains more than 3 million images, each with one caption. The contribution of this presented work is that it can be utilized by different companies, which require the interpretation of diverse images automatically and the naming of the images to describe some scenario or descriptions related to the images. In the future, the accuracy can be increased by increasing the number of images and captions or incorporating different deep-learning techniques.","PeriodicalId":369792,"journal":{"name":"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Image Caption Generation Related to Object Detection and Colour Recognition Using Transformer-Decoder\",\"authors\":\"Z. U. Kamangar, G. Shaikh, Saif Hassan, Nimra Mughal, U. A. Kamangar\",\"doi\":\"10.1109/iCoMET57998.2023.10099161\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The dependence on digital images is increasing in different fields. i.e, education, business, medicine, or defense, as they are shifting towards the online paradigm. So, there is a dire need for computers and other similar machines to interpret information related to these images and help the users understand the meaning of it. This has been achieved with the help of automatic Image captioning using different prediction models, such as machine learning and deep learning models. However, the problem with the traditional models, especially machine learning models, is that they may not generate a caption that accurately represents that Image. Although deep learning methods are better for generating captions of an image, it is still an open research area that requires a lot of work. Therefore, a model proposed in this research uses transformers with the help of attention layers to encode and decode the image token. Finally, it generates the image caption by identifying the objects along with their colours. The fliker8k and Conceptual Captions datasets are used to train this model, which contains images and captions. The fliker8k contains 8,092 images, each with five captions, and Conceptual Captions contains more than 3 million images, each with one caption. The contribution of this presented work is that it can be utilized by different companies, which require the interpretation of diverse images automatically and the naming of the images to describe some scenario or descriptions related to the images. In the future, the accuracy can be increased by increasing the number of images and captions or incorporating different deep-learning techniques.\",\"PeriodicalId\":369792,\"journal\":{\"name\":\"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iCoMET57998.2023.10099161\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iCoMET57998.2023.10099161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

各个领域对数字图像的依赖日益增加。例如,教育、商业、医学或国防,因为它们正在转向在线范式。因此,迫切需要计算机和其他类似的机器来解释与这些图像相关的信息,并帮助用户理解它的含义。这是通过使用不同的预测模型(如机器学习和深度学习模型)的自动图像字幕实现的。然而,传统模型,尤其是机器学习模型的问题在于,它们可能无法生成准确代表该图像的标题。尽管深度学习方法更适合生成图像的标题,但它仍然是一个开放的研究领域,需要大量的工作。因此,本研究提出的模型使用变压器和注意层来对图像标记进行编码和解码。最后,它通过识别物体及其颜色来生成图像标题。使用fliker8k和Conceptual Captions数据集来训练该模型,该模型包含图像和标题。fliker8k包含8092张图片,每张图片有5个说明文字,Conceptual captions包含300多万张图片,每张图片都有一个说明文字。这项工作的贡献在于它可以被不同的公司使用,这些公司需要自动解释不同的图像,并为图像命名,以描述与图像相关的一些场景或描述。在未来,可以通过增加图像和说明文字的数量或结合不同的深度学习技术来提高准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Image Caption Generation Related to Object Detection and Colour Recognition Using Transformer-Decoder
The dependence on digital images is increasing in different fields. i.e, education, business, medicine, or defense, as they are shifting towards the online paradigm. So, there is a dire need for computers and other similar machines to interpret information related to these images and help the users understand the meaning of it. This has been achieved with the help of automatic Image captioning using different prediction models, such as machine learning and deep learning models. However, the problem with the traditional models, especially machine learning models, is that they may not generate a caption that accurately represents that Image. Although deep learning methods are better for generating captions of an image, it is still an open research area that requires a lot of work. Therefore, a model proposed in this research uses transformers with the help of attention layers to encode and decode the image token. Finally, it generates the image caption by identifying the objects along with their colours. The fliker8k and Conceptual Captions datasets are used to train this model, which contains images and captions. The fliker8k contains 8,092 images, each with five captions, and Conceptual Captions contains more than 3 million images, each with one caption. The contribution of this presented work is that it can be utilized by different companies, which require the interpretation of diverse images automatically and the naming of the images to describe some scenario or descriptions related to the images. In the future, the accuracy can be increased by increasing the number of images and captions or incorporating different deep-learning techniques.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evaluating the Effectiveness of Flat Plate Solar Collector for Water Heating in Pakistan Deep learning-based video anonymization for security and privacy Sustainable Performance of Energy sector Organizations through Green Supply Chain Management The Influential factors for Pollution Emissions in manufacturing business Polyimide Substrate based Compact Antenna for Terahertz Wireless Communication Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1