利用变换解码器生成与目标检测和颜色识别相关的图像标题

2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET) Pub Date : 2023-03-17 DOI:10.1109/iCoMET57998.2023.10099161

Z. U. Kamangar, G. Shaikh, Saif Hassan, Nimra Mughal, U. A. Kamangar

{"title":"利用变换解码器生成与目标检测和颜色识别相关的图像标题","authors":"Z. U. Kamangar, G. Shaikh, Saif Hassan, Nimra Mughal, U. A. Kamangar","doi":"10.1109/iCoMET57998.2023.10099161","DOIUrl":null,"url":null,"abstract":"The dependence on digital images is increasing in different fields. i.e, education, business, medicine, or defense, as they are shifting towards the online paradigm. So, there is a dire need for computers and other similar machines to interpret information related to these images and help the users understand the meaning of it. This has been achieved with the help of automatic Image captioning using different prediction models, such as machine learning and deep learning models. However, the problem with the traditional models, especially machine learning models, is that they may not generate a caption that accurately represents that Image. Although deep learning methods are better for generating captions of an image, it is still an open research area that requires a lot of work. Therefore, a model proposed in this research uses transformers with the help of attention layers to encode and decode the image token. Finally, it generates the image caption by identifying the objects along with their colours. The fliker8k and Conceptual Captions datasets are used to train this model, which contains images and captions. The fliker8k contains 8,092 images, each with five captions, and Conceptual Captions contains more than 3 million images, each with one caption. The contribution of this presented work is that it can be utilized by different companies, which require the interpretation of diverse images automatically and the naming of the images to describe some scenario or descriptions related to the images. In the future, the accuracy can be increased by increasing the number of images and captions or incorporating different deep-learning techniques.","PeriodicalId":369792,"journal":{"name":"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Image Caption Generation Related to Object Detection and Colour Recognition Using Transformer-Decoder\",\"authors\":\"Z. U. Kamangar, G. Shaikh, Saif Hassan, Nimra Mughal, U. A. Kamangar\",\"doi\":\"10.1109/iCoMET57998.2023.10099161\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The dependence on digital images is increasing in different fields. i.e, education, business, medicine, or defense, as they are shifting towards the online paradigm. So, there is a dire need for computers and other similar machines to interpret information related to these images and help the users understand the meaning of it. This has been achieved with the help of automatic Image captioning using different prediction models, such as machine learning and deep learning models. However, the problem with the traditional models, especially machine learning models, is that they may not generate a caption that accurately represents that Image. Although deep learning methods are better for generating captions of an image, it is still an open research area that requires a lot of work. Therefore, a model proposed in this research uses transformers with the help of attention layers to encode and decode the image token. Finally, it generates the image caption by identifying the objects along with their colours. The fliker8k and Conceptual Captions datasets are used to train this model, which contains images and captions. The fliker8k contains 8,092 images, each with five captions, and Conceptual Captions contains more than 3 million images, each with one caption. The contribution of this presented work is that it can be utilized by different companies, which require the interpretation of diverse images automatically and the naming of the images to describe some scenario or descriptions related to the images. In the future, the accuracy can be increased by increasing the number of images and captions or incorporating different deep-learning techniques.\",\"PeriodicalId\":369792,\"journal\":{\"name\":\"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iCoMET57998.2023.10099161\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iCoMET57998.2023.10099161","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

各个领域对数字图像的依赖日益增加。例如，教育、商业、医学或国防，因为它们正在转向在线范式。因此，迫切需要计算机和其他类似的机器来解释与这些图像相关的信息，并帮助用户理解它的含义。这是通过使用不同的预测模型(如机器学习和深度学习模型)的自动图像字幕实现的。然而，传统模型，尤其是机器学习模型的问题在于，它们可能无法生成准确代表该图像的标题。尽管深度学习方法更适合生成图像的标题，但它仍然是一个开放的研究领域，需要大量的工作。因此，本研究提出的模型使用变压器和注意层来对图像标记进行编码和解码。最后，它通过识别物体及其颜色来生成图像标题。使用fliker8k和Conceptual Captions数据集来训练该模型，该模型包含图像和标题。fliker8k包含8092张图片，每张图片有5个说明文字，Conceptual captions包含300多万张图片，每张图片都有一个说明文字。这项工作的贡献在于它可以被不同的公司使用，这些公司需要自动解释不同的图像，并为图像命名，以描述与图像相关的一些场景或描述。在未来，可以通过增加图像和说明文字的数量或结合不同的深度学习技术来提高准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Image Caption Generation Related to Object Detection and Colour Recognition Using Transformer-Decoder

The dependence on digital images is increasing in different fields. i.e, education, business, medicine, or defense, as they are shifting towards the online paradigm. So, there is a dire need for computers and other similar machines to interpret information related to these images and help the users understand the meaning of it. This has been achieved with the help of automatic Image captioning using different prediction models, such as machine learning and deep learning models. However, the problem with the traditional models, especially machine learning models, is that they may not generate a caption that accurately represents that Image. Although deep learning methods are better for generating captions of an image, it is still an open research area that requires a lot of work. Therefore, a model proposed in this research uses transformers with the help of attention layers to encode and decode the image token. Finally, it generates the image caption by identifying the objects along with their colours. The fliker8k and Conceptual Captions datasets are used to train this model, which contains images and captions. The fliker8k contains 8,092 images, each with five captions, and Conceptual Captions contains more than 3 million images, each with one caption. The contribution of this presented work is that it can be utilized by different companies, which require the interpretation of diverse images automatically and the naming of the images to describe some scenario or descriptions related to the images. In the future, the accuracy can be increased by increasing the number of images and captions or incorporating different deep-learning techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 4th International Conference on Computing, Mathematics and Engineering Technologies (iCoMET)

自引率

0.00%

发文量