使用文本和视觉线索进行细粒度分类

International Journal of Advanced Network, Monitoring and Controls Pub Date : 2021-01-01 DOI:10.21307/ijanmc-2021-026

Zaryab Shaker, Xiao Feng, M. A. Tahir

{"title":"使用文本和视觉线索进行细粒度分类","authors":"Zaryab Shaker, Xiao Feng, M. A. Tahir","doi":"10.21307/ijanmc-2021-026","DOIUrl":null,"url":null,"abstract":"Abstract Text is an important invention of humanity, which plays a key role in human life, so far from dark ages. Text in image is closely related to the scene or a product and is widely used in vision based application. In this paper we are addressing the problem of visual understanding with text. The main focus is combining textual cues and visual cues in deep neural network. First the text is recognized and classified from the image. Then we combine the attended word embedding and visual feature vector which are then optimized by CNN for Fine-grained image classification. We carried out the experiments on soft drink dataset in Pakistan. The results shows that the system achieves significant performance which can be potentially beneficial for real world application e.g. product search.","PeriodicalId":193299,"journal":{"name":"International Journal of Advanced Network, Monitoring and Controls","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Text and Visual Cues for Fine-Grained Classification\",\"authors\":\"Zaryab Shaker, Xiao Feng, M. A. Tahir\",\"doi\":\"10.21307/ijanmc-2021-026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Text is an important invention of humanity, which plays a key role in human life, so far from dark ages. Text in image is closely related to the scene or a product and is widely used in vision based application. In this paper we are addressing the problem of visual understanding with text. The main focus is combining textual cues and visual cues in deep neural network. First the text is recognized and classified from the image. Then we combine the attended word embedding and visual feature vector which are then optimized by CNN for Fine-grained image classification. We carried out the experiments on soft drink dataset in Pakistan. The results shows that the system achieves significant performance which can be potentially beneficial for real world application e.g. product search.\",\"PeriodicalId\":193299,\"journal\":{\"name\":\"International Journal of Advanced Network, Monitoring and Controls\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advanced Network, Monitoring and Controls\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21307/ijanmc-2021-026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Network, Monitoring and Controls","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21307/ijanmc-2021-026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本是人类的一项重要发明，它在人类生活中起着至关重要的作用，从此远离了黑暗时代。图像中的文字与场景或产品密切相关，在基于视觉的应用中有着广泛的应用。在本文中，我们正在解决文本的视觉理解问题。研究的重点是文本线索和视觉线索在深度神经网络中的结合。首先从图像中对文本进行识别和分类。然后，我们将关注词嵌入和视觉特征向量结合起来，然后通过CNN对其进行优化，进行细粒度图像分类。我们在巴基斯坦的软饮料数据集上进行了实验。结果表明，该系统取得了显著的性能，可以潜在地有利于现实世界的应用，如产品搜索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Using Text and Visual Cues for Fine-Grained Classification

Abstract Text is an important invention of humanity, which plays a key role in human life, so far from dark ages. Text in image is closely related to the scene or a product and is widely used in vision based application. In this paper we are addressing the problem of visual understanding with text. The main focus is combining textual cues and visual cues in deep neural network. First the text is recognized and classified from the image. Then we combine the attended word embedding and visual feature vector which are then optimized by CNN for Fine-grained image classification. We carried out the experiments on soft drink dataset in Pakistan. The results shows that the system achieves significant performance which can be potentially beneficial for real world application e.g. product search.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Advanced Network, Monitoring and Controls

自引率

0.00%

发文量