使用文本和视觉线索进行细粒度分类

Zaryab Shaker, Xiao Feng, M. A. Tahir
{"title":"使用文本和视觉线索进行细粒度分类","authors":"Zaryab Shaker, Xiao Feng, M. A. Tahir","doi":"10.21307/ijanmc-2021-026","DOIUrl":null,"url":null,"abstract":"Abstract Text is an important invention of humanity, which plays a key role in human life, so far from dark ages. Text in image is closely related to the scene or a product and is widely used in vision based application. In this paper we are addressing the problem of visual understanding with text. The main focus is combining textual cues and visual cues in deep neural network. First the text is recognized and classified from the image. Then we combine the attended word embedding and visual feature vector which are then optimized by CNN for Fine-grained image classification. We carried out the experiments on soft drink dataset in Pakistan. The results shows that the system achieves significant performance which can be potentially beneficial for real world application e.g. product search.","PeriodicalId":193299,"journal":{"name":"International Journal of Advanced Network, Monitoring and Controls","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Text and Visual Cues for Fine-Grained Classification\",\"authors\":\"Zaryab Shaker, Xiao Feng, M. A. Tahir\",\"doi\":\"10.21307/ijanmc-2021-026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Text is an important invention of humanity, which plays a key role in human life, so far from dark ages. Text in image is closely related to the scene or a product and is widely used in vision based application. In this paper we are addressing the problem of visual understanding with text. The main focus is combining textual cues and visual cues in deep neural network. First the text is recognized and classified from the image. Then we combine the attended word embedding and visual feature vector which are then optimized by CNN for Fine-grained image classification. We carried out the experiments on soft drink dataset in Pakistan. The results shows that the system achieves significant performance which can be potentially beneficial for real world application e.g. product search.\",\"PeriodicalId\":193299,\"journal\":{\"name\":\"International Journal of Advanced Network, Monitoring and Controls\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advanced Network, Monitoring and Controls\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21307/ijanmc-2021-026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Network, Monitoring and Controls","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21307/ijanmc-2021-026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

文本是人类的一项重要发明,它在人类生活中起着至关重要的作用,从此远离了黑暗时代。图像中的文字与场景或产品密切相关,在基于视觉的应用中有着广泛的应用。在本文中,我们正在解决文本的视觉理解问题。研究的重点是文本线索和视觉线索在深度神经网络中的结合。首先从图像中对文本进行识别和分类。然后,我们将关注词嵌入和视觉特征向量结合起来,然后通过CNN对其进行优化,进行细粒度图像分类。我们在巴基斯坦的软饮料数据集上进行了实验。结果表明,该系统取得了显著的性能,可以潜在地有利于现实世界的应用,如产品搜索。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Using Text and Visual Cues for Fine-Grained Classification
Abstract Text is an important invention of humanity, which plays a key role in human life, so far from dark ages. Text in image is closely related to the scene or a product and is widely used in vision based application. In this paper we are addressing the problem of visual understanding with text. The main focus is combining textual cues and visual cues in deep neural network. First the text is recognized and classified from the image. Then we combine the attended word embedding and visual feature vector which are then optimized by CNN for Fine-grained image classification. We carried out the experiments on soft drink dataset in Pakistan. The results shows that the system achieves significant performance which can be potentially beneficial for real world application e.g. product search.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automatic Landing Control of Aircraft Based on Cognitive Load Theory and DDPG Research on Simulation Approximate Solution Strategy for Complex Kinematic Models Indoor Robot SLAM with Multi-Sensor Fusion Securing Operating Systems (OS): A Comprehensive Approach to Security with Best Practices and Techniques Lightweight Low-Altitude UAV Object Detection Based on Improved YOLOv5s
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1