面向细粒度视觉分类的可变形卷积神经网络

Shangxian Ruan, Jiating Yang, Jianbo Chen
{"title":"面向细粒度视觉分类的可变形卷积神经网络","authors":"Shangxian Ruan, Jiating Yang, Jianbo Chen","doi":"10.1145/3457682.3457702","DOIUrl":null,"url":null,"abstract":"Fine-grained visual classification (FGVC) aims to classify images belonging to the same basic category in a more detailed sub-category. It is a challenging research topic in the field of computer vision and pattern recognition in recent years. The existing FGVC method conduct the task by considering the part detection of the object in the image and its variants, which rarely pays attention to the difference in expression of many changes such as object size, posture, and perspective. As a result, these methods generally face two major difficulties: 1) How to effectively pay attention to the latent semantic region, and reduce the interference caused by many changes in pose and perspective; 2) How to extract rich feature information for non-rigid and weak structure objects. In order to solve these two problems, this paper proposes a deformable convolutional neural network with oriented response for FGVC. The proposed method can be divided into three main steps: firstly, the local region of latent semantic information is localized based on a lightweight CAM network; then, the deformable convolutional ResNet-50 network and the rotation-invariant coding oriented response network are designed, which input the original image and local region into the feature network to learn the discriminant features of rotation invariance; finally, the learned features are embed into a joint loss to optimize the entire network end-to-end. Experiments are carried out on three challenging FGVC datasets, including CUB-200-2011, FGVC_Aircraft and Aircraft_2 datasets. The results show that the accuracy of the proposed method on all datasets is better than the comparison method, which can effectively improve the accuracy of weakly supervised FGVC.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deformable Convolutional Neural Network with Oriented Response for Fine-Grained Visual Classification\",\"authors\":\"Shangxian Ruan, Jiating Yang, Jianbo Chen\",\"doi\":\"10.1145/3457682.3457702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fine-grained visual classification (FGVC) aims to classify images belonging to the same basic category in a more detailed sub-category. It is a challenging research topic in the field of computer vision and pattern recognition in recent years. The existing FGVC method conduct the task by considering the part detection of the object in the image and its variants, which rarely pays attention to the difference in expression of many changes such as object size, posture, and perspective. As a result, these methods generally face two major difficulties: 1) How to effectively pay attention to the latent semantic region, and reduce the interference caused by many changes in pose and perspective; 2) How to extract rich feature information for non-rigid and weak structure objects. In order to solve these two problems, this paper proposes a deformable convolutional neural network with oriented response for FGVC. The proposed method can be divided into three main steps: firstly, the local region of latent semantic information is localized based on a lightweight CAM network; then, the deformable convolutional ResNet-50 network and the rotation-invariant coding oriented response network are designed, which input the original image and local region into the feature network to learn the discriminant features of rotation invariance; finally, the learned features are embed into a joint loss to optimize the entire network end-to-end. Experiments are carried out on three challenging FGVC datasets, including CUB-200-2011, FGVC_Aircraft and Aircraft_2 datasets. The results show that the accuracy of the proposed method on all datasets is better than the comparison method, which can effectively improve the accuracy of weakly supervised FGVC.\",\"PeriodicalId\":142045,\"journal\":{\"name\":\"2021 13th International Conference on Machine Learning and Computing\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3457682.3457702\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457682.3457702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

细粒度视觉分类(FGVC)旨在将属于同一基本类别的图像分类为更详细的子类别。它是近年来计算机视觉和模式识别领域一个具有挑战性的研究课题。现有的FGVC方法通过考虑图像中物体及其变体的局部检测来执行任务,很少关注物体大小、姿态、视角等诸多变化的表达差异。因此,这些方法普遍面临两大难题:1)如何有效地关注潜在语义区域,减少姿态和视角变化带来的干扰;2)如何对非刚性和弱结构对象提取丰富的特征信息。为了解决这两个问题,本文提出了一种面向FGVC的可变形卷积神经网络。该方法分为三个主要步骤:首先,基于轻量级CAM网络对潜在语义信息局部区域进行定位;然后,设计了可变形卷积ResNet-50网络和面向旋转不变性编码的响应网络,将原始图像和局部区域输入特征网络,学习旋转不变性的判别特征;最后,将学习到的特征嵌入到一个联合损失中,对整个网络进行端到端的优化。实验在三个具有挑战性的FGVC数据集上进行,包括ub -200-2011、FGVC_Aircraft和Aircraft_2数据集。结果表明,该方法在所有数据集上的准确率均优于对比方法,可以有效提高弱监督FGVC的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Deformable Convolutional Neural Network with Oriented Response for Fine-Grained Visual Classification
Fine-grained visual classification (FGVC) aims to classify images belonging to the same basic category in a more detailed sub-category. It is a challenging research topic in the field of computer vision and pattern recognition in recent years. The existing FGVC method conduct the task by considering the part detection of the object in the image and its variants, which rarely pays attention to the difference in expression of many changes such as object size, posture, and perspective. As a result, these methods generally face two major difficulties: 1) How to effectively pay attention to the latent semantic region, and reduce the interference caused by many changes in pose and perspective; 2) How to extract rich feature information for non-rigid and weak structure objects. In order to solve these two problems, this paper proposes a deformable convolutional neural network with oriented response for FGVC. The proposed method can be divided into three main steps: firstly, the local region of latent semantic information is localized based on a lightweight CAM network; then, the deformable convolutional ResNet-50 network and the rotation-invariant coding oriented response network are designed, which input the original image and local region into the feature network to learn the discriminant features of rotation invariance; finally, the learned features are embed into a joint loss to optimize the entire network end-to-end. Experiments are carried out on three challenging FGVC datasets, including CUB-200-2011, FGVC_Aircraft and Aircraft_2 datasets. The results show that the accuracy of the proposed method on all datasets is better than the comparison method, which can effectively improve the accuracy of weakly supervised FGVC.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Corpus Construction and Entity Recognition for the Field of Industrial Robot Fault Diagnosis GCN2-NAA: Two-stage Graph Convolutional Networks with Node-Aware Attention for Joint Entity and Relation Extraction A Practical Indoor and Outdoor Seamless Navigation System Based on Electronic Map and Geomagnetism SC-DGCN: Sentiment Classification Based on Densely Connected Graph Convolutional Network Bird Songs Recognition Based on Ensemble Extreme Learning Machine
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1