面向细粒度视觉分类的可变形卷积神经网络

2021 13th International Conference on Machine Learning and Computing Pub Date : 2021-02-26 DOI:10.1145/3457682.3457702

Shangxian Ruan, Jiating Yang, Jianbo Chen

{"title":"面向细粒度视觉分类的可变形卷积神经网络","authors":"Shangxian Ruan, Jiating Yang, Jianbo Chen","doi":"10.1145/3457682.3457702","DOIUrl":null,"url":null,"abstract":"Fine-grained visual classification (FGVC) aims to classify images belonging to the same basic category in a more detailed sub-category. It is a challenging research topic in the field of computer vision and pattern recognition in recent years. The existing FGVC method conduct the task by considering the part detection of the object in the image and its variants, which rarely pays attention to the difference in expression of many changes such as object size, posture, and perspective. As a result, these methods generally face two major difficulties: 1) How to effectively pay attention to the latent semantic region, and reduce the interference caused by many changes in pose and perspective; 2) How to extract rich feature information for non-rigid and weak structure objects. In order to solve these two problems, this paper proposes a deformable convolutional neural network with oriented response for FGVC. The proposed method can be divided into three main steps: firstly, the local region of latent semantic information is localized based on a lightweight CAM network; then, the deformable convolutional ResNet-50 network and the rotation-invariant coding oriented response network are designed, which input the original image and local region into the feature network to learn the discriminant features of rotation invariance; finally, the learned features are embed into a joint loss to optimize the entire network end-to-end. Experiments are carried out on three challenging FGVC datasets, including CUB-200-2011, FGVC_Aircraft and Aircraft_2 datasets. The results show that the accuracy of the proposed method on all datasets is better than the comparison method, which can effectively improve the accuracy of weakly supervised FGVC.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Deformable Convolutional Neural Network with Oriented Response for Fine-Grained Visual Classification\",\"authors\":\"Shangxian Ruan, Jiating Yang, Jianbo Chen\",\"doi\":\"10.1145/3457682.3457702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fine-grained visual classification (FGVC) aims to classify images belonging to the same basic category in a more detailed sub-category. It is a challenging research topic in the field of computer vision and pattern recognition in recent years. The existing FGVC method conduct the task by considering the part detection of the object in the image and its variants, which rarely pays attention to the difference in expression of many changes such as object size, posture, and perspective. As a result, these methods generally face two major difficulties: 1) How to effectively pay attention to the latent semantic region, and reduce the interference caused by many changes in pose and perspective; 2) How to extract rich feature information for non-rigid and weak structure objects. In order to solve these two problems, this paper proposes a deformable convolutional neural network with oriented response for FGVC. The proposed method can be divided into three main steps: firstly, the local region of latent semantic information is localized based on a lightweight CAM network; then, the deformable convolutional ResNet-50 network and the rotation-invariant coding oriented response network are designed, which input the original image and local region into the feature network to learn the discriminant features of rotation invariance; finally, the learned features are embed into a joint loss to optimize the entire network end-to-end. Experiments are carried out on three challenging FGVC datasets, including CUB-200-2011, FGVC_Aircraft and Aircraft_2 datasets. The results show that the accuracy of the proposed method on all datasets is better than the comparison method, which can effectively improve the accuracy of weakly supervised FGVC.\",\"PeriodicalId\":142045,\"journal\":{\"name\":\"2021 13th International Conference on Machine Learning and Computing\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 13th International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3457682.3457702\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3457682.3457702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

细粒度视觉分类(FGVC)旨在将属于同一基本类别的图像分类为更详细的子类别。它是近年来计算机视觉和模式识别领域一个具有挑战性的研究课题。现有的FGVC方法通过考虑图像中物体及其变体的局部检测来执行任务，很少关注物体大小、姿态、视角等诸多变化的表达差异。因此，这些方法普遍面临两大难题:1)如何有效地关注潜在语义区域，减少姿态和视角变化带来的干扰;2)如何对非刚性和弱结构对象提取丰富的特征信息。为了解决这两个问题，本文提出了一种面向FGVC的可变形卷积神经网络。该方法分为三个主要步骤:首先，基于轻量级CAM网络对潜在语义信息局部区域进行定位;然后，设计了可变形卷积ResNet-50网络和面向旋转不变性编码的响应网络，将原始图像和局部区域输入特征网络，学习旋转不变性的判别特征;最后，将学习到的特征嵌入到一个联合损失中，对整个网络进行端到端的优化。实验在三个具有挑战性的FGVC数据集上进行，包括ub -200-2011、FGVC_Aircraft和Aircraft_2数据集。结果表明，该方法在所有数据集上的准确率均优于对比方法，可以有效提高弱监督FGVC的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Deformable Convolutional Neural Network with Oriented Response for Fine-Grained Visual Classification

Fine-grained visual classification (FGVC) aims to classify images belonging to the same basic category in a more detailed sub-category. It is a challenging research topic in the field of computer vision and pattern recognition in recent years. The existing FGVC method conduct the task by considering the part detection of the object in the image and its variants, which rarely pays attention to the difference in expression of many changes such as object size, posture, and perspective. As a result, these methods generally face two major difficulties: 1) How to effectively pay attention to the latent semantic region, and reduce the interference caused by many changes in pose and perspective; 2) How to extract rich feature information for non-rigid and weak structure objects. In order to solve these two problems, this paper proposes a deformable convolutional neural network with oriented response for FGVC. The proposed method can be divided into three main steps: firstly, the local region of latent semantic information is localized based on a lightweight CAM network; then, the deformable convolutional ResNet-50 network and the rotation-invariant coding oriented response network are designed, which input the original image and local region into the feature network to learn the discriminant features of rotation invariance; finally, the learned features are embed into a joint loss to optimize the entire network end-to-end. Experiments are carried out on three challenging FGVC datasets, including CUB-200-2011, FGVC_Aircraft and Aircraft_2 datasets. The results show that the accuracy of the proposed method on all datasets is better than the comparison method, which can effectively improve the accuracy of weakly supervised FGVC.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 13th International Conference on Machine Learning and Computing

自引率

0.00%

发文量