基于金字塔卷积和分散注意机制的图像多标签分类

2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) Pub Date : 2021-12-17 DOI:10.1109/ICCWAMTIP53232.2021.9674123

Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan

{"title":"基于金字塔卷积和分散注意机制的图像多标签分类","authors":"Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan","doi":"10.1109/ICCWAMTIP53232.2021.9674123","DOIUrl":null,"url":null,"abstract":"Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.","PeriodicalId":358772,"journal":{"name":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image Multi-Label Classification Based on Pyramid Convolution and Split-Attention Mechanism\",\"authors\":\"Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan\",\"doi\":\"10.1109/ICCWAMTIP53232.2021.9674123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.\",\"PeriodicalId\":358772,\"journal\":{\"name\":\"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCWAMTIP53232.2021.9674123\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWAMTIP53232.2021.9674123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

图像多标签分类是计算机视觉领域的一项关键任务。主要困难是多标签分类依赖于图像中的复杂信息来区分不同的标签，大大增加了分类难度。我们提出了一种修正先前模型的方法。首先，我们以TResNet为基准模型，将原始模型中的普通卷积替换为金字塔卷积，将模型中的注意机制替换为分裂注意方法。然后在VOC2007和MS-COCO数据集上对模型进行训练。通过对比实验，论证了模型参数的选取和最优修正方法的确定过程。最后，通过将修改后的模型与未修改的模型的性能进行比较，证明了我们的两种修改方法都能有效地提高模型的性能。在VOC数据集上，两种方法的修正模型分别提高了1%和1.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Image Multi-Label Classification Based on Pyramid Convolution and Split-Attention Mechanism

Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)

自引率

0.00%

发文量

期刊最新文献

Joint Modulation and Coding Recognition Using Deep Learning Chinese Short Text Classification Based On Deep Learning Solving TPS by SA Based on Probabilistic Double Crossover Operator Personalized Federated Learning with Gradient Similarity Implicit Certificate Based Signcryption for a Secure Data Sharing in Clouds