Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan
{"title":"基于金字塔卷积和分散注意机制的图像多标签分类","authors":"Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan","doi":"10.1109/ICCWAMTIP53232.2021.9674123","DOIUrl":null,"url":null,"abstract":"Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.","PeriodicalId":358772,"journal":{"name":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image Multi-Label Classification Based on Pyramid Convolution and Split-Attention Mechanism\",\"authors\":\"Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan\",\"doi\":\"10.1109/ICCWAMTIP53232.2021.9674123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.\",\"PeriodicalId\":358772,\"journal\":{\"name\":\"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCWAMTIP53232.2021.9674123\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWAMTIP53232.2021.9674123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Image Multi-Label Classification Based on Pyramid Convolution and Split-Attention Mechanism
Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.