基于金字塔卷积和分散注意机制的图像多标签分类

Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan
{"title":"基于金字塔卷积和分散注意机制的图像多标签分类","authors":"Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan","doi":"10.1109/ICCWAMTIP53232.2021.9674123","DOIUrl":null,"url":null,"abstract":"Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.","PeriodicalId":358772,"journal":{"name":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Image Multi-Label Classification Based on Pyramid Convolution and Split-Attention Mechanism\",\"authors\":\"Yang Xianhua, Yang Yi, Yang Juan, Yao Han, Wang Zheng, Long Shuquan\",\"doi\":\"10.1109/ICCWAMTIP53232.2021.9674123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.\",\"PeriodicalId\":358772,\"journal\":{\"name\":\"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"volume\":\"74 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCWAMTIP53232.2021.9674123\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWAMTIP53232.2021.9674123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

图像多标签分类是计算机视觉领域的一项关键任务。主要困难是多标签分类依赖于图像中的复杂信息来区分不同的标签,大大增加了分类难度。我们提出了一种修正先前模型的方法。首先,我们以TResNet为基准模型,将原始模型中的普通卷积替换为金字塔卷积,将模型中的注意机制替换为分裂注意方法。然后在VOC2007和MS-COCO数据集上对模型进行训练。通过对比实验,论证了模型参数的选取和最优修正方法的确定过程。最后,通过将修改后的模型与未修改的模型的性能进行比较,证明了我们的两种修改方法都能有效地提高模型的性能。在VOC数据集上,两种方法的修正模型分别提高了1%和1.6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Image Multi-Label Classification Based on Pyramid Convolution and Split-Attention Mechanism
Image multi-label classification is a critical task in the field of computer vision. The primary difficulty is that multi-label classification relies on the complex information in the image to differentiate different labels, significantly increasing the classification difficulty. We proposed a method for modifying previous models. First, we use TResNet as the benchmark model, replacing ordinary convolution with pyramid convolution in the original model and the attention mechanism in the model with the split-attention method. Then the model was trained on the VOC2007 and MS-COCO data sets. The process of selecting the model's parameters and determining the optimal modification method was demonstrated through comparative experiments. Finally, by comparing the performance of the modified model with the performance of the unmodified model, it is proved that our two modification methods can effectively improve the performance of the model. On the VOC data set, the modified model by the two methods increased by 1% and 1.6%, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Joint Modulation and Coding Recognition Using Deep Learning Chinese Short Text Classification Based On Deep Learning Solving TPS by SA Based on Probabilistic Double Crossover Operator Personalized Federated Learning with Gradient Similarity Implicit Certificate Based Signcryption for a Secure Data Sharing in Clouds
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1