{"title":"卷积:一种基于卷积next的高效人体姿态估计方法","authors":"Ke Lin, S. Zhang, Zhisong Qin","doi":"10.1145/3569966.3569989","DOIUrl":null,"url":null,"abstract":"Human pose estimation methods have developed rapidly in recent years and many high precision models have emerged. However, the computational costs of these methods are often very huge, especially for transformer-based models. In this work, we propose ConvPose, an efficient human pose estimation model based on convolutional neural network architecture. ConvPose uses an efficient single branch structure, using the ConvNeXt Block as a baseline and incorporating the Coordinate Attention module. This composition not only provides better feature extraction capabilities, but also can efficiently obtain the global dependency relationships between human keypoints and scenes. The effective combination of the large convolution kernel and the attention module gives our model the ability to focus more on detailed features when oriented to complex scenes. In addition, the number of parameters and GFLOPs of our model are at a lighter level compared to current high- performance models, which offers more possibilities for deployment of the model in low-end devices. Experiments show that our model achieves 73.6AP on the MS-COCO dataset with only 6.3M parameters, which is a very competitive result.","PeriodicalId":145580,"journal":{"name":"Proceedings of the 5th International Conference on Computer Science and Software Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ConvPose: An efficient human pose estimation method based on ConvNeXt\",\"authors\":\"Ke Lin, S. Zhang, Zhisong Qin\",\"doi\":\"10.1145/3569966.3569989\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human pose estimation methods have developed rapidly in recent years and many high precision models have emerged. However, the computational costs of these methods are often very huge, especially for transformer-based models. In this work, we propose ConvPose, an efficient human pose estimation model based on convolutional neural network architecture. ConvPose uses an efficient single branch structure, using the ConvNeXt Block as a baseline and incorporating the Coordinate Attention module. This composition not only provides better feature extraction capabilities, but also can efficiently obtain the global dependency relationships between human keypoints and scenes. The effective combination of the large convolution kernel and the attention module gives our model the ability to focus more on detailed features when oriented to complex scenes. In addition, the number of parameters and GFLOPs of our model are at a lighter level compared to current high- performance models, which offers more possibilities for deployment of the model in low-end devices. Experiments show that our model achieves 73.6AP on the MS-COCO dataset with only 6.3M parameters, which is a very competitive result.\",\"PeriodicalId\":145580,\"journal\":{\"name\":\"Proceedings of the 5th International Conference on Computer Science and Software Engineering\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Conference on Computer Science and Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3569966.3569989\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Computer Science and Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3569966.3569989","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ConvPose: An efficient human pose estimation method based on ConvNeXt
Human pose estimation methods have developed rapidly in recent years and many high precision models have emerged. However, the computational costs of these methods are often very huge, especially for transformer-based models. In this work, we propose ConvPose, an efficient human pose estimation model based on convolutional neural network architecture. ConvPose uses an efficient single branch structure, using the ConvNeXt Block as a baseline and incorporating the Coordinate Attention module. This composition not only provides better feature extraction capabilities, but also can efficiently obtain the global dependency relationships between human keypoints and scenes. The effective combination of the large convolution kernel and the attention module gives our model the ability to focus more on detailed features when oriented to complex scenes. In addition, the number of parameters and GFLOPs of our model are at a lighter level compared to current high- performance models, which offers more possibilities for deployment of the model in low-end devices. Experiments show that our model achieves 73.6AP on the MS-COCO dataset with only 6.3M parameters, which is a very competitive result.