Tiandi Peng, Yanmin Luo, Zhilong Ou, Jixiang Du, Gonggeng Lin
{"title":"超快网络:用于多人姿态预测的端到端可学习网络","authors":"Tiandi Peng, Yanmin Luo, Zhilong Ou, Jixiang Du, Gonggeng Lin","doi":"10.1007/s11227-024-06444-8","DOIUrl":null,"url":null,"abstract":"<p>At present, the top-down approach requires the introduction of pedestrian detection algorithms in multi-person pose estimation. In this paper, we propose an end-to-end trainable human pose estimation network named Ultra-FastNet, which has three main components: shape knowledge extractor, corner prediction module, and human body geometric knowledge encoder. Firstly, the shape knowledge extractor is built using the ultralightweight bottleneck module, which effectively reduces network parameters and effectively learns high-resolution local representations of keypoints; the global attention module was introduced to build an ultralightweight bottleneck block to capture keypoint shape knowledge and build high-resolution features. Secondly, the human body geometric knowledge encoder, which is made up of Transformer, was introduced to modeling and discovering body geometric knowledge in data. The network uses both shape knowledge and body geometric knowledge which is called knowledge-enhanced, to deduce keypoints. Finally, the pedestrian detection task is modeled as a keypoint detection task using the corner prediction module. As a result, an end-to-end multitask network can be created without the requirement to include pedestrian detection algorithms in order to execute multi-person pose estimation. In the experiments, we show that Ultra-FastNet can achieve high accuracy on the COCO2017 and MPII datasets. Furthermore, experiments show that our method outperforms the mainstream lightweight network.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"156 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ultra-FastNet: an end-to-end learnable network for multi-person posture prediction\",\"authors\":\"Tiandi Peng, Yanmin Luo, Zhilong Ou, Jixiang Du, Gonggeng Lin\",\"doi\":\"10.1007/s11227-024-06444-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>At present, the top-down approach requires the introduction of pedestrian detection algorithms in multi-person pose estimation. In this paper, we propose an end-to-end trainable human pose estimation network named Ultra-FastNet, which has three main components: shape knowledge extractor, corner prediction module, and human body geometric knowledge encoder. Firstly, the shape knowledge extractor is built using the ultralightweight bottleneck module, which effectively reduces network parameters and effectively learns high-resolution local representations of keypoints; the global attention module was introduced to build an ultralightweight bottleneck block to capture keypoint shape knowledge and build high-resolution features. Secondly, the human body geometric knowledge encoder, which is made up of Transformer, was introduced to modeling and discovering body geometric knowledge in data. The network uses both shape knowledge and body geometric knowledge which is called knowledge-enhanced, to deduce keypoints. Finally, the pedestrian detection task is modeled as a keypoint detection task using the corner prediction module. As a result, an end-to-end multitask network can be created without the requirement to include pedestrian detection algorithms in order to execute multi-person pose estimation. In the experiments, we show that Ultra-FastNet can achieve high accuracy on the COCO2017 and MPII datasets. Furthermore, experiments show that our method outperforms the mainstream lightweight network.</p>\",\"PeriodicalId\":501596,\"journal\":{\"name\":\"The Journal of Supercomputing\",\"volume\":\"156 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11227-024-06444-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06444-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ultra-FastNet: an end-to-end learnable network for multi-person posture prediction
At present, the top-down approach requires the introduction of pedestrian detection algorithms in multi-person pose estimation. In this paper, we propose an end-to-end trainable human pose estimation network named Ultra-FastNet, which has three main components: shape knowledge extractor, corner prediction module, and human body geometric knowledge encoder. Firstly, the shape knowledge extractor is built using the ultralightweight bottleneck module, which effectively reduces network parameters and effectively learns high-resolution local representations of keypoints; the global attention module was introduced to build an ultralightweight bottleneck block to capture keypoint shape knowledge and build high-resolution features. Secondly, the human body geometric knowledge encoder, which is made up of Transformer, was introduced to modeling and discovering body geometric knowledge in data. The network uses both shape knowledge and body geometric knowledge which is called knowledge-enhanced, to deduce keypoints. Finally, the pedestrian detection task is modeled as a keypoint detection task using the corner prediction module. As a result, an end-to-end multitask network can be created without the requirement to include pedestrian detection algorithms in order to execute multi-person pose estimation. In the experiments, we show that Ultra-FastNet can achieve high accuracy on the COCO2017 and MPII datasets. Furthermore, experiments show that our method outperforms the mainstream lightweight network.