基于通道和空间注意的人体姿态估计研究

Yilong Liu
{"title":"基于通道和空间注意的人体姿态估计研究","authors":"Yilong Liu","doi":"10.1109/ICCECE58074.2023.10135500","DOIUrl":null,"url":null,"abstract":"Accurate pose estimation is crucial for understanding human behavior in images or videos. Given an RGB image, we want to be able to accurately locate some important keypoints on the body. Understanding human pose and body structure is important for high-level tasks such as human-computer interaction. Human pose estimation usually has problems such as low discrimination between human body and background, and human pose estimation based on HRnet network does not make full use of important feature information. To solve these problems, a human pose estimation method MCSA-hrnet (Multi-scale Channel and Spatial Attention) based on multi-scale channel and spatial attention is improved by using channel attention mechanism and spatial attention mechanism. Starting from the channel domain and spatial domain, MCSA-HRnet integrates the multi-level attention mechanism into the high-resolution network structure, and designs the channel attention block and spatial attention block. This enables the network to focus on the regions of the image that are highly associated with the human body and not on other regions. MCSA-HRnet uses 1×1 convolutions for information extraction in the core part of the ca block (channel attention block) and parallel $\\boldsymbol{3\\mathrm{x}3}$ and $\\boldsymbol{5\\mathrm{x}5}$ convolutions in the sa block (spatial attention block). Different sizes of parallel convolutions can derive spatial attention maps of different scales, which makes the ability of the network to distinguish human features from background features more significant. Thus, the human body region and its key points can be accurately located. The improved method is verified on COCO keypoint dataset, and the results show that MCSA-HRnet can effectively improve the accuracy of human pose estimation joint point localization.","PeriodicalId":120030,"journal":{"name":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Study on human pose estimation based on channel and spatial attention\",\"authors\":\"Yilong Liu\",\"doi\":\"10.1109/ICCECE58074.2023.10135500\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate pose estimation is crucial for understanding human behavior in images or videos. Given an RGB image, we want to be able to accurately locate some important keypoints on the body. Understanding human pose and body structure is important for high-level tasks such as human-computer interaction. Human pose estimation usually has problems such as low discrimination between human body and background, and human pose estimation based on HRnet network does not make full use of important feature information. To solve these problems, a human pose estimation method MCSA-hrnet (Multi-scale Channel and Spatial Attention) based on multi-scale channel and spatial attention is improved by using channel attention mechanism and spatial attention mechanism. Starting from the channel domain and spatial domain, MCSA-HRnet integrates the multi-level attention mechanism into the high-resolution network structure, and designs the channel attention block and spatial attention block. This enables the network to focus on the regions of the image that are highly associated with the human body and not on other regions. MCSA-HRnet uses 1×1 convolutions for information extraction in the core part of the ca block (channel attention block) and parallel $\\\\boldsymbol{3\\\\mathrm{x}3}$ and $\\\\boldsymbol{5\\\\mathrm{x}5}$ convolutions in the sa block (spatial attention block). Different sizes of parallel convolutions can derive spatial attention maps of different scales, which makes the ability of the network to distinguish human features from background features more significant. Thus, the human body region and its key points can be accurately located. The improved method is verified on COCO keypoint dataset, and the results show that MCSA-HRnet can effectively improve the accuracy of human pose estimation joint point localization.\",\"PeriodicalId\":120030,\"journal\":{\"name\":\"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCECE58074.2023.10135500\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE58074.2023.10135500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

准确的姿态估计对于理解图像或视频中的人类行为至关重要。给定一个RGB图像,我们希望能够准确地定位身体上的一些重要关键点。了解人体姿势和身体结构对于人机交互等高级任务非常重要。人体姿态估计通常存在人体与背景识别率低、基于HRnet网络的人体姿态估计没有充分利用重要特征信息等问题。针对这些问题,利用通道注意机制和空间注意机制对基于多尺度通道和空间注意的人体姿态估计方法MCSA-hrnet (Multi-scale Channel and Spatial Attention)进行了改进。MCSA-HRnet从通道域和空间域出发,将多层次注意机制集成到高分辨率网络结构中,设计了通道注意块和空间注意块。这使得网络能够专注于图像中与人体高度相关的区域,而不是其他区域。MCSA-HRnet在ca块(通道注意力块)的核心部分使用1×1卷积进行信息提取,并在sa块(空间注意力块)中并行使用$\boldsymbol{3\mathrm{x}3}$和$\boldsymbol{5\mathrm{x}5}$卷积。不同大小的并行卷积可以得到不同尺度的空间注意图,这使得网络区分人类特征和背景特征的能力更加显著。从而准确定位人体区域及其关键点。在COCO关键点数据集上对改进方法进行了验证,结果表明MCSA-HRnet可以有效提高人体姿态估计关节点定位的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Study on human pose estimation based on channel and spatial attention
Accurate pose estimation is crucial for understanding human behavior in images or videos. Given an RGB image, we want to be able to accurately locate some important keypoints on the body. Understanding human pose and body structure is important for high-level tasks such as human-computer interaction. Human pose estimation usually has problems such as low discrimination between human body and background, and human pose estimation based on HRnet network does not make full use of important feature information. To solve these problems, a human pose estimation method MCSA-hrnet (Multi-scale Channel and Spatial Attention) based on multi-scale channel and spatial attention is improved by using channel attention mechanism and spatial attention mechanism. Starting from the channel domain and spatial domain, MCSA-HRnet integrates the multi-level attention mechanism into the high-resolution network structure, and designs the channel attention block and spatial attention block. This enables the network to focus on the regions of the image that are highly associated with the human body and not on other regions. MCSA-HRnet uses 1×1 convolutions for information extraction in the core part of the ca block (channel attention block) and parallel $\boldsymbol{3\mathrm{x}3}$ and $\boldsymbol{5\mathrm{x}5}$ convolutions in the sa block (spatial attention block). Different sizes of parallel convolutions can derive spatial attention maps of different scales, which makes the ability of the network to distinguish human features from background features more significant. Thus, the human body region and its key points can be accurately located. The improved method is verified on COCO keypoint dataset, and the results show that MCSA-HRnet can effectively improve the accuracy of human pose estimation joint point localization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Clutter Edge and Target Detection Method Based on Central Moment Feature Adaptive short-time Fourier transform based on reinforcement learning Design and implementation of carrier aggregation and secure communication in distribution field network Power data attribution revocation searchable encrypted cloud storage Research of Intrusion Detection Based on Neural Network Optimized by Sparrow Search Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1