{"title":"你在摆什么姿势:基于粗粒度语义的手势描述数据集","authors":"Luchun Chen, Guorun Wang, Yaoru Sun, Rui Pang, Chengzhi Zhang","doi":"10.1109/ICNLP58431.2023.00044","DOIUrl":null,"url":null,"abstract":"At present, algorithms for human pose estimation and image caption are prosperous but have disadvantages. The current mainstream algorithms of pose estimation only present the information of key nodes as a scalar but lacks semantics, while in most of algorithms for human image captioning, more attention is paid to the relationship between human bodies and the background, without understanding the human body semantics, which can not meet the need of deep visual understanding.In this paper, to fill in imperfection in previous studies, we provide a novel data set of the caption of human pose estimation for the deep understanding of image semantics. Moreover, we use the pose estimation system to extract posture figures and then we utilize the encoder-decoder to generate the captions of human poses in single picture, to produce deeper understanding of the original image. Lastly, we use Bert to carry out the next step of reasoning and get a further understanding. Our data set is open source.","PeriodicalId":53637,"journal":{"name":"Icon","volume":"23 1","pages":"208-212"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"What are You Posing: A gesture description dataset based on coarse-grained semantics\",\"authors\":\"Luchun Chen, Guorun Wang, Yaoru Sun, Rui Pang, Chengzhi Zhang\",\"doi\":\"10.1109/ICNLP58431.2023.00044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At present, algorithms for human pose estimation and image caption are prosperous but have disadvantages. The current mainstream algorithms of pose estimation only present the information of key nodes as a scalar but lacks semantics, while in most of algorithms for human image captioning, more attention is paid to the relationship between human bodies and the background, without understanding the human body semantics, which can not meet the need of deep visual understanding.In this paper, to fill in imperfection in previous studies, we provide a novel data set of the caption of human pose estimation for the deep understanding of image semantics. Moreover, we use the pose estimation system to extract posture figures and then we utilize the encoder-decoder to generate the captions of human poses in single picture, to produce deeper understanding of the original image. Lastly, we use Bert to carry out the next step of reasoning and get a further understanding. Our data set is open source.\",\"PeriodicalId\":53637,\"journal\":{\"name\":\"Icon\",\"volume\":\"23 1\",\"pages\":\"208-212\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Icon\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNLP58431.2023.00044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Icon","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNLP58431.2023.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Arts and Humanities","Score":null,"Total":0}
What are You Posing: A gesture description dataset based on coarse-grained semantics
At present, algorithms for human pose estimation and image caption are prosperous but have disadvantages. The current mainstream algorithms of pose estimation only present the information of key nodes as a scalar but lacks semantics, while in most of algorithms for human image captioning, more attention is paid to the relationship between human bodies and the background, without understanding the human body semantics, which can not meet the need of deep visual understanding.In this paper, to fill in imperfection in previous studies, we provide a novel data set of the caption of human pose estimation for the deep understanding of image semantics. Moreover, we use the pose estimation system to extract posture figures and then we utilize the encoder-decoder to generate the captions of human poses in single picture, to produce deeper understanding of the original image. Lastly, we use Bert to carry out the next step of reasoning and get a further understanding. Our data set is open source.