Si Liu, Qin Jin, Luoqi Liu, Zongheng Tang, Linli Lin
{"title":"PIC’22:语境工作坊的第四个人","authors":"Si Liu, Qin Jin, Luoqi Liu, Zongheng Tang, Linli Lin","doi":"10.1145/3503161.3554766","DOIUrl":null,"url":null,"abstract":"Understanding human and the surrounding context is crucial for the perception of the image and video. It benefits many related applications, such as person search, virtual tryon/makeup, abnormal action detection. In the proposed 4th Person in Context (PIC) workshop, to further promote the progress in the above-mentioned areas, we hold three human-centric perception and cognition challenges including Make-up Temporal Video Grounding (MTVG), Make-up Dense Video Caption (MDVC) and Human-centric Spatio-Temporal Video Grounding (HC-STVG). All the human-centric challenges focus on understanding the human behavior, interactions and relationships in video sequences, which requires understanding both visual and linguistic information, as well as complicated multimodal reasoning. The three sub-problems are complementary and collaboratively contribute to a unified human-centric perception and cognition solution.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PIC'22: 4th Person in Context Workshop\",\"authors\":\"Si Liu, Qin Jin, Luoqi Liu, Zongheng Tang, Linli Lin\",\"doi\":\"10.1145/3503161.3554766\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding human and the surrounding context is crucial for the perception of the image and video. It benefits many related applications, such as person search, virtual tryon/makeup, abnormal action detection. In the proposed 4th Person in Context (PIC) workshop, to further promote the progress in the above-mentioned areas, we hold three human-centric perception and cognition challenges including Make-up Temporal Video Grounding (MTVG), Make-up Dense Video Caption (MDVC) and Human-centric Spatio-Temporal Video Grounding (HC-STVG). All the human-centric challenges focus on understanding the human behavior, interactions and relationships in video sequences, which requires understanding both visual and linguistic information, as well as complicated multimodal reasoning. The three sub-problems are complementary and collaboratively contribute to a unified human-centric perception and cognition solution.\",\"PeriodicalId\":412792,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3503161.3554766\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3554766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Understanding human and the surrounding context is crucial for the perception of the image and video. It benefits many related applications, such as person search, virtual tryon/makeup, abnormal action detection. In the proposed 4th Person in Context (PIC) workshop, to further promote the progress in the above-mentioned areas, we hold three human-centric perception and cognition challenges including Make-up Temporal Video Grounding (MTVG), Make-up Dense Video Caption (MDVC) and Human-centric Spatio-Temporal Video Grounding (HC-STVG). All the human-centric challenges focus on understanding the human behavior, interactions and relationships in video sequences, which requires understanding both visual and linguistic information, as well as complicated multimodal reasoning. The three sub-problems are complementary and collaboratively contribute to a unified human-centric perception and cognition solution.