{"title":"视频原始草图:视频的一般中层表示","authors":"Zhi Han, Zongben Xu, Song-Chun Zhu","doi":"10.1109/ICCV.2011.6126380","DOIUrl":null,"url":null,"abstract":"This paper presents a middle-level video representation named Video Primal Sketch (VPS), which integrates two regimes of models: i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., ii) FRAME/MRF model with spatio-temporal filters to implicitly represent textured motion, such as water and fire, by matching feature statistics, i.e. histograms. This paper makes three contributions: i) learning a dictionary of video primitives as parametric generative model; ii) studying the Spatio-Temporal FRAME (ST-FRAME) model for modeling and synthesizing textured motion; and iii) developing a parsimonious hybrid model for generic video representation. VPS selects the proper representation automatically and is compatible with high-level action representations. In the experiments, we synthesize a series of dynamic textures, reconstruct real videos and show varying VPS over the change of densities causing by the scale transition in videos.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":"257 1","pages":"1283-1290"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Video Primal Sketch: A generic middle-level representation of video\",\"authors\":\"Zhi Han, Zongben Xu, Song-Chun Zhu\",\"doi\":\"10.1109/ICCV.2011.6126380\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a middle-level video representation named Video Primal Sketch (VPS), which integrates two regimes of models: i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., ii) FRAME/MRF model with spatio-temporal filters to implicitly represent textured motion, such as water and fire, by matching feature statistics, i.e. histograms. This paper makes three contributions: i) learning a dictionary of video primitives as parametric generative model; ii) studying the Spatio-Temporal FRAME (ST-FRAME) model for modeling and synthesizing textured motion; and iii) developing a parsimonious hybrid model for generic video representation. VPS selects the proper representation automatically and is compatible with high-level action representations. In the experiments, we synthesize a series of dynamic textures, reconstruct real videos and show varying VPS over the change of densities causing by the scale transition in videos.\",\"PeriodicalId\":6391,\"journal\":{\"name\":\"2011 International Conference on Computer Vision\",\"volume\":\"257 1\",\"pages\":\"1283-1290\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Computer Vision\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCV.2011.6126380\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2011.6126380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Video Primal Sketch: A generic middle-level representation of video
This paper presents a middle-level video representation named Video Primal Sketch (VPS), which integrates two regimes of models: i) sparse coding model using static or moving primitives to explicitly represent moving corners, lines, feature points, etc., ii) FRAME/MRF model with spatio-temporal filters to implicitly represent textured motion, such as water and fire, by matching feature statistics, i.e. histograms. This paper makes three contributions: i) learning a dictionary of video primitives as parametric generative model; ii) studying the Spatio-Temporal FRAME (ST-FRAME) model for modeling and synthesizing textured motion; and iii) developing a parsimonious hybrid model for generic video representation. VPS selects the proper representation automatically and is compatible with high-level action representations. In the experiments, we synthesize a series of dynamic textures, reconstruct real videos and show varying VPS over the change of densities causing by the scale transition in videos.