Kenzaburo Miyawaki, Mutsuo Sano, Syunichi Yonemura, M. Ode
We have an important issue that the people with cognitive dysfunctions should improve social skills for self supporting. This paper notices their fundamental cooking activities and proposes a cooperative cooking navigation system supporting their social skills training. We have econstructed this system to be composed of cooperative behavior expression support and cooperative behavior evaluation support. We evaluate the experimental results of applying our proposed system to patients with cognitive dysfunctions and extract the essential conditions for working this system well.
{"title":"Social Skills Training Support of Cognitive Dysfunctions by Cooperative Cooking Navigation System","authors":"Kenzaburo Miyawaki, Mutsuo Sano, Syunichi Yonemura, M. Ode","doi":"10.1109/ISM.2011.73","DOIUrl":"https://doi.org/10.1109/ISM.2011.73","url":null,"abstract":"We have an important issue that the people with cognitive dysfunctions should improve social skills for self supporting. This paper notices their fundamental cooking activities and proposes a cooperative cooking navigation system supporting their social skills training. We have econstructed this system to be composed of cooperative behavior expression support and cooperative behavior evaluation support. We evaluate the experimental results of applying our proposed system to patients with cognitive dysfunctions and extract the essential conditions for working this system well.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123833452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Real-time creation of video mosaics needs fast and accurate motion computation. While most mosaicing methods can use 2D image motion, the creation of multi view stereo mosaics needs more accurate 3D motion computation. Fast and accurate computation of 3D motion is challenging in the case of unstabilized cameras moving in 3D scenes, which is always the case when stereo mosaics are used. Efficient blending of the mosaic strip is also essential. Most cases of stereo mosaicing satisfy the assumption of limited camera motion, with no forward motion and no change in internal parameters. Under these assumptions uniform sideways motion creates straight epipolar lines. When the 3D motion is computed correctly, images can be aligned in space-time volume to give straight epipolar lines, a method which is depth invariant. We propose to align the video sequence in a space-time volume based on efficient feature tracking, and in this paper we used Kernel Tracking. Computation is fast as the motion in computed only for a few regions of the image, yet giving accurate 3D motion. This computation is faster and more accurate than the previously used direct approach. We also present "Barcode Blending", a new approach for using pyramid blending in video mosaics, which is very efficient. Barcode Blending overcomes the complexity of building pyramids for multiple narrow strips, combining all strips in a single blending step. The entire stereo mosaicing process is highly efficient in computation and in memory, and can be performed on mobile devices.
{"title":"Real-Time Stereo Mosaicing Using Feature Tracking","authors":"Marc Vivet, Shmuel Peleg, Xavier Binefa","doi":"10.1109/ISM.2011.102","DOIUrl":"https://doi.org/10.1109/ISM.2011.102","url":null,"abstract":"Real-time creation of video mosaics needs fast and accurate motion computation. While most mosaicing methods can use 2D image motion, the creation of multi view stereo mosaics needs more accurate 3D motion computation. Fast and accurate computation of 3D motion is challenging in the case of unstabilized cameras moving in 3D scenes, which is always the case when stereo mosaics are used. Efficient blending of the mosaic strip is also essential. Most cases of stereo mosaicing satisfy the assumption of limited camera motion, with no forward motion and no change in internal parameters. Under these assumptions uniform sideways motion creates straight epipolar lines. When the 3D motion is computed correctly, images can be aligned in space-time volume to give straight epipolar lines, a method which is depth invariant. We propose to align the video sequence in a space-time volume based on efficient feature tracking, and in this paper we used Kernel Tracking. Computation is fast as the motion in computed only for a few regions of the image, yet giving accurate 3D motion. This computation is faster and more accurate than the previously used direct approach. We also present \"Barcode Blending\", a new approach for using pyramid blending in video mosaics, which is very efficient. Barcode Blending overcomes the complexity of building pyramids for multiple narrow strips, combining all strips in a single blending step. The entire stereo mosaicing process is highly efficient in computation and in memory, and can be performed on mobile devices.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114883496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the AH+-tree, a balanced, tree-based index structure that efficiently supports Content-Based Image Retrieval (CBIR) through similarity queries. The proposed index structure addresses the problems of semantic gap and user subjectivity by considering the high-level semantics of multimedia data during the retrieval process. The AH+-tree provides the same functionality as the Affinity-Hybrid Tree (AH-Tree) but utilizes the high-level semantics in a novel way to eliminate the I/O overhead incurred by the AH-Tree due to the process of affinity propagation, which requires a complete traversal of the tree. The novel structure of the tree is explained, and detailed range and nearest neighbor algorithms are implemented and analyzed. Extensive discussions and experiments demonstrate the superior efficiency of the AH+-tree over the AH-Tree and the M-tree. Results show the AH+-tree significantly reduces I/O cost during similarity searches. The I/O efficiency of the AH+-tree and its ability to incorporate high-level semantics from different machine learning mechanisms make the AH+-tree a promising index access method for large multimedia databases.
{"title":"AH+-Tree: An Efficient Multimedia Indexing Structure for Similarity Queries","authors":"Fausto Fleites, Shu‐Ching Chen, Kasturi Chatterjee","doi":"10.1109/ISM.2011.20","DOIUrl":"https://doi.org/10.1109/ISM.2011.20","url":null,"abstract":"This paper presents the AH+-tree, a balanced, tree-based index structure that efficiently supports Content-Based Image Retrieval (CBIR) through similarity queries. The proposed index structure addresses the problems of semantic gap and user subjectivity by considering the high-level semantics of multimedia data during the retrieval process. The AH+-tree provides the same functionality as the Affinity-Hybrid Tree (AH-Tree) but utilizes the high-level semantics in a novel way to eliminate the I/O overhead incurred by the AH-Tree due to the process of affinity propagation, which requires a complete traversal of the tree. The novel structure of the tree is explained, and detailed range and nearest neighbor algorithms are implemented and analyzed. Extensive discussions and experiments demonstrate the superior efficiency of the AH+-tree over the AH-Tree and the M-tree. Results show the AH+-tree significantly reduces I/O cost during similarity searches. The I/O efficiency of the AH+-tree and its ability to incorporate high-level semantics from different machine learning mechanisms make the AH+-tree a promising index access method for large multimedia databases.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129573197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hai Wei, S. Zabuawala, Lei Zhang, Jiejie Zhu, J. Yadegar, J. D. Cruz, Hector J. Gonzalez
This paper presents a novel adaptive pattern-driven approach for compressing large-area high-resolution terrain data. Utilizing a pattern-driven model, the proposed approach achieves efficient terrain data reduction by modeling and encoding disparate visual patterns using a compact set of extracted features. The feasibility and efficiency of the proposed technique were corroborated by experiments using various terrain datasets and comparisons with the state-of-the-art compression techniques. Since different visual patterns are separated and modeled explicitly during the compression process, the proposed technique also holds a great potential for providing a good synergy between compression and compressed-domain analysis.
{"title":"Adaptive Pattern-driven Compression of Large-Area High-Resolution Terrain Data","authors":"Hai Wei, S. Zabuawala, Lei Zhang, Jiejie Zhu, J. Yadegar, J. D. Cruz, Hector J. Gonzalez","doi":"10.1109/ISM.2011.62","DOIUrl":"https://doi.org/10.1109/ISM.2011.62","url":null,"abstract":"This paper presents a novel adaptive pattern-driven approach for compressing large-area high-resolution terrain data. Utilizing a pattern-driven model, the proposed approach achieves efficient terrain data reduction by modeling and encoding disparate visual patterns using a compact set of extracted features. The feasibility and efficiency of the proposed technique were corroborated by experiments using various terrain datasets and comparisons with the state-of-the-art compression techniques. Since different visual patterns are separated and modeled explicitly during the compression process, the proposed technique also holds a great potential for providing a good synergy between compression and compressed-domain analysis.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"303 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123045106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work aims to realize a recognition system for a software engine that will automatically generate a quiz starting from a video content and reinsert it into the video, turning thus any available foreign-language video (such as news or TV series) into a remarkable learning tool. Our system includes a face tracking application which integrates the eigen face method with a temporal tracking approach. The main part of our work is to detect and identify faces from movies and to associate specific quizzes for each recognized character. The proposed approach allows to label the detected faces and maintains face tracking along the video stream. This task is challenging since characters present significant variation in their appearance. Therefore, we employed eigen faces to reconstruct the original image from training models and we developed a new technique based on frames buffering for continuous tracking in unfavorable environment conditions. Many tests were conducted and proved that our system is able to identify multiple characters. The obtained results showed the performance and the effectiveness of the proposed method.
{"title":"Characters Identification in TV Series","authors":"Madjid Maidi, Veronica Scurtu, M. Preda","doi":"10.1109/ISM.2011.31","DOIUrl":"https://doi.org/10.1109/ISM.2011.31","url":null,"abstract":"This work aims to realize a recognition system for a software engine that will automatically generate a quiz starting from a video content and reinsert it into the video, turning thus any available foreign-language video (such as news or TV series) into a remarkable learning tool. Our system includes a face tracking application which integrates the eigen face method with a temporal tracking approach. The main part of our work is to detect and identify faces from movies and to associate specific quizzes for each recognized character. The proposed approach allows to label the detected faces and maintains face tracking along the video stream. This task is challenging since characters present significant variation in their appearance. Therefore, we employed eigen faces to reconstruct the original image from training models and we developed a new technique based on frames buffering for continuous tracking in unfavorable environment conditions. Many tests were conducted and proved that our system is able to identify multiple characters. The obtained results showed the performance and the effectiveness of the proposed method.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"619 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120869721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel method for shot boundary detection via an optimization of traditional scoring based metrics using a genetic algorithm search heuristic. The advantage of this approach is that it allows for the detection of shots without requiring the direct use of thresholds. The methodology is described using the edge-change ratio metric and applied to several test video segments from the TREC 2002 video track and contemporary television shows. The shot boundary detection results are evaluated using recall, precision and F1 metrics, which demonstrate that the proposed approach provides superior overall performance when compared to the effective edge-change ratio method. In addition, the convergence of the genetic algorithm is examined to show that the proposed method is both efficient and stable.
{"title":"Shot Boundary Detection Using Genetic Algorithm Optimization","authors":"Calvin Chan, A. Wong","doi":"10.1109/ISM.2011.58","DOIUrl":"https://doi.org/10.1109/ISM.2011.58","url":null,"abstract":"This paper presents a novel method for shot boundary detection via an optimization of traditional scoring based metrics using a genetic algorithm search heuristic. The advantage of this approach is that it allows for the detection of shots without requiring the direct use of thresholds. The methodology is described using the edge-change ratio metric and applied to several test video segments from the TREC 2002 video track and contemporary television shows. The shot boundary detection results are evaluated using recall, precision and F1 metrics, which demonstrate that the proposed approach provides superior overall performance when compared to the effective edge-change ratio method. In addition, the convergence of the genetic algorithm is examined to show that the proposed method is both efficient and stable.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124387259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Application-layer overlay networks are receiving considerable popularity due to its flexibility and readily deployable nature thereby providing support for a plethora of Peer-to-Peer (P2P) applications. Currently, the real-world deployment of Internet-scale P2P media streaming systems involve the usage of tracker server for content discovery in on-demand model with asynchronous interactivity. The inherent drawbacks of tracker-server based approach are obvious due to scalability and bottleneck issues, which prompted us to pursue a structured P2P based proposition such as Distributed Hash Tables (DHT) which are already proved to be stable substrates. The challenging issue of accommodating a large number of update operations with the continuous change of user's playing position in DHT-based overlay is addressed in our previous work by the concept of Temporal-DHT which exploits the temporal dynamics of the content to estimate playing position. In this paper, we incorporate the notion of popularity awareness in the Temporal-DHT framework which will help to adapt the query resolution mechanism by addressing the skew ness of content popularity typically found in real multimedia user access patterns. The essential objective of popularity awareness mechanism is to increase the overall performance of Temporal-DHT by optimizing the search cost among the entire content set within the system. We formulate the problem and provide practical solutions with extensive simulation results that demonstrates the effectiveness of popularity-aware Temporal-DHT by achieving optimized query resolution cost and high streaming quality for on-demand systems in a dynamic network environment where user's are free to asynchronously join/leave the system.
{"title":"Popularity Awareness in Temporal-DHT for P2P-based Media Streaming Applications","authors":"Abhishek Bhattacharya, Zhenyu Yang, Deng Pan","doi":"10.1109/ISM.2011.46","DOIUrl":"https://doi.org/10.1109/ISM.2011.46","url":null,"abstract":"Application-layer overlay networks are receiving considerable popularity due to its flexibility and readily deployable nature thereby providing support for a plethora of Peer-to-Peer (P2P) applications. Currently, the real-world deployment of Internet-scale P2P media streaming systems involve the usage of tracker server for content discovery in on-demand model with asynchronous interactivity. The inherent drawbacks of tracker-server based approach are obvious due to scalability and bottleneck issues, which prompted us to pursue a structured P2P based proposition such as Distributed Hash Tables (DHT) which are already proved to be stable substrates. The challenging issue of accommodating a large number of update operations with the continuous change of user's playing position in DHT-based overlay is addressed in our previous work by the concept of Temporal-DHT which exploits the temporal dynamics of the content to estimate playing position. In this paper, we incorporate the notion of popularity awareness in the Temporal-DHT framework which will help to adapt the query resolution mechanism by addressing the skew ness of content popularity typically found in real multimedia user access patterns. The essential objective of popularity awareness mechanism is to increase the overall performance of Temporal-DHT by optimizing the search cost among the entire content set within the system. We formulate the problem and provide practical solutions with extensive simulation results that demonstrates the effectiveness of popularity-aware Temporal-DHT by achieving optimized query resolution cost and high streaming quality for on-demand systems in a dynamic network environment where user's are free to asynchronously join/leave the system.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124242260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the area of cultural heritage there is a strong push on aggregating content metadata from institutions (such as museums, university, archives, library, foundations, etc.) to make them widely accessible. This action is going to reduce fragmentation, allows aggregation and integrates valuable collections in a unique place. For example, European a (the so called European digital library) collects only metadata, while content files are referred via some URL. These URLs refer to the original content owner and/or to the Content Aggregator, facilitating the collection. That model leaves space to the Content Aggregator to provide additional services on their enriched models. The proposed Content Aggregation model attempts to satisfy specific requirements with a semantic model and tools providing support for executable aggregations such as: play lists, collections, e-learning courses, and media annotations/synchronizations. The produced aggregations may also be provided by mapping semantic concepts to European a. The paper also performs an analysis of semantics models mentioned and of their difficulties including some comments about the adoption of linked open data and media model. The results have been produced in the project ECLAP ICT PSP founded by the European Commission, http://www.eclap.eu.
{"title":"Models and Tools for Aggregating and Annotating Content on ECLAP","authors":"P. Bellini, P. Nesi, M. Paolucci, Marco Serena","doi":"10.1109/ISM.2011.41","DOIUrl":"https://doi.org/10.1109/ISM.2011.41","url":null,"abstract":"In the area of cultural heritage there is a strong push on aggregating content metadata from institutions (such as museums, university, archives, library, foundations, etc.) to make them widely accessible. This action is going to reduce fragmentation, allows aggregation and integrates valuable collections in a unique place. For example, European a (the so called European digital library) collects only metadata, while content files are referred via some URL. These URLs refer to the original content owner and/or to the Content Aggregator, facilitating the collection. That model leaves space to the Content Aggregator to provide additional services on their enriched models. The proposed Content Aggregation model attempts to satisfy specific requirements with a semantic model and tools providing support for executable aggregations such as: play lists, collections, e-learning courses, and media annotations/synchronizations. The produced aggregations may also be provided by mapping semantic concepts to European a. The paper also performs an analysis of semantics models mentioned and of their difficulties including some comments about the adoption of linked open data and media model. The results have been produced in the project ECLAP ICT PSP founded by the European Commission, http://www.eclap.eu.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131677825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a system which achieves cooperative capture assistance by camera manipulation recognition. Based on an experimental result on inexperienced users, the incremental interaction model, which the system and a user cooperatively shoot, was formerly proposed. The system based on the model compensates for user's lack of cinematographic knowledge or skills by relating affective information such as atmosphere or mood to capture techniques. When a user captures a shot after selecting a specific atmosphere, the system analyzes the current shooting image and the camera operation including the camera angle and the zooming speed. Then it gives guidance for better capture according to the analysis. The proposed system based on the model achieves an incremental interaction between the user and the system, evolving from user's unidirectional manipulation of the system. The system assists the user in reflecting user intention of the scene appropriately, therefore it enables the user to capture scenes more appropriately and effectively without specific cinematographic knowledge or skills. As a result, the user can acquire basic shooting skills smoothly and shoot more effectively.
{"title":"Shooting Assistance by Recognizing User's Camera Manipulation for Intelligible Video Production","authors":"H. Mitarai, A. Yoshitaka","doi":"10.1109/ISM.2011.33","DOIUrl":"https://doi.org/10.1109/ISM.2011.33","url":null,"abstract":"We propose a system which achieves cooperative capture assistance by camera manipulation recognition. Based on an experimental result on inexperienced users, the incremental interaction model, which the system and a user cooperatively shoot, was formerly proposed. The system based on the model compensates for user's lack of cinematographic knowledge or skills by relating affective information such as atmosphere or mood to capture techniques. When a user captures a shot after selecting a specific atmosphere, the system analyzes the current shooting image and the camera operation including the camera angle and the zooming speed. Then it gives guidance for better capture according to the analysis. The proposed system based on the model achieves an incremental interaction between the user and the system, evolving from user's unidirectional manipulation of the system. The system assists the user in reflecting user intention of the scene appropriately, therefore it enables the user to capture scenes more appropriately and effectively without specific cinematographic knowledge or skills. As a result, the user can acquire basic shooting skills smoothly and shoot more effectively.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124339689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is a great challenge to detect an object that is overlapped or occluded by other objects in images. For moving objects in a video sequence, their movements can bring extra spatio-temporal information of successive frames, which helps object detection, especially for occluded objects. This paper proposes a moving object detection approach for occluded objects in a video sequence with the assist of the SPCPE (Simultaneous Partition and Class Parameter Estimation) unsupervised video segmentation method. Based on the preliminary foreground estimation result from SPCPE and object detection information from the previous frame, an n-steps search (NSS) method is utilized to identify the location of the moving objects, followed by a size-adjustment method that adjusts the bounding boxes of the objects. Several experimental results show that our proposed approach achieves good detection performance under object occlusion situations in serial frames of a video sequence.
检测图像中被其他物体重叠或遮挡的物体是一个很大的挑战。对于视频序列中运动的物体,其运动可以带来连续帧的额外时空信息,有助于物体的检测,特别是对遮挡物体的检测。本文提出了一种利用sppe (Simultaneous Partition and Class Parameter Estimation)无监督视频分割方法对视频序列中被遮挡的运动目标进行检测的方法。基于sppe的初步前景估计结果和前一帧的目标检测信息,采用n步搜索(n-steps search, NSS)方法识别运动目标的位置,然后采用尺寸调整方法调整目标的边界框。实验结果表明,该方法在视频序列中连续帧的目标遮挡情况下具有良好的检测性能。
{"title":"Moving Object Detection under Object Occlusion Situations in Video Sequences","authors":"Dianting Liu, M. Shyu, Qiusha Zhu, Shu‐Ching Chen","doi":"10.1109/ISM.2011.50","DOIUrl":"https://doi.org/10.1109/ISM.2011.50","url":null,"abstract":"It is a great challenge to detect an object that is overlapped or occluded by other objects in images. For moving objects in a video sequence, their movements can bring extra spatio-temporal information of successive frames, which helps object detection, especially for occluded objects. This paper proposes a moving object detection approach for occluded objects in a video sequence with the assist of the SPCPE (Simultaneous Partition and Class Parameter Estimation) unsupervised video segmentation method. Based on the preliminary foreground estimation result from SPCPE and object detection information from the previous frame, an n-steps search (NSS) method is utilized to identify the location of the moving objects, followed by a size-adjustment method that adjusts the bounding boxes of the objects. Several experimental results show that our proposed approach achieves good detection performance under object occlusion situations in serial frames of a video sequence.","PeriodicalId":339410,"journal":{"name":"2011 IEEE International Symposium on Multimedia","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115566752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}