Pub Date : 2011-02-14DOI: 10.1109/POV.2011.5712365
Rubén Heras Evangelio, Michael Pátzold, T. Sikora
Designing static object detection systems that are able to incorporate user interaction conveys a great benefit in many surveillance applications, since some correctly detected static objects can be considered to have no interest by a human operator. Interactive systems allow the user to include these decisions into the system, making automated surveillance systems more attractive and comfortable to use. In this paper we present a system for the detection of static objects that, based on the detection of a dual background model, classifies pixels by means of a finite-state machine. The state machine provides the meaning for the interpretation of the results obtained from background subtraction and it can be optionally used to integrate user input. The system can thus be used both in an automatic and an interactive manner without requiring any expert knowledge from the user. We successfully validated the system with several public datasets.
{"title":"A system for automatic and interactive detection of static objects","authors":"Rubén Heras Evangelio, Michael Pátzold, T. Sikora","doi":"10.1109/POV.2011.5712365","DOIUrl":"https://doi.org/10.1109/POV.2011.5712365","url":null,"abstract":"Designing static object detection systems that are able to incorporate user interaction conveys a great benefit in many surveillance applications, since some correctly detected static objects can be considered to have no interest by a human operator. Interactive systems allow the user to include these decisions into the system, making automated surveillance systems more attractive and comfortable to use. In this paper we present a system for the detection of static objects that, based on the detection of a dual background model, classifies pixels by means of a finite-state machine. The state machine provides the meaning for the interpretation of the results obtained from background subtraction and it can be optionally used to integrate user input. The system can thus be used both in an automatic and an interactive manner without requiring any expert knowledge from the user. We successfully validated the system with several public datasets.","PeriodicalId":197184,"journal":{"name":"2011 IEEE Workshop on Person-Oriented Vision","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126040344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-02-14DOI: 10.1109/POV.2011.5712366
Yibiao Zhao, Xiaohan Nie, Y. Duan, Yaping Huang, Siwei Luo
This paper proposes a general benchmark for interactive segmentation algorithms. The main contribution can be summarized as follows: (I) A new dataset of fifty images is released. These images are categorized into five groups: animal, artifact, human, building and plant. They cover several major challenges for the interactive image segmentation task, including fuzzy boundary, complex texture, cluttered background, shading effect, sharp corner, and overlapping color. (II) We propose two types of schemes, point-process and boundary-process, to generate user scribbles automatically. The point-process simulates the human interaction process that users incrementally draw scribbles to some major components of the image. The boundary-process simulates the refining process that users place more scribbles near the segment boundaries to refine the details of result segments. (III) We then apply two precision measures to quantitatively evaluate the result segments of different algorithm. The region precision measures how many pixels are correctly classified, and the boundary precision measures how close is the segment boundary to the real boundary. This benchmark offered a tentative way to guarantee evaluation fairness of person-oriented tasks. Based on the benchmark, five state-of-the-art interactive segmentation algorithms are evaluated. All the images, synthesized user scribbles, running results are publicly available on the webpage1.
{"title":"A benchmark for interactive image segmentation algorithms","authors":"Yibiao Zhao, Xiaohan Nie, Y. Duan, Yaping Huang, Siwei Luo","doi":"10.1109/POV.2011.5712366","DOIUrl":"https://doi.org/10.1109/POV.2011.5712366","url":null,"abstract":"This paper proposes a general benchmark for interactive segmentation algorithms. The main contribution can be summarized as follows: (I) A new dataset of fifty images is released. These images are categorized into five groups: animal, artifact, human, building and plant. They cover several major challenges for the interactive image segmentation task, including fuzzy boundary, complex texture, cluttered background, shading effect, sharp corner, and overlapping color. (II) We propose two types of schemes, point-process and boundary-process, to generate user scribbles automatically. The point-process simulates the human interaction process that users incrementally draw scribbles to some major components of the image. The boundary-process simulates the refining process that users place more scribbles near the segment boundaries to refine the details of result segments. (III) We then apply two precision measures to quantitatively evaluate the result segments of different algorithm. The region precision measures how many pixels are correctly classified, and the boundary precision measures how close is the segment boundary to the real boundary. This benchmark offered a tentative way to guarantee evaluation fairness of person-oriented tasks. Based on the benchmark, five state-of-the-art interactive segmentation algorithms are evaluated. All the images, synthesized user scribbles, running results are publicly available on the webpage1.","PeriodicalId":197184,"journal":{"name":"2011 IEEE Workshop on Person-Oriented Vision","volume":"44 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131312219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-02-14DOI: 10.1109/POV.2011.5712362
Ashley M. Eden, C. M. Christoudias, Trevor Darrell
During a disaster, children may be quickly wrenched from their families. Research shows that children in such circumstances are often unable or unwilling to give their names or other identifying information. Currently in the US, there is no existing system in the public health infrastructure that effectively expedites reunification when children can't be identified. Working with the Children's Hospital Boston, we have engineered a system to speed reunification of children with their families, should they get separated in a disaster. Our system is based on a Content Based Image Retrieval and attribute search. In this paper we will describe the system and a series of evaluations, including a realistic disaster drill set up and run jointly with the Children's Hospital.
{"title":"Finding lost children","authors":"Ashley M. Eden, C. M. Christoudias, Trevor Darrell","doi":"10.1109/POV.2011.5712362","DOIUrl":"https://doi.org/10.1109/POV.2011.5712362","url":null,"abstract":"During a disaster, children may be quickly wrenched from their families. Research shows that children in such circumstances are often unable or unwilling to give their names or other identifying information. Currently in the US, there is no existing system in the public health infrastructure that effectively expedites reunification when children can't be identified. Working with the Children's Hospital Boston, we have engineered a system to speed reunification of children with their families, should they get separated in a disaster. Our system is based on a Content Based Image Retrieval and attribute search. In this paper we will describe the system and a series of evaluations, including a realistic disaster drill set up and run jointly with the Children's Hospital.","PeriodicalId":197184,"journal":{"name":"2011 IEEE Workshop on Person-Oriented Vision","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130319709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-02-14DOI: 10.1109/POV.2011.5712368
Dan Gelb, A. Subramanian, K. Tan
Video conferencing systems are designed to deliver a collaboration experience that is as close as possible to actually meeting in person. Current systems, however, do a poor job of integrating video streams presenting the users with shared collaboration content. Real and virtual content are unnaturally separated, leading to problems with nonverbal communication and the overall conference experience. Methods of interacting with shared content are typically limited to pointing with a mouse, which is not a natural component of face-to-face human conversation. This paper presents a natural and intuitive method for sharing digital content within a meeting using augmented reality and computer vision. Real and virtual content is seamlessly integrated into the collaboration space. We develop new vision based methods for interacting with inserted digital content including target finding and gesture based control. These improvements let us deliver an immersive collaboration experience using natural gesture and object based interaction.
{"title":"Augmented reality for immersive remote collaboration","authors":"Dan Gelb, A. Subramanian, K. Tan","doi":"10.1109/POV.2011.5712368","DOIUrl":"https://doi.org/10.1109/POV.2011.5712368","url":null,"abstract":"Video conferencing systems are designed to deliver a collaboration experience that is as close as possible to actually meeting in person. Current systems, however, do a poor job of integrating video streams presenting the users with shared collaboration content. Real and virtual content are unnaturally separated, leading to problems with nonverbal communication and the overall conference experience. Methods of interacting with shared content are typically limited to pointing with a mouse, which is not a natural component of face-to-face human conversation. This paper presents a natural and intuitive method for sharing digital content within a meeting using augmented reality and computer vision. Real and virtual content is seamlessly integrated into the collaboration space. We develop new vision based methods for interacting with inserted digital content including target finding and gesture based control. These improvements let us deliver an immersive collaboration experience using natural gesture and object based interaction.","PeriodicalId":197184,"journal":{"name":"2011 IEEE Workshop on Person-Oriented Vision","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133851509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-02-14DOI: 10.1109/POV.2011.5712363
Daozheng Chen, M. Bilgic, L. Getoor, D. Jacobs, Lilyana Mihalkova, Tom Yeh
We address the problem of searching camera network videos to retrieve frames containing specified individuals. We show the benefit of utilizing a learned probabilistic model that captures dependencies among the cameras. In addition, we develop an active inference framework that can request human input at inference time, directing human attention to the portions of the videos whose correct annotation would provide the biggest performance improvements. Our primary contribution is to show that by mapping video frames in a camera network onto a graphical model, we can apply collective classification and active inference algorithms to significantly increase the performance of the retrieval system, while minimizing the number of human annotations required.
{"title":"Active inference for retrieval in camera networks","authors":"Daozheng Chen, M. Bilgic, L. Getoor, D. Jacobs, Lilyana Mihalkova, Tom Yeh","doi":"10.1109/POV.2011.5712363","DOIUrl":"https://doi.org/10.1109/POV.2011.5712363","url":null,"abstract":"We address the problem of searching camera network videos to retrieve frames containing specified individuals. We show the benefit of utilizing a learned probabilistic model that captures dependencies among the cameras. In addition, we develop an active inference framework that can request human input at inference time, directing human attention to the portions of the videos whose correct annotation would provide the biggest performance improvements. Our primary contribution is to show that by mapping video frames in a camera network onto a graphical model, we can apply collective classification and active inference algorithms to significantly increase the performance of the retrieval system, while minimizing the number of human annotations required.","PeriodicalId":197184,"journal":{"name":"2011 IEEE Workshop on Person-Oriented Vision","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127915955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-02-14DOI: 10.1109/POV.2011.5712364
Daesik Jang, G. Miller, S. Fels, S. Oldridge
This paper provides a novel approach for a user oriented language model for face detection. Even though there are many open source or commercial libraries to solve the problem of face detection, they are still hard to use because they require specific knowledge on details of algorithmic techniques. This paper proposes a high-level language model for face detection with which users can develop systems easily and even without specific knowledge on face detection theories and algorithms. Important conditions are firstly considered to categorize the large problem space of face detection. The conditions identified here are then represented as expressions in terms of a language model so that developers can use them to express various problems. Once the conditions are expressed by users, the proposed associated interpreter interprets the conditions to find and organize the best algorithms to solve the represented problem with corresponding conditions. We show a proof-of-concept implementation and some test and analyze example problems to show the ease of use and usability.
{"title":"User oriented language model for face detection","authors":"Daesik Jang, G. Miller, S. Fels, S. Oldridge","doi":"10.1109/POV.2011.5712364","DOIUrl":"https://doi.org/10.1109/POV.2011.5712364","url":null,"abstract":"This paper provides a novel approach for a user oriented language model for face detection. Even though there are many open source or commercial libraries to solve the problem of face detection, they are still hard to use because they require specific knowledge on details of algorithmic techniques. This paper proposes a high-level language model for face detection with which users can develop systems easily and even without specific knowledge on face detection theories and algorithms. Important conditions are firstly considered to categorize the large problem space of face detection. The conditions identified here are then represented as expressions in terms of a language model so that developers can use them to express various problems. Once the conditions are expressed by users, the proposed associated interpreter interprets the conditions to find and organize the best algorithms to solve the represented problem with corresponding conditions. We show a proof-of-concept implementation and some test and analyze example problems to show the ease of use and usability.","PeriodicalId":197184,"journal":{"name":"2011 IEEE Workshop on Person-Oriented Vision","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123348145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}