A platform of exploratory networked robotic cameras was created, utilizing an aesthetic approach to experimentation. Initiated by research in autonomous swarm robotic camera behavior, SwarmVision is an installation consisting of multiple Pan-Tilt-Zoom cameras on rails positioned above spectators in an exhibition space, where each camera behaves autonomously based on its own rules of computer vision and control. Each of the cameras is programmed to detect visual information of interest based on a different algorithm, and each negotiates with the other two, influencing what subject matter to study in a collective way. The emergent behaviors of the system illustrate an ongoing process of scene
{"title":"SwarmVision: autonomous aesthetic multi-camera interaction","authors":"G. Legrady, D. Bazo, Marco Pinter","doi":"10.1145/2502081.2502192","DOIUrl":"https://doi.org/10.1145/2502081.2502192","url":null,"abstract":"A platform of exploratory networked robotic cameras was created, utilizing an aesthetic approach to experimentation. Initiated by research in autonomous swarm robotic camera behavior, SwarmVision is an installation consisting of multiple Pan-Tilt-Zoom cameras on rails positioned above spectators in an exhibition space, where each camera behaves autonomously based on its own rules of computer vision and control. Each of the cameras is programmed to detect visual information of interest based on a different algorithm, and each negotiates with the other two, influencing what subject matter to study in a collective way. The emergent behaviors of the system illustrate an ongoing process of scene","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86578535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the semantic gap, the low-level features are not able to semantically represent images well. Besides, traditional semantic related image representation may not be able to cope with large inter class variations and are not very robust to noise. To solve these problems, in this paper, we propose a novel image representation method in the sub-semantic space. First, examplar classifiers are trained by separating each training image from the others and serve as the weak semantic similarity measurement. Then a graph is constructed by combining the visual similarity and weak semantic similarity of these training images. We partition this graph into visually and semantically similar sub-sets. Each sub-set of images are then used to train classifiers in order to separate this sub-set from the others. The learned sub-set classifiers are then used to construct a sub-semantic space based representation of images. This sub-semantic space is not only more semantically meaningful but also more reliable and resistant to noise. Finally, we make categorization of images using this sub-semantic space based representation on several public datasets to demonstrate the effectiveness of the proposed method.
{"title":"Beyond bag of words: image representation in sub-semantic space","authors":"Chunjie Zhang, Shuhui Wang, Chao Liang, J. Liu, Qingming Huang, Haojie Li, Q. Tian","doi":"10.1145/2502081.2502132","DOIUrl":"https://doi.org/10.1145/2502081.2502132","url":null,"abstract":"Due to the semantic gap, the low-level features are not able to semantically represent images well. Besides, traditional semantic related image representation may not be able to cope with large inter class variations and are not very robust to noise. To solve these problems, in this paper, we propose a novel image representation method in the sub-semantic space. First, examplar classifiers are trained by separating each training image from the others and serve as the weak semantic similarity measurement. Then a graph is constructed by combining the visual similarity and weak semantic similarity of these training images. We partition this graph into visually and semantically similar sub-sets. Each sub-set of images are then used to train classifiers in order to separate this sub-set from the others. The learned sub-set classifiers are then used to construct a sub-semantic space based representation of images. This sub-semantic space is not only more semantically meaningful but also more reliable and resistant to noise. Finally, we make categorization of images using this sub-semantic space based representation on several public datasets to demonstrate the effectiveness of the proposed method.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"89 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91192213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Establishing correct correspondences between two images has a wide range of applications, such as 2D and 3D registration, structure from motion, and image retrieval. In this paper, we propose a new matching method based on spatial constraints. The proposed method has linear time complexity, and is efficient when applying it to image retrieval. The main assumption behind our method is that, the local geometric structure among a feature point and its neighbors, is not easily affected by both geometric and photometric transformations, and thus should be preserved in their corresponding images. We model this local geometric structure by linear coefficients that reconstruct the point from its neighbors. The method is flexible, as it can not only estimate the number of correct matches between two images efficiently, but also determine the correctness of each match accurately. Furthermore, it is simple and easy to be implemented. When applying the proposed method on re-ranking images in an image search engine, it outperforms the-state-of-the-art techniques.
{"title":"Locality preserving verification for image search","authors":"Shanmin Pang, Jianru Xue, Nanning Zheng, Q. Tian","doi":"10.1145/2502081.2502140","DOIUrl":"https://doi.org/10.1145/2502081.2502140","url":null,"abstract":"Establishing correct correspondences between two images has a wide range of applications, such as 2D and 3D registration, structure from motion, and image retrieval. In this paper, we propose a new matching method based on spatial constraints. The proposed method has linear time complexity, and is efficient when applying it to image retrieval. The main assumption behind our method is that, the local geometric structure among a feature point and its neighbors, is not easily affected by both geometric and photometric transformations, and thus should be preserved in their corresponding images. We model this local geometric structure by linear coefficients that reconstruct the point from its neighbors. The method is flexible, as it can not only estimate the number of correct matches between two images efficiently, but also determine the correctness of each match accurately. Furthermore, it is simple and easy to be implemented. When applying the proposed method on re-ranking images in an image search engine, it outperforms the-state-of-the-art techniques.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75362746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianfeng Wang, Jingdong Wang, Nenghai Yu, Shipeng Li
In this paper, we propose a novel method to learn similarity-preserving hash functions for approximate nearest neighbor (NN) search. The key idea is to learn hash functions by maximizing the alignment between the similarity orders computed from the original space and the ones in the hamming space. The problem of mapping the NN points into different hash codes is taken as a classification problem in which the points are categorized into several groups according to the hamming distances to the query. The hash functions are optimized from the classifiers pooled over the training points. Experimental results demonstrate the superiority of our approach over existing state-of-the-art hashing techniques.
{"title":"Order preserving hashing for approximate nearest neighbor search","authors":"Jianfeng Wang, Jingdong Wang, Nenghai Yu, Shipeng Li","doi":"10.1145/2502081.2502100","DOIUrl":"https://doi.org/10.1145/2502081.2502100","url":null,"abstract":"In this paper, we propose a novel method to learn similarity-preserving hash functions for approximate nearest neighbor (NN) search. The key idea is to learn hash functions by maximizing the alignment between the similarity orders computed from the original space and the ones in the hamming space. The problem of mapping the NN points into different hash codes is taken as a classification problem in which the points are categorized into several groups according to the hamming distances to the query. The hash functions are optimized from the classifiers pooled over the training points. Experimental results demonstrate the superiority of our approach over existing state-of-the-art hashing techniques.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75569000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A collage is an artistic composition made by assembling different parts to create a new whole. This procedure can be applied for assembling tridimensional objects. In this paper we present CollARt, a Mobile Augmented Reality application which permits to create 3D photo collages. Virtual pieces are textured with pictures taken with the camera and can be blended with real objects. A preliminary user study (N=12) revealed that participants were able to create interesting works of art. The evaluation also suggested that the possibility of itinerantly mixing virtual pieces with the real world increases creativity.
{"title":"CollARt: a tool for creating 3D photo collages using mobile augmented reality","authors":"A. Marzo, Oscar Ardaiz","doi":"10.1145/2502081.2502154","DOIUrl":"https://doi.org/10.1145/2502081.2502154","url":null,"abstract":"A collage is an artistic composition made by assembling different parts to create a new whole. This procedure can be applied for assembling tridimensional objects. In this paper we present CollARt, a Mobile Augmented Reality application which permits to create 3D photo collages. Virtual pieces are textured with pictures taken with the camera and can be blended with real objects. A preliminary user study (N=12) revealed that participants were able to create interesting works of art. The evaluation also suggested that the possibility of itinerantly mixing virtual pieces with the real world increases creativity.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"139 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74230992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper focuses on fine-grained classification by detecting photographed text in images. We introduce a text detection method that does not try to detect all possible foreground text regions but instead aims to reconstruct the scene background to eliminate non-text regions. Object cues such as color, contrast, and objectiveness are used in corporation with a random forest classifier to detect background pixels in the scene. Results on two publicly available datasets ICDAR03 and a fine-grained Building subcategories of ImageNet shows the effectiveness of the proposed method.
{"title":"Con-text: text detection using background connectivity for fine-grained object classification","authors":"Sezer Karaoglu, J. V. Gemert, T. Gevers","doi":"10.1145/2502081.2502197","DOIUrl":"https://doi.org/10.1145/2502081.2502197","url":null,"abstract":"This paper focuses on fine-grained classification by detecting photographed text in images. We introduce a text detection method that does not try to detect all possible foreground text regions but instead aims to reconstruct the scene background to eliminate non-text regions. Object cues such as color, contrast, and objectiveness are used in corporation with a random forest classifier to detect background pixels in the scene. Results on two publicly available datasets ICDAR03 and a fine-grained Building subcategories of ImageNet shows the effectiveness of the proposed method.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74012323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a video playback system that allows user to expend the field of view to surrounding environments that are not visible in the original video frame, arbitrarily change the viewing angles, and see the superimposed point-of-interest (POIs) data in an augmented reality manner during the video playback. The processing consists of two main steps: in the first step, client uploads a video to the GeoVideo Engine, and then the GeoVideo Engine extracts the geo-metadata and returns them back to the client; in the second step, client requests POIs from server, and then the client renders the video with POIs.
{"title":"Augmented and interactive video playback based on global camera pose","authors":"Junsheng Fu, Lixin Fan, Yu You, Kimmo Roimela","doi":"10.1145/2502081.2502269","DOIUrl":"https://doi.org/10.1145/2502081.2502269","url":null,"abstract":"This paper proposes a video playback system that allows user to expend the field of view to surrounding environments that are not visible in the original video frame, arbitrarily change the viewing angles, and see the superimposed point-of-interest (POIs) data in an augmented reality manner during the video playback. The processing consists of two main steps: in the first step, client uploads a video to the GeoVideo Engine, and then the GeoVideo Engine extracts the geo-metadata and returns them back to the client; in the second step, client requests POIs from server, and then the client renders the video with POIs.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79355823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shih-Yao Lin, Chuen-Kai Shie, Shen-Chi Chen, Y. Hung
To achieve maximum mobility, device-less approaches for home appliance remote control have received increasing attention in recent years. In this paper, we propose a screen-less virtual touch panel, called AirTouch Panel, which can be positioned at any place with various orientations around users. The proposed virtual touch panel provides a potential ability to remotely control the home appliances, such as television, air conditioner, and so on. The proposed system allows users to anchor the panel at the place with comfortable poses. If the users want to change panel's position or orientation, they only need to re-anchor it, and then the panel will be reset. In this paper, our main contribution is to design a re-anchorable virtual panel for digital home remote control. Most importantly, we explore the design of such imaginary interface through two user studies. In our user studies, we analyze task completion time, satisfaction rate, and the number of miss-clicks. We are interested in the feasibility issues, for example, proper click gesture, panel size and button size, etc. Moreover, based on the AirTouch Panel, we also developed an intelligent TV to demonstrate the usability for controlling home appliance.
{"title":"AirTouch panel: a re-anchorable virtual touch panel","authors":"Shih-Yao Lin, Chuen-Kai Shie, Shen-Chi Chen, Y. Hung","doi":"10.1145/2502081.2502164","DOIUrl":"https://doi.org/10.1145/2502081.2502164","url":null,"abstract":"To achieve maximum mobility, device-less approaches for home appliance remote control have received increasing attention in recent years. In this paper, we propose a screen-less virtual touch panel, called AirTouch Panel, which can be positioned at any place with various orientations around users. The proposed virtual touch panel provides a potential ability to remotely control the home appliances, such as television, air conditioner, and so on. The proposed system allows users to anchor the panel at the place with comfortable poses. If the users want to change panel's position or orientation, they only need to re-anchor it, and then the panel will be reset. In this paper, our main contribution is to design a re-anchorable virtual panel for digital home remote control. Most importantly, we explore the design of such imaginary interface through two user studies. In our user studies, we analyze task completion time, satisfaction rate, and the number of miss-clicks. We are interested in the feasibility issues, for example, proper click gesture, panel size and button size, etc. Moreover, based on the AirTouch Panel, we also developed an intelligent TV to demonstrate the usability for controlling home appliance.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84479203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This thesis investigates several aspects of Virtual Director technology, i.e. software capable of intelligent real-time selection of live media streams. It addresses several research questions in this interdisciplinary field with respect to how a generic Virtual Director framework can be constructed, and how its behavior can be modeled and formalized to realize professional applications with many parallel users within real-time constraints. Prototypes have been built for the applications of group videoconferencing and live event broadcast. The engine executes cinematic principles aiming to enhance the user experience. In group videoconferencing, a Virtual Director aims to support communication goals by selecting from multiple available streams, i.e. automating cuts between shots according to the communication situation. In event broadcast, it enables personalization by framing, animating and cutting virtual camera views as cropping from a high-resolution panorama. While the technical approach and framework has been evaluated in lab experiments, further evaluation involving potential users and cinematic professionals is ongoing.
{"title":"Virtual director technology for social video communication and live event broadcast production","authors":"Rene Kaiser","doi":"10.1145/2502081.2502213","DOIUrl":"https://doi.org/10.1145/2502081.2502213","url":null,"abstract":"This thesis investigates several aspects of Virtual Director technology, i.e. software capable of intelligent real-time selection of live media streams. It addresses several research questions in this interdisciplinary field with respect to how a generic Virtual Director framework can be constructed, and how its behavior can be modeled and formalized to realize professional applications with many parallel users within real-time constraints. Prototypes have been built for the applications of group videoconferencing and live event broadcast. The engine executes cinematic principles aiming to enhance the user experience. In group videoconferencing, a Virtual Director aims to support communication goals by selecting from multiple available streams, i.e. automating cuts between shots according to the communication situation. In event broadcast, it enables personalization by framing, animating and cutting virtual camera views as cropping from a high-resolution panorama. While the technical approach and framework has been evaluated in lab experiments, further evaluation involving potential users and cinematic professionals is ongoing.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81968165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spectral hashing (SpH) is an efficient and simple binary hashing method, which assumes that data are sampled from a multidimensional uniform distribution. However, this assumption is too restrictive in practice. In this paper we propose an improved method, Fitted Spectral Hashing, to relax this distribution assumption. Our work is based on the fact that one-dimensional data of any distribution could be mapped to a uniform distribution without changing the local neighbor relations among data items. We have found that this mapping on each PCA direction has certain regular pattern, and could fit data well by S-Curve function, Sigmoid function. With more parameters Fourier function also fit data well. Thus with Sigmoid function and Fourier function, we propose two binary hashing methods. Experiments show that our methods are efficient and outperform state-of-the-art methods.
{"title":"Fitted spectral hashing","authors":"Yu Wang, Sheng Tang, Yalin Zhang, Jintao Li, DanYi Chen","doi":"10.1145/2502081.2502169","DOIUrl":"https://doi.org/10.1145/2502081.2502169","url":null,"abstract":"Spectral hashing (SpH) is an efficient and simple binary hashing method, which assumes that data are sampled from a multidimensional uniform distribution. However, this assumption is too restrictive in practice. In this paper we propose an improved method, Fitted Spectral Hashing, to relax this distribution assumption. Our work is based on the fact that one-dimensional data of any distribution could be mapped to a uniform distribution without changing the local neighbor relations among data items. We have found that this mapping on each PCA direction has certain regular pattern, and could fit data well by S-Curve function, Sigmoid function. With more parameters Fourier function also fit data well. Thus with Sigmoid function and Fourier function, we propose two binary hashing methods. Experiments show that our methods are efficient and outperform state-of-the-art methods.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79418469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}