Marcus Ständer, Aristotelis Hadjakos, Niklas Lochschmidt, Christian Klos, B. Renner, M. Mühlhäuser
In the future our homes will be more and more equipped with sensing and interaction devices that will make new multimedia experiences possible. These experiences will not necessarily be bound to the TV, tabletop, smart phone, tablet or desktop computer but will be embedded in our everyday surroundings. In order to enable new forms of interaction, we equipped an ordinary kitchen with a large variety of sensors according to best practices. An innovation in comparison to related work is our Information Acquisition System that allows monitoring and controlling kitchen appliances remotely. This paper presents our sensing infrastructure and novel interactions in the kitchen that are enabled by the Information Acquisition System.
{"title":"A Smart Kitchen Infrastructure","authors":"Marcus Ständer, Aristotelis Hadjakos, Niklas Lochschmidt, Christian Klos, B. Renner, M. Mühlhäuser","doi":"10.1109/ISM.2012.27","DOIUrl":"https://doi.org/10.1109/ISM.2012.27","url":null,"abstract":"In the future our homes will be more and more equipped with sensing and interaction devices that will make new multimedia experiences possible. These experiences will not necessarily be bound to the TV, tabletop, smart phone, tablet or desktop computer but will be embedded in our everyday surroundings. In order to enable new forms of interaction, we equipped an ordinary kitchen with a large variety of sensors according to best practices. An innovation in comparison to related work is our Information Acquisition System that allows monitoring and controlling kitchen appliances remotely. This paper presents our sensing infrastructure and novel interactions in the kitchen that are enabled by the Information Acquisition System.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129328519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manual synchronization of audio and video recordings is a very annoying and time consuming task, especially if the tracks are very long and/or of large quantity. If the tracks aren't just short clips (of a few seconds or minutes) and recorded from heterogeneous sources, an additional problem comes into play - time drift - which arises if different recording devices aren't synchronized. This demo paper presents the experimental software Audio Align, which aims to simplify the manual synchronization process with the ultimate goal to automate it altogether. It gives a short introduction to the topic, discusses the approach, method, implementation and preliminary results and gives an outlook at possible improvements.
{"title":"AudioAlign - Synchronization of A/V-Streams Based on Audio Data","authors":"Mario Guggenberger, M. Lux, L. Böszörményi","doi":"10.1109/ISM.2012.79","DOIUrl":"https://doi.org/10.1109/ISM.2012.79","url":null,"abstract":"Manual synchronization of audio and video recordings is a very annoying and time consuming task, especially if the tracks are very long and/or of large quantity. If the tracks aren't just short clips (of a few seconds or minutes) and recorded from heterogeneous sources, an additional problem comes into play - time drift - which arises if different recording devices aren't synchronized. This demo paper presents the experimental software Audio Align, which aims to simplify the manual synchronization process with the ultimate goal to automate it altogether. It gives a short introduction to the topic, discusses the approach, method, implementation and preliminary results and gives an outlook at possible improvements.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116056541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present JIRL, an open source C++ software suite that allows to perform content-based image retrieval in the JPEG compressed domain. We provide implementations of nine retrieval algorithms representing the current state-of-the-art. For each algorithm, methods for compressed domain feature extraction as well as feature comparison are provided in an object-oriented framework. In addition, our software suite includes functionality for benchmarking retrieval algorithms in terms of retrieval performance and retrieval time. An example full image retrieval application is also provided to demonstrate how the library can be used. JIRL is made available to fellow researchers under the LGPL.
{"title":"JIRL - A C++ Library for JPEG Compressed Domain Image Retrieval","authors":"David Edmundson, G. Schaefer","doi":"10.1109/ISM.2012.48","DOIUrl":"https://doi.org/10.1109/ISM.2012.48","url":null,"abstract":"In this paper we present JIRL, an open source C++ software suite that allows to perform content-based image retrieval in the JPEG compressed domain. We provide implementations of nine retrieval algorithms representing the current state-of-the-art. For each algorithm, methods for compressed domain feature extraction as well as feature comparison are provided in an object-oriented framework. In addition, our software suite includes functionality for benchmarking retrieval algorithms in terms of retrieval performance and retrieval time. An example full image retrieval application is also provided to demonstrate how the library can be used. JIRL is made available to fellow researchers under the LGPL.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124981340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is well established that the human brain outperforms current computers, concerning pattern recognition tasks, through the collaborative processing of simple building units (neurons). In this work we expand an abstracted model of the neocortex called Hierarchical Quilted Self Organizing Map, benefiting from the parallel power of current Graphical Processing Units, to achieve realtime understanding and classification of spatio-temporal sensory information. We also propose an improvement on the original model that allows the learning rate to be automatically adapted according to the input training data available. The overall system is tested on the task of gesture recognition from a Microsoft Kinect publicly available dataset.
{"title":"GPU Hierarchical Quilted Self Organizing Maps for Multimedia Understanding","authors":"Y. Nashed","doi":"10.1109/ISM.2012.102","DOIUrl":"https://doi.org/10.1109/ISM.2012.102","url":null,"abstract":"It is well established that the human brain outperforms current computers, concerning pattern recognition tasks, through the collaborative processing of simple building units (neurons). In this work we expand an abstracted model of the neocortex called Hierarchical Quilted Self Organizing Map, benefiting from the parallel power of current Graphical Processing Units, to achieve realtime understanding and classification of spatio-temporal sensory information. We also propose an improvement on the original model that allows the learning rate to be automatically adapted according to the input training data available. The overall system is tested on the task of gesture recognition from a Microsoft Kinect publicly available dataset.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126042133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To ensure the preservation of cultural heritage, artifacts such as paintings must be analyzed to diagnose physical frailties that could result in permanent damage. Advancements in digital imaging techniques and computer-aided analysis have greatly aided in such diagnoses but can limit the ability to work directly with the artifact in the field. This paper presents the implementation and application of ARtifact, a tablet-based augmented reality system that enables on-site visual analysis of the artifact in question. Utilizing real-time tracking of the artifact under observation, a user interacting with the tablet can study various layers of data registered with the physical object in situ. Theses layers, representing data acquired through various imaging modalities such as infrared thermography and ultraviolet fluorescence, provide the user with an augmented view of the artifact to aid in on-site diagnosis and restoration. Intuitive interaction techniques further enable targeted analysis of artifact-related data. We present a case study utilizing our tablet system to analyze a 16th century Italian hall and highlight the benefits of our approach.
{"title":"ARtifact: Tablet-Based Augmented Reality for Interactive Analysis of Cultural Artifacts","authors":"D. Vanoni, M. Seracini, F. Kuester","doi":"10.1109/ISM.2012.17","DOIUrl":"https://doi.org/10.1109/ISM.2012.17","url":null,"abstract":"To ensure the preservation of cultural heritage, artifacts such as paintings must be analyzed to diagnose physical frailties that could result in permanent damage. Advancements in digital imaging techniques and computer-aided analysis have greatly aided in such diagnoses but can limit the ability to work directly with the artifact in the field. This paper presents the implementation and application of ARtifact, a tablet-based augmented reality system that enables on-site visual analysis of the artifact in question. Utilizing real-time tracking of the artifact under observation, a user interacting with the tablet can study various layers of data registered with the physical object in situ. Theses layers, representing data acquired through various imaging modalities such as infrared thermography and ultraviolet fluorescence, provide the user with an augmented view of the artifact to aid in on-site diagnosis and restoration. Intuitive interaction techniques further enable targeted analysis of artifact-related data. We present a case study utilizing our tablet system to analyze a 16th century Italian hall and highlight the benefits of our approach.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"766 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132969869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a face recognition system based on Discrete Tchebichef-Krawtchouk Transform DTKT and Support Vector Machines SVMs is proposed. The objective of this paper is to present the following: (1) the mathematical and theoretical frameworks for the definition of the DTKT including transform equations that need to be addressed. (2) the DTKT features used in the classification of faces. (3) results of empirical tests that compare the representational capabilities of this transform with other types of discrete transforms such as Discrete Tchebichef transform DTT, discrete Krawtchouk Transform DKT, and Discrete Cosine transform DCT. The system is tested on a large number of faces collected from ORL and Yale face databases. Empirical results show that the proposed transform gives very good overall accuracy under clean and noisy conditions.
{"title":"Face Recognition Using Discrete Tchebichef-Krawtchouk Transform","authors":"Wissam A. Jassim, Paramesran Raveendran","doi":"10.1109/ISM.2012.31","DOIUrl":"https://doi.org/10.1109/ISM.2012.31","url":null,"abstract":"In this paper, a face recognition system based on Discrete Tchebichef-Krawtchouk Transform DTKT and Support Vector Machines SVMs is proposed. The objective of this paper is to present the following: (1) the mathematical and theoretical frameworks for the definition of the DTKT including transform equations that need to be addressed. (2) the DTKT features used in the classification of faces. (3) results of empirical tests that compare the representational capabilities of this transform with other types of discrete transforms such as Discrete Tchebichef transform DTT, discrete Krawtchouk Transform DKT, and Discrete Cosine transform DCT. The system is tested on a large number of faces collected from ORL and Yale face databases. Empirical results show that the proposed transform gives very good overall accuracy under clean and noisy conditions.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130483070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Devlic, P. Lungaro, P. Kamaraju, Z. Segall, Konrad Tollmar
The arrival of smart phones and tablets, along with a flat rate mobile Internet pricing model have caused increasing adoption of mobile data services. According to recent studies, video has been the main driver of mobile data consumption, having a higher growth rate than any other mobile application. However, streaming a medium/high quality video files can be an issue in a mobile environment where available capacity needs to be shared among a large number of users. Additionally, the energy consumption in mobile devices increases proportionally with the duration of data transfers, which depend on the download data rates achievable by the device. In this respect, adoption of opportunistic content pre-fetching schemes that exploit times and locations with high data rates to deliver content before a user requests it, has the potential to reduce the energy consumption associated with content delivery and improve the user's quality of experience, by allowing playback of pre-stored content with virtually no perceived interruptions or delays. This paper presents a family of opportunistic content pre-fetching schemes and compares their performance to standard on-demand access to content. By adopting a simulation approach on experimental data, collected with monitoring software installed in mobile terminals, we show that content pre-fetching can reduce energy consumption of the mobile devices by up to 30% when compared to the on demand download of the same file, with a time window of 1 hour needed to complete the content prepositioning.
{"title":"Energy Consumption Reduction via Context-Aware Mobile Video Pre-fetching","authors":"A. Devlic, P. Lungaro, P. Kamaraju, Z. Segall, Konrad Tollmar","doi":"10.1109/ISM.2012.56","DOIUrl":"https://doi.org/10.1109/ISM.2012.56","url":null,"abstract":"The arrival of smart phones and tablets, along with a flat rate mobile Internet pricing model have caused increasing adoption of mobile data services. According to recent studies, video has been the main driver of mobile data consumption, having a higher growth rate than any other mobile application. However, streaming a medium/high quality video files can be an issue in a mobile environment where available capacity needs to be shared among a large number of users. Additionally, the energy consumption in mobile devices increases proportionally with the duration of data transfers, which depend on the download data rates achievable by the device. In this respect, adoption of opportunistic content pre-fetching schemes that exploit times and locations with high data rates to deliver content before a user requests it, has the potential to reduce the energy consumption associated with content delivery and improve the user's quality of experience, by allowing playback of pre-stored content with virtually no perceived interruptions or delays. This paper presents a family of opportunistic content pre-fetching schemes and compares their performance to standard on-demand access to content. By adopting a simulation approach on experimental data, collected with monitoring software installed in mobile terminals, we show that content pre-fetching can reduce energy consumption of the mobile devices by up to 30% when compared to the on demand download of the same file, with a time window of 1 hour needed to complete the content prepositioning.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127519656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present and evaluate a unified automatic image-based face detection and identification framework using two datasets of captive and free-living chimpanzee individuals gathered in uncontrolled environments. This application scenario implicates several challenging problems like different lighting situations, various expressions, partial occlusion, and non-cooperative subjects. After the faces and facial feature points are detected, we use a projective transformation to align the face images. All faces are then identified using an appearance-based face recognition approach in combination with additional information from local regions of the apes' face. We conducted open-set identification experiments for both datasets. Even though, the datasets are very challenging, the system achieved promising results and therefore has the potential to open up new ways in effective biodiversity conservation management.
{"title":"Detection and Identification of Chimpanzee Faces in the Wild","authors":"A. Loos, Andreas Ernst","doi":"10.1109/ISM.2012.30","DOIUrl":"https://doi.org/10.1109/ISM.2012.30","url":null,"abstract":"In this paper, we present and evaluate a unified automatic image-based face detection and identification framework using two datasets of captive and free-living chimpanzee individuals gathered in uncontrolled environments. This application scenario implicates several challenging problems like different lighting situations, various expressions, partial occlusion, and non-cooperative subjects. After the faces and facial feature points are detected, we use a projective transformation to align the face images. All faces are then identified using an appearance-based face recognition approach in combination with additional information from local regions of the apes' face. We conducted open-set identification experiments for both datasets. Even though, the datasets are very challenging, the system achieved promising results and therefore has the potential to open up new ways in effective biodiversity conservation management.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121224658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Content-based image retrieval (CBIR) has been an active research area for many years, yet much of the research ignores the fact that most images are stored in compressed form which affects retrieval both in terms of processing speed and retrieval accruacy. In this paper, we address various aspects of JPEG compressed images in the context of image retrieval. We first analyse the effect of JPEG quantisation on image retrieval and present a robust method to address the resulting performance drop. We then compare various retrieval methods that work in the JPEG compressed domain and finally propose two new methods that are based solely on information available in the JPEG header. One of these is using optimised Huffman tables for retrieval, while the other is based on tuned quantisation tables. Both techniques are shown to give retrieval performance comparable to existing methods while being magnitudes faster.
{"title":"Exploiting JPEG Compression for Image Retrieval","authors":"David Edmundson, G. Schaefer","doi":"10.1109/ISM.2012.99","DOIUrl":"https://doi.org/10.1109/ISM.2012.99","url":null,"abstract":"Content-based image retrieval (CBIR) has been an active research area for many years, yet much of the research ignores the fact that most images are stored in compressed form which affects retrieval both in terms of processing speed and retrieval accruacy. In this paper, we address various aspects of JPEG compressed images in the context of image retrieval. We first analyse the effect of JPEG quantisation on image retrieval and present a robust method to address the resulting performance drop. We then compare various retrieval methods that work in the JPEG compressed domain and finally propose two new methods that are based solely on information available in the JPEG header. One of these is using optimised Huffman tables for retrieval, while the other is based on tuned quantisation tables. Both techniques are shown to give retrieval performance comparable to existing methods while being magnitudes faster.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126194474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the area of multimedia semantic analysis and video retrieval, automatic object detection techniques play an important role. Without the analysis of the object-level features, it is hard to achieve high performance on semantic retrieval. As a branch of object detection study, moving object detection also becomes a hot research field and gets a great amount of progress recently. This paper proposes a moving object detection and retrieval model that integrates the spatial and temporal information in video sequences and uses the proposed integral density method (adopted from the idea of integral images) to quickly identify the motion regions in an unsupervised way. First, key information locations on video frames are achieved as maxima and minima of the result of Difference of Gaussian (DoG) function. On the other hand, a motion map of adjacent frames is obtained from the diversity of the outcomes from Simultaneous Partition and Class Parameter Estimation (SPCPE) framework. The motion map filters key information locations into key motion locations (KMLs) where the existence of moving objects is implied. Besides showing the motion zones, the motion map also indicates the motion direction which guides the proposed integral density approach to quickly and accurately locate the motion regions. The detection results are not only illustrated visually, but also verified by the promising experimental results which show the concept retrieval performance can be improved by integrating the global and local visual information.
{"title":"Effective Moving Object Detection and Retrieval via Integrating Spatial-Temporal Multimedia Information","authors":"Dianting Liu, M. Shyu","doi":"10.1109/ISM.2012.74","DOIUrl":"https://doi.org/10.1109/ISM.2012.74","url":null,"abstract":"In the area of multimedia semantic analysis and video retrieval, automatic object detection techniques play an important role. Without the analysis of the object-level features, it is hard to achieve high performance on semantic retrieval. As a branch of object detection study, moving object detection also becomes a hot research field and gets a great amount of progress recently. This paper proposes a moving object detection and retrieval model that integrates the spatial and temporal information in video sequences and uses the proposed integral density method (adopted from the idea of integral images) to quickly identify the motion regions in an unsupervised way. First, key information locations on video frames are achieved as maxima and minima of the result of Difference of Gaussian (DoG) function. On the other hand, a motion map of adjacent frames is obtained from the diversity of the outcomes from Simultaneous Partition and Class Parameter Estimation (SPCPE) framework. The motion map filters key information locations into key motion locations (KMLs) where the existence of moving objects is implied. Besides showing the motion zones, the motion map also indicates the motion direction which guides the proposed integral density approach to quickly and accurately locate the motion regions. The detection results are not only illustrated visually, but also verified by the promising experimental results which show the concept retrieval performance can be improved by integrating the global and local visual information.","PeriodicalId":282528,"journal":{"name":"2012 IEEE International Symposium on Multimedia","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129978763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}