Andrea Bernardini, C. Delogu, E. Pallotti, Luca Costantini
Archeological remnants in urban areas tend to be included in the urban landscape or even remain hidden in subterranean locations which are not visible and, for these reasons, they are accessed with difficulty by visitors. In our previous experience, we developed a mobile application, which guided visitors in real time through various archaeological sites using texts, images, and videos. The results of an evaluation test which collected visitors' impressions and suggestions showed us that the mobile application allowed them to visit archeological remnants in a more participative way but that most visitors were unable to imagine what relation the archaeological remnants had with the ancient urban landscape. To solve this problem and improve the visitors' experience, we are now working at another application, which combines historical and archeological details with an immersive experience. The mobile application recognizes a cultural heritage element by image recognition or by positioning and it augments the interface with various layers of information. Furthermore, the application will provide not only information but it will offer to visitors an emotional experience.
{"title":"Living the Past: Augmented Reality and Archeology","authors":"Andrea Bernardini, C. Delogu, E. Pallotti, Luca Costantini","doi":"10.1109/ICMEW.2012.67","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.67","url":null,"abstract":"Archeological remnants in urban areas tend to be included in the urban landscape or even remain hidden in subterranean locations which are not visible and, for these reasons, they are accessed with difficulty by visitors. In our previous experience, we developed a mobile application, which guided visitors in real time through various archaeological sites using texts, images, and videos. The results of an evaluation test which collected visitors' impressions and suggestions showed us that the mobile application allowed them to visit archeological remnants in a more participative way but that most visitors were unable to imagine what relation the archaeological remnants had with the ancient urban landscape. To solve this problem and improve the visitors' experience, we are now working at another application, which combines historical and archeological details with an immersive experience. The mobile application recognizes a cultural heritage element by image recognition or by positioning and it augments the interface with various layers of information. Furthermore, the application will provide not only information but it will offer to visitors an emotional experience.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123174379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In texture-plus-depth format of 3D visual data, texture and depth maps of multiple viewpoints are coded and transmitted at sender. At receiver, decoded texture and depth maps of two neighboring viewpoints are used to synthesize a desired intermediate view via depth-image-based rendering (DIBR). In this paper, to enable transmission of depth maps at low resolution for bit saving, we propose a novel super-resolution (SR) algorithm to increase the resolution of the received depth map at decoder to match the corresponding received high resolution texture map for DIBR. Unlike previous depth map SR techniques that only utilize the texture map of the same view 0 to interpolate missing depth pixels of view 0, we use texture maps of the same and neighboring viewpoints, 0 and 1, so that the error between the original texture map of view 1 and the synthesized image of view 1 (interpolated using texture and depth maps of view 0) can be used as a regularization term during depth map SR of view 0. Further, piecewise smoothness of the reconstructed depth map is enforced by computing only the lowest frequency coefficients in Graph based Transform (GBT) domain for each interpolated block. Experimental results show that our SR scheme out-performed a previous scheme by up to 1.7dB in synthesized view quality in PSNR.
{"title":"Depth Map Super-Resolution Using Synthesized View Matching for Depth-Image-Based Rendering","authors":"Wei Hu, Gene Cheung, Xin Li, O. Au","doi":"10.1109/ICMEW.2012.111","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.111","url":null,"abstract":"In texture-plus-depth format of 3D visual data, texture and depth maps of multiple viewpoints are coded and transmitted at sender. At receiver, decoded texture and depth maps of two neighboring viewpoints are used to synthesize a desired intermediate view via depth-image-based rendering (DIBR). In this paper, to enable transmission of depth maps at low resolution for bit saving, we propose a novel super-resolution (SR) algorithm to increase the resolution of the received depth map at decoder to match the corresponding received high resolution texture map for DIBR. Unlike previous depth map SR techniques that only utilize the texture map of the same view 0 to interpolate missing depth pixels of view 0, we use texture maps of the same and neighboring viewpoints, 0 and 1, so that the error between the original texture map of view 1 and the synthesized image of view 1 (interpolated using texture and depth maps of view 0) can be used as a regularization term during depth map SR of view 0. Further, piecewise smoothness of the reconstructed depth map is enforced by computing only the lowest frequency coefficients in Graph based Transform (GBT) domain for each interpolated block. Experimental results show that our SR scheme out-performed a previous scheme by up to 1.7dB in synthesized view quality in PSNR.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115282872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Yin, Mingzhi Dong, Weihong Deng, Jun Guo, Bin Zhang
This paper, guided by Statistical Color Models, proposes a real-time Adult Video detector to filter the adult content in the video. A generic color model is constructed by statistical analysis of the sample images containing adult pixels. We fully utilize the video continuity characteristics, i.e. preceding and following N frames considered in the classification. Our method, through experimental, displays a satisfactory performance for detecting adult content. The reminder of the paper addresses the application of real-time adult video filter that blocks adult content from kids.
{"title":"Statistical Color Model Based Adult Video Filter","authors":"Liang Yin, Mingzhi Dong, Weihong Deng, Jun Guo, Bin Zhang","doi":"10.1109/ICMEW.2012.66","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.66","url":null,"abstract":"This paper, guided by Statistical Color Models, proposes a real-time Adult Video detector to filter the adult content in the video. A generic color model is constructed by statistical analysis of the sample images containing adult pixels. We fully utilize the video continuity characteristics, i.e. preceding and following N frames considered in the classification. Our method, through experimental, displays a satisfactory performance for detecting adult content. The reminder of the paper addresses the application of real-time adult video filter that blocks adult content from kids.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116748237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunhui Shi, He Li, Jin Wang, Wenpeng Ding, Baocai Yin
This paper proposes a new method of inter prediction based on low-rank matrix completion. By collection and rearrangement, image regions with high correlations can be used to generate a low-rank or approximately low-rank matrix. We view prediction values as the missing part in an incomplete low-rank matrix, and obtain the prediction by recovering the generated low-rank matrix. Taking advantage of exact recovery of incomplete matrix, the low-rank based prediction can exploit temporal correlation better. Our proposed prediction has the advantage of higher accuracy and less extra information, as the motion vector doesn't need to be encoded. Simulation results show that the bit-rate saving of the proposed scheme can reach up to 9.91% compared with H.264/AVC. Our scheme also outperforms the counterpart of the Template Matching Averaging (TMA) prediction by 8.06% at most.
{"title":"Inter Prediction Based on Low-rank Matrix Completion","authors":"Yunhui Shi, He Li, Jin Wang, Wenpeng Ding, Baocai Yin","doi":"10.1109/ICMEW.2012.98","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.98","url":null,"abstract":"This paper proposes a new method of inter prediction based on low-rank matrix completion. By collection and rearrangement, image regions with high correlations can be used to generate a low-rank or approximately low-rank matrix. We view prediction values as the missing part in an incomplete low-rank matrix, and obtain the prediction by recovering the generated low-rank matrix. Taking advantage of exact recovery of incomplete matrix, the low-rank based prediction can exploit temporal correlation better. Our proposed prediction has the advantage of higher accuracy and less extra information, as the motion vector doesn't need to be encoded. Simulation results show that the bit-rate saving of the proposed scheme can reach up to 9.91% compared with H.264/AVC. Our scheme also outperforms the counterpart of the Template Matching Averaging (TMA) prediction by 8.06% at most.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122627128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Allocating computing resources to different tasks of surveillance systems has always been a big challenge. The problem becomes complicated when it requires dealing with real-time computation and decision making as the system cannot afford of processing all sensory feeds and execute computationally expensive algorithms. In multi-modal surveillance systems, real-time event detection and understanding of a situation is crucial. So, the proper use of computing resources is necessary to control and manage an area of surveillance. This paper introduces a dynamic task scheduling technique considering available computing resources and real-time requirement according to the current surveillance context. The task scheduler determines the importance of each sensor with respect to its observation and surrounding context. The scheduler dynamically allocates CPU clock to data streams of each sensor so that it can minimize event detection time from the time of its occurrence. The simulation results reveal that the task scheduler can offer proper resource utilization which is valuable for surveillance systems.
{"title":"Dynamic Resource Allocation for Event Processing in Surveillance Systems","authors":"D. Ahmed","doi":"10.1109/ICMEW.2012.74","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.74","url":null,"abstract":"Allocating computing resources to different tasks of surveillance systems has always been a big challenge. The problem becomes complicated when it requires dealing with real-time computation and decision making as the system cannot afford of processing all sensory feeds and execute computationally expensive algorithms. In multi-modal surveillance systems, real-time event detection and understanding of a situation is crucial. So, the proper use of computing resources is necessary to control and manage an area of surveillance. This paper introduces a dynamic task scheduling technique considering available computing resources and real-time requirement according to the current surveillance context. The task scheduler determines the importance of each sensor with respect to its observation and surrounding context. The scheduler dynamically allocates CPU clock to data streams of each sensor so that it can minimize event detection time from the time of its occurrence. The simulation results reveal that the task scheduler can offer proper resource utilization which is valuable for surveillance systems.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122733632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Mcgarry, Jesus Hernandez, R. Ferzli, V. Syrotiuk
We investigated the use of caching of packets containing video at intermediary routers to reduce the delay and energy consumption of Automatic Repeat reQuest (ARQ) error recovery. We modeled the two mathematical programs that select the optimal set of routers to have caching ability, one to minimize energy consumption and the other to minimize retransmission delay. Both of these mathematical programs have identical structure. We then solve these mathematical programs with a dynamic programming solution whose execution time growth is polynomial in the size of the input parameters. Our performance analysis indicates that the optimal solution significantly outperforms several heuristic solutions.
{"title":"Minimizing Video Retransmission Delay and Energy Consumption with Caching Routers","authors":"M. Mcgarry, Jesus Hernandez, R. Ferzli, V. Syrotiuk","doi":"10.1109/ICMEW.2012.25","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.25","url":null,"abstract":"We investigated the use of caching of packets containing video at intermediary routers to reduce the delay and energy consumption of Automatic Repeat reQuest (ARQ) error recovery. We modeled the two mathematical programs that select the optimal set of routers to have caching ability, one to minimize energy consumption and the other to minimize retransmission delay. Both of these mathematical programs have identical structure. We then solve these mathematical programs with a dynamic programming solution whose execution time growth is polynomial in the size of the input parameters. Our performance analysis indicates that the optimal solution significantly outperforms several heuristic solutions.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116219501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Rahman, M. Pickering, D. Kerr, C. Boushey, E. Delp
Poor diet is one of the key determinants of an individual's risk of developing chronic diseases. Assessing what people eat is fundamental to establishing the link between diet and disease. Food records are considered the best approach for assessing energy intake however paper-based food recording is cumbersome and often inaccurate. Researchers have begun to explore how mobile devices can be used to reduce the burden of recording nutritional intake. The integrated camera in a mobile phone can be used for capturing images of food consumed. These images are then processed to automatically identify the food items for record keeping purposes. In such systems, the accurate classification of food items in these images is vital to the success of such a system. In this paper we will present a new method for generating texture features from food images and demonstrate that this new feature provides greater food classification accuracy for a mobile phone based dietary assessment system.
{"title":"A New Texture Feature for Improved Food Recognition Accuracy in a Mobile Phone Based Dietary Assessment System","authors":"M. Rahman, M. Pickering, D. Kerr, C. Boushey, E. Delp","doi":"10.1109/ICMEW.2012.79","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.79","url":null,"abstract":"Poor diet is one of the key determinants of an individual's risk of developing chronic diseases. Assessing what people eat is fundamental to establishing the link between diet and disease. Food records are considered the best approach for assessing energy intake however paper-based food recording is cumbersome and often inaccurate. Researchers have begun to explore how mobile devices can be used to reduce the burden of recording nutritional intake. The integrated camera in a mobile phone can be used for capturing images of food consumed. These images are then processed to automatically identify the food items for record keeping purposes. In such systems, the accurate classification of food items in these images is vital to the success of such a system. In this paper we will present a new method for generating texture features from food images and demonstrate that this new feature provides greater food classification accuracy for a mobile phone based dietary assessment system.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125033846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current approaches for 3D reconstruction from feature points of images are classed as sparse and dense techniques. However, the sparse approaches are insufficient for surface reconstruction since only sparsely distributed feature points are presented. Further, existing dense reconstruction approaches require pre-calibrated camera orientation, which limits the applicability and flexibility. This paper proposes a one-stop 3D reconstruction solution that reconstructs a highly dense surface from an uncalibrated video sequence, the camera orientations and surface reconstruction are simultaneously computed from new dense point features using an approach motivated by Structure from Motion (SfM) techniques. Further, this paper presents a flexible automatic method with the simple interface of 'videos to 3D model'. These improvements are essential to practical applications in 3D modeling and visualization. The reliability of the proposed algorithm has been tested on various data sets and the accuracy and performance are compared with both sparse and dense reconstruction benchmark algorithms.
{"title":"A Dense 3D Reconstruction Approach from Uncalibrated Video Sequences","authors":"L. Ling, I. Burnett, E. Cheng","doi":"10.1109/ICMEW.2012.108","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.108","url":null,"abstract":"Current approaches for 3D reconstruction from feature points of images are classed as sparse and dense techniques. However, the sparse approaches are insufficient for surface reconstruction since only sparsely distributed feature points are presented. Further, existing dense reconstruction approaches require pre-calibrated camera orientation, which limits the applicability and flexibility. This paper proposes a one-stop 3D reconstruction solution that reconstructs a highly dense surface from an uncalibrated video sequence, the camera orientations and surface reconstruction are simultaneously computed from new dense point features using an approach motivated by Structure from Motion (SfM) techniques. Further, this paper presents a flexible automatic method with the simple interface of 'videos to 3D model'. These improvements are essential to practical applications in 3D modeling and visualization. The reliability of the proposed algorithm has been tested on various data sets and the accuracy and performance are compared with both sparse and dense reconstruction benchmark algorithms.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122951378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Wu, K. Seshadrinathan, Wei Sun, M. E. Choubassi, J. Ratcliff, I. Kozintsev
The popularity of mobile photography paves the way to create new ways of viewing, interacting and enabling a user's creative expression with personal media. In this paper, we describe an instantaneous and automatic method to localize the camera and enable segmentation of foreground objects such as people from an input image, assuming knowledge of the environment in which the image was taken. Camera localization is performed by comparing multiple views of the 3D environment against the uncalibrated input image. Following localization, selected views of the 3D environment are aligned, color-mapped and compared against the input image to segment the foreground content. We demonstrate results using our proposed system in two illustrative applications: a virtual game played between multiple users involving virtual projectiles and a group shot of multiple people who may not be available simultaneously at the same time or place created against a background of their choice.
{"title":"Creative Transformations of Personal Photographs","authors":"Yi Wu, K. Seshadrinathan, Wei Sun, M. E. Choubassi, J. Ratcliff, I. Kozintsev","doi":"10.1109/ICMEW.2012.87","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.87","url":null,"abstract":"The popularity of mobile photography paves the way to create new ways of viewing, interacting and enabling a user's creative expression with personal media. In this paper, we describe an instantaneous and automatic method to localize the camera and enable segmentation of foreground objects such as people from an input image, assuming knowledge of the environment in which the image was taken. Camera localization is performed by comparing multiple views of the 3D environment against the uncalibrated input image. Following localization, selected views of the 3D environment are aligned, color-mapped and compared against the input image to segment the foreground content. We demonstrate results using our proposed system in two illustrative applications: a virtual game played between multiple users involving virtual projectiles and a group shot of multiple people who may not be available simultaneously at the same time or place created against a background of their choice.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121846949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Rzepecki, Jonathan Delcourt, Matthieu Perreira Da Silva, P. Callet
Science and technology progress fast, but mouse and keyboard are still used to control multimedia devices. One of the limiting factors of gesture based HCIs adoption is the detection of the user's intention to interact. This study tries to make a step in that direction with use of consumer EEG sensor headset. EEG headset records in real-time data that can help to identify intention of the user based on his emotional state. For each subject EEG responses for different stimuli are recorded. Acquiring these data allows to determine the potential of EEG based intention detection. The findings are promising and with proper implementation should allow to building a new type of HCI devices.
{"title":"Virtual interactions: Can EEG Help Make the Difference with Real Interaction?","authors":"J. Rzepecki, Jonathan Delcourt, Matthieu Perreira Da Silva, P. Callet","doi":"10.1109/ICMEW.2012.33","DOIUrl":"https://doi.org/10.1109/ICMEW.2012.33","url":null,"abstract":"Science and technology progress fast, but mouse and keyboard are still used to control multimedia devices. One of the limiting factors of gesture based HCIs adoption is the detection of the user's intention to interact. This study tries to make a step in that direction with use of consumer EEG sensor headset. EEG headset records in real-time data that can help to identify intention of the user based on his emotional state. For each subject EEG responses for different stimuli are recorded. Acquiring these data allows to determine the potential of EEG based intention detection. The findings are promising and with proper implementation should allow to building a new type of HCI devices.","PeriodicalId":385797,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo Workshops","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122795308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}