In this work, a perceptual quality-regulable H.264 video encoder system has been developed. Exploiting the relationship between the reconstructed macro block and its best predicted macro block from mode decision, a novel quantization parameter prediction method is built and used to regulate the video quality according to a target perceptual quality. An automatic quality refinement scheme is also developed to achieve a better usage of bit budget. Moreover, with the aid of salient object detection, we further improve the quality on where human might focus on. The proposed algorithm achieves better bit allocation for video coding system by changing quantization parameters at macro block level. Compared to JM reference software with macro block layer rate control, the proposed algorithm achieves better and more stable quality with higher average SSIM index and smaller SSIM variation.
{"title":"System Design of Perceptual Quality-Regulable H.264 Video Encoder","authors":"Guan-Lin Wu, Yu-Jie Fu, Shao-Yi Chien","doi":"10.1109/ICME.2012.180","DOIUrl":"https://doi.org/10.1109/ICME.2012.180","url":null,"abstract":"In this work, a perceptual quality-regulable H.264 video encoder system has been developed. Exploiting the relationship between the reconstructed macro block and its best predicted macro block from mode decision, a novel quantization parameter prediction method is built and used to regulate the video quality according to a target perceptual quality. An automatic quality refinement scheme is also developed to achieve a better usage of bit budget. Moreover, with the aid of salient object detection, we further improve the quality on where human might focus on. The proposed algorithm achieves better bit allocation for video coding system by changing quantization parameters at macro block level. Compared to JM reference software with macro block layer rate control, the proposed algorithm achieves better and more stable quality with higher average SSIM index and smaller SSIM variation.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115435944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The usage of the hierarchical prediction structure in video coding has introduced a special type of temporal noise, i.e., the temporal pumping artifact. This artifact presents itself as severe quality fluctuations among adjacent pictures and is quite annoying due to the pumping or stumbling effect in perception. In this paper the fundamental reason of perception of the temporal pumping artifact is analyzed. The key factors influencing perception of temporal pumping artifact are evaluated based on subjective experiments, in terms of amplitude, frequency and phase of quality fluctuations, respectively. The detailed analysis suggests how the temporal pumping artifact can be well alleviated or even eliminated through adjusting coding parameters.
{"title":"Perception of Temporal Pumping Artifact in Video Coding with the Hierarchical Prediction Structure","authors":"Shuai Wan, Yanchao Gong, Fuzheng Yang","doi":"10.1109/ICME.2012.149","DOIUrl":"https://doi.org/10.1109/ICME.2012.149","url":null,"abstract":"The usage of the hierarchical prediction structure in video coding has introduced a special type of temporal noise, i.e., the temporal pumping artifact. This artifact presents itself as severe quality fluctuations among adjacent pictures and is quite annoying due to the pumping or stumbling effect in perception. In this paper the fundamental reason of perception of the temporal pumping artifact is analyzed. The key factors influencing perception of temporal pumping artifact are evaluated based on subjective experiments, in terms of amplitude, frequency and phase of quality fluctuations, respectively. The detailed analysis suggests how the temporal pumping artifact can be well alleviated or even eliminated through adjusting coding parameters.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115663406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a novel automatic image slideshow system that explores a new medium between images and music. It can be regarded as a new image selection and slideshow composition criterion. Based on the idea of ``hearing colors, seeing sounds" from the art of music visualization, equal importance is assigned to image features and audio properties for better synchronization. We minimize the aesthetic energy distance between visual and audio features. Given a set of images, a subset is selected by correlating image features with the input audio properties. The selected images are then synchronized with the music subclips by their audio-visual distance. The inductive image displaying approach has been introduced for common displaying devices.
{"title":"A Synaesthetic Approach for Image Slideshow Generation","authors":"Y. Xiang, M. Kankanhalli","doi":"10.1109/ICME.2012.75","DOIUrl":"https://doi.org/10.1109/ICME.2012.75","url":null,"abstract":"In this paper, we present a novel automatic image slideshow system that explores a new medium between images and music. It can be regarded as a new image selection and slideshow composition criterion. Based on the idea of ``hearing colors, seeing sounds\" from the art of music visualization, equal importance is assigned to image features and audio properties for better synchronization. We minimize the aesthetic energy distance between visual and audio features. Given a set of images, a subset is selected by correlating image features with the input audio properties. The selected images are then synchronized with the music subclips by their audio-visual distance. The inductive image displaying approach has been introduced for common displaying devices.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117260628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the stereoscopic high-definition (HD) frame-compatible formats, the separate left and right views are reduced in resolution and packed to fit within the same video frame as a conventional two-dimensional high-definition signal. Since they do not require additional transmission bandwidth and entail limited changes to the existing broadcasting infrastructure, these formats have been suggested for 3DTV. However, the convenience of frame-compatible formats comes at the expense of lower picture quality of the 3D signal. In this study, we evaluated the loss in picture quality of two frame-compatible formats: 1080i Side-by-Side and 720p Top/Bottom, using a subjective assessment experiment.
{"title":"Perceived Picture Quality of Frame-Compatible 3DTV Video Formats","authors":"F. Speranza, R. Renaud, A. Vincent, W. J. Tam","doi":"10.1109/ICME.2012.42","DOIUrl":"https://doi.org/10.1109/ICME.2012.42","url":null,"abstract":"In the stereoscopic high-definition (HD) frame-compatible formats, the separate left and right views are reduced in resolution and packed to fit within the same video frame as a conventional two-dimensional high-definition signal. Since they do not require additional transmission bandwidth and entail limited changes to the existing broadcasting infrastructure, these formats have been suggested for 3DTV. However, the convenience of frame-compatible formats comes at the expense of lower picture quality of the 3D signal. In this study, we evaluated the loss in picture quality of two frame-compatible formats: 1080i Side-by-Side and 720p Top/Bottom, using a subjective assessment experiment.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117338043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huayou Su, Jun Chai, M. Wen, Ju Ren, Chunyuan Zhang
In this paper, we present the parallelization design consideration for irregular algorithms of video processing on GPUs. Enrich parallelism can be exploited by scheduling the processing order or making a tradeoff between performance and parallelism for irregular algorithms (such as CAVLC and deblocking filter). We implement a component-oriented CAVLC encoder and a direction-oriented deblocking filter on GPUs. The experiment results show that, compared with the implementation on CPU, the optimized parallel methods achieve high performance in term of speedup ratio from 63 to 44, relatively for deblocking filter and CAVLC. It shows that the rich parallelism is one of the most important factors to gain high performance for irregular algorithms based on GPUs. In addition, it seems that for some irregular kernels, the number of SM of GPU is more important to the performance than the computation capability.
{"title":"Parallelization Design of Irregular Algorithms of Video Processing on GPUs","authors":"Huayou Su, Jun Chai, M. Wen, Ju Ren, Chunyuan Zhang","doi":"10.1109/ICME.2012.147","DOIUrl":"https://doi.org/10.1109/ICME.2012.147","url":null,"abstract":"In this paper, we present the parallelization design consideration for irregular algorithms of video processing on GPUs. Enrich parallelism can be exploited by scheduling the processing order or making a tradeoff between performance and parallelism for irregular algorithms (such as CAVLC and deblocking filter). We implement a component-oriented CAVLC encoder and a direction-oriented deblocking filter on GPUs. The experiment results show that, compared with the implementation on CPU, the optimized parallel methods achieve high performance in term of speedup ratio from 63 to 44, relatively for deblocking filter and CAVLC. It shows that the rich parallelism is one of the most important factors to gain high performance for irregular algorithms based on GPUs. In addition, it seems that for some irregular kernels, the number of SM of GPU is more important to the performance than the computation capability.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124892285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min-Chun Yang, De-An Huang, Chih-Yun Tsai, Y. Wang
We present a self-learning approach for single image super-resolution (SR), with the ability to preserve high frequency components such as edges in resulting high resolution (HR) images. Given a low-resolution (LR) input image, we construct its image pyramid and produce a super pixel dataset. By extracting context information from the super-pixels, we propose to deploy context-specific contour let transform on them in order to model the relationship (via support vector regression) between the input patches and their associated directional high-frequency responses. These learned models are applied to predict the SR output with satisfactory quality. Unlike prior learning-based SR methods, our approach advances a self-learning technique and does not require the self similarity of image patches within or across image scales. More importantly, we do not need to collect training LR/HR image data in advance and only require a single LR input image. Empirical results verify the effectiveness of our approach, which quantitatively and qualitatively outperforms existing interpolation or learning-based SR methods.
{"title":"Self-Learning of Edge-Preserving Single Image Super-Resolution via Contourlet Transform","authors":"Min-Chun Yang, De-An Huang, Chih-Yun Tsai, Y. Wang","doi":"10.1109/ICME.2012.169","DOIUrl":"https://doi.org/10.1109/ICME.2012.169","url":null,"abstract":"We present a self-learning approach for single image super-resolution (SR), with the ability to preserve high frequency components such as edges in resulting high resolution (HR) images. Given a low-resolution (LR) input image, we construct its image pyramid and produce a super pixel dataset. By extracting context information from the super-pixels, we propose to deploy context-specific contour let transform on them in order to model the relationship (via support vector regression) between the input patches and their associated directional high-frequency responses. These learned models are applied to predict the SR output with satisfactory quality. Unlike prior learning-based SR methods, our approach advances a self-learning technique and does not require the self similarity of image patches within or across image scales. More importantly, we do not need to collect training LR/HR image data in advance and only require a single LR input image. Empirical results verify the effectiveness of our approach, which quantitatively and qualitatively outperforms existing interpolation or learning-based SR methods.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125897436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the permeation of Web 2.0, large-scale user contributed images with tags are easily available on social websites. How to align these social tags with image regions is a challenging task while no additional human intervention is considered, but a valuable one since the alignment can provide more detailed image semantic information and improve the accuracy of image retrieval. To this end, we propose a large margin discriminative model for automatically locating unaligned and possibly noisy image-level tags to the corresponding regions, and the model is optimized using concave-convex procedure (CCCP). In the model, each image is considered as a bag of segmented regions, associated with a set of candidate labeling vectors. Each labeling vector encodes a possible label arrangement for the regions of an image. To make the size of admissible labels tractable, we adopt an effective strategy based on the consistency between visual similarity and semantic correlation to generate a more compact set of labeling vectors. Extensive experiments on MSRC and SAIAPR TC-12 databases have been conducted to demonstrate the encouraging performance of our method comparing with other baseline methods.
{"title":"Noisy Tag Alignment with Image Regions","authors":"Yang Liu, Jing Liu, Zechao Li, Hanqing Lu","doi":"10.1109/ICME.2012.143","DOIUrl":"https://doi.org/10.1109/ICME.2012.143","url":null,"abstract":"With the permeation of Web 2.0, large-scale user contributed images with tags are easily available on social websites. How to align these social tags with image regions is a challenging task while no additional human intervention is considered, but a valuable one since the alignment can provide more detailed image semantic information and improve the accuracy of image retrieval. To this end, we propose a large margin discriminative model for automatically locating unaligned and possibly noisy image-level tags to the corresponding regions, and the model is optimized using concave-convex procedure (CCCP). In the model, each image is considered as a bag of segmented regions, associated with a set of candidate labeling vectors. Each labeling vector encodes a possible label arrangement for the regions of an image. To make the size of admissible labels tractable, we adopt an effective strategy based on the consistency between visual similarity and semantic correlation to generate a more compact set of labeling vectors. Extensive experiments on MSRC and SAIAPR TC-12 databases have been conducted to demonstrate the encouraging performance of our method comparing with other baseline methods.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125296349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Sampaio, S. Bampi, M. Grellert, L. Agostini, J. Mattos
This paper presents the Motion Vectors Merging (MVM) heuristic, which is a method to reduce the HEVC inter-prediction complexity targeting the PU partition size decision. In the HM test model of the emerging HEVC standard, computational complexity is mostly concentrated in the inter-frame prediction step (up to 96% of the total encoder execution time, considering common test conditions). The goal of this work is to avoid several Motion Estimation (ME) calls during the PU inter-prediction decision in order to reduce the execution time in the overall encoding process. The MVM algorithm is based on merging NxN PU partitions in order to compose larger ones. After the best PU partition is decided, ME is called to produce the best possible rate-distortion results for the selected partitions. The proposed method was implemented in the HM test model version 3.4 and provides an execution time reduction of up to 34% with insignificant rate-distortion losses (0.08 dB drop and 1.9% bitrate increase in the worst case). Besides, there is no related work in the literature that proposes PU-level decision optimizations. When compared with works that target CU-level fast decision methods, the MVM shows itself competitive, achieving results as good as those works.
{"title":"Motion Vectors Merging: Low Complexity Prediction Unit Decision Heuristic for the Inter-prediction of HEVC Encoders","authors":"F. Sampaio, S. Bampi, M. Grellert, L. Agostini, J. Mattos","doi":"10.1109/ICME.2012.37","DOIUrl":"https://doi.org/10.1109/ICME.2012.37","url":null,"abstract":"This paper presents the Motion Vectors Merging (MVM) heuristic, which is a method to reduce the HEVC inter-prediction complexity targeting the PU partition size decision. In the HM test model of the emerging HEVC standard, computational complexity is mostly concentrated in the inter-frame prediction step (up to 96% of the total encoder execution time, considering common test conditions). The goal of this work is to avoid several Motion Estimation (ME) calls during the PU inter-prediction decision in order to reduce the execution time in the overall encoding process. The MVM algorithm is based on merging NxN PU partitions in order to compose larger ones. After the best PU partition is decided, ME is called to produce the best possible rate-distortion results for the selected partitions. The proposed method was implemented in the HM test model version 3.4 and provides an execution time reduction of up to 34% with insignificant rate-distortion losses (0.08 dB drop and 1.9% bitrate increase in the worst case). Besides, there is no related work in the literature that proposes PU-level decision optimizations. When compared with works that target CU-level fast decision methods, the MVM shows itself competitive, achieving results as good as those works.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128446324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This time machine expert talk describes the recent comeback of acoustic pattern matching algorithms, such as DTW. These are particularly suited for applications where little (or no) transcribed training data is available.
{"title":"Expert Talk for Time Machine Session: Dynamic Time Warping New Youth","authors":"X. Anguera","doi":"10.1109/ICME.2012.108","DOIUrl":"https://doi.org/10.1109/ICME.2012.108","url":null,"abstract":"This time machine expert talk describes the recent comeback of acoustic pattern matching algorithms, such as DTW. These are particularly suited for applications where little (or no) transcribed training data is available.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128501127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a method for acquiring full spherical high dynamic range (HDR) images with no missing areas by using two omni directional cameras mounted on the top and bottom of an unmanned airship. The full spherical HDR images are generated by combining multiple omni directional images that are captured with different shutter speeds. The images generated are intended for uses in telepresence, augmented telepresence, and image-based lighting.
{"title":"Full Spherical High Dynamic Range Imaging from the Sky","authors":"Fumio Okura, M. Kanbara, N. Yokoya","doi":"10.1109/ICME.2012.120","DOIUrl":"https://doi.org/10.1109/ICME.2012.120","url":null,"abstract":"This paper describes a method for acquiring full spherical high dynamic range (HDR) images with no missing areas by using two omni directional cameras mounted on the top and bottom of an unmanned airship. The full spherical HDR images are generated by combining multiple omni directional images that are captured with different shutter speeds. The images generated are intended for uses in telepresence, augmented telepresence, and image-based lighting.","PeriodicalId":273567,"journal":{"name":"2012 IEEE International Conference on Multimedia and Expo","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129439693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}