Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177385
Hanli Wang, Ming Ma, Tao Tian
With the increasing popularity of social network, more and more people tend to store and transmit information in visual format, such as image and video. However, the cost of this convenience brings about a shock to traditional video servers and expose them under the risk of overloading. Among the huge amount of online videos, there are quite a number of Near-Duplicate Videos (NDVs). Although many works have been proposed to detect NDVs, few researches are investigated to compress these NDVs in a more effective way than independent compression. In this work, we utilize the data redundancy of NDVs and propose a video coding method to jointly compress NDVs. In order to employ the proposed video coding method, a number of pre-processing functions are designed to explore the correlation of visual information among NDVs and to suit the video coding requirements. Experimental results verify that the proposed video coding method is able to effectively compress NDVs and thus save video data storage.
{"title":"Effectively compressing Near-Duplicate Videos in a joint way","authors":"Hanli Wang, Ming Ma, Tao Tian","doi":"10.1109/ICME.2015.7177385","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177385","url":null,"abstract":"With the increasing popularity of social network, more and more people tend to store and transmit information in visual format, such as image and video. However, the cost of this convenience brings about a shock to traditional video servers and expose them under the risk of overloading. Among the huge amount of online videos, there are quite a number of Near-Duplicate Videos (NDVs). Although many works have been proposed to detect NDVs, few researches are investigated to compress these NDVs in a more effective way than independent compression. In this work, we utilize the data redundancy of NDVs and propose a video coding method to jointly compress NDVs. In order to employ the proposed video coding method, a number of pre-processing functions are designed to explore the correlation of visual information among NDVs and to suit the video coding requirements. Experimental results verify that the proposed video coding method is able to effectively compress NDVs and thus save video data storage.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129053286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177516
P. Ahammad, R. Gaunker, B. Kennedy, Mehrdad Reshadi, K. Kumar, A. K. Pathan, Hariharan Kolam
The advent of content-rich modern web applications, unreliable network connectivity and device heterogeneity demands flexible web content delivery platforms that can handle the high variability along many dimensions - especially for the mobile web. Images account for more than 60% of the content delivered by present-day webpages and have a strong influence on the perceived webpage latency and end-user experience. We present a flexible web delivery platform with a client-cloud architecture and content-aware optimizations to address the problem of delivering image-rich web applications. Our solution makes use of quantitative measures of image perceptual quality, machine learning algorithms, partial caching and opportunistic client-side choices to efficiently deliver images on the web. Using data from the WWW, we experimentally demonstrate that our approach shows significant improvement on various web performance criteria that are critical for maintaining a desirable end-user quality-of-experience (QoE) for image-rich web applications.
{"title":"A flexible platform for QoE-driven delivery of image-rich web applications","authors":"P. Ahammad, R. Gaunker, B. Kennedy, Mehrdad Reshadi, K. Kumar, A. K. Pathan, Hariharan Kolam","doi":"10.1109/ICME.2015.7177516","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177516","url":null,"abstract":"The advent of content-rich modern web applications, unreliable network connectivity and device heterogeneity demands flexible web content delivery platforms that can handle the high variability along many dimensions - especially for the mobile web. Images account for more than 60% of the content delivered by present-day webpages and have a strong influence on the perceived webpage latency and end-user experience. We present a flexible web delivery platform with a client-cloud architecture and content-aware optimizations to address the problem of delivering image-rich web applications. Our solution makes use of quantitative measures of image perceptual quality, machine learning algorithms, partial caching and opportunistic client-side choices to efficiently deliver images on the web. Using data from the WWW, we experimentally demonstrate that our approach shows significant improvement on various web performance criteria that are critical for maintaining a desirable end-user quality-of-experience (QoE) for image-rich web applications.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126722484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177378
Katrin Tonndorf, Christian Handschigl, Julian Windscheid, H. Kosch, M. Granitzer
The growing number of elderly people combined with financial cuts in the health care sector lead to an increased demand for computer supported medical services. New standards like HTML5 allow the creation of hypervideo training applications that run on a variety of end user devices. In this paper, we evaluate an HTML5 player running an e-health hypervideo for the support of pelvic floor exercises. In an experimental test setting we compared the hypervideo to a primarily linear version regarding usability and utilization for self-controlled training. Our results show the hypervideo version leads to slightly more usability problems but facilitated a more active and individual training.
{"title":"The effect of non-linear structures on the usage of hypervideo for physical training","authors":"Katrin Tonndorf, Christian Handschigl, Julian Windscheid, H. Kosch, M. Granitzer","doi":"10.1109/ICME.2015.7177378","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177378","url":null,"abstract":"The growing number of elderly people combined with financial cuts in the health care sector lead to an increased demand for computer supported medical services. New standards like HTML5 allow the creation of hypervideo training applications that run on a variety of end user devices. In this paper, we evaluate an HTML5 player running an e-health hypervideo for the support of pelvic floor exercises. In an experimental test setting we compared the hypervideo to a primarily linear version regarding usability and utilization for self-controlled training. Our results show the hypervideo version leads to slightly more usability problems but facilitated a more active and individual training.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121600421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177478
E. Bozkurt, E. Erzin, Y. Yemez
Speech and hand gestures form a composite communicative signal that boosts the naturalness and affectiveness of the communication. We present a multimodal framework for joint analysis of continuous affect, speech prosody and hand gestures towards automatic synthesis of realistic hand gestures from spontaneous speech using the hidden semi-Markov models (HSMMs). To the best of our knowledge, this is the first attempt for synthesizing hand gestures using continuous dimensional affect space, i.e., activation, valence, and dominance. We model relationships between acoustic features describing speech prosody and hand gestures with and without using the continuous affect information in speaker independent configurations and evaluate the multimodal analysis framework by generating hand gesture animations, also via objective evaluations. Our experimental studies are promising, conveying the role of affect for modeling the dynamics of speech-gesture relationship.
{"title":"Affect-expressive hand gestures synthesis and animation","authors":"E. Bozkurt, E. Erzin, Y. Yemez","doi":"10.1109/ICME.2015.7177478","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177478","url":null,"abstract":"Speech and hand gestures form a composite communicative signal that boosts the naturalness and affectiveness of the communication. We present a multimodal framework for joint analysis of continuous affect, speech prosody and hand gestures towards automatic synthesis of realistic hand gestures from spontaneous speech using the hidden semi-Markov models (HSMMs). To the best of our knowledge, this is the first attempt for synthesizing hand gestures using continuous dimensional affect space, i.e., activation, valence, and dominance. We model relationships between acoustic features describing speech prosody and hand gestures with and without using the continuous affect information in speaker independent configurations and evaluate the multimodal analysis framework by generating hand gesture animations, also via objective evaluations. Our experimental studies are promising, conveying the role of affect for modeling the dynamics of speech-gesture relationship.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"43 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114095616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177445
Shengxi Li, Mai Xu, Zulin Wang
In this paper, we propose a new method, namely recursive Taylor expansion (RTE) method, for optimally allocating bits to each LCU in the R-λ rate control scheme for HEVC. Specifically, we first set up an optimization formulation on optimal bit allocation. Unfortunately, it is intractable to achieve a closed-form solution for this formulation. We therefore propose a RTE solution to iteratively solve the formulation with a fast convergence speed. Then, an approximate closed-form solution can be obtained. This way, the optimal bit allocation can be achieved at little encoding complexity cost. Finally, the experimental results validate the effectiveness of our method in three aspects: compressed distortion, bit-rate control error, and bit fluctuation.
{"title":"A novel method on optimal bit allocation at LCU level for rate control in HEVC","authors":"Shengxi Li, Mai Xu, Zulin Wang","doi":"10.1109/ICME.2015.7177445","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177445","url":null,"abstract":"In this paper, we propose a new method, namely recursive Taylor expansion (RTE) method, for optimally allocating bits to each LCU in the R-λ rate control scheme for HEVC. Specifically, we first set up an optimization formulation on optimal bit allocation. Unfortunately, it is intractable to achieve a closed-form solution for this formulation. We therefore propose a RTE solution to iteratively solve the formulation with a fast convergence speed. Then, an approximate closed-form solution can be obtained. This way, the optimal bit allocation can be achieved at little encoding complexity cost. Finally, the experimental results validate the effectiveness of our method in three aspects: compressed distortion, bit-rate control error, and bit fluctuation.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122573562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177515
D. Souza, A. Ilic, N. Roma, L. Sousa
To satisfy the growing demands on real-time video decoders for high frame resolutions, novel GPU parallel algorithms are proposed herein for fully compliant HEVC de-quantization, inverse transform and intra prediction. The proposed algorithms are designed to fully exploit and leverage the fine grain parallelism within these computationally demanding and highly data dependent modules. Moreover, the proposed approaches allow the efficient utilization of the GPU computational resources, while carefully managing the data accesses in the complex GPU memory hierarchy. The experimental results show that the real-time processing is achieved for all tested sequences and the most demanding QP, while delivering average fps of 118.6, 89.2 and 49.7 for Full HD, 2160p and Ultra HD 4K sequences, respectively.
{"title":"Towards GPU HEVC intra decoding: Seizing fine-grain parallelism","authors":"D. Souza, A. Ilic, N. Roma, L. Sousa","doi":"10.1109/ICME.2015.7177515","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177515","url":null,"abstract":"To satisfy the growing demands on real-time video decoders for high frame resolutions, novel GPU parallel algorithms are proposed herein for fully compliant HEVC de-quantization, inverse transform and intra prediction. The proposed algorithms are designed to fully exploit and leverage the fine grain parallelism within these computationally demanding and highly data dependent modules. Moreover, the proposed approaches allow the efficient utilization of the GPU computational resources, while carefully managing the data accesses in the complex GPU memory hierarchy. The experimental results show that the real-time processing is achieved for all tested sequences and the most demanding QP, while delivering average fps of 118.6, 89.2 and 49.7 for Full HD, 2160p and Ultra HD 4K sequences, respectively.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129824292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177443
Ying Li, Anshul Sheopuri
This paper describes our latest work on assisting CPG (Consumer Packaged Goods) companies with their product packaging designs by providing color palettes that are visually appealing, novel and consistent with desired marketing messages for a particular brand and product. Specifically, we start by mining a large collections of images of different products and brands to learn about all the colors and color combinations that frequently appear among them. Meanwhile, a color-message graph is constructed to represent messages conveyed by different colors as well as to capture the interrelationship among them. Knowledge from both color psychology and information sources like Thesaurus are extensively exploited in this case. Now, given a particular product and brand to be designed for its packaging, along with the company's desired marketing message, we apply a computational method to generate quintillions of novel color palettes that can be used for the design. This process will leverage existing palettes used by same products of different brands or different products of the same brand, take in optional color preferences from users, identify then utilize the right colors to convey the desired marketing message. Finally, we rank the palettes based on assessment of their visual aesthetics, novelty and the way that different messages of the same palette interact with each other, so as to guide human designers to choose the right ones. Our initial demonstrations of this work to colleagues of subject matter have received very positive feedback. We are now exploring opportunities to collaborate with them to validate this technology in a controlled experimental setting.
{"title":"Creative design of color palettes for product packaging","authors":"Ying Li, Anshul Sheopuri","doi":"10.1109/ICME.2015.7177443","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177443","url":null,"abstract":"This paper describes our latest work on assisting CPG (Consumer Packaged Goods) companies with their product packaging designs by providing color palettes that are visually appealing, novel and consistent with desired marketing messages for a particular brand and product. Specifically, we start by mining a large collections of images of different products and brands to learn about all the colors and color combinations that frequently appear among them. Meanwhile, a color-message graph is constructed to represent messages conveyed by different colors as well as to capture the interrelationship among them. Knowledge from both color psychology and information sources like Thesaurus are extensively exploited in this case. Now, given a particular product and brand to be designed for its packaging, along with the company's desired marketing message, we apply a computational method to generate quintillions of novel color palettes that can be used for the design. This process will leverage existing palettes used by same products of different brands or different products of the same brand, take in optional color preferences from users, identify then utilize the right colors to convey the desired marketing message. Finally, we rank the palettes based on assessment of their visual aesthetics, novelty and the way that different messages of the same palette interact with each other, so as to guide human designers to choose the right ones. Our initial demonstrations of this work to colleagues of subject matter have received very positive feedback. We are now exploring opportunities to collaborate with them to validate this technology in a controlled experimental setting.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127689348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177504
M. Soleymani, Anna Aljanaki, F. Wiering, R. Veltkamp
The cold start problem for new users or items is a great challenge for recommender systems. New items can be positioned within the existing items using a similarity metric to estimate their ratings. However, the calculation of similarity varies by domain and available resources. In this paper, we propose a content-based music recommender system which is based on a set of attributes derived from psychological studies of music preference. These five attributes, namely, Mellow, Unpretentious, Sophisticated, Intense and Contemporary (MUSIC), better describe the underlying factors of music preference compared to music genre. Using 249 songs and hundreds of ratings and attribute scores, we first develop an acoustic content-based attribute detection using auditory modulation features and a regression by sparse representation. We then use the estimated attributes in a cold start recommendation scenario. The proposed content-based recommendation significantly outperforms genre-based and user-based recommendation based on the root-mean-square error. The results demonstrate the effectiveness of these attributes in music preference estimation. Such methods will increase the chance of less popular but interesting songs in the long tail to be listened to.
{"title":"Content-based music recommendation using underlying music preference structure","authors":"M. Soleymani, Anna Aljanaki, F. Wiering, R. Veltkamp","doi":"10.1109/ICME.2015.7177504","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177504","url":null,"abstract":"The cold start problem for new users or items is a great challenge for recommender systems. New items can be positioned within the existing items using a similarity metric to estimate their ratings. However, the calculation of similarity varies by domain and available resources. In this paper, we propose a content-based music recommender system which is based on a set of attributes derived from psychological studies of music preference. These five attributes, namely, Mellow, Unpretentious, Sophisticated, Intense and Contemporary (MUSIC), better describe the underlying factors of music preference compared to music genre. Using 249 songs and hundreds of ratings and attribute scores, we first develop an acoustic content-based attribute detection using auditory modulation features and a regression by sparse representation. We then use the estimated attributes in a cold start recommendation scenario. The proposed content-based recommendation significantly outperforms genre-based and user-based recommendation based on the root-mean-square error. The results demonstrate the effectiveness of these attributes in music preference estimation. Such methods will increase the chance of less popular but interesting songs in the long tail to be listened to.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"297 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120881752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177424
Jongyoo Kim, Junghwan Kim, Woojae Kim, Jisoo Lee, Sanghoon Lee
For high bit rate video, it is important to acquire the video contents with high resolution, the quality of which may be degraded due to the motion blur from the movement of an object(s) or the camera. However, conventional sharpness assessments are designed to find focal blur caused either by defocusing or by compression distortion targeted for low bit rates. To overcome this limitation, we present a no-reference framework of a visual sharpness assessment (VSA) for high-resolution video based on the motion and scene classification. In the proposed framework, the accuracy of the sharpness estimation can be improved via pooling weighted by the visual perception from the object and camera movements and by the strong influence from the region with the highest sharpness. Based on the motion blur characteristics, the variance and the contrast over the spectral domain are used to quantify the perceived sharpness. Moreover, for the VSA, we extract the highly influential sharper regions and emphasize them by utilizing the scene adaptive pooling.
{"title":"Video sharpness prediction based on motion blur analysis","authors":"Jongyoo Kim, Junghwan Kim, Woojae Kim, Jisoo Lee, Sanghoon Lee","doi":"10.1109/ICME.2015.7177424","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177424","url":null,"abstract":"For high bit rate video, it is important to acquire the video contents with high resolution, the quality of which may be degraded due to the motion blur from the movement of an object(s) or the camera. However, conventional sharpness assessments are designed to find focal blur caused either by defocusing or by compression distortion targeted for low bit rates. To overcome this limitation, we present a no-reference framework of a visual sharpness assessment (VSA) for high-resolution video based on the motion and scene classification. In the proposed framework, the accuracy of the sharpness estimation can be improved via pooling weighted by the visual perception from the object and camera movements and by the strong influence from the region with the highest sharpness. Based on the motion blur characteristics, the variance and the contrast over the spectral domain are used to quantify the perceived sharpness. Moreover, for the VSA, we extract the highly influential sharper regions and emphasize them by utilizing the scene adaptive pooling.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127840632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-08-06DOI: 10.1109/ICME.2015.7177379
S. Milani, P. Zanuttigh
This paper proposes a novel scheme for the joint compression of photo collections framing the same object or scene. The proposed approach starts by locating corresponding features in the various images and then exploits a Structure from Motion algorithm to estimate the geometric relationships between the various images and their viewpoints. Then it uses 3D information and warping to predict images one from the other. Furthermore, graph algorithms are used to compute minimum weight topologies and identify the ordering of the input images that maximizes the efficiency of prediction. The obtained data is fed to a modified HEVC coder to perform the compression. Experimental results show that the proposed scheme outperforms competing solutions and can be efficiently employed for the storage of large image collections in the virtual exploration of architectural landmarks or in photo sharing websites.
本文提出了一种针对同一物体或场景的照片集联合压缩的新方案。该方法首先在各种图像中定位相应的特征,然后利用Structure from Motion算法来估计各种图像及其视点之间的几何关系。然后,它使用3D信息和变形来预测不同的图像。此外,图算法用于计算最小权重拓扑和识别输入图像的排序,以最大限度地提高预测效率。得到的数据被馈送到修改后的HEVC编码器执行压缩。实验结果表明,该方案优于其他方案,可以有效地用于建筑地标虚拟探索或照片共享网站中大型图像集合的存储。
{"title":"Compression of photo collections using geometrical information","authors":"S. Milani, P. Zanuttigh","doi":"10.1109/ICME.2015.7177379","DOIUrl":"https://doi.org/10.1109/ICME.2015.7177379","url":null,"abstract":"This paper proposes a novel scheme for the joint compression of photo collections framing the same object or scene. The proposed approach starts by locating corresponding features in the various images and then exploits a Structure from Motion algorithm to estimate the geometric relationships between the various images and their viewpoints. Then it uses 3D information and warping to predict images one from the other. Furthermore, graph algorithms are used to compute minimum weight topologies and identify the ordering of the input images that maximizes the efficiency of prediction. The obtained data is fed to a modified HEVC coder to perform the compression. Experimental results show that the proposed scheme outperforms competing solutions and can be efficiently employed for the storage of large image collections in the virtual exploration of architectural landmarks or in photo sharing websites.","PeriodicalId":146271,"journal":{"name":"2015 IEEE International Conference on Multimedia and Expo (ICME)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127536551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}