Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665176
Arun Kumar, A. Makur
Compression of encrypted data is possible by using distributed source coding. In this paper, we consider the encryption, followed by lossless compression of gray scale and color images. We propose to apply encryption on the prediction errors instead of directly applying on the images and use distributed source coding for compressing the cipher texts. The simulation results show that by using the proposed technique comparable compression gains, with compression ratios varying from 1.5 to 2.5 can be achieved despite encryption.
{"title":"Distributed source coding based encryption and lossless compression of gray scale and color images","authors":"Arun Kumar, A. Makur","doi":"10.1109/MMSP.2008.4665176","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665176","url":null,"abstract":"Compression of encrypted data is possible by using distributed source coding. In this paper, we consider the encryption, followed by lossless compression of gray scale and color images. We propose to apply encryption on the prediction errors instead of directly applying on the images and use distributed source coding for compressing the cipher texts. The simulation results show that by using the proposed technique comparable compression gains, with compression ratios varying from 1.5 to 2.5 can be achieved despite encryption.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114691319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665120
Liangjun Wang, Xiaolin Wu, Guangming Shi
A new multiple description coding (MDC) approach is proposed based on the theory of compressive sensing (CS). The CS theory allows a signal to be reconstructed from a small number of its random measurements if the signal is sparse in some space. An attractive property of CS for MDC applications is that the reconstruction error only depends on the number but not on which of the transmitted measurements that are received. By treating each CS measurement as a description, we have a balanced MDC scheme with fine description granularity and low encoding complexity. Another advantage of the new MDC approach is that all signals can be coded the same but decoded in different spaces for better sparse reconstruction.
{"title":"A compressive sensing approach of multiple descriptions for network multimedia communication","authors":"Liangjun Wang, Xiaolin Wu, Guangming Shi","doi":"10.1109/MMSP.2008.4665120","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665120","url":null,"abstract":"A new multiple description coding (MDC) approach is proposed based on the theory of compressive sensing (CS). The CS theory allows a signal to be reconstructed from a small number of its random measurements if the signal is sparse in some space. An attractive property of CS for MDC applications is that the reconstruction error only depends on the number but not on which of the transmitted measurements that are received. By treating each CS measurement as a description, we have a balanced MDC scheme with fine description granularity and low encoding complexity. Another advantage of the new MDC approach is that all signals can be coded the same but decoded in different spaces for better sparse reconstruction.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114907103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665045
K. Müller, A. Smolic, K. Dix, P. Kauff, T. Wiegand
In this paper, a system for video rendering on multiscopic 3D displays is considered where the data is represented as layered depth video (LDV). This representation consists of one full or central video with associated per-pixel depth and additional residual layers. Thus, only one full view with additional residual data needs to be transmitted. The LDV data is used at the receiver to generate all intermediate views for the display. The paper presents the LDV layer extraction as well as the view synthesis, using a scene reliability-driven approach. Here, unreliable image regions are detected and in contrast to previous approaches the residual data is enlarged to reduce artifacts in unreliable areas during rendering. To provide maximum data coverage, the residual data remains at its original positions and will not be projected towards the central view. The view synthesis process also uses this reliability analysis to provide higher quality intermediate views than previous approaches. As a final result, high quality intermediate views for an existing 9-view auto-stereoscopic display are presented, which prove the suitability of the LDV approach for advanced 3D video (3DV) systems.
{"title":"Reliability-based generation and view synthesis in layered depth video","authors":"K. Müller, A. Smolic, K. Dix, P. Kauff, T. Wiegand","doi":"10.1109/MMSP.2008.4665045","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665045","url":null,"abstract":"In this paper, a system for video rendering on multiscopic 3D displays is considered where the data is represented as layered depth video (LDV). This representation consists of one full or central video with associated per-pixel depth and additional residual layers. Thus, only one full view with additional residual data needs to be transmitted. The LDV data is used at the receiver to generate all intermediate views for the display. The paper presents the LDV layer extraction as well as the view synthesis, using a scene reliability-driven approach. Here, unreliable image regions are detected and in contrast to previous approaches the residual data is enlarged to reduce artifacts in unreliable areas during rendering. To provide maximum data coverage, the residual data remains at its original positions and will not be projected towards the central view. The view synthesis process also uses this reliability analysis to provide higher quality intermediate views than previous approaches. As a final result, high quality intermediate views for an existing 9-view auto-stereoscopic display are presented, which prove the suitability of the LDV approach for advanced 3D video (3DV) systems.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117128538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665213
J. Zdánský, J. Chaloupka, J. Nouza
In the paper we present a complex platform for automatic processing of Czech TV news programmes. Its audio processing module provides text transcription in form of metadata that contain information about spoken content, speaker identities, used pronunciation, word positions and intonation. The video processing module provides pictures representing individual video scenes and information about detected and possibly recognized human faces. The audio and video data are merged into single XML files that are indexed and stored in a searchable database. A simple Web-based search engine can be used to retrieve information from the database that recently contain more than 1800 hours of transcribed programmes from Czech CT24 station.
{"title":"Joint audio-visual processing, representation and indexing of TV news programmes","authors":"J. Zdánský, J. Chaloupka, J. Nouza","doi":"10.1109/MMSP.2008.4665213","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665213","url":null,"abstract":"In the paper we present a complex platform for automatic processing of Czech TV news programmes. Its audio processing module provides text transcription in form of metadata that contain information about spoken content, speaker identities, used pronunciation, word positions and intonation. The video processing module provides pictures representing individual video scenes and information about detected and possibly recognized human faces. The audio and video data are merged into single XML files that are indexed and stored in a searchable database. A simple Web-based search engine can be used to retrieve information from the database that recently contain more than 1800 hours of transcribed programmes from Czech CT24 station.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125978980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665128
D. Cernea, A. Munteanu, J. Cornelis, P. Schelkens
This paper investigates the novel concept of local error control in arbitrary mesh encoding, and proposes a new L-infinite mesh coding approach implementing this concept. In contrast to traditional mesh coding systems that use the mean-square error as distortion measure, the proposed approach employs the L-infinite distortion as target distortion metric. In this context, a novel wavelet-based L-infinite-constrained coding approach for meshes is proposed, which ensures that the maximum local error between the original and decoded meshes is lower than a given upper-bound. Additionally, the proposed system achieves scalability in L-infinite sense, that is, the L-infinite distortion upper-bound can be accurately estimated when decoding any layer from the input stream. Moreover, a distortion estimation approach is proposed, expressing the L-infinite distortion in the spatial domain as a statistical estimate of quantization errors produced in the wavelet domain. An instantiation of the proposed L-infinite coding approach is demonstrated for MESHGRID, which is a scalable 3D object coding system, part of MPEG-4 AFX. The proposed L-infinite coding approach guarantees that the maximum error is upper-bounded, it enables a fast real-time implementation of the rate-allocation, and it preserves all the scalability features and animation capabilities of the employed scalable mesh codec.
{"title":"Statistical L-infinite distortion estimation in scalable coding of meshes","authors":"D. Cernea, A. Munteanu, J. Cornelis, P. Schelkens","doi":"10.1109/MMSP.2008.4665128","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665128","url":null,"abstract":"This paper investigates the novel concept of local error control in arbitrary mesh encoding, and proposes a new L-infinite mesh coding approach implementing this concept. In contrast to traditional mesh coding systems that use the mean-square error as distortion measure, the proposed approach employs the L-infinite distortion as target distortion metric. In this context, a novel wavelet-based L-infinite-constrained coding approach for meshes is proposed, which ensures that the maximum local error between the original and decoded meshes is lower than a given upper-bound. Additionally, the proposed system achieves scalability in L-infinite sense, that is, the L-infinite distortion upper-bound can be accurately estimated when decoding any layer from the input stream. Moreover, a distortion estimation approach is proposed, expressing the L-infinite distortion in the spatial domain as a statistical estimate of quantization errors produced in the wavelet domain. An instantiation of the proposed L-infinite coding approach is demonstrated for MESHGRID, which is a scalable 3D object coding system, part of MPEG-4 AFX. The proposed L-infinite coding approach guarantees that the maximum error is upper-bounded, it enables a fast real-time implementation of the rate-allocation, and it preserves all the scalability features and animation capabilities of the employed scalable mesh codec.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128683024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665187
V. Jones, R. J. I. Veld, T. Tönis, R. Bults, B. Beijnum, I. Widya, M. Vollenbroek-Hutten, H. Hermens
We are investigating the use of body area networks (BANs), wearable sensors and wireless communications for measuring, processing, transmission, interpretation and display of biosignals. The goal is to provide telemonitoring and teletreatment services for patients. The remote health professional can view a multimedia display which includes graphical and numerical representation of patientspsila biosignals. Addition of feedback-control enables teletreatment services; teletreatment can be delivered to the patient via multiple modalities including tactile, text, auditory and visual. We describe the health BAN and a generic mobile health service platform and two context aware applications. The epilepsy application illustrates processing and interpretation of multi-source, multimedia BAN data. The chronic pain application illustrates multi-modal feedback and treatment, with patients able to view their own biosignals on their handheld device.
{"title":"Biosignal and context monitoring: Distributed multimedia applications of Body Area Networks in healthcare","authors":"V. Jones, R. J. I. Veld, T. Tönis, R. Bults, B. Beijnum, I. Widya, M. Vollenbroek-Hutten, H. Hermens","doi":"10.1109/MMSP.2008.4665187","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665187","url":null,"abstract":"We are investigating the use of body area networks (BANs), wearable sensors and wireless communications for measuring, processing, transmission, interpretation and display of biosignals. The goal is to provide telemonitoring and teletreatment services for patients. The remote health professional can view a multimedia display which includes graphical and numerical representation of patientspsila biosignals. Addition of feedback-control enables teletreatment services; teletreatment can be delivered to the patient via multiple modalities including tactile, text, auditory and visual. We describe the health BAN and a generic mobile health service platform and two context aware applications. The epilepsy application illustrates processing and interpretation of multi-source, multimedia BAN data. The chronic pain application illustrates multi-modal feedback and treatment, with patients able to view their own biosignals on their handheld device.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127524951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665166
Duy-Dinh Le, S. Satoh, T. Ngo, D. Duong
Video shot boundary detection is one of the fundamental tasks of video indexing and retrieval applications. Although many methods have been proposed for this task, finding a general and robust shot boundary method that is able to handle the various transition types caused by photo flashes, rapid camera movement and object movement is still challenging. We present a novel approach for detecting video shot boundaries in which we cast the problem of shot boundary detection into the problem of text segmentation in natural language processing. This is possible by assuming that each frame is a word and then the shot boundaries are treated as text segment boundaries (e.g. topics). The text segmentation based approaches in natural language processing can be used. The experimental results from various long video sequences have proved the effectiveness of our approach.
{"title":"A text segmentation based approach to video shot boundary detection","authors":"Duy-Dinh Le, S. Satoh, T. Ngo, D. Duong","doi":"10.1109/MMSP.2008.4665166","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665166","url":null,"abstract":"Video shot boundary detection is one of the fundamental tasks of video indexing and retrieval applications. Although many methods have been proposed for this task, finding a general and robust shot boundary method that is able to handle the various transition types caused by photo flashes, rapid camera movement and object movement is still challenging. We present a novel approach for detecting video shot boundaries in which we cast the problem of shot boundary detection into the problem of text segmentation in natural language processing. This is possible by assuming that each frame is a word and then the shot boundaries are treated as text segment boundaries (e.g. topics). The text segmentation based approaches in natural language processing can be used. The experimental results from various long video sequences have proved the effectiveness of our approach.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127535098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665105
D. Varodayan, David M. Chen, B. Girod
We consider a new problem in network image coding for multicast. In a multihop mesh network, structured as a directed graph, all nodes decode and display reconstructions of the image (at possibly different qualities). Each node may also perform transcoding before transmitting data downstream in the network. The problem is the design of the coding and transcoding schemes to deliver the best image quality over the network. For a network with diamond topology, we show that multiple description coding combined with Wyner-Ziv transcoding is often superior to other methods. We argue further that the benefits are magnified for larger networks containing one or more diamond subnets. Our image coding experiments demonstrate that multiple description coding with Wyner-Ziv transcoding outperforms single description coding or multiple description coding with conventional transcoding, for both a diamond network and a two-hop mesh network with four branches.
{"title":"Network image coding for multicast","authors":"D. Varodayan, David M. Chen, B. Girod","doi":"10.1109/MMSP.2008.4665105","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665105","url":null,"abstract":"We consider a new problem in network image coding for multicast. In a multihop mesh network, structured as a directed graph, all nodes decode and display reconstructions of the image (at possibly different qualities). Each node may also perform transcoding before transmitting data downstream in the network. The problem is the design of the coding and transcoding schemes to deliver the best image quality over the network. For a network with diamond topology, we show that multiple description coding combined with Wyner-Ziv transcoding is often superior to other methods. We argue further that the benefits are magnified for larger networks containing one or more diamond subnets. Our image coding experiments demonstrate that multiple description coding with Wyner-Ziv transcoding outperforms single description coding or multiple description coding with conventional transcoding, for both a diamond network and a two-hop mesh network with four branches.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127199640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665071
Jie Xiang Yang, H. Wu
The H.264/AVC coding standard reduces the blocking artifact by applying a spatial loop-filter in the encoder and in the decoder. However, the temporal fluctuation or flickering artifact is still noticeable between intra-coded frames or between an intra (I) frame and the preceding or subsequent inter prediction (P) frames. This paper proposes a non-linear temporal filter to reduce the flickering artifact and preserve image sharpness of the reconstructed video, by using a robust prior model. Performance of the flickering reduction with proposed filter is evaluated by a temporal metric, namely the sum of square difference (SSD), and the traditional measure, the peak signal to noise ratio (PSNR).
{"title":"A non-linear post filtering method for flicker reduction in H.264/AVC coded video sequences","authors":"Jie Xiang Yang, H. Wu","doi":"10.1109/MMSP.2008.4665071","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665071","url":null,"abstract":"The H.264/AVC coding standard reduces the blocking artifact by applying a spatial loop-filter in the encoder and in the decoder. However, the temporal fluctuation or flickering artifact is still noticeable between intra-coded frames or between an intra (I) frame and the preceding or subsequent inter prediction (P) frames. This paper proposes a non-linear temporal filter to reduce the flickering artifact and preserve image sharpness of the reconstructed video, by using a robust prior model. Performance of the flickering reduction with proposed filter is evaluated by a temporal metric, namely the sum of square difference (SSD), and the traditional measure, the peak signal to noise ratio (PSNR).","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129916675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2008-11-05DOI: 10.1109/MMSP.2008.4665050
D. Farin, M. Haller, A. Krutz, T. Sikora
The composition of panoramic images has recently received considerable attention. While panoramic images were first used mainly as a flexible visualization technique, they also found application in video coding, video enhancement, format conversion, and content analysis. The topic has enlarged and diverged into many specialized research directions, which makes it difficult to stay in touch with recent developments. This paper intends to give an overview of the current state of research, including recent developments. Two of the applications of sprite coding and global-motion estimation are presented in more detail to provide some insights into the system aspects.
{"title":"Recent developments in panoramic image generation and sprite coding","authors":"D. Farin, M. Haller, A. Krutz, T. Sikora","doi":"10.1109/MMSP.2008.4665050","DOIUrl":"https://doi.org/10.1109/MMSP.2008.4665050","url":null,"abstract":"The composition of panoramic images has recently received considerable attention. While panoramic images were first used mainly as a flexible visualization technique, they also found application in video coding, video enhancement, format conversion, and content analysis. The topic has enlarged and diverged into many specialized research directions, which makes it difficult to stay in touch with recent developments. This paper intends to give an overview of the current state of research, including recent developments. Two of the applications of sprite coding and global-motion estimation are presented in more detail to provide some insights into the system aspects.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129957635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}