Pub Date : 2005-07-25DOI: 10.1109/ICME.2005.1521508
Hua Cai, Jiang Li
With the rapid development of digital technology in consumer electronics, the demand to preserve raw image data for further editing or repeated compression is increasing. Traditional lossless image coders usually consist of computationally intensive modeling and entropy coding phases, therefore might not be suitable to mobile devices or scenarios with a strict real-time requirement. This paper presents a new image coding algorithm based on a simple architecture that is easy to model and encode the residual samples. In the proposed algorithm, each residual sample is separated into three parts: (1) a sign value, (2) a magnitude value, and (3) a magnitude level. A tree structure is then used to organize the magnitude levels. By simply coding the tree and the other two parts without any complicated modeling and entropy coding, good performance can be achieved with very low computational cost in the binary-uncoded mode. Moreover, with the aid of context-based arithmetic coding, the magnitude values are further compressed in the arithmetic-coded mode. This gives close performance to JPEG-LS and JPEG2000.
{"title":"Lossless image compression with tree coding of magnitude levels","authors":"Hua Cai, Jiang Li","doi":"10.1109/ICME.2005.1521508","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521508","url":null,"abstract":"With the rapid development of digital technology in consumer electronics, the demand to preserve raw image data for further editing or repeated compression is increasing. Traditional lossless image coders usually consist of computationally intensive modeling and entropy coding phases, therefore might not be suitable to mobile devices or scenarios with a strict real-time requirement. This paper presents a new image coding algorithm based on a simple architecture that is easy to model and encode the residual samples. In the proposed algorithm, each residual sample is separated into three parts: (1) a sign value, (2) a magnitude value, and (3) a magnitude level. A tree structure is then used to organize the magnitude levels. By simply coding the tree and the other two parts without any complicated modeling and entropy coding, good performance can be achieved with very low computational cost in the binary-uncoded mode. Moreover, with the aid of context-based arithmetic coding, the magnitude values are further compressed in the arithmetic-coded mode. This gives close performance to JPEG-LS and JPEG2000.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130887935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521444
Lluis Barceló, R. L. Felip, Xavier Binefa
Dominant motion estimation in video sequence is a task that must be often be solved in computer vision problems but involves a high computational cost due to the overwhelming amount of data to be treated when working in image domain. In this paper we introduce a novel technique to perform motion analysis in video sequences taking advantage of the motion information of MPEG streams and its structure, using imaginary line tracking and robust statistics to overcome the noise present in compressed domain information. In order to demonstrate the reliability of our new approach, we also show the results of its application to mosaic image construction problem.
{"title":"A new approach for real time motion imation using robust statistics and MPEG domain applied to mosaic images construction","authors":"Lluis Barceló, R. L. Felip, Xavier Binefa","doi":"10.1109/ICME.2005.1521444","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521444","url":null,"abstract":"Dominant motion estimation in video sequence is a task that must be often be solved in computer vision problems but involves a high computational cost due to the overwhelming amount of data to be treated when working in image domain. In this paper we introduce a novel technique to perform motion analysis in video sequences taking advantage of the motion information of MPEG streams and its structure, using imaginary line tracking and robust statistics to overcome the noise present in compressed domain information. In order to demonstrate the reliability of our new approach, we also show the results of its application to mosaic image construction problem.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115433045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521464
S. Kopf, T. Haenselmann, W. Effelsberg
The curvature scale space (CSS) technique, which is also part of the MPEG-7 standard, is a robust method to describe complex shapes. The central idea is to analyze the curvature of a shape and derive features from inflection points. A major drawback of the CSS method is its poor representation of convex segments: Convex objects cannot be represented at all due to missing inflection points. We have extended the CSS approach to generate feature points for concave and convex segments of a shape. This generic approach is applicable to arbitrary objects. In the experimental results, we evaluate as a comprehensive example the automatic recognition of characters in images and videos.
{"title":"Enhancing curvature scale space features for robust shape classification","authors":"S. Kopf, T. Haenselmann, W. Effelsberg","doi":"10.1109/ICME.2005.1521464","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521464","url":null,"abstract":"The curvature scale space (CSS) technique, which is also part of the MPEG-7 standard, is a robust method to describe complex shapes. The central idea is to analyze the curvature of a shape and derive features from inflection points. A major drawback of the CSS method is its poor representation of convex segments: Convex objects cannot be represented at all due to missing inflection points. We have extended the CSS approach to generate feature points for concave and convex segments of a shape. This generic approach is applicable to arbitrary objects. In the experimental results, we evaluate as a comprehensive example the automatic recognition of characters in images and videos.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124885225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521710
Dihua Guo, V. Atluri, N. Adam
Typically, high-resolution remote sensing (HRRS) images contain a high level noise as well as possess different texture scales. As a result, existing image segmentation approaches are not suitable to HRRS imagery. In this paper, we have presented an unsupervised texture-based segmentation algorithm suitable for HRRS images, by extending the local binary pattern texture features and the lossless wavelet transform. Our experimental results using USGS 1 ft orthoimagery show a significant improvement over the previously proposed LBP approach
{"title":"Texture-Based Remote-Sensing Image Segmentation","authors":"Dihua Guo, V. Atluri, N. Adam","doi":"10.1109/ICME.2005.1521710","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521710","url":null,"abstract":"Typically, high-resolution remote sensing (HRRS) images contain a high level noise as well as possess different texture scales. As a result, existing image segmentation approaches are not suitable to HRRS imagery. In this paper, we have presented an unsupervised texture-based segmentation algorithm suitable for HRRS images, by extending the local binary pattern texture features and the lossless wavelet transform. Our experimental results using USGS 1 ft orthoimagery show a significant improvement over the previously proposed LBP approach","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125812282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521656
W. Lie, Sheng-Hsiung Shia
In baseball game, an event is defined as the portion of video clip between two pitches, and a play is defined as a batter finishing his plate appearance. A play is a concatenation of many events, and a baseball game is formed by a series of plays. In this paper, only the event happened in the last pitch of a plate appearance is detected. It is then semantically classified to represent the corresponding play by using an algorithm integrating caption rule-inference and visual feature analysis. Our proposed system is capable of classifying each baseball play into eleven semantic categories, which are popular and familiar to most of the audiences. In an experiment of 260 testing plays, the classification rate achieves up to 87%
{"title":"Combining Caption and Visual Features for Semantic Event Classification of Baseball Video","authors":"W. Lie, Sheng-Hsiung Shia","doi":"10.1109/ICME.2005.1521656","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521656","url":null,"abstract":"In baseball game, an event is defined as the portion of video clip between two pitches, and a play is defined as a batter finishing his plate appearance. A play is a concatenation of many events, and a baseball game is formed by a series of plays. In this paper, only the event happened in the last pitch of a plate appearance is detected. It is then semantically classified to represent the corresponding play by using an algorithm integrating caption rule-inference and visual feature analysis. Our proposed system is capable of classifying each baseball play into eleven semantic categories, which are popular and familiar to most of the audiences. In an experiment of 260 testing plays, the classification rate achieves up to 87%","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125277123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521561
T. Schierl, M. Kampmann, T. Wiegand
We present a streaming system that utilizes interleaved transmission for real-time H.264/AVC video in 3G wireless environments with benefits shown especially in the presence of link outages. In the 3GPP packet-switched streaming Service Rel. 6, H.264/AVC and its RTP payload format are specified. The RTP payload format allows interleaved transmission of NAL units of H.264/AVC. Our simulations also include audio into the interleaving framework and are conducted within a testbed that emulates a 3G network including block error rates on the physical layer, a buffer for retransmission on the link layer for different error rates, and link outages. The experimental results demonstrate the superior performance of interleaving for typical link outage settings.
{"title":"H.264/AVC interleaving for 3G wireless video streaming","authors":"T. Schierl, M. Kampmann, T. Wiegand","doi":"10.1109/ICME.2005.1521561","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521561","url":null,"abstract":"We present a streaming system that utilizes interleaved transmission for real-time H.264/AVC video in 3G wireless environments with benefits shown especially in the presence of link outages. In the 3GPP packet-switched streaming Service Rel. 6, H.264/AVC and its RTP payload format are specified. The RTP payload format allows interleaved transmission of NAL units of H.264/AVC. Our simulations also include audio into the interleaving framework and are conducted within a testbed that emulates a 3G network including block error rates on the physical layer, a buffer for retransmission on the link layer for different error rates, and link outages. The experimental results demonstrate the superior performance of interleaving for typical link outage settings.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126773582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521498
Meng Zhang, Yun Tang, Li Zhao, Jian-Guang Luo, Shiqiang Yang
We present a novel single source peer-to-peer multicast architecture called GridMedia which mainly consists of 1) multi-sender based overlay multicast protocol (MSOMP) and 2) multi-sender based redundancy-retransmitting algorithm (MSRRA). The MSOMP deploys mesh-based two-layer structure and groups all the peers into clusters with multiple distinct paths from the source root to each peer. To address the problem of long burst packet loss, the MSRRA is proposed at the sender peers to patch the lost packets by using receiver peer loss pattern prediction. Consequently, GridMedia provides a scalable and reliable video streaming system for a large and highly dynamic population of end hosts, and ensures the quality of service in terms of continuous playback, bandwidth demanding and low latency. A real experimental system based on GridMedia architecture has been constructed over CERNET and broadcasting TV programs for seven months. More than 140,000 end users have been attracted with almost 600 simultaneously being online at Aug 2004 during Athens Olympic Games
{"title":"Gridmedia: A Multi-Sender Based Peer-to-Peer Multicast System for Video Streaming","authors":"Meng Zhang, Yun Tang, Li Zhao, Jian-Guang Luo, Shiqiang Yang","doi":"10.1109/ICME.2005.1521498","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521498","url":null,"abstract":"We present a novel single source peer-to-peer multicast architecture called GridMedia which mainly consists of 1) multi-sender based overlay multicast protocol (MSOMP) and 2) multi-sender based redundancy-retransmitting algorithm (MSRRA). The MSOMP deploys mesh-based two-layer structure and groups all the peers into clusters with multiple distinct paths from the source root to each peer. To address the problem of long burst packet loss, the MSRRA is proposed at the sender peers to patch the lost packets by using receiver peer loss pattern prediction. Consequently, GridMedia provides a scalable and reliable video streaming system for a large and highly dynamic population of end hosts, and ensures the quality of service in terms of continuous playback, bandwidth demanding and low latency. A real experimental system based on GridMedia architecture has been constructed over CERNET and broadcasting TV programs for seven months. More than 140,000 end users have been attracted with almost 600 simultaneously being online at Aug 2004 during Athens Olympic Games","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115333653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521408
T. Chong, O. Au, Wing-San Chau, Tai-Wai Chan
In this paper, we propose a multiple objective frame rate up conversion algorithm (MOFRUC), which utilizes two different models. The first model is a constant velocity model that assumes the objects position is a linear function of time. The second model exploits the spatial correlation between neighboring blocks, and assumes the pixel intensity is highly correlated in a small local area. In this model, the perceptual quality of interpolated frame is also taken into account and the blocking artifact is minimized. Our proposed MOFRUC estimates the motion trajectory by the first model and interpolate the frame along the motion trajectory. At the same time, the algorithm refines the motion trajectory by maximizing a spatial correlation measurement defined in the second model and interpolates the frame with minimum blocking artifact. Simulation results show that our proposed MOFRUC outperforms other existing algorithms and produces high quality interpolated frame.
{"title":"Multiple Objective Frame Rate Up Conversion","authors":"T. Chong, O. Au, Wing-San Chau, Tai-Wai Chan","doi":"10.1109/ICME.2005.1521408","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521408","url":null,"abstract":"In this paper, we propose a multiple objective frame rate up conversion algorithm (MOFRUC), which utilizes two different models. The first model is a constant velocity model that assumes the objects position is a linear function of time. The second model exploits the spatial correlation between neighboring blocks, and assumes the pixel intensity is highly correlated in a small local area. In this model, the perceptual quality of interpolated frame is also taken into account and the blocking artifact is minimized. Our proposed MOFRUC estimates the motion trajectory by the first model and interpolate the frame along the motion trajectory. At the same time, the algorithm refines the motion trajectory by maximizing a spatial correlation measurement defined in the second model and interpolates the frame with minimum blocking artifact. Simulation results show that our proposed MOFRUC outperforms other existing algorithms and produces high quality interpolated frame.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115530585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521602
Micha Haas, Ard A. J. Oerlemans, M. Lew
In the current state-of-the-art in multimedia content analysis (MCA), the fundamental techniques are typically derived from core pattern recognition and computer vision algorithms. It is well known that completely automatic pattern recognition and computer vision approaches have not been successful in being robust and domain independent so we should not expect more from MCA algorithms. The exception to this would naturally be methods which are human-interactive or not automatic. In this paper, we describe some of the recent work we have done in multimedia content analysis across multiple domains where the fundamental technique is founded in interactive search. Our novel algorithm integrates our previous work from wavelet based salient points and genetic algorithms and shows that the main contribution and improvement is from the user feedback provided by the interactive search
{"title":"Relevance Feedback Methods in Content Based Retrieval and Video Summarization","authors":"Micha Haas, Ard A. J. Oerlemans, M. Lew","doi":"10.1109/ICME.2005.1521602","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521602","url":null,"abstract":"In the current state-of-the-art in multimedia content analysis (MCA), the fundamental techniques are typically derived from core pattern recognition and computer vision algorithms. It is well known that completely automatic pattern recognition and computer vision approaches have not been successful in being robust and domain independent so we should not expect more from MCA algorithms. The exception to this would naturally be methods which are human-interactive or not automatic. In this paper, we describe some of the recent work we have done in multimedia content analysis across multiple domains where the fundamental technique is founded in interactive search. Our novel algorithm integrates our previous work from wavelet based salient points and genetic algorithms and shows that the main contribution and improvement is from the user feedback provided by the interactive search","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122732418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521422
Wei Tu, E. Steinbach
We propose a framework for error robust real-time video transmission over wireless networks. In our approach, we cope with packet loss on the downlink by retransmitting lost packets from the base station (BS) to the receiver for error recovery. Retransmissions are enabled by using fixed-distance reference picture selection during encoding with a prediction distance that corresponds to the round-trip-time between the BS and the receiver. We deal with transmission errors on the uplink by sending acknowledgements and predicting the next frame from the most recent frame that has been positively acknowledged by the BS. We show that these two separate approaches for uplink and downlink nicely fit together. We compare our approach to state-of-the art error resilience approaches that employ random intra update of macroblocks and FEC across packets for error resilience. At the same bit rate and packet loss rate we observe improvements of up to 4.5 dB for our scheme.
{"title":"Proxy-based reference picture selection for real-time video transmission over mobile networks","authors":"Wei Tu, E. Steinbach","doi":"10.1109/ICME.2005.1521422","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521422","url":null,"abstract":"We propose a framework for error robust real-time video transmission over wireless networks. In our approach, we cope with packet loss on the downlink by retransmitting lost packets from the base station (BS) to the receiver for error recovery. Retransmissions are enabled by using fixed-distance reference picture selection during encoding with a prediction distance that corresponds to the round-trip-time between the BS and the receiver. We deal with transmission errors on the uplink by sending acknowledgements and predicting the next frame from the most recent frame that has been positively acknowledged by the BS. We show that these two separate approaches for uplink and downlink nicely fit together. We compare our approach to state-of-the art error resilience approaches that employ random intra update of macroblocks and FEC across packets for error resilience. At the same bit rate and packet loss rate we observe improvements of up to 4.5 dB for our scheme.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122485513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}