Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521371
Atul Sajjanhar, Guojun Lu, Dengsheng Zhang
In this paper, spherical harmonics are proposed as shape descriptors for 2D images. We introduce the concept of connectivity; 2D images are decomposed using connectivity, which is followed by 3D model construction. Spherical harmonics are obtained for 3D models and used as descriptors for the underlying 2D shapes. Difference between two images is computed as the Euclidean distance between their spherical harmonics descriptors. Experiments are performed to test the effectiveness of spherical harmonics for retrieval of 2D images. Item S8 within the MPEG-7 still images content set is used for performing experiments; this dataset consists of 3621 still images. Experimental results show that the proposed descriptors for 2D images are effective
{"title":"Spherical Harmonics Descriptor for 2D-Image Retrieval","authors":"Atul Sajjanhar, Guojun Lu, Dengsheng Zhang","doi":"10.1109/ICME.2005.1521371","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521371","url":null,"abstract":"In this paper, spherical harmonics are proposed as shape descriptors for 2D images. We introduce the concept of connectivity; 2D images are decomposed using connectivity, which is followed by 3D model construction. Spherical harmonics are obtained for 3D models and used as descriptors for the underlying 2D shapes. Difference between two images is computed as the Euclidean distance between their spherical harmonics descriptors. Experiments are performed to test the effectiveness of spherical harmonics for retrieval of 2D images. Item S8 within the MPEG-7 still images content set is used for performing experiments; this dataset consists of 3621 still images. Experimental results show that the proposed descriptors for 2D images are effective","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123343917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521682
Shingo Uchihashi, T. Kanade
We propose a new "content-free" image retrieval method which attempts to exploit certain common tendencies that exist among people's interpretation of images from user feedbacks. The system simply accumulates records of user feedback and recycles them in the form of collaborative filtering. We discuss various issues of image retrieval, argue for the idea of content-free, and present results of experiment. The results indicate that the performance of content-free image retrieval improves with the number of accumulated feedbacks, outperforming a basic but typical conventional content-based image retrieval system
{"title":"Content-Free Image Retrieval Based on Relations Exploited from User Feedbacks","authors":"Shingo Uchihashi, T. Kanade","doi":"10.1109/ICME.2005.1521682","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521682","url":null,"abstract":"We propose a new \"content-free\" image retrieval method which attempts to exploit certain common tendencies that exist among people's interpretation of images from user feedbacks. The system simply accumulates records of user feedback and recycles them in the form of collaborative filtering. We discuss various issues of image retrieval, argue for the idea of content-free, and present results of experiment. The results indicate that the performance of content-free image retrieval improves with the number of accumulated feedbacks, outperforming a basic but typical conventional content-based image retrieval system","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116566214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521419
Min Yang, Zhaohui Wu, Yingchun Yang
In this paper, we introduce an elaborate utterance detection algorithm to enhance speaker segmentation. Silence detector, further divider and audio type classifier are employed in this elaborate utterance detection, to make this algorithm adaptive for both silent and noisy environments. Open-set verification testing has taken on the Hub4-NE broadcasts database. The experiment results show that this enhanced segmentation method can provide better information for speaker models.
{"title":"Enhance speaker segmentation by elaborating utterance detection","authors":"Min Yang, Zhaohui Wu, Yingchun Yang","doi":"10.1109/ICME.2005.1521419","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521419","url":null,"abstract":"In this paper, we introduce an elaborate utterance detection algorithm to enhance speaker segmentation. Silence detector, further divider and audio type classifier are employed in this elaborate utterance detection, to make this algorithm adaptive for both silent and noisy environments. Open-set verification testing has taken on the Hub4-NE broadcasts database. The experiment results show that this enhanced segmentation method can provide better information for speaker models.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126361708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521571
A. Descampe, J. Ou, P. Chevalier, B. Macq
Remote access to large scale images arouses a growing interest in fields such as medical imagery or remote sensing. This raises the need for algorithms guaranteeing navigation smoothness while minimizing the network resources used. In this paper, we present a model taking advantage of the JPEG 2000 scalability combined with a prefetching policy. The model uses the last user action to efficiently manage the cache and to prefetch the most probable data to be used next. Three different network configurations are considered. In each case, comparison with two more classic policies shows the improvement brought by our approach.
{"title":"Data prefetching for smooth navigation of large scale JPEG 2000 images","authors":"A. Descampe, J. Ou, P. Chevalier, B. Macq","doi":"10.1109/ICME.2005.1521571","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521571","url":null,"abstract":"Remote access to large scale images arouses a growing interest in fields such as medical imagery or remote sensing. This raises the need for algorithms guaranteeing navigation smoothness while minimizing the network resources used. In this paper, we present a model taking advantage of the JPEG 2000 scalability combined with a prefetching policy. The model uses the last user action to efficiently manage the cache and to prefetch the most probable data to be used next. Three different network configurations are considered. In each case, comparison with two more classic policies shows the improvement brought by our approach.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127974115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521375
Mikko Löytynoja, T. Seppänen
This paper describes a counter scheme that uses hash functions to count how many times the user is allowed to play protected content in a DRM-enabled player. The proposed basic scheme can be used in scenarios where the user cannot be assumed to have online connection. We discuss the weaknesses of the proposed scheme and present alternative to the basic scheme, which increases the security of the counter
{"title":"Hash-based Counter Scheme for Digital Rights Management","authors":"Mikko Löytynoja, T. Seppänen","doi":"10.1109/ICME.2005.1521375","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521375","url":null,"abstract":"This paper describes a counter scheme that uses hash functions to count how many times the user is allowed to play protected content in a DRM-enabled player. The proposed basic scheme can be used in scenarios where the user cannot be assumed to have online connection. We discuss the weaknesses of the proposed scheme and present alternative to the basic scheme, which increases the security of the counter","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121464621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521470
Matthew L. Cooper, J. Foote
A convenient representation of a video segment is a single "keyframe". Keyframes are widely used in applications such as non-linear browsing and video editing. With existing methods of keyframe selection, similar video segments result in very similar keyframes, with the drawback that actual differences between the segments may be obscured. We present methods for keyframe selection based on two criteria: capturing the similarity to the represented segment, and preserving the differences from other segment keyframes, so that different segments will have visually distinct representations. We present two discriminative keyframe selection methods, and an example of experimental results.
{"title":"Discriminative techniques for keyframe selection","authors":"Matthew L. Cooper, J. Foote","doi":"10.1109/ICME.2005.1521470","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521470","url":null,"abstract":"A convenient representation of a video segment is a single \"keyframe\". Keyframes are widely used in applications such as non-linear browsing and video editing. With existing methods of keyframe selection, similar video segments result in very similar keyframes, with the drawback that actual differences between the segments may be obscured. We present methods for keyframe selection based on two criteria: capturing the similarity to the represented segment, and preserving the differences from other segment keyframes, so that different segments will have visually distinct representations. We present two discriminative keyframe selection methods, and an example of experimental results.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121780654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521633
Hao-Ping Hung, Ming-Syan Chen
Recent technology advances in multimedia communication have ushered in a new era of personal communication. Users can ubiquitously access the Internet via various mobile devices. For the mobile devices featured with lower-bandwidth network connectivity, transcoding can be used to reduce the object size by lowering the quality of a multimedia object. In this paper, we focus on the cache replacement policy in a transcoding proxy, which is a proxy server responsible for transcoding the object and reducing the network traffic. Based on the architecture in prior works, we propose a maximum profit replacement algorithm, abbreviated as MPR. MPR performs cache replacement according to the content in the caching candidate set, which is generated by the concept of dynamic programming. Experimental results show that the the proposed MPR outperforms the prior scheme in terms of the cache hit ratio.
{"title":"Maximizing the profit for cache replacement in a transcoding proxy","authors":"Hao-Ping Hung, Ming-Syan Chen","doi":"10.1109/ICME.2005.1521633","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521633","url":null,"abstract":"Recent technology advances in multimedia communication have ushered in a new era of personal communication. Users can ubiquitously access the Internet via various mobile devices. For the mobile devices featured with lower-bandwidth network connectivity, transcoding can be used to reduce the object size by lowering the quality of a multimedia object. In this paper, we focus on the cache replacement policy in a transcoding proxy, which is a proxy server responsible for transcoding the object and reducing the network traffic. Based on the architecture in prior works, we propose a maximum profit replacement algorithm, abbreviated as MPR. MPR performs cache replacement according to the content in the caching candidate set, which is generated by the concept of dynamic programming. Experimental results show that the the proposed MPR outperforms the prior scheme in terms of the cache hit ratio.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"30 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113937785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521402
V. Monga, Divyanshu Vats, B. Evans
Surviving geometric attacks in image authentication is considered to be of great importance. This is because of the vulnerability of classical watermarking and digital signature based schemes to geometric image manipulations, particularly local geometric attacks. In this paper, we present a general framework for image content authentication using salient feature points. We first develop an iterative feature detector based on an explicit modeling of the human visual system. Then, we compare features from two images by developing a generalized Hausdorff distance measure. The use of such a distance measure is crucial to the robustness of the scheme, and accounts for feature detector failure or occlusion, which previously proposed methods do not address. The proposed algorithm withstands standard benchmark (e.g. Stirmark) attacks including compression, common signal processing operations, global as well as local geometric transformations, and even hard to model distortions such as print and scan. Content changing (malicious) manipulations of image data are also accurately detected
{"title":"Image Authentication Under Geometric Attacks Via Structure Matching","authors":"V. Monga, Divyanshu Vats, B. Evans","doi":"10.1109/ICME.2005.1521402","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521402","url":null,"abstract":"Surviving geometric attacks in image authentication is considered to be of great importance. This is because of the vulnerability of classical watermarking and digital signature based schemes to geometric image manipulations, particularly local geometric attacks. In this paper, we present a general framework for image content authentication using salient feature points. We first develop an iterative feature detector based on an explicit modeling of the human visual system. Then, we compare features from two images by developing a generalized Hausdorff distance measure. The use of such a distance measure is crucial to the robustness of the scheme, and accounts for feature detector failure or occlusion, which previously proposed methods do not address. The proposed algorithm withstands standard benchmark (e.g. Stirmark) attacks including compression, common signal processing operations, global as well as local geometric transformations, and even hard to model distortions such as print and scan. Content changing (malicious) manipulations of image data are also accurately detected","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130274127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521430
N. Thakoor, Sungyong Jung, Jean X. Gao
The goal of this communication is to present a weighted likelihood discriminant for minimum error shape classification. Different from traditional maximum likelihood (ML) methods in which classification is carried out based on probabilities from independent individual class models as is the case for general hidden Markov model (HMM) methods, our proposed method utilizes information from all classes to minimize classification error. Proposed approach uses a hidden Markov model as a curvature feature based 2D shape descriptor. In this contribution we present a generalized probabilistic descent (GPD) method to weight the curvature likelihoods to achieve a discriminant function with minimum classification error. In contrast with other approaches, a weighted likelihood discriminant function is introduced. We believe that our sound theory based implementation reduces classification error by combining hidden Markov model with generalized probabilistic descent theory. We show comparative results obtained with our approach and classic maximum-likelihood calculation for fighter planes in terms of classification accuracies
{"title":"Hidden Markov Model Based Weighted Likelihood Discriminant for Minimum Error Shape Classification","authors":"N. Thakoor, Sungyong Jung, Jean X. Gao","doi":"10.1109/ICME.2005.1521430","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521430","url":null,"abstract":"The goal of this communication is to present a weighted likelihood discriminant for minimum error shape classification. Different from traditional maximum likelihood (ML) methods in which classification is carried out based on probabilities from independent individual class models as is the case for general hidden Markov model (HMM) methods, our proposed method utilizes information from all classes to minimize classification error. Proposed approach uses a hidden Markov model as a curvature feature based 2D shape descriptor. In this contribution we present a generalized probabilistic descent (GPD) method to weight the curvature likelihoods to achieve a discriminant function with minimum classification error. In contrast with other approaches, a weighted likelihood discriminant function is introduced. We believe that our sound theory based implementation reduces classification error by combining hidden Markov model with generalized probabilistic descent theory. We show comparative results obtained with our approach and classic maximum-likelihood calculation for fighter planes in terms of classification accuracies","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131054580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2005-07-06DOI: 10.1109/ICME.2005.1521465
D. Farin, J. Han, P. D. With
Semantic analysis of sport sequences requires camera calibration to obtain player and ball positions in real-world coordinates. For court sports like tennis, the marker lines on the field can be used to determine the calibration parameters. We propose a real-time calibration algorithm that can be applied to all court sports simply by exchanging the court model. The algorithm is based on (1) a specialized court-line detector, (2) a RANSAC-based line parameter estimation, (3) a combinatorial optimization step to localize the court within the set of detected line segments, and (4) an iterative court-model tracking step. Our results show real-time calibration of, e.g., tennis and soccer sequences with a computation time of only about 6 ms per frame.
{"title":"Fast camera calibration for the analysis of sport sequences","authors":"D. Farin, J. Han, P. D. With","doi":"10.1109/ICME.2005.1521465","DOIUrl":"https://doi.org/10.1109/ICME.2005.1521465","url":null,"abstract":"Semantic analysis of sport sequences requires camera calibration to obtain player and ball positions in real-world coordinates. For court sports like tennis, the marker lines on the field can be used to determine the calibration parameters. We propose a real-time calibration algorithm that can be applied to all court sports simply by exchanging the court model. The algorithm is based on (1) a specialized court-line detector, (2) a RANSAC-based line parameter estimation, (3) a combinatorial optimization step to localize the court within the set of detected line segments, and (4) an iterative court-model tracking step. Our results show real-time calibration of, e.g., tennis and soccer sequences with a computation time of only about 6 ms per frame.","PeriodicalId":244360,"journal":{"name":"2005 IEEE International Conference on Multimedia and Expo","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130744082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}