Pub Date : 2014-11-20DOI: 10.1109/MMSP.2014.6958817
Sean I. Young, R. Mathew, D. Taubman
An embedded coding scheme for dense motion (optical flow) fields is proposed. Such a scheme is particularly useful in scalable video compression where one must compensate for inter-frame motion at various visual qualities and resolutions. However, the high cost of coding such fields has often made this option prohibitive. Using our previously developed `breakpoint'-adaptive wavelet transform, we show that it is possible to code dense motion fields efficiently while simultaneously endowing the coded motion representation with embedded resolution and quality scalability attributes. Performance comparisons with the traditional non-scalable block-based model are also made and presented with the aid of a modified H.264/AVC JM reference encoder.
{"title":"Embedded coding of optical flow fields for scalable video compression","authors":"Sean I. Young, R. Mathew, D. Taubman","doi":"10.1109/MMSP.2014.6958817","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958817","url":null,"abstract":"An embedded coding scheme for dense motion (optical flow) fields is proposed. Such a scheme is particularly useful in scalable video compression where one must compensate for inter-frame motion at various visual qualities and resolutions. However, the high cost of coding such fields has often made this option prohibitive. Using our previously developed `breakpoint'-adaptive wavelet transform, we show that it is possible to code dense motion fields efficiently while simultaneously endowing the coded motion representation with embedded resolution and quality scalability attributes. Performance comparisons with the traditional non-scalable block-based model are also made and presented with the aid of a modified H.264/AVC JM reference encoder.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115076356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-20DOI: 10.1109/MMSP.2014.6958815
Haibin Huang, S. Rahardja
In scalable audio coders, such as the MPEG-4 SLS, error-mapping is used to map quantization errors in the core coder to an error signal before passing through bit-plane coding. In this paper, we propose a new error-mapping scheme that is derived by observing statistical properties of the error signal. Compared with the error-mapping in SLS, the proposed scheme improves coding efficiency as well as computational complexity of the coder. An average improvement of 9 points in MUSHRA score has been achieved by the proposed scheme in subjective listening tests. The proposed error-mapping adds a useful new tool to the existing toolset for constructing next-generation scalable audio coders.
{"title":"A new error-mapping scheme for scalable audio coding","authors":"Haibin Huang, S. Rahardja","doi":"10.1109/MMSP.2014.6958815","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958815","url":null,"abstract":"In scalable audio coders, such as the MPEG-4 SLS, error-mapping is used to map quantization errors in the core coder to an error signal before passing through bit-plane coding. In this paper, we propose a new error-mapping scheme that is derived by observing statistical properties of the error signal. Compared with the error-mapping in SLS, the proposed scheme improves coding efficiency as well as computational complexity of the coder. An average improvement of 9 points in MUSHRA score has been achieved by the proposed scheme in subjective listening tests. The proposed error-mapping adds a useful new tool to the existing toolset for constructing next-generation scalable audio coders.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134009572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-20DOI: 10.1109/MMSP.2014.6958811
Weizhi Nie, Anan Liu, Jing Yu, Yuting Su, L. Chaisorn, Yongkang Wang, M. Kankanhalli
This paper proposes a novel multi-view human action recognition method by discovering and sharing common knowledge among different video sets captured in multiple viewpoints. To our knowledge, we are the first to treat a specific view as target domain and the others as source domains and consequently formulate the multi-view action recognition into the cross-domain learning framework. First, the classic bag-of-visual word framework is implemented for visual feature extraction in individual viewpoints. Then, we propose a cross-domain learning method with block-wise weighted kernel function matrix to highlight the saliency components and consequently augment the discriminative ability of the model. Extensive experiments are implemented on IXMAS, the popular multi-view action dataset. The experimental results demonstrate that the proposed method can consistently outperform the state of the arts.
{"title":"Multi-view action recognition by cross-domain learning","authors":"Weizhi Nie, Anan Liu, Jing Yu, Yuting Su, L. Chaisorn, Yongkang Wang, M. Kankanhalli","doi":"10.1109/MMSP.2014.6958811","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958811","url":null,"abstract":"This paper proposes a novel multi-view human action recognition method by discovering and sharing common knowledge among different video sets captured in multiple viewpoints. To our knowledge, we are the first to treat a specific view as target domain and the others as source domains and consequently formulate the multi-view action recognition into the cross-domain learning framework. First, the classic bag-of-visual word framework is implemented for visual feature extraction in individual viewpoints. Then, we propose a cross-domain learning method with block-wise weighted kernel function matrix to highlight the saliency components and consequently augment the discriminative ability of the model. Extensive experiments are implemented on IXMAS, the popular multi-view action dataset. The experimental results demonstrate that the proposed method can consistently outperform the state of the arts.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132200140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-20DOI: 10.1109/MMSP.2014.6958816
Dominic Rüfenacht, R. Mathew, D. Taubman
The ability to predict motion fields at finer temporal scales from coarser ones is a very desirable property for temporal scalability. This is at best very difficult in current state-of-the-art video codecs (i.e., H.264, HEVC), where motion fields are anchored in the frame that is to be predicted (target frame). In this paper, we propose to anchor motion fields in the reference frames. We show how from only one fully coded motion field at the coarsest temporal level as well as breakpoints which signal discontinuities in the motion field, we are able to reliably predict motion fields used at finer temporal levels. This significantly reduces the cost for coding the motion fields. Results on synthetic data show improved rate-distortion (R-D) performance and superior scalability, when compared to the traditional way of anchoring motion fields.
{"title":"Bidirectional hierarchical anchoring of motion fields for scalable video coding","authors":"Dominic Rüfenacht, R. Mathew, D. Taubman","doi":"10.1109/MMSP.2014.6958816","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958816","url":null,"abstract":"The ability to predict motion fields at finer temporal scales from coarser ones is a very desirable property for temporal scalability. This is at best very difficult in current state-of-the-art video codecs (i.e., H.264, HEVC), where motion fields are anchored in the frame that is to be predicted (target frame). In this paper, we propose to anchor motion fields in the reference frames. We show how from only one fully coded motion field at the coarsest temporal level as well as breakpoints which signal discontinuities in the motion field, we are able to reliably predict motion fields used at finer temporal levels. This significantly reduces the cost for coding the motion fields. Results on synthetic data show improved rate-distortion (R-D) performance and superior scalability, when compared to the traditional way of anchoring motion fields.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132489848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-11-14DOI: 10.1109/MMSP.2014.6958823
Subrata Chakraborty, M. Paul, M. Murshed, Mortuza Ali
Video coding techniques utilising background frames, provide better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. Parametric approaches such as the mixture of Gaussian (MoG) based background modeling has been widely used however they require prior knowledge about the test videos for parameter estimation. Recently introduced non-parametric (NP) based background modeling techniques successfully improved video coding performance through a HEVC integrated coding scheme. The inherent nature of the NP technique naturally exhibits superior performance in dynamic background scenarios compared to the MoG based technique without a priori knowledge of video data distribution. Although NP based coding schemes showed promising coding performances, they suffer from a number of key challenges - (a) determination of the optimal subset of training frames for generating a suitable background that can be used as a reference frame during coding, (b) incorporating dynamic changes in the background effectively after the initial background frame is generated, (c) managing frequent scene change leading to performance degradation, and (d) optimizing coding quality ratio between an I-frame and other frames under bit rate constraints. In this study we develop a new scene adaptive coding scheme using the NP based technique, capable of solving the current challenges by incorporating a new continuously updating background generation process. Extensive experimental results are also provided to validate the effectiveness of the new scheme.
{"title":"A novel video coding scheme using a scene adaptive non-parametric background model","authors":"Subrata Chakraborty, M. Paul, M. Murshed, Mortuza Ali","doi":"10.1109/MMSP.2014.6958823","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958823","url":null,"abstract":"Video coding techniques utilising background frames, provide better rate distortion performance by exploiting coding efficiency in uncovered background areas compared to the latest video coding standard. Parametric approaches such as the mixture of Gaussian (MoG) based background modeling has been widely used however they require prior knowledge about the test videos for parameter estimation. Recently introduced non-parametric (NP) based background modeling techniques successfully improved video coding performance through a HEVC integrated coding scheme. The inherent nature of the NP technique naturally exhibits superior performance in dynamic background scenarios compared to the MoG based technique without a priori knowledge of video data distribution. Although NP based coding schemes showed promising coding performances, they suffer from a number of key challenges - (a) determination of the optimal subset of training frames for generating a suitable background that can be used as a reference frame during coding, (b) incorporating dynamic changes in the background effectively after the initial background frame is generated, (c) managing frequent scene change leading to performance degradation, and (d) optimizing coding quality ratio between an I-frame and other frames under bit rate constraints. In this study we develop a new scene adaptive coding scheme using the NP based technique, capable of solving the current challenges by incorporating a new continuously updating background generation process. Extensive experimental results are also provided to validate the effectiveness of the new scheme.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127010588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-09-01DOI: 10.1109/MMSP.2014.6958809
Konstantinos Pliakos, Constantine Kotropoulos
The unremitting growth of social media popularity is manifested by the vast volume of images uploaded to the web. Despite the extensive research efforts, there are still open problems in accurate or efficient image search methods. The majority of existing methods, dedicated to image search, treat the image visual content and the semantic information captured by the social image tags, separately or in a sequential manner. Here, a novel and efficient method is proposed, exploiting visual and textual information simultaneously. The joint visual-textual information is captured by a fuzzy hypergraph powered by the term-frequency and inverse-document-frequency (tf-idf) weighting scheme. Experimental results conducted on two datasets substantiate the merits of the proposed method. Indicatively, an average precision of 77% is measured at 1% recall for image-based queries.
{"title":"Social image search exploiting joint visual-textual information within a fuzzy hypergraph framework","authors":"Konstantinos Pliakos, Constantine Kotropoulos","doi":"10.1109/MMSP.2014.6958809","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958809","url":null,"abstract":"The unremitting growth of social media popularity is manifested by the vast volume of images uploaded to the web. Despite the extensive research efforts, there are still open problems in accurate or efficient image search methods. The majority of existing methods, dedicated to image search, treat the image visual content and the semantic information captured by the social image tags, separately or in a sequential manner. Here, a novel and efficient method is proposed, exploiting visual and textual information simultaneously. The joint visual-textual information is captured by a fuzzy hypergraph powered by the term-frequency and inverse-document-frequency (tf-idf) weighting scheme. Experimental results conducted on two datasets substantiate the merits of the proposed method. Indicatively, an average precision of 77% is measured at 1% recall for image-based queries.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115146030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-09-01DOI: 10.1109/MMSP.2014.6958800
T. Yamasaki, Takaki Maeda, K. Aizawa
This paper presents an algorithm to distinguish whether the output label that is yielded from multiclass support vector machine (SVM) is true or false without knowing the answer. Such judgment is done only by the confidence analysis based on the pre-training/testing using the training data. Such true/false judgment is useful for refining the output labels. We experimentally demonstrate that the decision value difference between the top candidate and the second candidate is a good measure. In addition, a proper threshold can be determined by the pre-training/testing using only the training data. Experimental results using three standard image datasets demonstrate that our proposed algorithm can improve Matthews correlation coefficient (MCC) much better than simply thresholding the decision value for the top candidate.
{"title":"SVM is not always confident: Telling whether the output from multiclass SVM is true or false by analysing its confidence values","authors":"T. Yamasaki, Takaki Maeda, K. Aizawa","doi":"10.1109/MMSP.2014.6958800","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958800","url":null,"abstract":"This paper presents an algorithm to distinguish whether the output label that is yielded from multiclass support vector machine (SVM) is true or false without knowing the answer. Such judgment is done only by the confidence analysis based on the pre-training/testing using the training data. Such true/false judgment is useful for refining the output labels. We experimentally demonstrate that the decision value difference between the top candidate and the second candidate is a good measure. In addition, a proper threshold can be determined by the pre-training/testing using only the training data. Experimental results using three standard image datasets demonstrate that our proposed algorithm can improve Matthews correlation coefficient (MCC) much better than simply thresholding the decision value for the top candidate.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123509610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-09-01DOI: 10.1109/MMSP.2014.6958819
Shengbin Meng, Y. Duan, Jun Sun, Zongming Guo
In this paper, we propose a novel design and optimized implementation of the HEVC decoder. First, a novel decoder prototype with refined decoding workflow and efficient memory management is designed. Then on this basis, a series of single-instruction-multiple-data (SIMD) based algorithms are used to speed up several time-consuming modules in HEVC decoding. Finally, a frame-based parallel framework is applied to exploit the multi-threading technology on multicore processors. With the highly optimized HEVC decoder, decoding speed of 246fps on Intel i7-2400 3.4GHz quad-core processor for 1080p videos and 52fps on ARM Cortex-A9 1.2GHz dual-core processor for 720p videos can be achieved in our experiments.
{"title":"Highly optimized implementation of HEVC decoder for general processors","authors":"Shengbin Meng, Y. Duan, Jun Sun, Zongming Guo","doi":"10.1109/MMSP.2014.6958819","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958819","url":null,"abstract":"In this paper, we propose a novel design and optimized implementation of the HEVC decoder. First, a novel decoder prototype with refined decoding workflow and efficient memory management is designed. Then on this basis, a series of single-instruction-multiple-data (SIMD) based algorithms are used to speed up several time-consuming modules in HEVC decoding. Finally, a frame-based parallel framework is applied to exploit the multi-threading technology on multicore processors. With the highly optimized HEVC decoder, decoding speed of 246fps on Intel i7-2400 3.4GHz quad-core processor for 1080p videos and 52fps on ARM Cortex-A9 1.2GHz dual-core processor for 720p videos can be achieved in our experiments.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125269595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-09-01DOI: 10.1109/MMSP.2014.6958832
Philippe Hanhart, Emilie Bosc, P. Callet, T. Ebrahimi
Free-viewpoint television is expected to create a more natural and interactive viewing experience by providing the ability to interactively change the viewpoint to enjoy a 3D scene. To render new virtual viewpoints, free-viewpoint systems rely on view synthesis. However, it is known that most objective metrics fail at predicting perceived quality of synthesized views. Therefore, it is legitimate to question the reliability of commonly used objective metrics to assess the quality of free-viewpoint video (FVV) sequences. In this paper, we analyze the performance of several commonly used objective quality metrics on FVV sequences, which were synthesized from decompressed depth data, using subjective scores as ground truth. Statistical analyses showed that commonly used metrics were not reliable predictors of perceived image quality when different contents and distortions were considered. However, the correlation improved when considering individual conditions, which indicates that the artifacts produced by some view synthesis algorithms might not be correctly handled by current metrics.
{"title":"Free-viewpoint video sequences: A new challenge for objective quality metrics","authors":"Philippe Hanhart, Emilie Bosc, P. Callet, T. Ebrahimi","doi":"10.1109/MMSP.2014.6958832","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958832","url":null,"abstract":"Free-viewpoint television is expected to create a more natural and interactive viewing experience by providing the ability to interactively change the viewpoint to enjoy a 3D scene. To render new virtual viewpoints, free-viewpoint systems rely on view synthesis. However, it is known that most objective metrics fail at predicting perceived quality of synthesized views. Therefore, it is legitimate to question the reliability of commonly used objective metrics to assess the quality of free-viewpoint video (FVV) sequences. In this paper, we analyze the performance of several commonly used objective quality metrics on FVV sequences, which were synthesized from decompressed depth data, using subjective scores as ground truth. Statistical analyses showed that commonly used metrics were not reliable predictors of perceived image quality when different contents and distortions were considered. However, the correlation improved when considering individual conditions, which indicates that the artifacts produced by some view synthesis algorithms might not be correctly handled by current metrics.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134429591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2014-09-01DOI: 10.1109/MMSP.2014.6958790
Madjid Maidi, M. Preda, Yassine Lehiani, T. Lavric
This paper presents an approach for tracking natural objects in augmented reality applications. The targets are detected and identified using a markerless approach relying upon the extraction of image salient features and descriptors. The method deals with large image databases using a novel strategy for feature retrieval and pairwise matching. Further-more, the developed method integrates a real-time solution for 3D pose estimation using an analytical technique based on camera perspective transformations. The algorithm associates 2D feature samples coming from the identification part with 3D mapped points of the object space. Next, a sampling scheme for ordering correspondences is carried out to establishing the 2D/3D projective relationship. The tracker performs localization using the feature images and 3D models to enhance the scene view with overlaid graphics by computing the camera motion parameters. The modules built within this architecture are deployed on a mobile platform to provide an intuitive interface for interacting with the surrounding real world. The system is experimented and evaluated on challenging scalable image dataset and the obtained results demonstrate the effectiveness of the approach towards versatile augmented reality applications.
{"title":"Vision-based tracking in large image database for real-time mobile augmented reality","authors":"Madjid Maidi, M. Preda, Yassine Lehiani, T. Lavric","doi":"10.1109/MMSP.2014.6958790","DOIUrl":"https://doi.org/10.1109/MMSP.2014.6958790","url":null,"abstract":"This paper presents an approach for tracking natural objects in augmented reality applications. The targets are detected and identified using a markerless approach relying upon the extraction of image salient features and descriptors. The method deals with large image databases using a novel strategy for feature retrieval and pairwise matching. Further-more, the developed method integrates a real-time solution for 3D pose estimation using an analytical technique based on camera perspective transformations. The algorithm associates 2D feature samples coming from the identification part with 3D mapped points of the object space. Next, a sampling scheme for ordering correspondences is carried out to establishing the 2D/3D projective relationship. The tracker performs localization using the feature images and 3D models to enhance the scene view with overlaid graphics by computing the camera motion parameters. The modules built within this architecture are deployed on a mobile platform to provide an intuitive interface for interacting with the surrounding real world. The system is experimented and evaluated on challenging scalable image dataset and the obtained results demonstrate the effectiveness of the approach towards versatile augmented reality applications.","PeriodicalId":164858,"journal":{"name":"2014 IEEE 16th International Workshop on Multimedia Signal Processing (MMSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131297251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}