Pub Date : 2017-03-05DOI: 10.1109/ICASSP.2017.7952668
Walid Masoudimansour, N. Bouguila
In this paper, a novel effective method to reduce the dimensionality of labeled proportional data is introduced. Most well-known existing linear dimensionality reduction methods rely on solving the generalized eigen value problem which fails in certain cases such as sparse data. The proposed algorithm is a linear method and uses a novel approach to the problem of dimensionality reduction to solve this problem while resulting higher classification rates. Data is assumed to be from two different classes where each class is matched to a mixture of generalized Dirichlet distributions after projection. Jeffrey divergence is then used as a dissimilarity measure between the projected classes to increase the inter-class variance. To find the optimal projection that yields the largest mutual information, genetic algorithm is used. The method is especially designed as a preprocessing step for binary classification, however, it can handle multi-modal data effectively due to the use of mixture models and therefore can be used for multi-class problems as well.
{"title":"Generalized dirichlet mixture matching projection for supervised linear dimensionality reduction of proportional data","authors":"Walid Masoudimansour, N. Bouguila","doi":"10.1109/ICASSP.2017.7952668","DOIUrl":"https://doi.org/10.1109/ICASSP.2017.7952668","url":null,"abstract":"In this paper, a novel effective method to reduce the dimensionality of labeled proportional data is introduced. Most well-known existing linear dimensionality reduction methods rely on solving the generalized eigen value problem which fails in certain cases such as sparse data. The proposed algorithm is a linear method and uses a novel approach to the problem of dimensionality reduction to solve this problem while resulting higher classification rates. Data is assumed to be from two different classes where each class is matched to a mixture of generalized Dirichlet distributions after projection. Jeffrey divergence is then used as a dissimilarity measure between the projected classes to increase the inter-class variance. To find the optimal projection that yields the largest mutual information, genetic algorithm is used. The method is especially designed as a preprocessing step for binary classification, however, it can handle multi-modal data effectively due to the use of mixture models and therefore can be used for multi-class problems as well.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116106591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-23DOI: 10.1109/MMSP.2016.7813395
Leonardo Favario, M. Siekkinen, E. Masala
Live video streaming from mobile devices is quickly becoming popular through services such as Periscope, Meerkat, and Facebook Live. Little is known, however, about how such services tackle the challenges of the live mobile streaming scenario. This work addresses such gap by investigating in details the characteristics of the Periscope service. A large number of publicly available streams have been captured and analyzed in depth, in particular studying the characteristics of the encoded streams and the communication evolution over time. Such an investigation allows to get an insight into key performance parameters such as bandwidth, latency, buffer levels and freezes, as well as the limits and strategies adopted by Periscope to deal with this challenging application scenario.
{"title":"Mobile live streaming: Insights from the periscope service","authors":"Leonardo Favario, M. Siekkinen, E. Masala","doi":"10.1109/MMSP.2016.7813395","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813395","url":null,"abstract":"Live video streaming from mobile devices is quickly becoming popular through services such as Periscope, Meerkat, and Facebook Live. Little is known, however, about how such services tackle the challenges of the live mobile streaming scenario. This work addresses such gap by investigating in details the characteristics of the Periscope service. A large number of publicly available streams have been captured and analyzed in depth, in particular studying the characteristics of the encoded streams and the communication evolution over time. Such an investigation allows to get an insight into key performance parameters such as bandwidth, latency, buffer levels and freezes, as well as the limits and strategies adopted by Periscope to deal with this challenging application scenario.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127940637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/MMSP.2016.7813382
K. M. Alam, Mohammed Bin Hariz, Seyed Vahid Hosseinioun, M. Saini, Abdulmotaleb El Saddik
Vehicular Cyber-Physical System (VCPS) is a new trend in the research of the intelligent transport systems (ITS). In VCPS, vehicles work as a hub of sensors to collect interior and exterior information about the vehicle. Vehicles can use ad-hoc networking or 3G/LTE communication technology to share useful information with their neighboring vehicles or with the infrastructures to accomplish user safety, comfort, and entertainment tasks. In order to facilitate efficient sensor-services fusion in the VCPS applications, we need real life vehicular sensory datasets. While there has been many datasets containing vehicle mobility traces, there is hardly any that contains sensory information to be shared on the network. In this paper, we present a scenario specific modular dataset architecture along with some multi-sensory dataset modules. One of the dataset modules provides time synchronized multi-vehicle data including multi-view video, multi-directional sound, GPS, accelerometer, gyroscope, and magnetic field sensors. Each of the three vehicles recorded front, back, left, and right videos while moving closely in the suburban areas to let explore vehicular cooperative applications. Another module presents necessary tools and datasets to identify vehicular events such as acceleration, deceleration, turn, and no-turn events. We also present development details of a safety application using the presented datasets along with a list of other possible applications.
{"title":"MUDVA: A multi-sensory dataset for the vehicular CPS applications","authors":"K. M. Alam, Mohammed Bin Hariz, Seyed Vahid Hosseinioun, M. Saini, Abdulmotaleb El Saddik","doi":"10.1109/MMSP.2016.7813382","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813382","url":null,"abstract":"Vehicular Cyber-Physical System (VCPS) is a new trend in the research of the intelligent transport systems (ITS). In VCPS, vehicles work as a hub of sensors to collect interior and exterior information about the vehicle. Vehicles can use ad-hoc networking or 3G/LTE communication technology to share useful information with their neighboring vehicles or with the infrastructures to accomplish user safety, comfort, and entertainment tasks. In order to facilitate efficient sensor-services fusion in the VCPS applications, we need real life vehicular sensory datasets. While there has been many datasets containing vehicle mobility traces, there is hardly any that contains sensory information to be shared on the network. In this paper, we present a scenario specific modular dataset architecture along with some multi-sensory dataset modules. One of the dataset modules provides time synchronized multi-vehicle data including multi-view video, multi-directional sound, GPS, accelerometer, gyroscope, and magnetic field sensors. Each of the three vehicles recorded front, back, left, and right videos while moving closely in the suburban areas to let explore vehicular cooperative applications. Another module presents necessary tools and datasets to identify vehicular events such as acceleration, deceleration, turn, and no-turn events. We also present development details of a safety application using the presented datasets along with a list of other possible applications.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122596713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/MMSP.2016.7813347
Jürgen Seiler, André Kaup
Image signals typically are defined on a rectangular two-dimensional grid. However, there exist scenarios where this is not fulfilled and where the image information only is available for a non-regular subset of pixel position. For processing, transmitting or displaying such an image signal, a re-sampling to a regular grid is required. Recently, Frequency Selective Reconstruction (FSR) has been proposed as a very effective sparsity-based algorithm for solving this under-determined problem. For this, FSR iteratively generates a model of the signal in the Fourier-domain. In this context, a fixed frequency prior inspired by the optical transfer function is used for favoring low-frequency content. However, this fixed prior is often too strict and may lead to a reduced reconstruction quality. To resolve this weakness, this paper proposes an adaptive frequency prior which takes the local density of the available samples into account. The proposed adaptive prior allows for a very high reconstruction quality, yielding gains of up to 0.6 dB PSNR over the fixed prior, independently of the density of the available samples. Compared to other state-of-the-art algorithms, visually noticeable gains of several dB are possible.
图像信号通常定义在矩形二维网格上。然而,在某些情况下,这并没有实现,并且图像信息只能用于像素位置的非规则子集。为了处理、传输或显示这样的图像信号,需要对规则网格进行重新采样。近年来,频率选择重建(FSR)作为一种非常有效的基于稀疏性的算法被提出来解决这一欠确定问题。为此,FSR迭代地在傅里叶域中生成信号的模型。在这种情况下,由光学传递函数激发的固定频率先验用于有利于低频内容。然而,这种固定的先验往往过于严格,可能导致重建质量下降。为了解决这一缺点,本文提出了一种考虑可用样本的局部密度的自适应频率先验。所提出的自适应先验允许非常高的重建质量,与固定先验相比,产生高达0.6 dB PSNR的增益,与可用样本的密度无关。与其他最先进的算法相比,可以实现几个dB的视觉显著增益。
{"title":"Adaptive frequency prior for frequency selective reconstruction of images from non-regular subsampling","authors":"Jürgen Seiler, André Kaup","doi":"10.1109/MMSP.2016.7813347","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813347","url":null,"abstract":"Image signals typically are defined on a rectangular two-dimensional grid. However, there exist scenarios where this is not fulfilled and where the image information only is available for a non-regular subset of pixel position. For processing, transmitting or displaying such an image signal, a re-sampling to a regular grid is required. Recently, Frequency Selective Reconstruction (FSR) has been proposed as a very effective sparsity-based algorithm for solving this under-determined problem. For this, FSR iteratively generates a model of the signal in the Fourier-domain. In this context, a fixed frequency prior inspired by the optical transfer function is used for favoring low-frequency content. However, this fixed prior is often too strict and may lead to a reduced reconstruction quality. To resolve this weakness, this paper proposes an adaptive frequency prior which takes the local density of the available samples into account. The proposed adaptive prior allows for a very high reconstruction quality, yielding gains of up to 0.6 dB PSNR over the fixed prior, independently of the density of the available samples. Compared to other state-of-the-art algorithms, visually noticeable gains of several dB are possible.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"940 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/MMSP.2016.7813386
Wei-Jen Ko, Y. Wang, Shao-Yi Chien
With the goal of increasing the resolution of face images, recent face hallucination methods advance learning techniques which observe training low and high-resolution patches for recovering the output image of interest. Since most existing patch-based face hallucination approaches do not consider the location information of the patches to be hallucinated, the resulting performance might be limited. In this paper, we propose an anchored patch-based hallucination method, which is able to exploit and identify image patches exhibiting structurally and spatially similar information. With these representative anchors observed, improved performance and computation efficiency can be achieved. Experimental results demonstrate that our proposed method achieves satisfactory performance and performs favorably against recent face hallucination approaches.
{"title":"Learning patch-based anchors for face hallucination","authors":"Wei-Jen Ko, Y. Wang, Shao-Yi Chien","doi":"10.1109/MMSP.2016.7813386","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813386","url":null,"abstract":"With the goal of increasing the resolution of face images, recent face hallucination methods advance learning techniques which observe training low and high-resolution patches for recovering the output image of interest. Since most existing patch-based face hallucination approaches do not consider the location information of the patches to be hallucinated, the resulting performance might be limited. In this paper, we propose an anchored patch-based hallucination method, which is able to exploit and identify image patches exhibiting structurally and spatially similar information. With these representative anchors observed, improved performance and computation efficiency can be achieved. Experimental results demonstrate that our proposed method achieves satisfactory performance and performs favorably against recent face hallucination approaches.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132206824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/MMSP.2016.7813397
Ioannis Mademlis, A. Tefas, N. Nikolaidis, I. Pitas
Automatic shot selection is an important aspect of movie summarization that is helpful both to producers and to audiences, e.g., for market promotion or browsing purposes. However, most of the related research has focused on shot selection based on low-level video content, which disregards semantic information, or on narrative properties extracted from text, which requires the movie script to be available. In this work, semantic shot selection based on the narrative prominence of movie characters in both the visual and the audio modalities is investigated, without the need for additional data such as a script. The output is a movie summary that only contains video frames from selected movie shots. Selection is controlled by a user-provided shot retention parameter, that removes key-frames/key-segments from the skim based on actor face appearances and speech instances. This novel process (Multimodal Shot Pruning, or MSP) is algebraically modelled as a multimodal matrix Column Subset Selection Problem, which is solved using an evolutionary computing approach.
{"title":"Movie shot selection preserving narrative properties","authors":"Ioannis Mademlis, A. Tefas, N. Nikolaidis, I. Pitas","doi":"10.1109/MMSP.2016.7813397","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813397","url":null,"abstract":"Automatic shot selection is an important aspect of movie summarization that is helpful both to producers and to audiences, e.g., for market promotion or browsing purposes. However, most of the related research has focused on shot selection based on low-level video content, which disregards semantic information, or on narrative properties extracted from text, which requires the movie script to be available. In this work, semantic shot selection based on the narrative prominence of movie characters in both the visual and the audio modalities is investigated, without the need for additional data such as a script. The output is a movie summary that only contains video frames from selected movie shots. Selection is controlled by a user-provided shot retention parameter, that removes key-frames/key-segments from the skim based on actor face appearances and speech instances. This novel process (Multimodal Shot Pruning, or MSP) is algebraically modelled as a multimodal matrix Column Subset Selection Problem, which is solved using an evolutionary computing approach.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132300452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/MMSP.2016.7813361
Kodai Kikuchi, T. Kajiyama, Kei Ogura, E. Miyashita
We propose a color space transform for 4:4:4 video coding. The uncorrelated noise contained in specific color component deteriorates compression efficiency. The proposed transform can limit the propagation of the uncorrelated noise of the color component to other color components by introducing zero coefficients in a transform matrix. Simulation results show that the proposed transform improves image quality after compression using extended JPEG in certain natural images. To achieve high image quality for any kind of image, an adaptive selection among conventional and the proposed color space transforms were performed by calculating the independency among color-difference components.
{"title":"Adaptive color space transforms for 4:4:4 video coding considering uncorrelated noise among color components","authors":"Kodai Kikuchi, T. Kajiyama, Kei Ogura, E. Miyashita","doi":"10.1109/MMSP.2016.7813361","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813361","url":null,"abstract":"We propose a color space transform for 4:4:4 video coding. The uncorrelated noise contained in specific color component deteriorates compression efficiency. The proposed transform can limit the propagation of the uncorrelated noise of the color component to other color components by introducing zero coefficients in a transform matrix. Simulation results show that the proposed transform improves image quality after compression using extended JPEG in certain natural images. To achieve high image quality for any kind of image, an adaptive selection among conventional and the proposed color space transforms were performed by calculating the independency among color-difference components.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131591850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/MMSP.2016.7813371
I. Elfitri, M. Sobirin, Fadhlur Rahman, R. Kurnia
MPEG Surround (MPS) has been widely known as both efficient technique for encoding multi-channel audio signals and rich-features audio standard. However, the generation of residual signal in the basis of a single module in MPS encoder can be considered as not optimal to compensate for error due to down-mixing process. In this paper, an improved residual coding method is proposed in order to ensure the down-mixing error can be optimally minimised particularly for MPS operation at high bit-rates. The distortion introduced during MPS encoding and decoding processes is first studied which then motivates for developing an approach which is more accurate in determining residual signals for better compensation for the distortion. A subjective test demonstrates that the MPS with improved residual coding can be competitive to Advanced Audio Coding (AAC) multi-channel for encoding 5-channel audio signals at bit-rates of 256 and 320 kb/s.
{"title":"Advanced residual coding for MPEG surround encoder","authors":"I. Elfitri, M. Sobirin, Fadhlur Rahman, R. Kurnia","doi":"10.1109/MMSP.2016.7813371","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813371","url":null,"abstract":"MPEG Surround (MPS) has been widely known as both efficient technique for encoding multi-channel audio signals and rich-features audio standard. However, the generation of residual signal in the basis of a single module in MPS encoder can be considered as not optimal to compensate for error due to down-mixing process. In this paper, an improved residual coding method is proposed in order to ensure the down-mixing error can be optimally minimised particularly for MPS operation at high bit-rates. The distortion introduced during MPS encoding and decoding processes is first studied which then motivates for developing an approach which is more accurate in determining residual signals for better compensation for the distortion. A subjective test demonstrates that the MPS with improved residual coding can be competitive to Advanced Audio Coding (AAC) multi-channel for encoding 5-channel audio signals at bit-rates of 256 and 320 kb/s.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114631235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/MMSP.2016.7813393
J. Ramírez, J. Górriz, Francisco J. Martínez-Murcia, F. Segovia, D. Salas-González
This paper shows a magnetic resonance image (MRI) classification technique based on nonnegative matrix factorization (NNMF) and ensemble tree learning methods. The system consists of a feature extraction process that applies NNMF to gray matter (GM) MRI first-order statistics of a number of sub-cortical structures and a learning process of an ensemble of decision trees. The ensembles are trained by means of boosting and bagging while their performance is compared in terms of the classification error and the received operating characteristics curve (ROC) using k-fold cross validation. The results show that NNMF is well suited for reducing the dimensionality of the input data without a penalty on the performance of the ensembles. The best performance was obtained by bagging in terms of convergence rate and minimum residual loss, especially for high complexity classification tasks (i.e. NC vs. MCI and MCI vs. AD.
{"title":"Magnetic resonance image classification using nonnegative matrix factorization and ensemble tree learning techniques","authors":"J. Ramírez, J. Górriz, Francisco J. Martínez-Murcia, F. Segovia, D. Salas-González","doi":"10.1109/MMSP.2016.7813393","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813393","url":null,"abstract":"This paper shows a magnetic resonance image (MRI) classification technique based on nonnegative matrix factorization (NNMF) and ensemble tree learning methods. The system consists of a feature extraction process that applies NNMF to gray matter (GM) MRI first-order statistics of a number of sub-cortical structures and a learning process of an ensemble of decision trees. The ensembles are trained by means of boosting and bagging while their performance is compared in terms of the classification error and the received operating characteristics curve (ROC) using k-fold cross validation. The results show that NNMF is well suited for reducing the dimensionality of the input data without a penalty on the performance of the ensembles. The best performance was obtained by bagging in terms of convergence rate and minimum residual loss, especially for high complexity classification tasks (i.e. NC vs. MCI and MCI vs. AD.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134034286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-09-01DOI: 10.1109/MMSP.2016.7813399
Qiang Yao, Hiroshi Sankoh, Keisuke Nonaka, S. Naito
In recent years, the demand of immersive experience has triggered a great revolution in the applications and formats of multimedia. Particularly, immersive navigation of free viewpoint sports video has become increasingly popular, and people would like to be able to actively select different viewpoints when watching sports videos to enhance the ultra realistic experience. In the practical realization of immersive navigation of free viewpoint video, the camera calibration is of vital importance. Especially, automatic camera calibration is very significant in real-time implementation and the accuracy of camera parameter directly determines the final experience of free viewpoint navigation. In this paper, we propose an automatic camera self-calibration method based on a field model for free viewpoint navigation in sports events. The proposed method is composed of three parts, namely, extraction of field lines in a camera image, calculation of crossing points, determination of the optimal camera parameter. Experimental results show that the camera parameter can be automatically estimated by the proposed method for a fixed camera, dynamic camera and multi-view cameras with high accuracy. Furthermore, immersive free viewpoint navigation in sports events can also be completely realized based on the camera parameter estimated by the proposed method.
{"title":"Automatic camera self-calibration for immersive navigation of free viewpoint sports video","authors":"Qiang Yao, Hiroshi Sankoh, Keisuke Nonaka, S. Naito","doi":"10.1109/MMSP.2016.7813399","DOIUrl":"https://doi.org/10.1109/MMSP.2016.7813399","url":null,"abstract":"In recent years, the demand of immersive experience has triggered a great revolution in the applications and formats of multimedia. Particularly, immersive navigation of free viewpoint sports video has become increasingly popular, and people would like to be able to actively select different viewpoints when watching sports videos to enhance the ultra realistic experience. In the practical realization of immersive navigation of free viewpoint video, the camera calibration is of vital importance. Especially, automatic camera calibration is very significant in real-time implementation and the accuracy of camera parameter directly determines the final experience of free viewpoint navigation. In this paper, we propose an automatic camera self-calibration method based on a field model for free viewpoint navigation in sports events. The proposed method is composed of three parts, namely, extraction of field lines in a camera image, calculation of crossing points, determination of the optimal camera parameter. Experimental results show that the camera parameter can be automatically estimated by the proposed method for a fixed camera, dynamic camera and multi-view cameras with high accuracy. Furthermore, immersive free viewpoint navigation in sports events can also be completely realized based on the camera parameter estimated by the proposed method.","PeriodicalId":113192,"journal":{"name":"2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134253894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}