Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8462022
Yongjian Xue, P. Beauseroy
Many one class SVM applications require online learning technique when time series data are encountered. Most of the existing methods for online SVM learning are based on C SVM without adapting the constraint parameter dynamically as the number of training samples increases. In such case the false alarm rate decreases while the miss alarm rate increases gradually for one class SVM. In most applications we prefer a relatively stable performance, especially the false alarm rate. In order to solve that problem, we propose an online version of v-OeSVM. Experiments on toy and real datasets show that v-OeSVM is a good mean to target a given false alarm rate while the AUC increases slowly as the number of new samples increases.
{"title":"Constant False Alarm Rate for Online one Class Svm Learning","authors":"Yongjian Xue, P. Beauseroy","doi":"10.1109/ICASSP.2018.8462022","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462022","url":null,"abstract":"Many one class SVM applications require online learning technique when time series data are encountered. Most of the existing methods for online SVM learning are based on C SVM without adapting the constraint parameter dynamically as the number of training samples increases. In such case the false alarm rate decreases while the miss alarm rate increases gradually for one class SVM. In most applications we prefer a relatively stable performance, especially the false alarm rate. In order to solve that problem, we propose an online version of v-OeSVM. Experiments on toy and real datasets show that v-OeSVM is a good mean to target a given false alarm rate while the AUC increases slowly as the number of new samples increases.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"30 1","pages":"2821-2825"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81231986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8461320
Karthik Upadhya, S. Vorobyov, R. Heath
Millimeter wave (mmWave) multiple-input multiple-output (MIMO) transceivers employ narrow beams to obtain a large array-gain, rendering them sensitive to changes in the angles of arrival and departure of the paths. Since the singular vectors that span the channel subspace are used to design the precoder and combiner, we propose a method to track the receiver-side channel subspace during data transmission using a separate radio frequency (RF) chain dedicated for channel tracking. Under certain conditions on the transmit precoder, we show that the receiver-side channel subspace can be estimated during data transmission without knowing the structure of the precoder or the transmitted data. The performance of the proposed method is evaluated through simulations.
{"title":"Low-Overhead Receiver-Side Channel Tracking for Mmwave Mimo","authors":"Karthik Upadhya, S. Vorobyov, R. Heath","doi":"10.1109/ICASSP.2018.8461320","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461320","url":null,"abstract":"Millimeter wave (mmWave) multiple-input multiple-output (MIMO) transceivers employ narrow beams to obtain a large array-gain, rendering them sensitive to changes in the angles of arrival and departure of the paths. Since the singular vectors that span the channel subspace are used to design the precoder and combiner, we propose a method to track the receiver-side channel subspace during data transmission using a separate radio frequency (RF) chain dedicated for channel tracking. Under certain conditions on the transmit precoder, we show that the receiver-side channel subspace can be estimated during data transmission without knowing the structure of the precoder or the transmitted data. The performance of the proposed method is evaluated through simulations.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"36 1","pages":"3859-3863"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81508181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8461312
Alihan Kaplan, V. Pohl, Dae Gwan Lee
This paper considers the problem of determining the sparse covariance matrix $mathbf{X}$ of an unknown data vector $pmb{x}$ by observing the covariance matrix $mathbf{Y}$ of a compressive measurement vector $pmb{y}=mathbf{A}pmb{x}$. We construct deterministic sensing matrices $mathbf{A}$ for which the recovery of a $k$ -sparse covariance matrix $mathbf{X}$ from $m$ values of $mathbf{Y}$ is guaranteed with high probability. In particular, we show that the number of measurements $m$ scales linearly with the sparsity $k$.
{"title":"On Compressive Sensing of Sparse Covariance Matrices Using Deterministic Sensing Matrices","authors":"Alihan Kaplan, V. Pohl, Dae Gwan Lee","doi":"10.1109/ICASSP.2018.8461312","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461312","url":null,"abstract":"This paper considers the problem of determining the sparse covariance matrix <tex>$mathbf{X}$</tex> of an unknown data vector <tex>$pmb{x}$</tex> by observing the covariance matrix <tex>$mathbf{Y}$</tex> of a compressive measurement vector <tex>$pmb{y}=mathbf{A}pmb{x}$</tex>. We construct deterministic sensing matrices <tex>$mathbf{A}$</tex> for which the recovery of a <tex>$k$</tex> -sparse covariance matrix <tex>$mathbf{X}$</tex> from <tex>$m$</tex> values of <tex>$mathbf{Y}$</tex> is guaranteed with high probability. In particular, we show that the number of measurements <tex>$m$</tex> scales linearly with the sparsity <tex>$k$</tex>.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"28 1","pages":"4019-4023"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89595941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8462464
Saori Takeyama, Shunsuke Ono, I. Kumazawa
Acquiring high-resolution hyperspectral (HS) images is a very challenging task. To this end, hyperspectral pansharpening techniques have been widely studied, which estimate an HS image of high spatial and spectral resolution (high HS image) from a pair of an HS image of high spectral resolution but low spatial resolution (low HS image) and a high spatial resolution panchromatic (PAN) image. However, since these methods do not fully utilize the piecewise-smoothness of spectral information on HS images in estimation, they tend to produce spectral distortion when the low HS image contains noise. To tackle this issue, we propose a new hyperspectral pansharpening method using a spatio-spectral regularization. Our method not only effectively exploits observed information but also properly promotes the spatio-spectral piecewise-smoothness of the resulting high HS image, leading to high quality and robust estimation. The proposed method is reduced to a nonsmooth convex optimization problem, which is efficiently solved by a primal-dual splitting method. Our experiments demonstrate the advantages of our method over existing hyperspectral pansharpening methods.
{"title":"Robust and Effective Hyperspectral Pansharpening Using Spatio-Spectral Total Variation","authors":"Saori Takeyama, Shunsuke Ono, I. Kumazawa","doi":"10.1109/ICASSP.2018.8462464","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462464","url":null,"abstract":"Acquiring high-resolution hyperspectral (HS) images is a very challenging task. To this end, hyperspectral pansharpening techniques have been widely studied, which estimate an HS image of high spatial and spectral resolution (high HS image) from a pair of an HS image of high spectral resolution but low spatial resolution (low HS image) and a high spatial resolution panchromatic (PAN) image. However, since these methods do not fully utilize the piecewise-smoothness of spectral information on HS images in estimation, they tend to produce spectral distortion when the low HS image contains noise. To tackle this issue, we propose a new hyperspectral pansharpening method using a spatio-spectral regularization. Our method not only effectively exploits observed information but also properly promotes the spatio-spectral piecewise-smoothness of the resulting high HS image, leading to high quality and robust estimation. The proposed method is reduced to a nonsmooth convex optimization problem, which is efficiently solved by a primal-dual splitting method. Our experiments demonstrate the advantages of our method over existing hyperspectral pansharpening methods.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"45 1","pages":"1603-1607"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86512694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8461836
Aravind Illa, Deep Patel, B. Yamini, Meera ss, N. Shivashankar, P. Veeramani, Seena vengalii, Kiran Polavarapui, S. Nashi, A. Nalini, P. Ghosh
In this work, we consider the task of acoustic and articulatory feature based automatic classification of Amyotrophic Lateral Sclerosis (ALS) patients and healthy subjects using speech tasks. In particular, we compare the roles of different types of speech tasks, namely rehearsed speech, spontaneous speech and repeated words for this purpose. Simultaneous articulatory and speech data were recorded from 8 healthy controls and 8 ALS patients using AG501 for the classification experiments. In addition to typical acoustic and articulatory features, new articulatory features are proposed for classification. As classifiers, both Deep Neural Networks (DNN) and Support Vector Machines (SVM) are examined. Classification experiments reveal that the proposed articulatory features outperform other acoustic and articulatory features using both DNN and SVM classifier. However, SVM performs better than DNN classifier using the proposed feature. Among three different speech tasks considered, the rehearsed speech was found to provide the highest F-score of 1, followed by an F-score of 0.92 when both repeated words and spontaneous speech are used for classification.
{"title":"Comparison of Speech Tasks for Automatic Classification of Patients with Amyotrophic Lateral Sclerosis and Healthy Subjects","authors":"Aravind Illa, Deep Patel, B. Yamini, Meera ss, N. Shivashankar, P. Veeramani, Seena vengalii, Kiran Polavarapui, S. Nashi, A. Nalini, P. Ghosh","doi":"10.1109/ICASSP.2018.8461836","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461836","url":null,"abstract":"In this work, we consider the task of acoustic and articulatory feature based automatic classification of Amyotrophic Lateral Sclerosis (ALS) patients and healthy subjects using speech tasks. In particular, we compare the roles of different types of speech tasks, namely rehearsed speech, spontaneous speech and repeated words for this purpose. Simultaneous articulatory and speech data were recorded from 8 healthy controls and 8 ALS patients using AG501 for the classification experiments. In addition to typical acoustic and articulatory features, new articulatory features are proposed for classification. As classifiers, both Deep Neural Networks (DNN) and Support Vector Machines (SVM) are examined. Classification experiments reveal that the proposed articulatory features outperform other acoustic and articulatory features using both DNN and SVM classifier. However, SVM performs better than DNN classifier using the proposed feature. Among three different speech tasks considered, the rehearsed speech was found to provide the highest F-score of 1, followed by an F-score of 0.92 when both repeated words and spontaneous speech are used for classification.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"46 15 1","pages":"6014-6018"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73471803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8461514
C. Dantas, R. Gribonval
Accelerating the solution of the Lasso problem becomes crucial when scaling to very high dimensional data. In this paper, we propose a way to combine two existing acceleration techniques: safe screening tests, which simplify the problem by eliminating useless dictionary atoms; and the use of structured dictionaries which are faster to operate with. A structured approximation of the true dictionary is used at the initial stage of the optimization, and we show how to define screening tests which are still safe despite the approximation error. In particular, we extend a state-of-the-art screening test, the GAP SAFE sphere test, to this new setting. The practical interest of the proposed methodology is demonstrated by considerable reductions in simulation time.
{"title":"Faster and Still Safe: Combining Screening Techniques and Structured Dictionaries to Accelerate the Lasso","authors":"C. Dantas, R. Gribonval","doi":"10.1109/ICASSP.2018.8461514","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461514","url":null,"abstract":"Accelerating the solution of the Lasso problem becomes crucial when scaling to very high dimensional data. In this paper, we propose a way to combine two existing acceleration techniques: safe screening tests, which simplify the problem by eliminating useless dictionary atoms; and the use of structured dictionaries which are faster to operate with. A structured approximation of the true dictionary is used at the initial stage of the optimization, and we show how to define screening tests which are still safe despite the approximation error. In particular, we extend a state-of-the-art screening test, the GAP SAFE sphere test, to this new setting. The practical interest of the proposed methodology is demonstrated by considerable reductions in simulation time.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"9 1","pages":"4069-4073"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73267044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8462237
T. Okamoto, Kentaro Tachibana, T. Toda, Y. Shiga, H. Kawai
Although a WaveNet vocoder can synthesize more natural-sounding speech waveforms than conventional vocoders with sampling frequencies of 16 and 24 kHz, it is difficult to directly extend the sampling frequency to 48 kHz to cover the entire human audible frequency range for higher-quality synthesis because the model size becomes too large to train with a consumer GPU. For a WaveNet vocoder with a sampling frequency of 48 kHz with a consumer GPU, this paper introduces a subband WaveNet architecture to a speaker-dependent WaveNet vocoder and proposes a subband WaveNet vocoder. In experiments, each conditional subband WaveNet with a sampling frequency of 8 kHz was well trained using a consumer GPU. The results of subjective evaluations with a Japanese male speech corpus indicate that the proposed subband WaveNet vocoder with 36-dimensional simple acoustic features significantly outperformed the conventional source-filter model-based vocoders including STRAIGHT with 86-dimensional features.
{"title":"An Investigation of Subband Wavenet Vocoder Covering Entire Audible Frequency Range with Limited Acoustic Features","authors":"T. Okamoto, Kentaro Tachibana, T. Toda, Y. Shiga, H. Kawai","doi":"10.1109/ICASSP.2018.8462237","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462237","url":null,"abstract":"Although a WaveNet vocoder can synthesize more natural-sounding speech waveforms than conventional vocoders with sampling frequencies of 16 and 24 kHz, it is difficult to directly extend the sampling frequency to 48 kHz to cover the entire human audible frequency range for higher-quality synthesis because the model size becomes too large to train with a consumer GPU. For a WaveNet vocoder with a sampling frequency of 48 kHz with a consumer GPU, this paper introduces a subband WaveNet architecture to a speaker-dependent WaveNet vocoder and proposes a subband WaveNet vocoder. In experiments, each conditional subband WaveNet with a sampling frequency of 8 kHz was well trained using a consumer GPU. The results of subjective evaluations with a Japanese male speech corpus indicate that the proposed subband WaveNet vocoder with 36-dimensional simple acoustic features significantly outperformed the conventional source-filter model-based vocoders including STRAIGHT with 86-dimensional features.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"14 2 1","pages":"5654-5658"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78638817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8461464
K. H. Ng, S. Tatinati, Andy W. H. Khong
Impact of online learning sequences to forecast course outcomes for an undergraduate digital signal processing (DSP) course is studied in this work. A multi-modal learning schema based on deep-learning techniques with learning sequences, psychometric measures, and personality traits as input features is developed in this work. The aim is to identify any underlying patterns in the learning sequences and subsequently forecast the learning outcomes. Experiments are conducted on the data acquired for the DSP course taught over 13 teaching weeks to underpin the forecasting efficacy of various deep-learning models. Results showed that the proposed multi-modal schema yields better forecasting performance compared to existing frequency-based methods in existing literature. It is further observed that the psychometric measures incorporated in the proposed multimodal schema enhance the ability of distinguishing nuances in the input sequences when the forecasting task is highly dependent on human behavior.
{"title":"Online Education Evaluation for Signal Processing Course Through Student Learning Pathways","authors":"K. H. Ng, S. Tatinati, Andy W. H. Khong","doi":"10.1109/ICASSP.2018.8461464","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8461464","url":null,"abstract":"Impact of online learning sequences to forecast course outcomes for an undergraduate digital signal processing (DSP) course is studied in this work. A multi-modal learning schema based on deep-learning techniques with learning sequences, psychometric measures, and personality traits as input features is developed in this work. The aim is to identify any underlying patterns in the learning sequences and subsequently forecast the learning outcomes. Experiments are conducted on the data acquired for the DSP course taught over 13 teaching weeks to underpin the forecasting efficacy of various deep-learning models. Results showed that the proposed multi-modal schema yields better forecasting performance compared to existing frequency-based methods in existing literature. It is further observed that the psychometric measures incorporated in the proposed multimodal schema enhance the ability of distinguishing nuances in the input sequences when the forecasting task is highly dependent on human behavior.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"6 1","pages":"6458-6462"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88017283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8462070
Kang Lin, Yu Qi, Shaozhe Feng, Qi Lian, Gang Pan, Yueming Wang
Automatic seizure identification plays an important role in epilepsy evaluation. Most existing methods regard seizure identification as a classification problem and rely on labelled training set. However, labelling seizure onset is very expensive and seizure data for each individual is especially limited, classifier-based methods are usually impractical in use. Clustering methods could learn useful information from unlabelled data, while they may lead to unstable results given epileptic signals with high noises. In this paper, we propose to use Gaussian temporal-constrained k-medoids method for seizure state segmentation. Using temporal information, the noises could be effectively suppressed and robust clustering performance is achieved. Besides, a new criterion called signed total variation (STV) which describes temporal integrity and consistency is proposed for temporal-constrained clustering evaluation. Experimental results show that, compared with the existing methods, the k-medoids method with Gaussian temporal constraint achieves the best results on both F1-score and STV.
{"title":"Epileptic State Segmentation with Temporal-Constrained Clustering","authors":"Kang Lin, Yu Qi, Shaozhe Feng, Qi Lian, Gang Pan, Yueming Wang","doi":"10.1109/ICASSP.2018.8462070","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462070","url":null,"abstract":"Automatic seizure identification plays an important role in epilepsy evaluation. Most existing methods regard seizure identification as a classification problem and rely on labelled training set. However, labelling seizure onset is very expensive and seizure data for each individual is especially limited, classifier-based methods are usually impractical in use. Clustering methods could learn useful information from unlabelled data, while they may lead to unstable results given epileptic signals with high noises. In this paper, we propose to use Gaussian temporal-constrained k-medoids method for seizure state segmentation. Using temporal information, the noises could be effectively suppressed and robust clustering performance is achieved. Besides, a new criterion called signed total variation (STV) which describes temporal integrity and consistency is proposed for temporal-constrained clustering evaluation. Experimental results show that, compared with the existing methods, the k-medoids method with Gaussian temporal constraint achieves the best results on both F1-score and STV.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"881-885"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83501276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-04-24DOI: 10.1109/ICASSP.2018.8462686
Haoqi Li, Naveen Kumar, Ruxin Chen, P. Georgiou
This paper presents a novel deep Reinforcement Learning (RL) framework for classifying movie scenes based on affect using the face images detected in the video stream as input. Extracting affective information from the video is a challenging task modulating complex visual and temporal representations intertwined with the complex aspects of human perception and information integration. This also makes it difficult to collect a large annotated corpus restricting the use of supervised learning methods. We present an alternative learning framework based on RL that is tolerant to label sparsity and can easily make use of any available ground truth in an online fashion. We employ this modified RL model for the binary classification of whether a scene is funny or not on a dataset of movie scene clips. The results show that our model correctly predicts 72.95% of the time on the 2–3 minute long movie scenes while on shorter scenes the accuracy obtained is 84.13%.
{"title":"A Deep Reinforcement Learning Framework for Identifying Funny Scenes in Movies","authors":"Haoqi Li, Naveen Kumar, Ruxin Chen, P. Georgiou","doi":"10.1109/ICASSP.2018.8462686","DOIUrl":"https://doi.org/10.1109/ICASSP.2018.8462686","url":null,"abstract":"This paper presents a novel deep Reinforcement Learning (RL) framework for classifying movie scenes based on affect using the face images detected in the video stream as input. Extracting affective information from the video is a challenging task modulating complex visual and temporal representations intertwined with the complex aspects of human perception and information integration. This also makes it difficult to collect a large annotated corpus restricting the use of supervised learning methods. We present an alternative learning framework based on RL that is tolerant to label sparsity and can easily make use of any available ground truth in an online fashion. We employ this modified RL model for the binary classification of whether a scene is funny or not on a dataset of movie scene clips. The results show that our model correctly predicts 72.95% of the time on the 2–3 minute long movie scenes while on shorter scenes the accuracy obtained is 84.13%.","PeriodicalId":6638,"journal":{"name":"2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"3116-3120"},"PeriodicalIF":0.0,"publicationDate":"2018-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80553371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}