Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7471639
Satoru Fukayama, Masataka Goto
This paper describes a novel method for estimating the emotions elicited by a piece of music from its acoustic signals. Previous research in this field has centered on finding effective acoustic features and regression methods to relate features to emotions. The state-of-the-art method is based on a multi-stage regression, which aggregates the results from different regressors trained with training data. However, after training, the aggregation happens in a fixed way and cannot be adapted to acoustic signals with different musical properties. We propose a method that adapts the aggregation by taking into account new acoustic signal inputs. Since we cannot know the emotions elicited by new inputs beforehand, we need a way of adapting the aggregation weights. We do so by exploiting the deviation observed in the training data using Gaussian process regressions. We confirmed with an experiment comparing different aggregation approaches that our adaptive aggregation is effective in improving recognition accuracy.
{"title":"Music emotion recognition with adaptive aggregation of Gaussian process regressors","authors":"Satoru Fukayama, Masataka Goto","doi":"10.1109/ICASSP.2016.7471639","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471639","url":null,"abstract":"This paper describes a novel method for estimating the emotions elicited by a piece of music from its acoustic signals. Previous research in this field has centered on finding effective acoustic features and regression methods to relate features to emotions. The state-of-the-art method is based on a multi-stage regression, which aggregates the results from different regressors trained with training data. However, after training, the aggregation happens in a fixed way and cannot be adapted to acoustic signals with different musical properties. We propose a method that adapts the aggregation by taking into account new acoustic signal inputs. Since we cannot know the emotions elicited by new inputs beforehand, we need a way of adapting the aggregation weights. We do so by exploiting the deviation observed in the training data using Gaussian process regressions. We confirmed with an experiment comparing different aggregation approaches that our adaptive aggregation is effective in improving recognition accuracy.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122725667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472298
Pierre-Antoine Thouvenin, N. Dobigeon, J. Tourneret
Hyperspectral unmixing consists in determining the reference spectral signatures composing a hyperspectral image and their relative abundance fractions in each pixel. In practice, the identified signatures may be affected by a significant spectral variability resulting for instance from the temporal evolution of the imaged scene. This phenomenon can be accounted for by using a perturbed linear mixing model. This paper studies an online estimation algorithm for the parameters of this extended linear mixing model. This algorithm is of interest for the practical applications where the size of the hyper-spectral images precludes the use of batch procedures. The performance of the proposed method is evaluated on synthetic data.
{"title":"Unmixing multitemporal hyperspectral images with variability: An online algorithm","authors":"Pierre-Antoine Thouvenin, N. Dobigeon, J. Tourneret","doi":"10.1109/ICASSP.2016.7472298","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472298","url":null,"abstract":"Hyperspectral unmixing consists in determining the reference spectral signatures composing a hyperspectral image and their relative abundance fractions in each pixel. In practice, the identified signatures may be affected by a significant spectral variability resulting for instance from the temporal evolution of the imaged scene. This phenomenon can be accounted for by using a perturbed linear mixing model. This paper studies an online estimation algorithm for the parameters of this extended linear mixing model. This algorithm is of interest for the practical applications where the size of the hyper-spectral images precludes the use of batch procedures. The performance of the proposed method is evaluated on synthetic data.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"752 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122976708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472168
I. Rodomagoulakis, N. Kardaris, Vassilis Pitsikalis, E. Mavroudi, Athanasios Katsamanis, A. Tsiami, P. Maragos
Within the context of assistive robotics we develop an intelligent interface that provides multimodal sensory processing capabilities for human action recognition. Human action is considered in multimodal terms, containing inputs such as audio from microphone arrays, and visual inputs from high definition and depth cameras. Exploring state-of-the-art approaches from automatic speech recognition, and visual action recognition, we multimodally recognize actions and commands. By fusing the unimodal information streams, we obtain the optimum multimodal hypothesis which is to be further exploited by the active mobility assistance robot in the framework of the MOBOT EU research project. Evidence from recognition experiments shows that by integrating multiple sensors and modalities, we increase multimodal recognition performance in the newly acquired challenging dataset involving elderly people while interacting with the assistive robot.
{"title":"Multimodal human action recognition in assistive human-robot interaction","authors":"I. Rodomagoulakis, N. Kardaris, Vassilis Pitsikalis, E. Mavroudi, Athanasios Katsamanis, A. Tsiami, P. Maragos","doi":"10.1109/ICASSP.2016.7472168","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472168","url":null,"abstract":"Within the context of assistive robotics we develop an intelligent interface that provides multimodal sensory processing capabilities for human action recognition. Human action is considered in multimodal terms, containing inputs such as audio from microphone arrays, and visual inputs from high definition and depth cameras. Exploring state-of-the-art approaches from automatic speech recognition, and visual action recognition, we multimodally recognize actions and commands. By fusing the unimodal information streams, we obtain the optimum multimodal hypothesis which is to be further exploited by the active mobility assistance robot in the framework of the MOBOT EU research project. Evidence from recognition experiments shows that by integrating multiple sensors and modalities, we increase multimodal recognition performance in the newly acquired challenging dataset involving elderly people while interacting with the assistive robot.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123022837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472580
Hsiao-Han Chao, L. Vandenberghe
We present an extension of recent semidefinite programming formulations for atomic decomposition over continuous dictionaries, with applications to continuous or `gridless' compressed sensing. The dictionary considered in this paper is defined in terms of a general matrix pencil and is parameterized by a complex variable that varies over a segment of a line or circle in the complex plane. The main result of the paper is the formulation as a convex semidefinite optimization problem, and a simple constructive proof of the equivalence. The techniques are illustrated with a direction of arrival estimation problem, and an example of low-rank structured matrix decomposition.
{"title":"Extensions of semidefinite programming methods for atomic decomposition","authors":"Hsiao-Han Chao, L. Vandenberghe","doi":"10.1109/ICASSP.2016.7472580","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472580","url":null,"abstract":"We present an extension of recent semidefinite programming formulations for atomic decomposition over continuous dictionaries, with applications to continuous or `gridless' compressed sensing. The dictionary considered in this paper is defined in terms of a general matrix pencil and is parameterized by a complex variable that varies over a segment of a line or circle in the complex plane. The main result of the paper is the formulation as a convex semidefinite optimization problem, and a simple constructive proof of the equivalence. The techniques are illustrated with a direction of arrival estimation problem, and an example of low-rank structured matrix decomposition.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121898318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7471749
M. Cartwright, Bryan Pardo, G. Mysore, M. Hoffman
Automated objective methods of audio evaluation are fast, cheap, and require little effort by the investigator. However, objective evaluation methods do not exist for the output of all audio processing algorithms, often have output that correlates poorly with human quality assessments, and require ground truth data in their calculation. Subjective human ratings of audio quality are the gold standard for many tasks, but are expensive, slow, and require a great deal of effort to recruit subjects and run listening tests. Moving listening tests from the lab to the micro-task labor market of Amazon Mechanical Turk speeds data collection and reduces investigator effort. However, it also reduces the amount of control investigators have over the testing environment, adding new variability and potential biases to the data. In this work, we compare multiple stimulus listening tests performed in a lab environment to multiple stimulus listening tests performed in web environment on a population drawn from Mechanical Turk.
{"title":"Fast and easy crowdsourced perceptual audio evaluation","authors":"M. Cartwright, Bryan Pardo, G. Mysore, M. Hoffman","doi":"10.1109/ICASSP.2016.7471749","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7471749","url":null,"abstract":"Automated objective methods of audio evaluation are fast, cheap, and require little effort by the investigator. However, objective evaluation methods do not exist for the output of all audio processing algorithms, often have output that correlates poorly with human quality assessments, and require ground truth data in their calculation. Subjective human ratings of audio quality are the gold standard for many tasks, but are expensive, slow, and require a great deal of effort to recruit subjects and run listening tests. Moving listening tests from the lab to the micro-task labor market of Amazon Mechanical Turk speeds data collection and reduces investigator effort. However, it also reduces the amount of control investigators have over the testing environment, adding new variability and potential biases to the data. In this work, we compare multiple stimulus listening tests performed in a lab environment to multiple stimulus listening tests performed in web environment on a population drawn from Mechanical Turk.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122074472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/TVT.2017.2676244
Weidong Mei, Zhi Chen, Lingxiang Li, Jun Fang, Shaoqian Li
This paper considers an optimal artificial noise (AN)-aided transmit design for multi-user MISO systems in the eyes of service integration. Specifically, two sorts of services are combined and served simultaneously: one multicast message intended for all receivers and one confidential message intended for only one receiver. The confidential message is kept perfectly secure from all the unauthorized receivers. This paper considers a general case of imperfect channel state information (CSI), aiming at a joint and robust design of the input covariances for the multicast message, confidential message and AN, such that the worst-case secrecy rate region is maximized subject to the sum power constraint. To this end, we reveal its hidden convexity and transform the original worst-case robust secrecy rate maximization (SRM) problem into a sequence of semidefinite programming. Numerical results are presented to show the efficacy of our proposed method.
{"title":"Robust artificial-noise aided transmit design for multi-user MISO systems with integrated services","authors":"Weidong Mei, Zhi Chen, Lingxiang Li, Jun Fang, Shaoqian Li","doi":"10.1109/TVT.2017.2676244","DOIUrl":"https://doi.org/10.1109/TVT.2017.2676244","url":null,"abstract":"This paper considers an optimal artificial noise (AN)-aided transmit design for multi-user MISO systems in the eyes of service integration. Specifically, two sorts of services are combined and served simultaneously: one multicast message intended for all receivers and one confidential message intended for only one receiver. The confidential message is kept perfectly secure from all the unauthorized receivers. This paper considers a general case of imperfect channel state information (CSI), aiming at a joint and robust design of the input covariances for the multicast message, confidential message and AN, such that the worst-case secrecy rate region is maximized subject to the sum power constraint. To this end, we reveal its hidden convexity and transform the original worst-case robust secrecy rate maximization (SRM) problem into a sequence of semidefinite programming. Numerical results are presented to show the efficacy of our proposed method.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122092262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472240
Yunmei Shi, X. Mao, Mingyang Cao, Yongtan Liu
In this paper, a noncircular deterministic maximum likelihood (NC-DML) estimator for direction-of-arrival estimation of strictly NC signals is devised. Unlike the conventional DML solution for arbitrary signals, the NC-DML exploits the NC properties of the sources by reconstructing the parameter set, significantly decreasing the number of parameters to be considered. For computing the NC-DML, we present a novel NC alternating projection (NC-AP) approach. The NC-AP solution is carried out based on an augmented virtual array structure. Moreover, it also takes the impact of the initial phase shift of the NC signals into account. Simulation results are included to illustrate the superiority of the proposed method.
{"title":"Deterministic maximum likelihood method for direction-of-arrival estimation of strictly noncircular signals","authors":"Yunmei Shi, X. Mao, Mingyang Cao, Yongtan Liu","doi":"10.1109/ICASSP.2016.7472240","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472240","url":null,"abstract":"In this paper, a noncircular deterministic maximum likelihood (NC-DML) estimator for direction-of-arrival estimation of strictly NC signals is devised. Unlike the conventional DML solution for arbitrary signals, the NC-DML exploits the NC properties of the sources by reconstructing the parameter set, significantly decreasing the number of parameters to be considered. For computing the NC-DML, we present a novel NC alternating projection (NC-AP) approach. The NC-AP solution is carried out based on an augmented virtual array structure. Moreover, it also takes the impact of the initial phase shift of the NC signals into account. Simulation results are included to illustrate the superiority of the proposed method.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116828947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472785
D. Bertero, Pascale Fung
We propose a method to predict humor response in dialog using acoustic and language features. We use data from two popular TV sitcoms - "The Big Bang Theory" and "Seinfeld" - to predict how the audience responds to humor. Due to the sequentiality of humor response in dialogues we use a Conditional Random Field as classifier/predictor. Our method is relatively effective, with a maximum precision obtained of 72.1% in "Big Bang" and 60.2% in "Seinfeld". Experiments show that audio, speed, word and sentence length features are the most effective. This work is applicable to develop appropriate machine response empathetic to emotion in dialog, in addition to humor.
{"title":"Predicting humor response in dialogues from TV sitcoms","authors":"D. Bertero, Pascale Fung","doi":"10.1109/ICASSP.2016.7472785","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472785","url":null,"abstract":"We propose a method to predict humor response in dialog using acoustic and language features. We use data from two popular TV sitcoms - \"The Big Bang Theory\" and \"Seinfeld\" - to predict how the audience responds to humor. Due to the sequentiality of humor response in dialogues we use a Conditional Random Field as classifier/predictor. Our method is relatively effective, with a maximum precision obtained of 72.1% in \"Big Bang\" and 60.2% in \"Seinfeld\". Experiments show that audio, speed, word and sentence length features are the most effective. This work is applicable to develop appropriate machine response empathetic to emotion in dialog, in addition to humor.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1962 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129765896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472596
V. Solo, M. J. Piggott
There is an achilles heel underlying the consensus literature. It has been known for some time that measurement noise causes the explosive growth of the consensus mode. Here we state the noise problem; characterise the behaviour of the noisy consensus system including its drift to ∞ critique the existing remedies and develop a new remedy.
{"title":"What to do about noisy consensus?","authors":"V. Solo, M. J. Piggott","doi":"10.1109/ICASSP.2016.7472596","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472596","url":null,"abstract":"There is an achilles heel underlying the consensus literature. It has been known for some time that measurement noise causes the explosive growth of the consensus mode. Here we state the noise problem; characterise the behaviour of the noisy consensus system including its drift to ∞ critique the existing remedies and develop a new remedy.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129939262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2016-03-20DOI: 10.1109/ICASSP.2016.7472372
Weile Zhang, F. Gao, Bobin Yao
In this paper, we propose a new blind carrier frequency offset (CFO) estimation method for multiuser orthogonal frequency division multiplexing (OFDM) uplink transmissions. The spatial multiplexing is supported in the considered model that allows the subcarriers to be simultaneously occupied by multiple users. We propose to assign different null subcarriers to different users and design algorithm that can perform blind CFO estimation for each individual user with the aid of large number of receive antennas, which then removes the necessity of multidimensional searching. Numerical results are provided to corroborate the proposed studies.
{"title":"Blind CFO estimation for multiuser OFDM uplink with large number of receive antennas","authors":"Weile Zhang, F. Gao, Bobin Yao","doi":"10.1109/ICASSP.2016.7472372","DOIUrl":"https://doi.org/10.1109/ICASSP.2016.7472372","url":null,"abstract":"In this paper, we propose a new blind carrier frequency offset (CFO) estimation method for multiuser orthogonal frequency division multiplexing (OFDM) uplink transmissions. The spatial multiplexing is supported in the considered model that allows the subcarriers to be simultaneously occupied by multiple users. We propose to assign different null subcarriers to different users and design algorithm that can perform blind CFO estimation for each individual user with the aid of large number of receive antennas, which then removes the necessity of multidimensional searching. Numerical results are provided to corroborate the proposed studies.","PeriodicalId":165321,"journal":{"name":"2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128306830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}