{"title":"Laughter synthesis: A comparison between Variational autoencoder and Autoencoder","authors":"Nadia Mansouri, Z. Lachiri","doi":"10.1109/ATSIP49331.2020.9231607","DOIUrl":null,"url":null,"abstract":"Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.","PeriodicalId":384018,"journal":{"name":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATSIP49331.2020.9231607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.