笑声合成:变分自编码器与自编码器的比较

2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) Pub Date : 2020-09-01 DOI:10.1109/ATSIP49331.2020.9231607

Nadia Mansouri, Z. Lachiri

{"title":"笑声合成:变分自编码器与自编码器的比较","authors":"Nadia Mansouri, Z. Lachiri","doi":"10.1109/ATSIP49331.2020.9231607","DOIUrl":null,"url":null,"abstract":"Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.","PeriodicalId":384018,"journal":{"name":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Laughter synthesis: A comparison between Variational autoencoder and Autoencoder\",\"authors\":\"Nadia Mansouri, Z. Lachiri\",\"doi\":\"10.1109/ATSIP49331.2020.9231607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.\",\"PeriodicalId\":384018,\"journal\":{\"name\":\"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ATSIP49331.2020.9231607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATSIP49331.2020.9231607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

笑声是人类自出生以来发出的最著名的非语言声音之一，它传达了我们情绪状态的信息。这些特点使它成为一个重要的声音，应该研究，以提高人机交互。本文从声笑声的声学特征出发，研究了声笑声的产生过程。该过程被认为是基于无监督降维技术的分析-转换综合基准:标准自编码器(AE)和变分自编码器(VAE)。因此，笑声合成方法包括将提取的高维对数幅度谱图转换为低维潜在向量。这个潜在向量包含最有价值的信息，用于重建一个合成的幅度谱图，该谱图将通过一个特定的声码器来生成笑声波形。我们系统地利用VAE在插值过程的基础上创造新的声音(言语-笑声)。为了评价这些模型的性能，采用了客观评价和主观评价两种评价指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Laughter synthesis: A comparison between Variational autoencoder and Autoencoder

Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)

自引率

0.00%

发文量

期刊最新文献

Automatic Recognition of Epileptiform EEG Abnormalities Using Machine Learning Approaches Generation of fuzzy evidence numbers for the evaluation of uncertainty measures Speckle Denoising of the Multipolarization Images by Hybrid Filters Identification of the user by using a hardware device Lightweight Hardware Architectures for the Piccolo Block Cipher in FPGA