笑声合成:变分自编码器与自编码器的比较

Nadia Mansouri, Z. Lachiri
{"title":"笑声合成:变分自编码器与自编码器的比较","authors":"Nadia Mansouri, Z. Lachiri","doi":"10.1109/ATSIP49331.2020.9231607","DOIUrl":null,"url":null,"abstract":"Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.","PeriodicalId":384018,"journal":{"name":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Laughter synthesis: A comparison between Variational autoencoder and Autoencoder\",\"authors\":\"Nadia Mansouri, Z. Lachiri\",\"doi\":\"10.1109/ATSIP49331.2020.9231607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.\",\"PeriodicalId\":384018,\"journal\":{\"name\":\"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ATSIP49331.2020.9231607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATSIP49331.2020.9231607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

笑声是人类自出生以来发出的最著名的非语言声音之一,它传达了我们情绪状态的信息。这些特点使它成为一个重要的声音,应该研究,以提高人机交互。本文从声笑声的声学特征出发,研究了声笑声的产生过程。该过程被认为是基于无监督降维技术的分析-转换综合基准:标准自编码器(AE)和变分自编码器(VAE)。因此,笑声合成方法包括将提取的高维对数幅度谱图转换为低维潜在向量。这个潜在向量包含最有价值的信息,用于重建一个合成的幅度谱图,该谱图将通过一个特定的声码器来生成笑声波形。我们系统地利用VAE在插值过程的基础上创造新的声音(言语-笑声)。为了评价这些模型的性能,采用了客观评价和主观评价两种评价指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Laughter synthesis: A comparison between Variational autoencoder and Autoencoder
Laughter is one of the most famous non verbal sounds that human produce since birth, it conveys messages about our emotional state. These characteristics make it an important sound that should be studied in order to improve the human-machine interactions. In this paper we investigate the audio laughter generation process from its acoustic features. This suggested process is considered as an analysis-transformation synthesis benchmark based on unsupervised dimensionality reduction techniques: The standard autoencoder (AE) and the variational autoencoder (VAE). Therefore, the laughter synthesis methodology consists of transforming the extracted high-dimensional log magnitude spectrogram into a low-dimensional latent vector. This latent vector contains the most valuable information used to reconstruct a synthetic magnitude spectrogram that will be passed through a specific vocoder to generate the laughter waveform. We systematically, exploit the VAE to create new sound (speech-laugh) based on the interpolation process. To evaluate the performance of these models two evaluation metrics were conducted: objective and subjective evaluations.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automatic Recognition of Epileptiform EEG Abnormalities Using Machine Learning Approaches Generation of fuzzy evidence numbers for the evaluation of uncertainty measures Speckle Denoising of the Multipolarization Images by Hybrid Filters Identification of the user by using a hardware device Lightweight Hardware Architectures for the Piccolo Block Cipher in FPGA
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1