{"title":"使用音色细化自动编码器对合成器参数进行潜空间插值","authors":"Gwendal Le Vaillant;Thierry Dutoit","doi":"10.1109/TASLP.2024.3426987","DOIUrl":null,"url":null,"abstract":"Sound synthesizers are ubiquitous in modern music production but manipulating their presets, i.e. the sets of synthesis parameters, demands expert skills. This study presents a novel variational auto-encoder model tailored for black-box synthesizer preset interpolation, which enables the intuitive generation of new presets from pre-existing ones. Leveraging multi-head self-attention networks, the model efficiently learns latent representations of synthesis parameters, aligning these with perceived timbre dimensions through attribute-based regularization. It is able to gradually transition between diverse presets, surpassing traditional linear parametric interpolation methods. Furthermore, we introduce an objective and reproducible evaluation method, based on linearity and smoothness metrics computed on a broad set of audio features. The model's efficacy is demonstrated through subjective experiments, whose results also highlight significant correlations with the proposed objective metrics. The model is validated using a widespread frequency modulation synthesizer with a large set of interdependent parameters. It can be adapted to various commercial synthesizers, and can perform other tasks such as modulations and extrapolations.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3379-3392"},"PeriodicalIF":4.1000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Latent Space Interpolation of Synthesizer Parameters Using Timbre-Regularized Auto-Encoders\",\"authors\":\"Gwendal Le Vaillant;Thierry Dutoit\",\"doi\":\"10.1109/TASLP.2024.3426987\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sound synthesizers are ubiquitous in modern music production but manipulating their presets, i.e. the sets of synthesis parameters, demands expert skills. This study presents a novel variational auto-encoder model tailored for black-box synthesizer preset interpolation, which enables the intuitive generation of new presets from pre-existing ones. Leveraging multi-head self-attention networks, the model efficiently learns latent representations of synthesis parameters, aligning these with perceived timbre dimensions through attribute-based regularization. It is able to gradually transition between diverse presets, surpassing traditional linear parametric interpolation methods. Furthermore, we introduce an objective and reproducible evaluation method, based on linearity and smoothness metrics computed on a broad set of audio features. The model's efficacy is demonstrated through subjective experiments, whose results also highlight significant correlations with the proposed objective metrics. The model is validated using a widespread frequency modulation synthesizer with a large set of interdependent parameters. It can be adapted to various commercial synthesizers, and can perform other tasks such as modulations and extrapolations.\",\"PeriodicalId\":13332,\"journal\":{\"name\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"volume\":\"32 \",\"pages\":\"3379-3392\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10596701/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10596701/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
Latent Space Interpolation of Synthesizer Parameters Using Timbre-Regularized Auto-Encoders
Sound synthesizers are ubiquitous in modern music production but manipulating their presets, i.e. the sets of synthesis parameters, demands expert skills. This study presents a novel variational auto-encoder model tailored for black-box synthesizer preset interpolation, which enables the intuitive generation of new presets from pre-existing ones. Leveraging multi-head self-attention networks, the model efficiently learns latent representations of synthesis parameters, aligning these with perceived timbre dimensions through attribute-based regularization. It is able to gradually transition between diverse presets, surpassing traditional linear parametric interpolation methods. Furthermore, we introduce an objective and reproducible evaluation method, based on linearity and smoothness metrics computed on a broad set of audio features. The model's efficacy is demonstrated through subjective experiments, whose results also highlight significant correlations with the proposed objective metrics. The model is validated using a widespread frequency modulation synthesizer with a large set of interdependent parameters. It can be adapted to various commercial synthesizers, and can perform other tasks such as modulations and extrapolations.
期刊介绍:
The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.