{"title":"使用1D卷积网络的音频耳蜗图分析和合成库","authors":"Elias Nemer","doi":"10.1109/CAIDA51941.2021.9425342","DOIUrl":null,"url":null,"abstract":"Time-Frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training a network on features such as the Mel-Spectrogram or Cochleogram has been proven more effective than training on time samples. In practical realizations, these are generated on a separate processor or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with different variants. In this paper, we provide a PyTorch framework for generating the Cochleogram as well as the time-domain complex filter-banks for analysis and re-synthesis using the built-in trainable conv1d() layer. This allows computing this spectral feature on the fly as part of a larger network and enables experimenting with varying parameters. The analysis / synthesis banks enable building a trainable network that operates on complex subbands, where resynthesizing the time samples is desirable. The convolutional kernels may be trained from random values, or may be initialized and frozen or initialized and continuously trained with the rest of any network they are part of.","PeriodicalId":272573,"journal":{"name":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Audio Cochleogram with Analysis and Synthesis Banks Using 1D Convolutional Networks\",\"authors\":\"Elias Nemer\",\"doi\":\"10.1109/CAIDA51941.2021.9425342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Time-Frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training a network on features such as the Mel-Spectrogram or Cochleogram has been proven more effective than training on time samples. In practical realizations, these are generated on a separate processor or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with different variants. In this paper, we provide a PyTorch framework for generating the Cochleogram as well as the time-domain complex filter-banks for analysis and re-synthesis using the built-in trainable conv1d() layer. This allows computing this spectral feature on the fly as part of a larger network and enables experimenting with varying parameters. The analysis / synthesis banks enable building a trainable network that operates on complex subbands, where resynthesizing the time samples is desirable. The convolutional kernels may be trained from random values, or may be initialized and frozen or initialized and continuously trained with the rest of any network they are part of.\",\"PeriodicalId\":272573,\"journal\":{\"name\":\"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)\",\"volume\":\"166 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAIDA51941.2021.9425342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAIDA51941.2021.9425342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Audio Cochleogram with Analysis and Synthesis Banks Using 1D Convolutional Networks
Time-Frequency transformation and spectral representations of audio signals are commonly used in various machine learning applications. Training a network on features such as the Mel-Spectrogram or Cochleogram has been proven more effective than training on time samples. In practical realizations, these are generated on a separate processor or pre-computed and stored on disk, requiring additional efforts and making it difficult to experiment with different variants. In this paper, we provide a PyTorch framework for generating the Cochleogram as well as the time-domain complex filter-banks for analysis and re-synthesis using the built-in trainable conv1d() layer. This allows computing this spectral feature on the fly as part of a larger network and enables experimenting with varying parameters. The analysis / synthesis banks enable building a trainable network that operates on complex subbands, where resynthesizing the time samples is desirable. The convolutional kernels may be trained from random values, or may be initialized and frozen or initialized and continuously trained with the rest of any network they are part of.