P. Steiner, Simon Stone, P. Birkholz, A. Jalalvand
{"title":"Multipitch tracking in music signals using Echo State Networks","authors":"P. Steiner, Simon Stone, P. Birkholz, A. Jalalvand","doi":"10.23919/Eusipco47968.2020.9287638","DOIUrl":null,"url":null,"abstract":"Currently, convolutional neural networks (CNNs) define the state of the art for multipitch tracking in music signals. Echo State Networks (ESNs), a recently introduced recurrent neural network architecture, achieved similar results as CNNs for various tasks, such as phoneme or digit recognition. However, they have not yet received much attention in the community of Music Information Retrieval. The core of ESNs is a group of unordered, randomly connected neurons, i.e., the reservoir, by which the low-dimensional input space is non-linearly transformed into a high-dimensional feature space. Because only the weights of the connections between the reservoir and the output are trained using linear regression, ESNs are easier to train than deep neural networks. This paper presents a first exploration of ESNs for the challenging task of multipitch tracking in music signals. The best results presented in this paper were achieved with a bidirectional two-layer ESN with 20 000 neurons in each layer. Although the final F-score of 0.7198 still falls below the state of the art (0.7370), the proposed ESN-based approach serves as a baseline for further investigations of ESNs in audio signal processing in the future.","PeriodicalId":6705,"journal":{"name":"2020 28th European Signal Processing Conference (EUSIPCO)","volume":"2012 1","pages":"126-130"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/Eusipco47968.2020.9287638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Currently, convolutional neural networks (CNNs) define the state of the art for multipitch tracking in music signals. Echo State Networks (ESNs), a recently introduced recurrent neural network architecture, achieved similar results as CNNs for various tasks, such as phoneme or digit recognition. However, they have not yet received much attention in the community of Music Information Retrieval. The core of ESNs is a group of unordered, randomly connected neurons, i.e., the reservoir, by which the low-dimensional input space is non-linearly transformed into a high-dimensional feature space. Because only the weights of the connections between the reservoir and the output are trained using linear regression, ESNs are easier to train than deep neural networks. This paper presents a first exploration of ESNs for the challenging task of multipitch tracking in music signals. The best results presented in this paper were achieved with a bidirectional two-layer ESN with 20 000 neurons in each layer. Although the final F-score of 0.7198 still falls below the state of the art (0.7370), the proposed ESN-based approach serves as a baseline for further investigations of ESNs in audio signal processing in the future.
目前,卷积神经网络(cnn)定义了音乐信号中多音高跟踪的最新技术。回声状态网络(Echo State Networks, ESNs)是最近引入的一种循环神经网络架构,在各种任务(如音素或数字识别)上取得了与cnn相似的结果。然而,它们在音乐信息检索界还没有得到足够的重视。ESNs的核心是一组无序、随机连接的神经元,即存储库,通过它将低维输入空间非线性转换为高维特征空间。因为只有储层和输出之间的连接权值是用线性回归训练的,所以esn比深度神经网络更容易训练。本文首次探索了ESNs用于音乐信号中多音高跟踪的挑战性任务。本文给出的最佳结果是双向双层回声状态网络,每层有20,000个神经元。虽然最终的f值0.7198仍然低于目前的水平(0.7370),但所提出的基于esn的方法可以作为未来音频信号处理中进一步研究esn的基线。