平衡偏差和性能在复调钢琴转录系统

IF 1.3 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC Frontiers in signal processing Pub Date : 2022-10-03 DOI:10.3389/frsip.2022.975932
L. Marták, Rainer Kelz , Gerhard Widmer 
{"title":"平衡偏差和性能在复调钢琴转录系统","authors":"L. Marták, Rainer Kelz , Gerhard Widmer ","doi":"10.3389/frsip.2022.975932","DOIUrl":null,"url":null,"abstract":"Current state-of-the-art methods for polyphonic piano transcription tend to use high capacity neural networks. Most models are trained “end-to-end”, and learn a mapping from audio input to pitch labels. They require large training corpora consisting of many audio recordings of different piano models and temporally aligned pitch labels. It has been shown in previous work that neural network-based systems struggle to generalize to unseen note combinations, as they tend to learn note combinations by heart. Semi-supervised linear matrix decomposition is a frequently used alternative approach to piano transcription–one that does not have this particular drawback. The disadvantages of linear methods start to show when they encounter recordings of pieces played on unseen pianos, a scenario where neural networks seem relatively untroubled. A recently proposed approach called “Differentiable Dictionary Search” (DDS) combines the modeling capacity of deep density models with the linear mixing model of matrix decomposition in order to balance the mutual advantages and disadvantages of the standalone approaches, making it better suited to model unseen sources, while generalization to unseen note combinations should be unaffected, because the mixing model is not learned, and thus cannot acquire a corpus bias. In its initially proposed form, however, DDS is too inefficient in utilizing computational resources to be applied to piano music transcription. To reduce computational demands and memory requirements, we propose a number of modifications. These adjustments finally enable a fair comparison of our modified DDS variant with a semi-supervised matrix decomposition baseline, as well as a state-of-the-art, deep neural network based system that is trained end-to-end. In systematic experiments with both musical and “unmusical” piano recordings (real musical pieces and unusual chords), we provide quantitative and qualitative analyses at the frame level, characterizing the behavior of the modified approach, along with a comparison to several related methods. The results will generally show the fundamental promise of the model, and in particular demonstrate improvement in situations where a corpus bias incurred by learning from musical material of a specific genre would be problematic.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"56 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Balancing bias and performance in polyphonic piano transcription systems\",\"authors\":\"L. Marták, Rainer Kelz , Gerhard Widmer \",\"doi\":\"10.3389/frsip.2022.975932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current state-of-the-art methods for polyphonic piano transcription tend to use high capacity neural networks. Most models are trained “end-to-end”, and learn a mapping from audio input to pitch labels. They require large training corpora consisting of many audio recordings of different piano models and temporally aligned pitch labels. It has been shown in previous work that neural network-based systems struggle to generalize to unseen note combinations, as they tend to learn note combinations by heart. Semi-supervised linear matrix decomposition is a frequently used alternative approach to piano transcription–one that does not have this particular drawback. The disadvantages of linear methods start to show when they encounter recordings of pieces played on unseen pianos, a scenario where neural networks seem relatively untroubled. A recently proposed approach called “Differentiable Dictionary Search” (DDS) combines the modeling capacity of deep density models with the linear mixing model of matrix decomposition in order to balance the mutual advantages and disadvantages of the standalone approaches, making it better suited to model unseen sources, while generalization to unseen note combinations should be unaffected, because the mixing model is not learned, and thus cannot acquire a corpus bias. In its initially proposed form, however, DDS is too inefficient in utilizing computational resources to be applied to piano music transcription. To reduce computational demands and memory requirements, we propose a number of modifications. These adjustments finally enable a fair comparison of our modified DDS variant with a semi-supervised matrix decomposition baseline, as well as a state-of-the-art, deep neural network based system that is trained end-to-end. In systematic experiments with both musical and “unmusical” piano recordings (real musical pieces and unusual chords), we provide quantitative and qualitative analyses at the frame level, characterizing the behavior of the modified approach, along with a comparison to several related methods. The results will generally show the fundamental promise of the model, and in particular demonstrate improvement in situations where a corpus bias incurred by learning from musical material of a specific genre would be problematic.\",\"PeriodicalId\":93557,\"journal\":{\"name\":\"Frontiers in signal processing\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in signal processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frsip.2022.975932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in signal processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frsip.2022.975932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 1

摘要

目前最先进的复调钢琴转录方法倾向于使用高容量神经网络。大多数模型都是“端到端”训练的,并学习从音频输入到音高标签的映射。它们需要大量的训练语料库,包括许多不同钢琴型号的录音和暂时对齐的音高标签。之前的研究已经表明,基于神经网络的系统很难泛化到看不见的音符组合,因为它们倾向于记住音符组合。半监督线性矩阵分解是一种常用的钢琴转录替代方法,它没有这个特殊的缺点。线性方法的缺点在遇到未见过的钢琴演奏曲目的录音时开始显现出来,在这种情况下,神经网络似乎相对不受影响。最近提出的一种称为“可微分字典搜索”(DDS)的方法将深度密度模型的建模能力与矩阵分解的线性混合模型相结合,以平衡独立方法的相互优缺点,使其更适合于建模看不见的源,而泛化到看不见的音符组合应该不受影响,因为混合模型没有学习,因此无法获得语料库偏差。然而,在其最初提出的形式中,DDS在利用计算资源方面效率太低,无法应用于钢琴音乐转录。为了减少计算需求和内存需求,我们提出了一些修改。这些调整最终能够将我们改进的DDS变体与半监督矩阵分解基线以及最先进的基于端到端训练的深度神经网络系统进行公平比较。在音乐和“非音乐”钢琴录音(真实的音乐作品和不寻常的和弦)的系统实验中,我们在框架层面上提供了定量和定性分析,描述了改进方法的行为特征,并与几种相关方法进行了比较。结果通常会显示该模型的基本承诺,特别是在学习特定类型的音乐材料导致语料库偏差的情况下,该模型会得到改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Balancing bias and performance in polyphonic piano transcription systems
Current state-of-the-art methods for polyphonic piano transcription tend to use high capacity neural networks. Most models are trained “end-to-end”, and learn a mapping from audio input to pitch labels. They require large training corpora consisting of many audio recordings of different piano models and temporally aligned pitch labels. It has been shown in previous work that neural network-based systems struggle to generalize to unseen note combinations, as they tend to learn note combinations by heart. Semi-supervised linear matrix decomposition is a frequently used alternative approach to piano transcription–one that does not have this particular drawback. The disadvantages of linear methods start to show when they encounter recordings of pieces played on unseen pianos, a scenario where neural networks seem relatively untroubled. A recently proposed approach called “Differentiable Dictionary Search” (DDS) combines the modeling capacity of deep density models with the linear mixing model of matrix decomposition in order to balance the mutual advantages and disadvantages of the standalone approaches, making it better suited to model unseen sources, while generalization to unseen note combinations should be unaffected, because the mixing model is not learned, and thus cannot acquire a corpus bias. In its initially proposed form, however, DDS is too inefficient in utilizing computational resources to be applied to piano music transcription. To reduce computational demands and memory requirements, we propose a number of modifications. These adjustments finally enable a fair comparison of our modified DDS variant with a semi-supervised matrix decomposition baseline, as well as a state-of-the-art, deep neural network based system that is trained end-to-end. In systematic experiments with both musical and “unmusical” piano recordings (real musical pieces and unusual chords), we provide quantitative and qualitative analyses at the frame level, characterizing the behavior of the modified approach, along with a comparison to several related methods. The results will generally show the fundamental promise of the model, and in particular demonstrate improvement in situations where a corpus bias incurred by learning from musical material of a specific genre would be problematic.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A mini-review of signal processing techniques for RIS-assisted near field THz communication Editorial: Signal processing in computational video and video streaming Editorial: Editor’s challenge—image processing Improved circuitry and post-processing for interleaved fast-scan cyclic voltammetry and electrophysiology measurements Bounds for Haralick features in synthetic images with sinusoidal gradients
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1