Shichao Hu, Bin Zhang, Jinhong Lu, Yiliang Jiang, Wucheng Wang, Lingchen Kong, Weifeng Zhao, Tao Jiang
{"title":"WideResNet with Joint Representation Learning and Data Augmentation for Cover Song Identification","authors":"Shichao Hu, Bin Zhang, Jinhong Lu, Yiliang Jiang, Wucheng Wang, Lingchen Kong, Weifeng Zhao, Tao Jiang","doi":"10.21437/interspeech.2022-10600","DOIUrl":null,"url":null,"abstract":"Cover song identification (CSI) has been a challenging task and an import topic in music information retrieval (MIR) commu-nity. In recent years, CSI problems have been extensively stud-ied based on deep learning methods. In this paper, we propose a novel framework for CSI based on a joint representation learning method inspired by multi-task learning. In specific, we propose a joint learning strategy which combines classification and metric learning for optimizing the cover song model based on WideResNet, called LyraC-Net. Classification objective learns separable embeddings from different classes, while metric learning optimizes embedding similarity by decreasing the inter-class distance and increasing the intra-classs separabil-ity. This joint optimization strategy is expected to learn a more robust cover song representation than methods with single training objectives. For the metric learning, prototypical network is introduced to stabilize and accelerate the training process, to-gether with triplet loss. Furthermore, we introduce SpecAugment, a popular augmentation method in speech recognition, to further improve the performance. Experiment results show that our proposed method achieves promising results and outperforms other recent CSI methods in the evaluations.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"4187-4191"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-10600","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Cover song identification (CSI) has been a challenging task and an import topic in music information retrieval (MIR) commu-nity. In recent years, CSI problems have been extensively stud-ied based on deep learning methods. In this paper, we propose a novel framework for CSI based on a joint representation learning method inspired by multi-task learning. In specific, we propose a joint learning strategy which combines classification and metric learning for optimizing the cover song model based on WideResNet, called LyraC-Net. Classification objective learns separable embeddings from different classes, while metric learning optimizes embedding similarity by decreasing the inter-class distance and increasing the intra-classs separabil-ity. This joint optimization strategy is expected to learn a more robust cover song representation than methods with single training objectives. For the metric learning, prototypical network is introduced to stabilize and accelerate the training process, to-gether with triplet loss. Furthermore, we introduce SpecAugment, a popular augmentation method in speech recognition, to further improve the performance. Experiment results show that our proposed method achieves promising results and outperforms other recent CSI methods in the evaluations.