{"title":"Comparative Analysis of Pretrained Audio Representations in Music Recommender Systems","authors":"Yan-Martin Tamm, Anna Aljanaki","doi":"arxiv-2409.08987","DOIUrl":null,"url":null,"abstract":"Over the years, Music Information Retrieval (MIR) has proposed various models\npretrained on large amounts of music data. Transfer learning showcases the\nproven effectiveness of pretrained backend models with a broad spectrum of\ndownstream tasks, including auto-tagging and genre classification. However, MIR\npapers generally do not explore the efficiency of pretrained models for Music\nRecommender Systems (MRS). In addition, the Recommender Systems community tends\nto favour traditional end-to-end neural network learning over these models. Our\nresearch addresses this gap and evaluates the applicability of six pretrained\nbackend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, and MusiCNN) in\nthe context of MRS. We assess their performance using three recommendation\nmodels: K-nearest neighbours (KNN), shallow neural network, and BERT4Rec. Our\nfindings suggest that pretrained audio representations exhibit significant\nperformance variability between traditional MIR tasks and MRS, indicating that\nvaluable aspects of musical information captured by backend models may differ\ndepending on the task. This study establishes a foundation for further\nexploration of pretrained audio representations to enhance music recommendation\nsystems.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Over the years, Music Information Retrieval (MIR) has proposed various models
pretrained on large amounts of music data. Transfer learning showcases the
proven effectiveness of pretrained backend models with a broad spectrum of
downstream tasks, including auto-tagging and genre classification. However, MIR
papers generally do not explore the efficiency of pretrained models for Music
Recommender Systems (MRS). In addition, the Recommender Systems community tends
to favour traditional end-to-end neural network learning over these models. Our
research addresses this gap and evaluates the applicability of six pretrained
backend models (MusicFM, Music2Vec, MERT, EncodecMAE, Jukebox, and MusiCNN) in
the context of MRS. We assess their performance using three recommendation
models: K-nearest neighbours (KNN), shallow neural network, and BERT4Rec. Our
findings suggest that pretrained audio representations exhibit significant
performance variability between traditional MIR tasks and MRS, indicating that
valuable aspects of musical information captured by backend models may differ
depending on the task. This study establishes a foundation for further
exploration of pretrained audio representations to enhance music recommendation
systems.