{"title":"Comparison of self-supervised in-domain and supervised out-domain transfer learning for bird species recognition","authors":"Houtan Ghaffari, Paul Devos","doi":"arxiv-2404.17252","DOIUrl":null,"url":null,"abstract":"Transferring the weights of a pre-trained model to assist another task has\nbecome a crucial part of modern deep learning, particularly in data-scarce\nscenarios. Pre-training refers to the initial step of training models outside\nthe current task of interest, typically on another dataset. It can be done via\nsupervised models using human-annotated datasets or self-supervised models\ntrained on unlabeled datasets. In both cases, many pre-trained models are\navailable to fine-tune for the task of interest. Interestingly, research has\nshown that pre-trained models from ImageNet can be helpful for audio tasks\ndespite being trained on image datasets. Hence, it's unclear whether in-domain\nmodels would be advantageous compared to competent out-domain models, such as\nconvolutional neural networks from ImageNet. Our experiments will demonstrate\nthe usefulness of in-domain models and datasets for bird species recognition by\nleveraging VICReg, a recent and powerful self-supervised method.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.17252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Transferring the weights of a pre-trained model to assist another task has
become a crucial part of modern deep learning, particularly in data-scarce
scenarios. Pre-training refers to the initial step of training models outside
the current task of interest, typically on another dataset. It can be done via
supervised models using human-annotated datasets or self-supervised models
trained on unlabeled datasets. In both cases, many pre-trained models are
available to fine-tune for the task of interest. Interestingly, research has
shown that pre-trained models from ImageNet can be helpful for audio tasks
despite being trained on image datasets. Hence, it's unclear whether in-domain
models would be advantageous compared to competent out-domain models, such as
convolutional neural networks from ImageNet. Our experiments will demonstrate
the usefulness of in-domain models and datasets for bird species recognition by
leveraging VICReg, a recent and powerful self-supervised method.