{"title":"EMVD dataset: a dataset of extreme vocal distortion techniques used in heavy metal","authors":"Modan TailleurIRIT-SAMoVA, Julien PinquierIRIT-SAMoVA, Laurent MillotACTE, Corsin VogelLS2N, Mathieu LagrangeLS2N","doi":"arxiv-2406.17732","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce the Extreme Metal Vocals Dataset, which comprises\na collection of recordings of extreme vocal techniques performed within the\nrealm of heavy metal music. The dataset consists of 760 audio excerpts of 1\nsecond to 30 seconds long, totaling about 100 min of audio material, roughly\ncomposed of 60 minutes of distorted voices and 40 minutes of clear voice\nrecordings. These vocal recordings are from 27 different singers and are\nprovided without accompanying musical instruments or post-processing effects.\nThe distortion taxonomy within this dataset encompasses four distinct\ndistortion techniques and three vocal effects, all performed in different pitch\nranges. Performance of a state-of-the-art deep learning model is evaluated for\ntwo different classification tasks related to vocal techniques, demonstrating\nthe potential of this resource for the audio processing community.","PeriodicalId":501482,"journal":{"name":"arXiv - PHYS - Classical Physics","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Classical Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.17732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we introduce the Extreme Metal Vocals Dataset, which comprises
a collection of recordings of extreme vocal techniques performed within the
realm of heavy metal music. The dataset consists of 760 audio excerpts of 1
second to 30 seconds long, totaling about 100 min of audio material, roughly
composed of 60 minutes of distorted voices and 40 minutes of clear voice
recordings. These vocal recordings are from 27 different singers and are
provided without accompanying musical instruments or post-processing effects.
The distortion taxonomy within this dataset encompasses four distinct
distortion techniques and three vocal effects, all performed in different pitch
ranges. Performance of a state-of-the-art deep learning model is evaluated for
two different classification tasks related to vocal techniques, demonstrating
the potential of this resource for the audio processing community.