Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar
{"title":"Faetar 基准:资源极度匮乏语言的语音识别","authors":"Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar","doi":"arxiv-2409.08103","DOIUrl":null,"url":null,"abstract":"We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark\ncorpus designed to push the limits of current approaches to low-resource speech\nrecognition. Faetar, a Franco-Proven\\c{c}al variety spoken primarily in Italy,\nhas no standard orthography, has virtually no existing textual or speech\nresources other than what is included in the benchmark, and is quite different\nfrom other forms of Franco-Proven\\c{c}al. The corpus comes from field\nrecordings, most of which are noisy, for which only 5 hrs have matching\ntranscriptions, and for which forced alignment is of variable quality. The\ncorpus contains an additional 20 hrs of unlabelled speech. We report baseline\nresults from state-of-the-art multilingual speech foundation models with a best\nphone error rate of 30.4%, using a pipeline that continues pre-training on the\nfoundation model using the unlabelled set.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language\",\"authors\":\"Michael Ong, Sean Robertson, Leo Peckham, Alba Jorquera Jimenez de Aberasturi, Paula Arkhangorodsky, Robin Huo, Aman Sakhardande, Mark Hallap, Naomi Nagy, Ewan Dunbar\",\"doi\":\"arxiv-2409.08103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark\\ncorpus designed to push the limits of current approaches to low-resource speech\\nrecognition. Faetar, a Franco-Proven\\\\c{c}al variety spoken primarily in Italy,\\nhas no standard orthography, has virtually no existing textual or speech\\nresources other than what is included in the benchmark, and is quite different\\nfrom other forms of Franco-Proven\\\\c{c}al. The corpus comes from field\\nrecordings, most of which are noisy, for which only 5 hrs have matching\\ntranscriptions, and for which forced alignment is of variable quality. The\\ncorpus contains an additional 20 hrs of unlabelled speech. We report baseline\\nresults from state-of-the-art multilingual speech foundation models with a best\\nphone error rate of 30.4%, using a pipeline that continues pre-training on the\\nfoundation model using the unlabelled set.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Faetar Benchmark: Speech Recognition in a Very Under-Resourced Language
We introduce the Faetar Automatic Speech Recognition Benchmark, a benchmark
corpus designed to push the limits of current approaches to low-resource speech
recognition. Faetar, a Franco-Proven\c{c}al variety spoken primarily in Italy,
has no standard orthography, has virtually no existing textual or speech
resources other than what is included in the benchmark, and is quite different
from other forms of Franco-Proven\c{c}al. The corpus comes from field
recordings, most of which are noisy, for which only 5 hrs have matching
transcriptions, and for which forced alignment is of variable quality. The
corpus contains an additional 20 hrs of unlabelled speech. We report baseline
results from state-of-the-art multilingual speech foundation models with a best
phone error rate of 30.4%, using a pipeline that continues pre-training on the
foundation model using the unlabelled set.