{"title":"Improvements of SpeeD’s Romanian ASR system during ReTeRom project","authors":"Alexandru-Lucian Georgescu, H. Cucu, C. Burileanu","doi":"10.1109/sped53181.2021.9587383","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition (ASR) for Romanian language is on an ascending trend of interest for the scientific community. In the last two years several research groups reported valuable results on speech recognition and dialogue tasks for Romanian. In our paper we present the improvements we recently obtained by collecting and using more text and audio data for training the language and acoustic models. We emphasize the automatic methodologies employed to facilitate data collection and annotation. In comparison to our previous work, we report state-of-the-art results for read speech (WER of 1.6%) and significantly better results on spontaneous speech: relative improvement around 40%). In order to facilitate direct comparison with other ASR systems, we release all evaluation datasets, totaling 10 hours of manually annotated speech.","PeriodicalId":193702,"journal":{"name":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/sped53181.2021.9587383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Automatic speech recognition (ASR) for Romanian language is on an ascending trend of interest for the scientific community. In the last two years several research groups reported valuable results on speech recognition and dialogue tasks for Romanian. In our paper we present the improvements we recently obtained by collecting and using more text and audio data for training the language and acoustic models. We emphasize the automatic methodologies employed to facilitate data collection and annotation. In comparison to our previous work, we report state-of-the-art results for read speech (WER of 1.6%) and significantly better results on spontaneous speech: relative improvement around 40%). In order to facilitate direct comparison with other ASR systems, we release all evaluation datasets, totaling 10 hours of manually annotated speech.