Andrei-Marius Avram, M. Nichita, Razvan Bartusica, Madalin Mihai
{"title":"RoSAC:用于转录罗马尼亚紧急呼叫的语音语料库","authors":"Andrei-Marius Avram, M. Nichita, Razvan Bartusica, Madalin Mihai","doi":"10.1109/comm54429.2022.9817214","DOIUrl":null,"url":null,"abstract":"Publicly available speech datasets for Romanian are still scarce, being far from enough for obtaining state-of-the-art performance with modern deep neural networks. As a response to this issue, during the development of an internal automatic speech recognition system for the national emergency call center, we have created the Romanian Speech Alert Corpus, a new Romanian speech corpus that was obtained by crowd-sourcing the reading of sentences in our institution. This paper describes the data acquisition process, several statistics about the resulted corpus and the algorithm we employed for removing the inadequate recordings.","PeriodicalId":118077,"journal":{"name":"2022 14th International Conference on Communications (COMM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"RoSAC: A Speech Corpus for Transcribing Romanian Emergency Calls\",\"authors\":\"Andrei-Marius Avram, M. Nichita, Razvan Bartusica, Madalin Mihai\",\"doi\":\"10.1109/comm54429.2022.9817214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Publicly available speech datasets for Romanian are still scarce, being far from enough for obtaining state-of-the-art performance with modern deep neural networks. As a response to this issue, during the development of an internal automatic speech recognition system for the national emergency call center, we have created the Romanian Speech Alert Corpus, a new Romanian speech corpus that was obtained by crowd-sourcing the reading of sentences in our institution. This paper describes the data acquisition process, several statistics about the resulted corpus and the algorithm we employed for removing the inadequate recordings.\",\"PeriodicalId\":118077,\"journal\":{\"name\":\"2022 14th International Conference on Communications (COMM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Communications (COMM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/comm54429.2022.9817214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Communications (COMM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/comm54429.2022.9817214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RoSAC: A Speech Corpus for Transcribing Romanian Emergency Calls
Publicly available speech datasets for Romanian are still scarce, being far from enough for obtaining state-of-the-art performance with modern deep neural networks. As a response to this issue, during the development of an internal automatic speech recognition system for the national emergency call center, we have created the Romanian Speech Alert Corpus, a new Romanian speech corpus that was obtained by crowd-sourcing the reading of sentences in our institution. This paper describes the data acquisition process, several statistics about the resulted corpus and the algorithm we employed for removing the inadequate recordings.