P. Shobana Devi, Divya Das, J. Stephen, V. K. Bhadran
{"title":"Web based and voice enabled IVRS for large scale Malayalam speech data collection","authors":"P. Shobana Devi, Divya Das, J. Stephen, V. K. Bhadran","doi":"10.1109/IC3I.2014.7019717","DOIUrl":null,"url":null,"abstract":"Speech corpora are vital resource in development and evaluation of automatic speech recognition systems, as well as for acoustic phonetic studies. Collecting a huge corpus is not an easy task. The lack of such resources is one of the reasons for the absence of good quality speech recognition systems in Indian languages. Here we have automated such process by developing web based tool for collecting broad band speech data and an IVR system with speech recognition for collecting narrow band speech data. The main features includes the full support for the typical recording, annotation and project administration workflow, easy editing of the speech content, with an advantage of a fully localizable user interface. This paper describes in detail the development of web based speech collection tool and an IVR system which will enable end-to-end building of speech corpus with minimum manual effort.","PeriodicalId":430848,"journal":{"name":"2014 International Conference on Contemporary Computing and Informatics (IC3I)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Contemporary Computing and Informatics (IC3I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3I.2014.7019717","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Speech corpora are vital resource in development and evaluation of automatic speech recognition systems, as well as for acoustic phonetic studies. Collecting a huge corpus is not an easy task. The lack of such resources is one of the reasons for the absence of good quality speech recognition systems in Indian languages. Here we have automated such process by developing web based tool for collecting broad band speech data and an IVR system with speech recognition for collecting narrow band speech data. The main features includes the full support for the typical recording, annotation and project administration workflow, easy editing of the speech content, with an advantage of a fully localizable user interface. This paper describes in detail the development of web based speech collection tool and an IVR system which will enable end-to-end building of speech corpus with minimum manual effort.