首页 > 最新文献

Workshop on Spoken Language Technologies for Under-resourced Languages最新文献

英文 中文
A Unified Phonological Representation of South Asian Languages for Multilingual Text-to-Speech 南亚语言多语言文本-语音的统一音系表征
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-17
Isin Demirsahin, Martin Jansche, Alexander Gutkin
We present a multilingual phoneme inventory and inclusion mappings from the native inventories of several major South Asian languages for multilingual parametric text-to-speech synthesis (TTS). Our goal is to reduce the need for training data when building new TTS voices by leveraging available data for similar languages within a common feature design. For West Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Tel-ugu, and Urdu we compare TTS voices trained only on monolingual data with voices trained on multilingual data from 12 languages. In subjective evaluations multilingually trained voices outperform (or in a few cases are statistically tied with) the corresponding monolingual voices. The multilingual setup can further be used to synthesize speech for languages not seen in the training data; preliminary evaluations lean towards good. Our results indicate that pooling data from different languages in a single acoustic model can be beneficial, opening up new uses and research questions.
我们提出了一个多语言音素清单和包含映射,来自几种主要南亚语言的本地清单,用于多语言参数文本到语音合成(TTS)。我们的目标是在构建新的TTS语音时,通过利用公共功能设计中类似语言的可用数据来减少对训练数据的需求。对于西孟加拉语、古吉拉特语、卡纳达语、马拉雅拉姆语、马拉地语、泰米尔语、特尔乌古语和乌尔都语,我们比较了仅在单语言数据上训练的TTS语音与在12种语言的多语言数据上训练的语音。在主观评价中,多语言训练的声音比相应的单语言声音表现得更好(或者在少数情况下在统计上与之并列)。多语言设置可以进一步用于合成训练数据中未出现的语言的语音;初步评价倾向于良好。我们的研究结果表明,在单一声学模型中汇集不同语言的数据是有益的,开辟了新的用途和研究问题。
{"title":"A Unified Phonological Representation of South Asian Languages for Multilingual Text-to-Speech","authors":"Isin Demirsahin, Martin Jansche, Alexander Gutkin","doi":"10.21437/SLTU.2018-17","DOIUrl":"https://doi.org/10.21437/SLTU.2018-17","url":null,"abstract":"We present a multilingual phoneme inventory and inclusion mappings from the native inventories of several major South Asian languages for multilingual parametric text-to-speech synthesis (TTS). Our goal is to reduce the need for training data when building new TTS voices by leveraging available data for similar languages within a common feature design. For West Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Tel-ugu, and Urdu we compare TTS voices trained only on monolingual data with voices trained on multilingual data from 12 languages. In subjective evaluations multilingually trained voices outperform (or in a few cases are statistically tied with) the corresponding monolingual voices. The multilingual setup can further be used to synthesize speech for languages not seen in the training data; preliminary evaluations lean towards good. Our results indicate that pooling data from different languages in a single acoustic model can be beneficial, opening up new uses and research questions.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127128056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Neural Networks-based Automatic Speech Recognition for Agricultural Commodity in Gujarati Language 基于神经网络的古吉拉特语农产品语音自动识别
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-34
Hardik B. Sailor, H. Patil
In this paper, we present a development of Automatic Speech Recognition (ASR) system as a part of a speech-based access for an agricultural commodity in the Gujarati (a low resource) language. We proposed to use neural networks for language modeling, acoustic modeling, and feature learning from the raw speech signals. The speech database of agricultural commodities was collected from the farmers belonging to various villages of Gujarat state (India). The database has various dialectal variations and real noisy acoustic environments. Acoustic modeling is performed using Time Delay Neural Networks (TDNN). The auditory feature representation is learned using Convolutional Restricted Boltzmann Machine (ConvRBM) and Teager Energy Operator (TEO). The language model (LM) rescoring is performed using Recurrent Neural Networks (RNN). RNNLM rescoring provides an absolute reduction of 0.69-1.18 in % WER for all the feature sets compared to the bi-gram LM. The system combination of ConvRBM and Mel filterbank further improved the performance of ASR compared to the baseline TDNN with Mel filterbank features (5.4 % relative reduction in WER). The statistical significance of proposed approach is justified using a bootstrap-based % Probability of Improvement (POI) measure.
在本文中,我们提出了一个自动语音识别(ASR)系统的开发,作为古吉拉特语(一种低资源)语言农产品的基于语音的访问的一部分。我们建议使用神经网络对原始语音信号进行语言建模、声学建模和特征学习。农产品语言数据库收集自印度古吉拉特邦各个村庄的农民。数据库有各种方言变化和真实的嘈杂声环境。声学建模采用延时神经网络(TDNN)。使用卷积受限玻尔兹曼机(ConvRBM)和Teager能量算子(TEO)学习听觉特征表示。语言模型(LM)评分采用递归神经网络(RNN)进行。与双图LM相比,RNNLM评分为所有特征集提供了0.69-1.18 %的绝对减少。与具有Mel滤波器组特征的基线TDNN相比,ConvRBM和Mel滤波器组的系统组合进一步提高了ASR的性能(相对降低了5.4%的WER)。采用基于自举的%改进概率(POI)度量来证明所提出方法的统计显著性。
{"title":"Neural Networks-based Automatic Speech Recognition for Agricultural Commodity in Gujarati Language","authors":"Hardik B. Sailor, H. Patil","doi":"10.21437/SLTU.2018-34","DOIUrl":"https://doi.org/10.21437/SLTU.2018-34","url":null,"abstract":"In this paper, we present a development of Automatic Speech Recognition (ASR) system as a part of a speech-based access for an agricultural commodity in the Gujarati (a low resource) language. We proposed to use neural networks for language modeling, acoustic modeling, and feature learning from the raw speech signals. The speech database of agricultural commodities was collected from the farmers belonging to various villages of Gujarat state (India). The database has various dialectal variations and real noisy acoustic environments. Acoustic modeling is performed using Time Delay Neural Networks (TDNN). The auditory feature representation is learned using Convolutional Restricted Boltzmann Machine (ConvRBM) and Teager Energy Operator (TEO). The language model (LM) rescoring is performed using Recurrent Neural Networks (RNN). RNNLM rescoring provides an absolute reduction of 0.69-1.18 in % WER for all the feature sets compared to the bi-gram LM. The system combination of ConvRBM and Mel filterbank further improved the performance of ASR compared to the baseline TDNN with Mel filterbank features (5.4 % relative reduction in WER). The statistical significance of proposed approach is justified using a bootstrap-based % Probability of Improvement (POI) measure.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114635431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SVM Based Language Diarization for Code-Switched Bilingual Indian Speech Using Bottleneck Features 基于瓶颈特征的码交换双语印度语支持向量机语言分类
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-28
V. Spoorthy, Veena Thenkanidiyoor, Dileep Aroor Dinesh
This paper proposes an SVM-based language diarizer for code-switched bilingual Indian speech. Code-switching corresponds to usage of more than one language within a single utterance. Language diarization involves identifying code-switch points in an utterance and segmenting it into homogeneous language segments. This is very important for Indian context because every Indian is at least bilingual and code-switching is inevitable. For building an effective language diarizer, it is helpful to consider phonotactic features. In this work, we propose to consider bottleneck features for language diarization. Bottleneck features correspond to output of a narrow hidden layer of a multilayer neural network trained to perform phone state classification. The studies conducted using the standard datasets have shown the effectiveness of the proposed approach.
本文提出了一种基于支持向量机的编码切换双语印度语日记器。语码转换是指在一个话语中使用一种以上的语言。语言分割是指识别话语中的语码转换点,并将其分割成同质的语言片段。这对印度语境来说非常重要,因为每个印度人至少都会说两种语言,代码转换是不可避免的。为了建立一个有效的语言日记,考虑语音特征是有帮助的。在这项工作中,我们建议考虑语言化的瓶颈特征。瓶颈特征对应于多层神经网络的窄隐藏层的输出,用于执行电话状态分类。使用标准数据集进行的研究显示了所提出方法的有效性。
{"title":"SVM Based Language Diarization for Code-Switched Bilingual Indian Speech Using Bottleneck Features","authors":"V. Spoorthy, Veena Thenkanidiyoor, Dileep Aroor Dinesh","doi":"10.21437/SLTU.2018-28","DOIUrl":"https://doi.org/10.21437/SLTU.2018-28","url":null,"abstract":"This paper proposes an SVM-based language diarizer for code-switched bilingual Indian speech. Code-switching corresponds to usage of more than one language within a single utterance. Language diarization involves identifying code-switch points in an utterance and segmenting it into homogeneous language segments. This is very important for Indian context because every Indian is at least bilingual and code-switching is inevitable. For building an effective language diarizer, it is helpful to consider phonotactic features. In this work, we propose to consider bottleneck features for language diarization. Bottleneck features correspond to output of a narrow hidden layer of a multilayer neural network trained to perform phone state classification. The studies conducted using the standard datasets have shown the effectiveness of the proposed approach.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128935215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali 爪哇语,巽他语,僧伽罗语,尼泊尔语和孟加拉语的众包语音语料库
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-11
Oddur Kjartansson, Supheakmungkol Sarin, Knot Pipatsrisawat, Martin Jansche, Linne Ha
We present speech corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali. Each corpus consists of an average of approximately 200k recorded utterances that were provided by native-speaker volunteers in the respective region. Recordings were made using portable consumer electronics in reasonably quiet environments. For each recorded utterance the textual prompt and an anonymized hexadecimal identifier of the speaker are available. Biographical information of the speakers is unavailable. In particular, the speakers come from an unspeci-fied mix of genders. The recordings are suitable for research on acoustic modeling for speech recognition, for example. To validate the integrity of the corpora and their suitability for speech recognition research, we provide simple recipes that illustrate how they can be used with the open-source Kaldi speech recognition toolkit. The corpora are being made available under a Creative Commons license in the hope that they will stimulate further research on these languages.
我们提供爪哇语、巽他语、僧伽罗语、尼泊尔语和孟加拉语的演讲语料库。每个语料库平均包含大约20万段由各自地区的母语志愿者提供的话语记录。录音是在相当安静的环境中使用便携式消费电子设备进行的。对于每一个记录的话语,文本提示和说话人的匿名十六进制标识符是可用的。无法获得演讲者的个人资料。特别值得一提的是,说话者来自一个性别不明的混合体。例如,这些录音适合用于语音识别的声学建模研究。为了验证语料库的完整性及其对语音识别研究的适用性,我们提供了简单的食谱,说明如何将它们与开源的Kaldi语音识别工具包一起使用。这些语料库是在知识共享许可下提供的,希望它们能促进对这些语言的进一步研究。
{"title":"Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali","authors":"Oddur Kjartansson, Supheakmungkol Sarin, Knot Pipatsrisawat, Martin Jansche, Linne Ha","doi":"10.21437/SLTU.2018-11","DOIUrl":"https://doi.org/10.21437/SLTU.2018-11","url":null,"abstract":"We present speech corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali. Each corpus consists of an average of approximately 200k recorded utterances that were provided by native-speaker volunteers in the respective region. Recordings were made using portable consumer electronics in reasonably quiet environments. For each recorded utterance the textual prompt and an anonymized hexadecimal identifier of the speaker are available. Biographical information of the speakers is unavailable. In particular, the speakers come from an unspeci-fied mix of genders. The recordings are suitable for research on acoustic modeling for speech recognition, for example. To validate the integrity of the corpora and their suitability for speech recognition research, we provide simple recipes that illustrate how they can be used with the open-source Kaldi speech recognition toolkit. The corpora are being made available under a Creative Commons license in the hope that they will stimulate further research on these languages.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114493229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Interspeech 2018 Low Resource Automatic Speech Recognition Challenge for Indian Languages Interspeech 2018印度语言低资源自动语音识别挑战
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-3
B. M. L. Srivastava, Sunayana Sitaram, R. Mehta, K. Mohan, Pallavi Matani, Sandeepkumar Satpal, Kalika Bali, Radhakrishnan Srikanth, N. Nayak
India has more than 1500 languages, with 30 of them spoken by more than one million native speakers. Most of them are low-resource and could greatly benefit from speech and language technologies. Building speech recognition support for these low-resource languages requires innovation in handling constraints on data size, while also exploiting the unique properties and similarities among Indian languages. With this goal, we organized a low-resource Automatic Speech Recognition challenge for Indian languages as part of Interspeech 2018. We released 50 hours of speech data with transcriptions for Tamil, Telugu and Gujarati, amounting to a total of 150 hours. Participants were required to only use the data we released for the challenge to preserve the low-resource setting, however, they were not restricted to work on any particular aspect of the speech recognizer. We received 109 submissions from 18 research groups and evaluated the systems in terms of Word Error Rate on a blind test set. In this paper we summarize the data, approaches and results of the challenge.
印度有1500多种语言,其中30种语言的母语使用者超过100万人。他们中的大多数资源匮乏,可以从语音和语言技术中受益匪浅。为这些低资源语言构建语音识别支持需要在处理数据大小限制方面进行创新,同时还要利用印度语言之间的独特属性和相似性。为了实现这一目标,我们组织了一个低资源的印度语言自动语音识别挑战,作为Interspeech 2018的一部分。我们发布了50小时的语音数据,包括泰米尔语、泰卢固语和古吉拉特语的转录,总计150小时。参与者被要求只使用我们为挑战发布的数据,以保持低资源设置,然而,他们不限于在语音识别器的任何特定方面工作。我们收到了来自18个研究小组的109份意见书,并在盲测集上根据单词错误率对系统进行了评估。在本文中,我们总结了数据,方法和结果的挑战。
{"title":"Interspeech 2018 Low Resource Automatic Speech Recognition Challenge for Indian Languages","authors":"B. M. L. Srivastava, Sunayana Sitaram, R. Mehta, K. Mohan, Pallavi Matani, Sandeepkumar Satpal, Kalika Bali, Radhakrishnan Srikanth, N. Nayak","doi":"10.21437/SLTU.2018-3","DOIUrl":"https://doi.org/10.21437/SLTU.2018-3","url":null,"abstract":"India has more than 1500 languages, with 30 of them spoken by more than one million native speakers. Most of them are low-resource and could greatly benefit from speech and language technologies. Building speech recognition support for these low-resource languages requires innovation in handling constraints on data size, while also exploiting the unique properties and similarities among Indian languages. With this goal, we organized a low-resource Automatic Speech Recognition challenge for Indian languages as part of Interspeech 2018. We released 50 hours of speech data with transcriptions for Tamil, Telugu and Gujarati, amounting to a total of 150 hours. Participants were required to only use the data we released for the challenge to preserve the low-resource setting, however, they were not restricted to work on any particular aspect of the speech recognizer. We received 109 submissions from 18 research groups and evaluated the systems in terms of Word Error Rate on a blind test set. In this paper we summarize the data, approaches and results of the challenge.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115924976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Building an ASR System for Mboshi Using A Cross-Language Definition of Acoustic Units Approach 基于声单元跨语言定义方法构建Mboshi的ASR系统
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-35
O. Scharenborg, Patrick Ebel, M. Hasegawa-Johnson, N. Dehak
For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi acoustic units were first created using cross-language initialization of the phoneme vectors in the output layer. Subsequently, this adapted system was retrained using Mboshi self-labels. Two training methods were investigated: retraining of only the output layer and retraining the full deep neural network (DNN). The resulting Mboshi system was analyzed by investigating per phoneme accuracies, phoneme confusions, and by visualizing the hidden layers of the DNNs prior to and following retraining with the self-labels. Results showed a fairly similar performance for the two training methods but a better phoneme representation for the fully retrained DNN.
对于世界上许多语言来说,没有足够的(带注释的)语音数据来训练ASR系统。最近,我们提出了一种使用语言知识和半监督训练的跨语言训练ASR系统的方法。在这里,我们将这种方法应用于资源匮乏的Mboshi语言。使用荷兰语训练的ASR系统,Mboshi声学单元首先在输出层中使用跨语言初始化音素向量来创建。随后,使用Mboshi自我标签对该适应系统进行再训练。研究了两种训练方法:仅对输出层进行再训练和对全深度神经网络进行再训练。通过调查每个音素的准确性,音素混淆,以及在使用自标签进行再训练之前和之后可视化dnn的隐藏层来分析所得到的Mboshi系统。结果表明,两种训练方法的表现相当相似,但完全再训练的DNN具有更好的音素表示。
{"title":"Building an ASR System for Mboshi Using A Cross-Language Definition of Acoustic Units Approach","authors":"O. Scharenborg, Patrick Ebel, M. Hasegawa-Johnson, N. Dehak","doi":"10.21437/SLTU.2018-35","DOIUrl":"https://doi.org/10.21437/SLTU.2018-35","url":null,"abstract":"For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi acoustic units were first created using cross-language initialization of the phoneme vectors in the output layer. Subsequently, this adapted system was retrained using Mboshi self-labels. Two training methods were investigated: retraining of only the output layer and retraining the full deep neural network (DNN). The resulting Mboshi system was analyzed by investigating per phoneme accuracies, phoneme confusions, and by visualizing the hidden layers of the DNNs prior to and following retraining with the self-labels. Results showed a fairly similar performance for the two training methods but a better phoneme representation for the fully retrained DNN.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122069798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Designing an IVR Based Framework for Telephony Speech Data Collection and Transcription in Under-Resourced Languages 在资源不足的语言中设计基于IVR的电话语音数据收集和转录框架
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-10
Joyanta Basu, Soma Khan, M. S. Bepari, Rajib Roy, Madhab Pal, Sushmita Nandi
Scarcity of digitally available language resources restricts development of large scale speech applications in Indian scenario. This paper describes a unique design framework for telephony speech data collection in under-resourced languages using interactive voice response (IVR) technology. IVR systems provide a fast, reliable, automated and relatively low cost medium for simultaneous multilingual audio resource collection from remote users and help in structured storage of resources for further usage. The framework needs IVR hardware & API, related software tools and text resources as its necessary components. Detailed functional design and development process of such a running IVR system are stepwise elaborated. Sample IVR call-flow design templates and offline audio transcription procedure is also presented for ease of understanding. Entire methodology is language independent and is adaptable to similar tasks in other languages and specially beneficial to accelerate resource creation process in under-resourced languages, minimizing manual efforts of data collection and transcription.
数字语言资源的缺乏限制了印度场景中大规模语音应用的发展。本文描述了一个独特的设计框架,用于在资源不足的语言中使用交互式语音应答(IVR)技术进行电话语音数据收集。IVR系统为远程用户同时收集多语言音频资源提供了一种快速、可靠、自动化和相对低成本的媒介,并有助于对资源进行结构化存储以供进一步使用。该框架需要IVR硬件及API、相关软件工具和文本资源作为其必备组件。逐步阐述了该可运行IVR系统的详细功能设计和开发过程。示例IVR呼叫流设计模板和离线音频转录程序也提出了便于理解。整个方法是语言独立的,适用于其他语言的类似任务,特别有利于加速资源不足语言的资源创建过程,最大限度地减少数据收集和转录的手工工作。
{"title":"Designing an IVR Based Framework for Telephony Speech Data Collection and Transcription in Under-Resourced Languages","authors":"Joyanta Basu, Soma Khan, M. S. Bepari, Rajib Roy, Madhab Pal, Sushmita Nandi","doi":"10.21437/SLTU.2018-10","DOIUrl":"https://doi.org/10.21437/SLTU.2018-10","url":null,"abstract":"Scarcity of digitally available language resources restricts development of large scale speech applications in Indian scenario. This paper describes a unique design framework for telephony speech data collection in under-resourced languages using interactive voice response (IVR) technology. IVR systems provide a fast, reliable, automated and relatively low cost medium for simultaneous multilingual audio resource collection from remote users and help in structured storage of resources for further usage. The framework needs IVR hardware & API, related software tools and text resources as its necessary components. Detailed functional design and development process of such a running IVR system are stepwise elaborated. Sample IVR call-flow design templates and offline audio transcription procedure is also presented for ease of understanding. Entire methodology is language independent and is adaptable to similar tasks in other languages and specially beneficial to accelerate resource creation process in under-resourced languages, minimizing manual efforts of data collection and transcription.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122990665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving ASR for Code-Switched Speech in Under-Resourced Languages Using Out-of-Domain Data 利用域外数据提高资源不足语言中编码切换语音的自动识别能力
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-26
A. Biswas, E. V. D. Westhuizen, T. Niesler, F. D. Wet
We explore the use of out-of-domain monolingual data for the improvement of automatic speech recognition (ASR) of code-switched speech. This is relevant because annotated code-switched speech data is both scarce and very hard to produce, especially when the languages concerned are under-resourced, while monolingual corpora are generally better-resourced. We perform experiments using a recently-introduced small five-language corpus of code-switched South African soap opera speech. We consider specifically whether ASR of English– isiZulu code-switched speech can be improved by incorporating monolingual data from unrelated but larger corpora. TDNN-BLSTM acoustic models are trained using various configura-tions of training data. The utility of artificially-generated bilingual English–isiZulu text to augment language model training data is also explored. We find that English-isiZulu speech recognition accuracy can be improved by incorporating mono-lingual out-of-domain data despite the differences between the soap-opera and monolingual speech.
我们探索使用域外单语数据来改进编码切换语音的自动语音识别(ASR)。这是相关的,因为带注释的代码转换语音数据既稀少又很难产生,特别是当有关语言资源不足时,而单语语料库通常资源较好。我们使用一个最近引入的小型五语言语料库进行实验,该语料库由南非肥皂剧的语码转换而成。我们特别考虑是否可以通过合并来自不相关但更大语料库的单语数据来提高英语- isiZulu语码转换语音的ASR。TDNN-BLSTM声学模型使用各种配置的训练数据进行训练。本文还探讨了人工生成的双语英语- isizulu文本对语言模型训练数据的增强作用。我们发现,尽管肥皂剧和单语语音之间存在差异,但通过合并单语域外数据可以提高英语- isizulu语音识别的准确性。
{"title":"Improving ASR for Code-Switched Speech in Under-Resourced Languages Using Out-of-Domain Data","authors":"A. Biswas, E. V. D. Westhuizen, T. Niesler, F. D. Wet","doi":"10.21437/SLTU.2018-26","DOIUrl":"https://doi.org/10.21437/SLTU.2018-26","url":null,"abstract":"We explore the use of out-of-domain monolingual data for the improvement of automatic speech recognition (ASR) of code-switched speech. This is relevant because annotated code-switched speech data is both scarce and very hard to produce, especially when the languages concerned are under-resourced, while monolingual corpora are generally better-resourced. We perform experiments using a recently-introduced small five-language corpus of code-switched South African soap opera speech. We consider specifically whether ASR of English– isiZulu code-switched speech can be improved by incorporating monolingual data from unrelated but larger corpora. TDNN-BLSTM acoustic models are trained using various configura-tions of training data. The utility of artificially-generated bilingual English–isiZulu text to augment language model training data is also explored. We find that English-isiZulu speech recognition accuracy can be improved by incorporating mono-lingual out-of-domain data despite the differences between the soap-opera and monolingual speech.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128047631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Comparative Study of SMT and NMT: Case Study of English-Nepali Language Pair SMT与NMT的比较研究——以英语-尼泊尔语对为例
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-19
P. Acharya, B. Bal
{"title":"A Comparative Study of SMT and NMT: Case Study of English-Nepali Language Pair","authors":"P. Acharya, B. Bal","doi":"10.21437/SLTU.2018-19","DOIUrl":"https://doi.org/10.21437/SLTU.2018-19","url":null,"abstract":"","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116086742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hindi Speech Vowel Recognition Using Hidden Markov Model 基于隐马尔可夫模型的印地语语音元音识别
Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-42
Shobha Bhatt, A. Dev, Anurag Jain
Machine learning has revolutionised speech technologies for major world languages, but these technologies have generally not been available for the roughly 4,000 languages with populations of fewer than 10,000 speakers. This paper describes the development of Elpis, a pipeline which language documentation workers with minimal computational experience can use to build their own speech recognition models, resulting in models being built for 16 languages from the Asia-Pacific region. Elpis puts machine learning speech technologies within reach of people working with languages with scarce data, in a scalable way. This is impactful since it enables language communities to cross the digital divide, and speeds up language documentation. Complete automation of the process is not feasible for languages with small quantities of data and potentially large vocabularies. Hence our goal is not full automation, but rather to make a practical and effective workflow that integrates machine learning technologies.
机器学习已经彻底改变了世界主要语言的语音技术,但这些技术通常不适用于人口少于10,000的大约4,000种语言。本文描述了Elpis的开发,这是一个管道,语言文档工作者可以使用它来建立自己的语音识别模型,从而为来自亚太地区的16种语言建立了模型。Elpis以一种可扩展的方式,将机器学习语音技术提供给那些使用稀缺数据的语言的人。这是有影响力的,因为它使语言社区能够跨越数字鸿沟,并加快语言文档。对于具有少量数据和潜在的大量词汇表的语言,该过程的完全自动化是不可行的。因此,我们的目标不是完全自动化,而是建立一个集成机器学习技术的实用有效的工作流程。
{"title":"Hindi Speech Vowel Recognition Using Hidden Markov Model","authors":"Shobha Bhatt, A. Dev, Anurag Jain","doi":"10.21437/SLTU.2018-42","DOIUrl":"https://doi.org/10.21437/SLTU.2018-42","url":null,"abstract":"Machine learning has revolutionised speech technologies for major world languages, but these technologies have generally not been available for the roughly 4,000 languages with populations of fewer than 10,000 speakers. This paper describes the development of Elpis, a pipeline which language documentation workers with minimal computational experience can use to build their own speech recognition models, resulting in models being built for 16 languages from the Asia-Pacific region. Elpis puts machine learning speech technologies within reach of people working with languages with scarce data, in a scalable way. This is impactful since it enables language communities to cross the digital divide, and speeds up language documentation. Complete automation of the process is not feasible for languages with small quantities of data and potentially large vocabularies. Hence our goal is not full automation, but rather to make a practical and effective workflow that integrates machine learning technologies.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115273459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
Workshop on Spoken Language Technologies for Under-resourced Languages
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1