Workshop on Spoken Language Technologies for Under-resourced Languages最新文献

英文中文

Advances in Low Resource ASR: A Deep Learning Perspective 基于深度学习的低资源ASR研究进展

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-4

Hardik B. Sailor, Ankur T. Patil, H. Patil

Recently, developing Automatic Speech Recognition (ASR) systems for Low Resource (LR) languages is an active research area. The research in ASR is significantly advanced using deep learning approaches producing state-of-the-art results compared to the conventional approaches. However, it is still challenging to use such approaches for LR languages since it requires a huge amount of training data. Recently, data augmentation, multilingual and cross-lingual approaches, transfer learning, etc. enable training deep learning architectures. This paper presents an overview of deep learning-based approaches for building ASR for LR languages. Recent projects and events organized to support the development of ASR and related applications in this direction are also discussed. This paper could be a good motivation for the researchers interested to work towards low resource ASR using deep learning techniques. The approaches described here could be useful in other related applications, such as audio search.

近年来，针对低资源语言的自动语音识别(ASR)系统的开发是一个活跃的研究领域。与传统方法相比，深度学习方法在ASR研究方面取得了显著进展，产生了最先进的结果。然而，在LR语言中使用这种方法仍然具有挑战性，因为它需要大量的训练数据。最近，数据增强、多语言和跨语言方法、迁移学习等使深度学习架构的训练成为可能。本文概述了基于深度学习的LR语言ASR构建方法。最近组织的项目和活动，以支持ASR的发展和相关的应用在这个方向也进行了讨论。对于有兴趣使用深度学习技术研究低资源ASR的研究人员来说，这篇论文可能是一个很好的动力。这里描述的方法在其他相关应用程序中也很有用，比如音频搜索。

引用次数: 5

Mining Training Data for Language Modeling Across the World's Languages 跨世界语言的语言建模训练数据挖掘

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-13

Manasa Prasad, Theresa Breiner, D. Esch

引用次数: 12

Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load 基于功能负载的零资源环境下DPGMM聚类优化

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-1

Bin Wu, S. Sakti, Jinsong Zhang, Satoshi Nakamura

Inspired by infant language acquisition, unsupervised subword discovery of zero-resource languages has gained attention recently. The Dirichlet Process Gaussian Mixture Model (DPGMM) achieves top results evaluated by the ABX discrimination test. However, the DPGMM model is too sensitive to acoustic variation and often produces too many types of subword units and a relatively high-dimensional posteriorgram, which implies high computational cost to perform learning and inference, as well as more tendency to be overfitting. This paper proposes applying functional load to reduce the number of sub-word units from DPGMM. We greedily merge pairs of units with the lowest functional load, causing the least information loss of the language. Results on the Xitsonga corpus with the official setting of Zerospeech 2015 show that we can reduce the number of sub-word units by more than two thirds without hurting the ABX error rate. The number of units is close to that of phonemes in human language.

受幼儿语言习得的启发，零资源语言的无监督子词发现近年来引起了人们的关注。Dirichlet过程高斯混合模型(DPGMM)在ABX判别检验中获得了最高的评价结果。然而，DPGMM模型对声学变化过于敏感，经常产生太多类型的子词单元和相对高维的后图，这意味着执行学习和推理的计算成本高，并且更容易过度拟合。本文提出利用功能负载来减少DPGMM的子词单元数。我们贪婪地合并功能负荷最低的单元对，导致语言的信息损失最少。在官方设置为Zerospeech 2015的西松加语料库上的结果表明，我们可以在不影响ABX错误率的情况下将子词单位数量减少三分之二以上。其单位数量与人类语言中音素的数量相近。

引用次数: 10

Development of Assamese Continuous Speech Recognition System 阿萨姆语连续语音识别系统的开发

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-46

Tanmay Bhowmik, S. Mandal

引用次数: 0

Analysis and Comparison of Features for Text-Independent Bengali Speaker Recognition 不依赖文本的孟加拉语说话人识别特征分析与比较

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-57

S. Das, P. Das

引用次数: 0

Improved Language Identification Using Stacked SDC Features and Residual Neural Network 基于堆叠SDC特征和残差神经网络的改进语言识别

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-44

R. Vuddagiri, Hari Krishna Vydana, A. Vuppala

引用次数: 9

Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages 信号处理线索改善低资源印度语言的自动语音识别

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-6

Arun Baby, S. KarthikPandiaD., H. Murthy

Building accurate acoustic models for low resource languages is the focus of this paper. Acoustic models are likely to be accurate provided the phone boundaries are determined accurately. Conventional ﬂat-start based Viterbi phone alignment (where only utterance level transcriptions are available) results in poor phone boundaries as the boundaries are not explicitly modeled in any statistical machine learning system. The focus of the effort in this paper is to explicitly model phrase boundaries using acoustic cues obtained using signal processing. A phrase is made up of a sequence of words, where each word is made up of a sequence of syllables. Syllable boundaries are detected using signal processing. The waveform corresponding to an utterance is spliced at phrase boundaries when it matches a syllable boundary. Gaussian mixture model - hidden Markov model (GMM-HMM) training is performed phrase by phrase, rather than utterance by utterance. Training using these short phrases yields better acoustic models. This alignment is then fed to a DNN to enable better discrimination between phones. During the training process, the syllable boundaries (obtained using signal processing) are restored in every iteration. A rela-tive improvement is observed in WER over the baseline Indian languages, namely, Gujarati, Tamil, and Telugu.

为低资源语言建立准确的声学模型是本文研究的重点。如果电话边界被准确地确定，声学模型可能是准确的。传统的基于平面启动的Viterbi电话对齐(其中只有话语级别的转录可用)导致电话边界很差，因为边界没有在任何统计机器学习系统中明确建模。本文的重点是利用信号处理获得的声学线索来明确地建模短语边界。短语由一系列单词组成，其中每个单词又由一系列音节组成。使用信号处理检测音节边界。当与音节边界相匹配时，对应于话语的波形在短语边界处拼接。高斯混合模型-隐马尔可夫模型(GMM-HMM)训练是逐句进行，而不是逐句进行。使用这些短语进行训练可以产生更好的声学模型。然后将这种对齐馈送到DNN，以便更好地区分手机。在训练过程中，在每次迭代中恢复音节边界(通过信号处理获得)。相对于印度语言，即古吉拉特语、泰米尔语和泰卢固语，在WER中观察到相对改善。

{"title":"Signal Processing Cues to Improve Automatic Speech Recognition for Low Resource Indian Languages","authors":"Arun Baby, S. KarthikPandiaD., H. Murthy","doi":"10.21437/SLTU.2018-6","DOIUrl":"https://doi.org/10.21437/SLTU.2018-6","url":null,"abstract":"Building accurate acoustic models for low resource languages is the focus of this paper. Acoustic models are likely to be accurate provided the phone boundaries are determined accurately. Conventional ﬂat-start based Viterbi phone alignment (where only utterance level transcriptions are available) results in poor phone boundaries as the boundaries are not explicitly modeled in any statistical machine learning system. The focus of the effort in this paper is to explicitly model phrase boundaries using acoustic cues obtained using signal processing. A phrase is made up of a sequence of words, where each word is made up of a sequence of syllables. Syllable boundaries are detected using signal processing. The waveform corresponding to an utterance is spliced at phrase boundaries when it matches a syllable boundary. Gaussian mixture model - hidden Markov model (GMM-HMM) training is performed phrase by phrase, rather than utterance by utterance. Training using these short phrases yields better acoustic models. This alignment is then fed to a DNN to enable better discrimination between phones. During the training process, the syllable boundaries (obtained using signal processing) are restored in every iteration. A rela-tive improvement is observed in WER over the baseline Indian languages, namely, Gujarati, Tamil, and Telugu.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128106054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

JAMLIT: A Corpus of Jamaican Standard English for Automatic Speech Recognition of Children's Speech JAMLIT:用于儿童语音自动识别的牙买加标准英语语料库

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-51

Stefan Watson, André Coy

引用次数: 0

Empirical Study of Speech Synthesis Markup Language and Its Implementation for Punjabi Language 旁遮普语语音合成标记语言的实证研究及实现

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-22

Atul Kumar, S. Agrawal

This paper builds a prioritized list of requirements for speech synthesis markup which any proposed markup language should address. This study presents requirements and essential tags for specification development of Punjabi Language. A speech synthesizer works like written text into correct sounds to be spoken. To do this it uses an SSML document and one or more lexicons and dictionaries. We have presented how the different type of modules in TTS System helps to convert a text input of SSML document to spoken form in Punjabi Language. Since, Punjabi is the morphological rich Language, it is written in "Gurumukhi" Script and this is the official Language of Govt. of India. So, hence accordingly in this language Homograph problem will not occur. Tones in Punjabi pose big problems. The words written in similar ways, have different tones and there by changes their meanings for which the tags have been designed separately. In Punjabi orthographically the written symbols exactly corresponds to the specific words. Therefore in Punjabi, we do not any word which may be called Homograph.

本文构建了语音合成标记的优先级需求列表，任何提议的标记语言都应该解决这些需求。本文提出了旁遮普语规范发展的要求和基本标签。语音合成器的工作原理就像把书面文本转换成正确的发音。为此，它使用一个SSML文档以及一个或多个词典和字典。我们介绍了TTS系统中不同类型的模块如何帮助将SSML文档的文本输入转换为旁遮普语的口语形式。由于旁遮普语是一种形态丰富的语言，它是用“Gurumukhi”文字书写的，这是印度政府的官方语言。所以，因此相应地在这种语言中就不会出现同形词的问题。旁遮普语的语调带来了很大的问题。以相似的方式书写的单词，有不同的音调，并通过改变它们的含义，标签被单独设计。在旁遮普语中，正字法上的书写符号与特定的单词完全对应。因此，在旁遮普语中，我们没有任何可以称为同形词的词。

引用次数: 0

Implementation of Concatenation Technique for Low Resource Text-To-Speech System Based on Marathi Talking Calculator 基于马拉地语语音计算器的低资源文本转语音系统级联技术实现

Workshop on Spoken Language Technologies for Under-resourced Languages

Pub Date : 2018-08-29 DOI: 10.21437/SLTU.2018-16

Monica R. Mundada, Sangramsing Kayte, P. Das

The indulgent acquaintance of mathematical basic concepts creates the pavement for numerous opportunities in life for every individual, including visually impaired people. The use of assertive technology for the disabled section of the society makes them more independent and avoid barriers in the field of education and employment. This research is focused to design an Android-based application i.e. talking Calculator for low resource based Marathi native language. The novelty of this work is to develop both, the application and the Marathi number corpus. Marathi is an Indo-Aryan language spoken by approximately 6.99 million speakers in India, which is the third widely spoken language after Bengali and Telugu but as they lack in linguistic resources, e.g. grammars, POS taggers, corpora, it falls into the category of low resource languages. The front end part of the application depicts the screen of a basic calculator with numerals displayed in Marathi. During runtime, each number is spoken as the specific key is pressed. It also speaks out the operation which is intended to be performed. The concatenation synthesis technique is applied to speak out the value of decimal places in the output number. The result is spoken out with proper place value of a digit in Marathi. The performance of the system is measured to the accuracy rate of 95.5%. The average run time complexity of the application is also calculated which is noted down to 2.64 sec. The feedback and review of the application is also taken from real end-user i.e. blind people.

对数学基本概念的深入了解为每个人的生活创造了无数的机会，包括视障人士。为社会的残疾阶层使用自信的技术使他们更加独立，避免了在教育和就业领域的障碍。本研究的重点是设计一个基于android的应用程序，即基于低资源的马拉地语母语的说话计算器。这项工作的新颖之处在于同时开发了应用程序和马拉地语数字语料库。马拉地语是一种印度-雅利安语言，在印度大约有699万人使用，是仅次于孟加拉语和泰卢固语的第三种广泛使用的语言，但由于缺乏语言资源，例如语法、词性标注器、语料库，马拉地语属于低资源语言。应用程序的前端部分描绘了一个基本计算器的屏幕，其中用马拉地语显示数字。在运行期间，当按下特定的键时，会说出每个数字。它还指出要执行的操作。采用串联合成技术读出输出数中小数点后的值。结果用马拉地语用正确的位值表示出来。经测试，该系统的准确率达到95.5%。应用程序的平均运行时间复杂度也被计算到2.64秒。应用程序的反馈和审查也来自真正的最终用户，即盲人。

{"title":"Implementation of Concatenation Technique for Low Resource Text-To-Speech System Based on Marathi Talking Calculator","authors":"Monica R. Mundada, Sangramsing Kayte, P. Das","doi":"10.21437/SLTU.2018-16","DOIUrl":"https://doi.org/10.21437/SLTU.2018-16","url":null,"abstract":"The indulgent acquaintance of mathematical basic concepts creates the pavement for numerous opportunities in life for every individual, including visually impaired people. The use of assertive technology for the disabled section of the society makes them more independent and avoid barriers in the field of education and employment. This research is focused to design an Android-based application i.e. talking Calculator for low resource based Marathi native language. The novelty of this work is to develop both, the application and the Marathi number corpus. Marathi is an Indo-Aryan language spoken by approximately 6.99 million speakers in India, which is the third widely spoken language after Bengali and Telugu but as they lack in linguistic resources, e.g. grammars, POS taggers, corpora, it falls into the category of low resource languages. The front end part of the application depicts the screen of a basic calculator with numerals displayed in Marathi. During runtime, each number is spoken as the specific key is pressed. It also speaks out the operation which is intended to be performed. The concatenation synthesis technique is applied to speak out the value of decimal places in the output number. The result is spoken out with proper place value of a digit in Marathi. The performance of the system is measured to the accuracy rate of 95.5%. The average run time complexity of the application is also calculated which is noted down to 2.64 sec. The feedback and review of the application is also taken from real end-user i.e. blind people.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121508920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Workshop on Spoken Language Technologies for Under-resourced Languages

全部 Geobiology Appl. Clay Sci. Geochim. Cosmochim. Acta J. Hydrol. Org. Geochem. Carbon Balance Manage. Contrib. Mineral. Petrol. Int. J. Biometeorol. IZV-PHYS SOLID EART+ J. Atmos. Chem. Acta Oceanolog. Sin. Acta Geophys. ACTA GEOL POL ACTA PETROL SIN ACTA GEOL SIN-ENGL AAPG Bull. Acta Geochimica Adv. Atmos. Sci. Adv. Meteorol. Am. J. Phys. Anthropol. Am. J. Sci. Am. Mineral. Annu. Rev. Earth Planet. Sci. Appl. Geochem. Aquat. Geochem. Ann. Glaciol. Archaeol. Anthropol. Sci. ARCHAEOMETRY ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. ATMOSPHERE-BASEL Atmos. Res. Aust. J. Earth Sci. Atmos. Chem. Phys. Atmos. Meas. Tech. Basin Res. Big Earth Data BIOGEOSCIENCES Geostand. Geoanal. Res. GEOLOGY Geosci. J. Geochem. J. Geochem. Trans. Geosci. Front. Geol. Ore Deposits Global Biogeochem. Cycles Gondwana Res. Geochem. Int. Geol. J. Geophys. Prospect. Geosci. Model Dev. GEOL BELG GROUNDWATER Hydrogeol. J. Hydrol. Earth Syst. Sci. Hydrol. Processes Int. J. Climatol. Int. J. Earth Sci. Int. Geol. Rev. Int. J. Disaster Risk Reduct. Int. J. Geomech. Int. J. Geog. Inf. Sci. Isl. Arc J. Afr. Earth. Sci. J. Adv. Model. Earth Syst. J APPL METEOROL CLIM J. Atmos. Oceanic Technol. J. Atmos. Sol. Terr. Phys. J. Clim. J. Earth Sci. J. Earth Syst. Sci. J. Environ. Eng. Geophys. J. Geog. Sci. Mineral. Mag. Miner. Deposita Mon. Weather Rev. Nat. Hazards Earth Syst. Sci. Nat. Clim. Change Nat. Geosci. Ocean Dyn. Ocean and Coastal Research npj Clim. Atmos. Sci. Ocean Modell. Ocean Sci. Ore Geol. Rev. OCEAN SCI J Paleontol. J. PALAEOGEOGR PALAEOCL PERIOD MINERAL PETROLOGY+ Phys. Chem. Miner. Polar Sci. Prog. Oceanogr. Quat. Sci. Rev. Q. J. Eng. Geol. Hydrogeol. RADIOCARBON Pure Appl. Geophys. Resour. Geol. Rev. Geophys. Sediment. Geol.

﹀