Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions

T. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy, V. Devi
{"title":"Unsupervised HMM posteriograms for language independent acoustic modeling in zero resource conditions","authors":"T. Ansari, Rajath Kumar, Sonali Singh, Sriram Ganapathy, V. Devi","doi":"10.1109/ASRU.2017.8269014","DOIUrl":null,"url":null,"abstract":"The task of language independent acoustic unit modeling in unlabeled raw speech (zero-resource setting) has gained significant interest over the recent years. The main challenge here is the extraction of acoustic representations that elicit good similarity between the same words or linguistic tokens spoken by different speakers and to derive these representations in a language independent manner. In this paper, we explore the use of Hidden Markov Model (HMM) based posteriograms for unsupervised acoustic unit modeling. The states of the HMM (which represent the language independent acoustic units) are initialized using a Gaussian mixture model (GMM) — Universal Background Model (UBM). The trained HMM is subsequently used to generate a temporally contiguous state alignment which are then modeled in a hybrid deep neural network (DNN) model. For the purpose of testing, we use the frame level HMM state posteriors obtained from the DNN as features for the ZeroSpeech challenge task. The minimal pair ABX error rate is measured for both the within and across speaker pairs. With several experiments on multiple languages in the ZeroSpeech corpus, we show that the proposed HMM based posterior features provides significant improvements over the baseline system using MFCC features (average relative improvements of 25% for within speaker pairs and 40% for across speaker pairs). Furthermore, the experiments where the target language is not seen training illustrate the proposed modeling approach is capable of learning global language independent representations.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"447 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8269014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

The task of language independent acoustic unit modeling in unlabeled raw speech (zero-resource setting) has gained significant interest over the recent years. The main challenge here is the extraction of acoustic representations that elicit good similarity between the same words or linguistic tokens spoken by different speakers and to derive these representations in a language independent manner. In this paper, we explore the use of Hidden Markov Model (HMM) based posteriograms for unsupervised acoustic unit modeling. The states of the HMM (which represent the language independent acoustic units) are initialized using a Gaussian mixture model (GMM) — Universal Background Model (UBM). The trained HMM is subsequently used to generate a temporally contiguous state alignment which are then modeled in a hybrid deep neural network (DNN) model. For the purpose of testing, we use the frame level HMM state posteriors obtained from the DNN as features for the ZeroSpeech challenge task. The minimal pair ABX error rate is measured for both the within and across speaker pairs. With several experiments on multiple languages in the ZeroSpeech corpus, we show that the proposed HMM based posterior features provides significant improvements over the baseline system using MFCC features (average relative improvements of 25% for within speaker pairs and 40% for across speaker pairs). Furthermore, the experiments where the target language is not seen training illustrate the proposed modeling approach is capable of learning global language independent representations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
零资源条件下语言无关声学建模的无监督HMM后图
在未标记的原始语音(零资源设置)中建立独立于语言的声学单元模型是近年来研究的热点。这里的主要挑战是声学表征的提取,这些声学表征可以引出不同说话者所说的相同单词或语言符号之间的良好相似性,并以语言独立的方式派生这些表征。在本文中,我们探索了基于隐马尔可夫模型(HMM)的后图在无监督声学单元建模中的应用。HMM(表示与语言无关的声学单元)的状态使用高斯混合模型(GMM) -通用背景模型(UBM)进行初始化。训练后的HMM随后用于生成时间连续状态对齐,然后在混合深度神经网络(DNN)模型中建模。为了测试的目的,我们使用从DNN获得的帧级HMM状态后验作为ZeroSpeech挑战任务的特征。测量扬声器对内部和跨扬声器对的最小对ABX错误率。通过对ZeroSpeech语料库中多种语言的多次实验,我们表明,与使用MFCC特征的基线系统相比,提出的基于HMM的后验特征提供了显著的改进(在说话人对内部的平均相对改进为25%,在说话人对之间的平均相对改进为40%)。此外,在未看到目标语言训练的实验中,表明所提出的建模方法能够学习全局语言独立表示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Scalable multi-domain dialogue state tracking Topic segmentation in ASR transcripts using bidirectional RNNS for change detection Consistent DNN uncertainty training and decoding for robust ASR Cracking the cocktail party problem by multi-beam deep attractor network ONENET: Joint domain, intent, slot prediction for spoken language understanding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1