Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS

Ö. Çetin, M. Magimai.-Doss, Karen Livescu, Arthur Kantor, Simon King, C. Bartels, Joe Frankel
{"title":"Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS","authors":"Ö. Çetin, M. Magimai.-Doss, Karen Livescu, Arthur Kantor, Simon King, C. Bartels, Joe Frankel","doi":"10.1109/ASRU.2007.4430080","DOIUrl":null,"url":null,"abstract":"The features derived from posteriors of a multilayer perceptron (MLP), known as tandem features, have proven to be very effective for automatic speech recognition. Most tandem features to date have relied on MLPs trained for phone classification. We recently showed on a relatively small data set that MLPs trained for articulatory feature classification can be equally effective. In this paper, we provide a similar comparison using MLPs trained on a much larger data set -2000 hours of English conversational telephone speech. We also explore how portable phone-and articulatory feature-based tandem features are in an entirely different language - Mandarin - without any retraining. We find that while the phone-based features perform slightly better than AF-based features in the matched-language condition, they perform significantly better in the cross-language condition. However, in the cross-language condition, neither approach is as effective as the tandem features extracted from an MLP trained on a relatively small amount of in-domain data. Beyond feature concatenation, we also explore novel factored observation modeling schemes that allow for greater flexibility in combining the tandem and standard features.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2007.4430080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

The features derived from posteriors of a multilayer perceptron (MLP), known as tandem features, have proven to be very effective for automatic speech recognition. Most tandem features to date have relied on MLPs trained for phone classification. We recently showed on a relatively small data set that MLPs trained for articulatory feature classification can be equally effective. In this paper, we provide a similar comparison using MLPs trained on a much larger data set -2000 hours of English conversational telephone speech. We also explore how portable phone-and articulatory feature-based tandem features are in an entirely different language - Mandarin - without any retraining. We find that while the phone-based features perform slightly better than AF-based features in the matched-language condition, they perform significantly better in the cross-language condition. However, in the cross-language condition, neither approach is as effective as the tandem features extracted from an MLP trained on a relatively small amount of in-domain data. Beyond feature concatenation, we also explore novel factored observation modeling schemes that allow for greater flexibility in combining the tandem and standard features.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
单语与跨语比较,源自发音与电话MLPS的串联特征
多层感知器(MLP)的后验特征被称为串联特征,已被证明对自动语音识别非常有效。迄今为止,大多数串联功能都依赖于经过电话分类训练的mlp。我们最近在一个相对较小的数据集上展示了训练用于发音特征分类的mlp同样有效。在本文中,我们使用在更大的数据集(2000小时的英语会话电话语音)上训练的mlp进行了类似的比较。我们还探索了便携式电话和基于发音特征的串联功能如何在完全不同的语言中-普通话-无需任何再培训。我们发现,虽然基于手机的特征在匹配语言条件下的表现略好于基于人工智能的特征,但它们在跨语言条件下的表现明显更好。然而,在跨语言条件下,这两种方法都不如在相对少量的域内数据上训练的MLP提取的串联特征有效。除了特征串联之外,我们还探索了新的因子观测建模方案,该方案允许在组合串联和标准特征时具有更大的灵活性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predictive linear transforms for noise robust speech recognition Development of a phonetic system for large vocabulary Arabic speech recognition Error simulation for training statistical dialogue systems An enhanced minimum classification error learning framework for balancing insertion, deletion and substitution errors Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1