Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU) Pub Date : 2007-12-01 DOI:10.1109/ASRU.2007.4430080

Ö. Çetin, M. Magimai.-Doss, Karen Livescu, Arthur Kantor, Simon King, C. Bartels, Joe Frankel

{"title":"Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS","authors":"Ö. Çetin, M. Magimai.-Doss, Karen Livescu, Arthur Kantor, Simon King, C. Bartels, Joe Frankel","doi":"10.1109/ASRU.2007.4430080","DOIUrl":null,"url":null,"abstract":"The features derived from posteriors of a multilayer perceptron (MLP), known as tandem features, have proven to be very effective for automatic speech recognition. Most tandem features to date have relied on MLPs trained for phone classification. We recently showed on a relatively small data set that MLPs trained for articulatory feature classification can be equally effective. In this paper, we provide a similar comparison using MLPs trained on a much larger data set -2000 hours of English conversational telephone speech. We also explore how portable phone-and articulatory feature-based tandem features are in an entirely different language - Mandarin - without any retraining. We find that while the phone-based features perform slightly better than AF-based features in the matched-language condition, they perform significantly better in the cross-language condition. However, in the cross-language condition, neither approach is as effective as the tandem features extracted from an MLP trained on a relatively small amount of in-domain data. Beyond feature concatenation, we also explore novel factored observation modeling schemes that allow for greater flexibility in combining the tandem and standard features.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2007.4430080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

Abstract

The features derived from posteriors of a multilayer perceptron (MLP), known as tandem features, have proven to be very effective for automatic speech recognition. Most tandem features to date have relied on MLPs trained for phone classification. We recently showed on a relatively small data set that MLPs trained for articulatory feature classification can be equally effective. In this paper, we provide a similar comparison using MLPs trained on a much larger data set -2000 hours of English conversational telephone speech. We also explore how portable phone-and articulatory feature-based tandem features are in an entirely different language - Mandarin - without any retraining. We find that while the phone-based features perform slightly better than AF-based features in the matched-language condition, they perform significantly better in the cross-language condition. However, in the cross-language condition, neither approach is as effective as the tandem features extracted from an MLP trained on a relatively small amount of in-domain data. Beyond feature concatenation, we also explore novel factored observation modeling schemes that allow for greater flexibility in combining the tandem and standard features.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

单语与跨语比较，源自发音与电话MLPS的串联特征

多层感知器(MLP)的后验特征被称为串联特征，已被证明对自动语音识别非常有效。迄今为止，大多数串联功能都依赖于经过电话分类训练的mlp。我们最近在一个相对较小的数据集上展示了训练用于发音特征分类的mlp同样有效。在本文中，我们使用在更大的数据集(2000小时的英语会话电话语音)上训练的mlp进行了类似的比较。我们还探索了便携式电话和基于发音特征的串联功能如何在完全不同的语言中-普通话-无需任何再培训。我们发现，虽然基于手机的特征在匹配语言条件下的表现略好于基于人工智能的特征，但它们在跨语言条件下的表现明显更好。然而，在跨语言条件下，这两种方法都不如在相对少量的域内数据上训练的MLP提取的串联特征有效。除了特征串联之外，我们还探索了新的因子观测建模方案，该方案允许在组合串联和标准特征时具有更大的灵活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)

自引率

0.00%

发文量