Nozomi - a fast, memory-efficient stack decoder for LVCSR

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI:10.21437/ICSLP.1998-627

M. Schuster

引用次数: 5

Abstract

This paper describes some of the implementation details of the \Nozomi" 1 stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3-gram LM trained on the RWC text corpus, both models provided by the IPA group [7], it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a disk-based LM the memory usage could be optimized to 4 MB in total.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Nozomi -一个快速，内存高效的LVCSR堆栈解码器

本文介绍了用于LVCSR的\Nozomi" 1堆栈解码器的一些实现细节。该解码器在一个使用5000个单词的日语报纸听写任务中进行了测试。使用IPA组[7]提供的在JNAS/ASJ语料库上训练的2000和3000状态的连续密度声学模型和在RWC文本语料库上训练的3克LM模型，在标准测试集上可以达到95%以上的单词准确率。使用计算成本低廉的声学模型，我们可以在300 Mhz的Pentium II上几乎实时地达到89%的精度。使用基于磁盘的LM，内存使用可以优化到总共4 MB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

5th International Conference on Spoken Language Processing (ICSLP 1998)

自引率

0.00%

发文量

期刊最新文献

Assimilation of place in Japanese and dutch Articulatory analysis using a codebook for articulatory based low bit-rate speech coding Phonetic and phonological characteristics of paralinguistic information in spoken Japanese HMM-based visual speech recognition using intensity and location normalization Speech recognition via phonetically featured syllables