Concatenated phoneme models for text-variable speaker recognition

1993 IEEE International Conference on Acoustics, Speech, and Signal Processing Pub Date : 1993-04-27 DOI:10.1109/ICASSP.1993.319321

Tomoko Matsui, S. Furui

引用次数: 134

Abstract

Methods that create models to specify both speaker and phonetic information accurately by using only a small amount of training data for each speaker are investigated. For a text-dependent speaker recognition method, in which arbitrary key texts are prompted from the recognizer, speaker-specific phoneme models are necessary to identify the key text and recognize the speaker. Two methods of making speaker-specific phoneme models are discussed: phoneme-adaptation of a phoneme-independent speaker model and speaker-adaptation of universal phoneme models. The authors also investigate supplementing these methods by adding a phoneme-independent speaker model to make up for the lack of speaker information. This combination achieves a rejection rate as high as 98.5% for speech that differs from the key text and a speaker verification rate of 100.0%.<>

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

文本变量说话人识别的连接音素模型

研究了利用少量的训练数据建立模型来准确指定说话人和语音信息的方法。基于文本的说话人识别方法是由识别器提示任意关键文本，需要特定于说话人的音素模型来识别关键文本并识别说话人。讨论了两种建立说话人特定音素模型的方法:独立音素的说话人音素适应模型和通用音素模型的说话人适应模型。作者还研究了通过添加与音素无关的说话人模型来补充这些方法，以弥补说话人信息的缺失。这种组合实现了与关键文本不同的语音的拒绝率高达98.5%，以及100.0%的说话人验证率

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

1993 IEEE International Conference on Acoustics, Speech, and Signal Processing

自引率

0.00%

发文量