Four-in-One: a Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

2022 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2022-10-26 DOI:10.1109/SLT54892.2023.10023257

S.S. Tan, Piyush Behre, Nick Kibre, Issac Alphonso, Shuangyu Chang

引用次数: 3

Abstract

Features such as punctuation, capitalization, and formatting of entities are important for readability, understanding, and natural language processing tasks. However, Automatic Speech Recognition (ASR) systems produce spoken-form text devoid of formatting, and tagging approaches to formatting address just one or two features at a time. In this paper, we unify spoken-to-written text conversion via a two-stage process: First, we use a single transformer tagging model to jointly produce token-level tags for inverse text normalization (ITN), punctuation, capitalization, and disfluencies. Then, we apply the tags to generate written-form text and use weighted finite state transducer (WFST) grammars to format tagged ITN entity spans. Despite joining four models into one, our unified tagging approach matches or outperforms task-specific models across all four tasks on benchmark test sets across several domains.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

四合一:用于自动语音识别的反文本规范化、标点、大写和不流畅性的联合方法

标点符号、大写和实体格式等特性对于可读性、可理解性和自然语言处理任务都很重要。然而，自动语音识别(ASR)系统产生的是没有格式化的口语文本，而格式化的标记方法一次只能处理一两个特征。在本文中，我们通过两个阶段的过程统一了口头到书面的文本转换:首先，我们使用一个单一的转换器标记模型来联合生成逆文本规范化(ITN)、标点、大写和不流畅的令牌级标记。然后，我们应用标签来生成书面形式的文本，并使用加权有限状态传感器(WFST)语法来格式化标记的ITN实体跨度。尽管将四个模型合并为一个，但我们的统一标记方法在多个领域的基准测试集中，在所有四个任务上匹配或优于特定于任务的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量