Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-08-07 DOI:10.1109/O-COCOSDA46868.2019.9041202

B. Nguyen, V. H. Nguyen, Hien Nguyen, Pham Ngoc Phuong, The-Loc Nguyen, Quoc Truong Do, Luong Chi Mai

{"title":"Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging","authors":"B. Nguyen, V. H. Nguyen, Hien Nguyen, Pham Ngoc Phuong, The-Loc Nguyen, Quoc Truong Do, Luong Chi Mai","doi":"10.1109/O-COCOSDA46868.2019.9041202","DOIUrl":null,"url":null,"abstract":"In recent years, studies on automatic speech recognition (ASR) have shown outstanding results that reach human parity on short speech segments. However, there are still difficulties in standardizing the output of ASR such as capitalization and punctuation restoration for long-speech transcription. The problems obstruct readers to understand the ASR output semantically and also cause difficulties for natural language processing models such as NER, POS and semantic parsing. In this paper, we propose a method to restore the punctuation and capitalization for long-speech ASR transcription. The method is based on Transformer models and chunk merging that allows us to (1), build a single model that performs punctuation and capitalization in one go, and (2), perform decoding in parallel while improving the prediction accuracy. Experiments on British National Corpus showed that the proposed approach outperforms existing methods in both accuracy and decoding speed.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

In recent years, studies on automatic speech recognition (ASR) have shown outstanding results that reach human parity on short speech segments. However, there are still difficulties in standardizing the output of ASR such as capitalization and punctuation restoration for long-speech transcription. The problems obstruct readers to understand the ASR output semantically and also cause difficulties for natural language processing models such as NER, POS and semantic parsing. In this paper, we propose a method to restore the punctuation and capitalization for long-speech ASR transcription. The method is based on Transformer models and chunk merging that allows us to (1), build a single model that performs punctuation and capitalization in one go, and (2), perform decoding in parallel while improving the prediction accuracy. Experiments on British National Corpus showed that the proposed approach outperforms existing methods in both accuracy and decoding speed.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于变换和块合并的快速准确的语音自动识别的大写和标点

近年来，自动语音识别(ASR)的研究取得了显著的成果，在短语音段上达到了人类的水平。然而，对于长语音转录而言，在规范ASR的输出方面仍存在一些困难，如大写和标点恢复。这些问题阻碍了读者从语义上理解ASR输出，也给NER、POS等自然语言处理模型和语义解析带来了困难。在本文中，我们提出了一种恢复长语音ASR转录中标点和大写的方法。该方法基于Transformer模型和块合并，它允许我们(1)构建一个一次性执行标点和大写的单一模型，以及(2)在提高预测精度的同时并行执行解码。在英国国家语料库上的实验表明，该方法在准确率和解码速度上都优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)

自引率

0.00%

发文量

期刊最新文献

A Great Reduction of WER by Syllable Toneme Prediction for Thai Grapheme to Phoneme Conversion index The Architecture of Speech-to-Speech Translator for Mobile Conversation Characteristics of everyday conversation derived from the analysis of dialog act annotation Annotation and preliminary analysis of utterance decontextualization in a multiactivity