{"title":"Proper Error Estimation and Calibration for Attention-Based Encoder-Decoder Models","authors":"Mun-Hak Lee;Joon-Hyuk Chang","doi":"10.1109/TASLP.2024.3492799","DOIUrl":null,"url":null,"abstract":"An attention-based automatic speech recognition (ASR) model generates a probability distribution of the tokens set at each time step. Recent studies have shown that calibration errors exist in the output probability distributions of attention-based ASR models trained to minimize the negative log likelihood. This study analyzes the causes of calibration errors in ASR model outputs and their impact on model performance. Based on the analysis, we argue that conventional methods for estimating calibration errors at the token level are unsuitable for ASR tasks. Accordingly, we propose a new calibration measure that estimates the calibration error at the sequence level. Moreover, we present a new post-hoc calibration function and training objective to mitigate the calibration error of the ASR model at the sequence level. Through experiments using the ASR benchmark, we show that the proposed methods effectively alleviate the calibration error of the ASR model and improve the generalization performance.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"4919-4930"},"PeriodicalIF":4.1000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745647/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
An attention-based automatic speech recognition (ASR) model generates a probability distribution of the tokens set at each time step. Recent studies have shown that calibration errors exist in the output probability distributions of attention-based ASR models trained to minimize the negative log likelihood. This study analyzes the causes of calibration errors in ASR model outputs and their impact on model performance. Based on the analysis, we argue that conventional methods for estimating calibration errors at the token level are unsuitable for ASR tasks. Accordingly, we propose a new calibration measure that estimates the calibration error at the sequence level. Moreover, we present a new post-hoc calibration function and training objective to mitigate the calibration error of the ASR model at the sequence level. Through experiments using the ASR benchmark, we show that the proposed methods effectively alleviate the calibration error of the ASR model and improve the generalization performance.
基于注意力的自动语音识别(ASR)模型会在每个时间步生成标记集的概率分布。最近的研究表明,为最小化负对数似然而训练的注意力型 ASR 模型的输出概率分布存在校准误差。本研究分析了 ASR 模型输出校准误差的原因及其对模型性能的影响。根据分析结果,我们认为在标记水平上估计校准误差的传统方法不适合 ASR 任务。因此,我们提出了一种新的校准测量方法,可以估计序列级别的校准误差。此外,我们还提出了一种新的事后校准函数和训练目标,以减轻 ASR 模型在序列层面的校准误差。通过使用 ASR 基准进行实验,我们发现所提出的方法有效地减轻了 ASR 模型的校准误差,并提高了泛化性能。
期刊介绍:
The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.