Streaming model for Acoustic to Articulatory Inversion with transformer networks

Sathvik Udupa, Aravind Illa, P. Ghosh
{"title":"Streaming model for Acoustic to Articulatory Inversion with transformer networks","authors":"Sathvik Udupa, Aravind Illa, P. Ghosh","doi":"10.21437/interspeech.2022-10159","DOIUrl":null,"url":null,"abstract":"Estimating speech articulatory movements from speech acoustics is known as Acoustic to Articulatory Inversion (AAI). Recently, transformer-based AAI models have been shown to achieve state-of-art performance. However, in transformer networks, the attention is applied over the whole utterance, thereby needing to obtain the full utterance before the inference, which leads to high latency and is impractical for streaming AAI. To enable streaming during inference, evaluation could be performed on non-overlapping chucks instead of a full utterance. However, due to a mismatch of the attention receptive field during training and evaluation, there could be a drop in AAI performance. To overcome this scenario, in this work we perform experiments with different attention masks and use context from previous predictions during training. Experiments results revealed that using the random start mask attention with the context from previous predictions of transformer decoder performs better than the baseline results.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"625-629"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-10159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Estimating speech articulatory movements from speech acoustics is known as Acoustic to Articulatory Inversion (AAI). Recently, transformer-based AAI models have been shown to achieve state-of-art performance. However, in transformer networks, the attention is applied over the whole utterance, thereby needing to obtain the full utterance before the inference, which leads to high latency and is impractical for streaming AAI. To enable streaming during inference, evaluation could be performed on non-overlapping chucks instead of a full utterance. However, due to a mismatch of the attention receptive field during training and evaluation, there could be a drop in AAI performance. To overcome this scenario, in this work we perform experiments with different attention masks and use context from previous predictions during training. Experiments results revealed that using the random start mask attention with the context from previous predictions of transformer decoder performs better than the baseline results.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于变压器网络的声-铰接反演流模型
从语音声学中估计语音发音运动被称为声学到发音反转(AAI)。最近,基于变压器的AAI模型已被证明能够实现最先进的性能。然而,在变压器网络中,注意力集中在整个话语上,因此需要在推理之前获得完整的话语,这导致了高延迟,并且不适合流式AAI。为了在推理过程中实现流,可以在不重叠的卡盘上执行评估,而不是在完整的话语上执行评估。然而,由于在训练和评估过程中注意接受野的不匹配,AAI的表现可能会下降。为了克服这种情况,在这项工作中,我们使用不同的注意力面具进行实验,并在训练期间使用先前预测的上下文。实验结果表明,将随机开始掩码注意与变压器解码器先前预测的上下文结合使用,效果优于基线结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Contrastive Learning Approach for Assessment of Phonological Precision in Patients with Tongue Cancer Using MRI Data. Segmental and Suprasegmental Speech Foundation Models for Classifying Cognitive Risk Factors: Evaluating Out-of-the-Box Performance. How Does Alignment Error Affect Automated Pronunciation Scoring in Children's Speech? Comparing ambulatory voice measures during daily life with brief laboratory assessments in speakers with and without vocal hyperfunction. YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1