探索离散声学单元标记化的益处

Avihu Dekel, Raul Fernandez
{"title":"探索离散声学单元标记化的益处","authors":"Avihu Dekel, Raul Fernandez","doi":"arxiv-2406.05547","DOIUrl":null,"url":null,"abstract":"Tokenization algorithms that merge the units of a base vocabulary into\nlarger, variable-rate units have become standard in natural language processing\ntasks. This idea, however, has been mostly overlooked when the vocabulary\nconsists of phonemes or Discrete Acoustic Units (DAUs), an audio-based\nrepresentation that is playing an increasingly important role due to the\nsuccess of discrete language-modeling techniques. In this paper, we showcase\nthe advantages of tokenization of phonetic units and of DAUs on three\nprediction tasks: grapheme-to-phoneme, grapheme-to-DAUs, and unsupervised\nspeech generation using DAU language modeling. We demonstrate that tokenization\nyields significant improvements in terms of performance, as well as training\nand inference speed, across all three tasks. We also offer theoretical insights\nto provide some explanation for the superior performance observed.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring the Benefits of Tokenization of Discrete Acoustic Units\",\"authors\":\"Avihu Dekel, Raul Fernandez\",\"doi\":\"arxiv-2406.05547\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tokenization algorithms that merge the units of a base vocabulary into\\nlarger, variable-rate units have become standard in natural language processing\\ntasks. This idea, however, has been mostly overlooked when the vocabulary\\nconsists of phonemes or Discrete Acoustic Units (DAUs), an audio-based\\nrepresentation that is playing an increasingly important role due to the\\nsuccess of discrete language-modeling techniques. In this paper, we showcase\\nthe advantages of tokenization of phonetic units and of DAUs on three\\nprediction tasks: grapheme-to-phoneme, grapheme-to-DAUs, and unsupervised\\nspeech generation using DAU language modeling. We demonstrate that tokenization\\nyields significant improvements in terms of performance, as well as training\\nand inference speed, across all three tasks. We also offer theoretical insights\\nto provide some explanation for the superior performance observed.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.05547\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.05547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在自然语言处理任务中,将基础词汇单位合并为更大的、速率可变的单位的标记化算法已成为标准。然而,当词汇包含音素或离散声学单位(DAUs)时,这一想法大多被忽视了,由于离散语言建模技术的成功,基于音频的表述正发挥着越来越重要的作用。在本文中,我们展示了语音单位标记化和 DAUs 在三项预测任务中的优势:词素到词素、词素到 DAUs 以及使用 DAU 语言建模的无监督语音生成。我们证明,在所有三个任务中,标记化在性能、训练和推理速度方面都有显著提高。我们还提出了一些理论见解,为所观察到的卓越性能提供了一些解释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Tokenization algorithms that merge the units of a base vocabulary into larger, variable-rate units have become standard in natural language processing tasks. This idea, however, has been mostly overlooked when the vocabulary consists of phonemes or Discrete Acoustic Units (DAUs), an audio-based representation that is playing an increasingly important role due to the success of discrete language-modeling techniques. In this paper, we showcase the advantages of tokenization of phonetic units and of DAUs on three prediction tasks: grapheme-to-phoneme, grapheme-to-DAUs, and unsupervised speech generation using DAU language modeling. We demonstrate that tokenization yields significant improvements in terms of performance, as well as training and inference speed, across all three tasks. We also offer theoretical insights to provide some explanation for the superior performance observed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration Prevailing Research Areas for Music AI in the Era of Foundation Models Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1