{"title":"Exploring the Benefits of Tokenization of Discrete Acoustic Units","authors":"Avihu Dekel, Raul Fernandez","doi":"arxiv-2406.05547","DOIUrl":null,"url":null,"abstract":"Tokenization algorithms that merge the units of a base vocabulary into\nlarger, variable-rate units have become standard in natural language processing\ntasks. This idea, however, has been mostly overlooked when the vocabulary\nconsists of phonemes or Discrete Acoustic Units (DAUs), an audio-based\nrepresentation that is playing an increasingly important role due to the\nsuccess of discrete language-modeling techniques. In this paper, we showcase\nthe advantages of tokenization of phonetic units and of DAUs on three\nprediction tasks: grapheme-to-phoneme, grapheme-to-DAUs, and unsupervised\nspeech generation using DAU language modeling. We demonstrate that tokenization\nyields significant improvements in terms of performance, as well as training\nand inference speed, across all three tasks. We also offer theoretical insights\nto provide some explanation for the superior performance observed.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.05547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Tokenization algorithms that merge the units of a base vocabulary into
larger, variable-rate units have become standard in natural language processing
tasks. This idea, however, has been mostly overlooked when the vocabulary
consists of phonemes or Discrete Acoustic Units (DAUs), an audio-based
representation that is playing an increasingly important role due to the
success of discrete language-modeling techniques. In this paper, we showcase
the advantages of tokenization of phonetic units and of DAUs on three
prediction tasks: grapheme-to-phoneme, grapheme-to-DAUs, and unsupervised
speech generation using DAU language modeling. We demonstrate that tokenization
yields significant improvements in terms of performance, as well as training
and inference speed, across all three tasks. We also offer theoretical insights
to provide some explanation for the superior performance observed.