Recognition and translation of code-switching speech utterances

Sahoko Nakayama, Takatomo Kano, Andros Tjandra, S. Sakti, Satoshi Nakamura
{"title":"Recognition and translation of code-switching speech utterances","authors":"Sahoko Nakayama, Takatomo Kano, Andros Tjandra, S. Sakti, Satoshi Nakamura","doi":"10.1109/O-COCOSDA46868.2019.9060847","DOIUrl":null,"url":null,"abstract":"Code-switching (CS), a hallmark of worldwide bilingual communities, refers to a strategy adopted by bilinguals (or multilinguals) who mix two or more languages in a discourse often with little change of interlocutor or topic. The units and the locations of the switches may vary widely from single-word switches to whole phrases (beyond the length of the loanword units). Such phenomena pose challenges for spoken language technologies, i.e., automatic speech recognition (ASR), since the systems need to be able to handle the input in a multilingual setting. Several works constructed a CS ASR on many different language pairs. But the common aim of developing a CS ASR is merely for transcribing CS-speech utterances into CS-text sentences within a single individual. In contrast, in this study, we address the situational context that happens during dialogs between CS and non-CS (monolingual) speakers and support monolingual speakers who want to understand CS speakers. We construct a system that recognizes and translates from codeswitching speech to monolingual text. We investigated several approaches, including a cascade of ASR and a neural machine translation (NMT), a cascade of ASR and a deep bidirectional language model (BERT), an ASR that directly outputs monolingual transcriptions from CS speech, and multi-task learning. Finally, we evaluate and discuss these four ways on a Japanese- English CS to English monolingual task.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9060847","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Code-switching (CS), a hallmark of worldwide bilingual communities, refers to a strategy adopted by bilinguals (or multilinguals) who mix two or more languages in a discourse often with little change of interlocutor or topic. The units and the locations of the switches may vary widely from single-word switches to whole phrases (beyond the length of the loanword units). Such phenomena pose challenges for spoken language technologies, i.e., automatic speech recognition (ASR), since the systems need to be able to handle the input in a multilingual setting. Several works constructed a CS ASR on many different language pairs. But the common aim of developing a CS ASR is merely for transcribing CS-speech utterances into CS-text sentences within a single individual. In contrast, in this study, we address the situational context that happens during dialogs between CS and non-CS (monolingual) speakers and support monolingual speakers who want to understand CS speakers. We construct a system that recognizes and translates from codeswitching speech to monolingual text. We investigated several approaches, including a cascade of ASR and a neural machine translation (NMT), a cascade of ASR and a deep bidirectional language model (BERT), an ASR that directly outputs monolingual transcriptions from CS speech, and multi-task learning. Finally, we evaluate and discuss these four ways on a Japanese- English CS to English monolingual task.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
语码转换语音的识别与翻译
语码转换(Code-switching, CS)是全球双语社区的一个特征,是指双语者(或多语者)在对话者或话题很少改变的情况下,将两种或两种以上语言混合在一个话语中所采用的一种策略。开关的单位和位置可能变化很大,从单个单词开关到整个短语(超出外来词单位的长度)。这种现象对口语技术,即自动语音识别(ASR)提出了挑战,因为系统需要能够处理多语言设置中的输入。一些作品在许多不同的语言对上构建了CS ASR。但是,开发CS语音识别系统的共同目标仅仅是将单个个体的CS语音话语转录成CS文本句子。相比之下,在本研究中,我们关注的是CS和非CS(单语)说话者之间对话时的情景语境,并支持单语说话者想要理解CS说话者。我们构建了一个识别和翻译从码转换语音到单语文本的系统。我们研究了几种方法,包括ASR级联和神经机器翻译(NMT), ASR级联和深度双向语言模型(BERT),直接从CS语音输出单语转录的ASR,以及多任务学习。最后,我们对这四种方法进行了评价和讨论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Great Reduction of WER by Syllable Toneme Prediction for Thai Grapheme to Phoneme Conversion index The Architecture of Speech-to-Speech Translator for Mobile Conversation Characteristics of everyday conversation derived from the analysis of dialog act annotation Annotation and preliminary analysis of utterance decontextualization in a multiactivity
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1