Spoken-to-written text conversion for enhancement of Korean–English readability and machine translation

IF 1.6 4区计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC ETRI Journal Pub Date : 2024-02-28 DOI:10.4218/etrij.2023-0354

HyunJung Choi, Muyeol Choi, Seonhui Kim, Yohan Lim, Minkyu Lee, Seung Yun, Donghyun Kim, Sang Hun Kim

引用次数: 0

Abstract

The Korean language has written (formal) and spoken (phonetic) forms that differ in their application, which can lead to confusion, especially when dealing with numbers and embedded Western words and phrases. This fact makes it difficult to automate Korean speech recognition models due to the need for a complete transcription training dataset. Because such datasets are frequently constructed using broadcast audio and their accompanying transcriptions, they do not follow a discrete rule-based matching pattern. Furthermore, these mismatches are exacerbated over time due to changing tacit policies. To mitigate this problem, we introduce a data-driven Korean spoken-to-written transcription conversion technique that enhances the automatic conversion of numbers and Western phrases to improve automatic translation model performance.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为提高韩英可读性和机器翻译而进行的口语到书面文本转换

韩语的书面（正式）和口语（语音）形式在应用上有所不同，这可能会导致混淆，尤其是在处理数字和嵌入式西方单词和短语时。由于需要完整的转录训练数据集，这一事实使得韩语语音识别模型难以实现自动化。由于此类数据集通常使用广播音频及其附带的转录来构建，因此它们并不遵循基于规则的离散匹配模式。此外，随着时间的推移，这些不匹配会因默许政策的变化而加剧。为了缓解这一问题，我们引入了一种数据驱动的韩语口语到书面语转录转换技术，该技术可增强数字和西方短语的自动转换，从而提高自动翻译模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ETRI Journal 工程技术-电信学

CiteScore

4.00

自引率

7.10%

发文量

审稿时长

6.9 months

期刊介绍： ETRI Journal is an international, peer-reviewed multidisciplinary journal published bimonthly in English. The main focus of the journal is to provide an open forum to exchange innovative ideas and technology in the fields of information, telecommunications, and electronics. Key topics of interest include high-performance computing, big data analytics, cloud computing, multimedia technology, communication networks and services, wireless communications and mobile computing, material and component technology, as well as security. With an international editorial committee and experts from around the world as reviewers, ETRI Journal publishes high-quality research papers on the latest and best developments from the global community.

期刊最新文献

Issue Information 2025 Reviewer List Correction to “PartitionTuner: An operator scheduler for deep-learning compilers supporting multiple heterogeneous processing units” Issue Information Correction to “CRFNet: Context ReFinement Network used for semantic segmentation”