首页 > 最新文献

2019 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
A New Method of Tonal Determination for Chinese Dialects 汉语方言声调确定的新方法
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037711
Yan Li, Zhiyi Wu
Values of the basic tones are the key to do research on dialects in China. The traditional method of determining tones by ear and the more popular method used in experimental phonetics are either inaccurate to some degree or difficult to learn. The method provided and discussed in this paper is simple and reliable, requiring the use of only Praat and fundamental frequency value. More examples are given to prove this method’s effectiveness.
基本声调的价值是研究中国方言的关键。传统的靠耳定音法和实验语音学中较为流行的定音法在一定程度上是不准确的,或者是难以学习的。本文所提供和讨论的方法简单可靠,只需要使用Praat和基频值。通过实例验证了该方法的有效性。
{"title":"A New Method of Tonal Determination for Chinese Dialects","authors":"Yan Li, Zhiyi Wu","doi":"10.1109/IALP48816.2019.9037711","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037711","url":null,"abstract":"Values of the basic tones are the key to do research on dialects in China. The traditional method of determining tones by ear and the more popular method used in experimental phonetics are either inaccurate to some degree or difficult to learn. The method provided and discussed in this paper is simple and reliable, requiring the use of only Praat and fundamental frequency value. More examples are given to prove this method’s effectiveness.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"103 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123525775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Context’s Diversity to Improve Neural Language Model 探索上下文的多样性以改进神经语言模型
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037662
Yanchun Zhang, Xingyuan Chen, Peng Jin, Yajun Du
The neural language models (NLMs), such as long short term memery networks (LSTMs), have achieved great success over the years. However the NLMs usually only minimize a loss between the prediction results and the target words. In fact, the context has natural diversity, i.e. there are few words that could occur more than once in a certain length of word sequence. We report the natural diversity as context’s diversity in this paper. The context’s diversity, in our model, means there is a high probability that the target words predicted by any two contexts are different given a fixed input sequence. Namely the softmax results of any two contexts should be diverse. Based on this observation, we propose a new cross-entropy loss function which is used to calculate the cross-entropy loss of the softmax outputs for any two different given contexts. Adding the new cross-entropy loss, our approach could explicitly consider the context’s diversity, therefore improving the model’s sensitivity of prediction for every context. Based on two typical LSTM models, one is regularized by dropout while the other is not, the results of our experiment show its effectiveness on the benchmark dataset.
神经语言模型(nlm),如长短期记忆网络(lstm),近年来取得了巨大的成功。然而,nlm通常只能将预测结果与目标词之间的损失最小化。事实上,上下文具有天然的多样性,即在一定长度的单词序列中,很少有单词可以出现一次以上。本文将自然多样性称为环境多样性。在我们的模型中,上下文的多样性意味着给定一个固定的输入序列,任意两个上下文预测的目标单词很可能是不同的。即任意两个上下文的softmax结果应该是不同的。基于这一观察,我们提出了一个新的交叉熵损失函数,用于计算任意两个不同给定上下文下softmax输出的交叉熵损失。加入新的交叉熵损失后,我们的方法可以明确地考虑上下文的多样性,从而提高模型对每个上下文的预测灵敏度。基于两种典型的LSTM模型,一种采用dropout正则化,另一种不采用dropout正则化,实验结果表明了该方法在基准数据集上的有效性。
{"title":"Exploring Context’s Diversity to Improve Neural Language Model","authors":"Yanchun Zhang, Xingyuan Chen, Peng Jin, Yajun Du","doi":"10.1109/IALP48816.2019.9037662","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037662","url":null,"abstract":"The neural language models (NLMs), such as long short term memery networks (LSTMs), have achieved great success over the years. However the NLMs usually only minimize a loss between the prediction results and the target words. In fact, the context has natural diversity, i.e. there are few words that could occur more than once in a certain length of word sequence. We report the natural diversity as context’s diversity in this paper. The context’s diversity, in our model, means there is a high probability that the target words predicted by any two contexts are different given a fixed input sequence. Namely the softmax results of any two contexts should be diverse. Based on this observation, we propose a new cross-entropy loss function which is used to calculate the cross-entropy loss of the softmax outputs for any two different given contexts. Adding the new cross-entropy loss, our approach could explicitly consider the context’s diversity, therefore improving the model’s sensitivity of prediction for every context. Based on two typical LSTM models, one is regularized by dropout while the other is not, the results of our experiment show its effectiveness on the benchmark dataset.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124245898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Comparative Analysis of Acoustic Characteristics between Kazak & Uyghur Mandarin Learners and Standard Mandarin Speakers 哈萨克族、维吾尔族普通话学习者与标准普通话使用者的声学特征比较分析
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037703
Gulnur Arkin, Gvljan Alijan, A. Hamdulla, Mijit Ablimit
In this paper, based on the vowel and phonological pronunciation corpora of 20 Kazakh undergraduate Mandarin learners, 10 Uyghur learners, and 10 standard pronunciations, under the framework of the phonetic learning model and comparative analysis, the method of experimental phonetics will be applied to the Kazak and Uyghur learners. The learners and standard speaker Mandarin vowels were analyzed for acoustic characteristics, such as formant frequency values, the vowel duration similarity and other prosodic parameters were compared with the standard speaker. These results are conducive to providing learners with effective teaching-related reference information, providing reliable and correct parameters and pronunciation assessments for computer-assisted language teaching systems (CALLs), as well as improving the accuracy of multinational Chinese Putonghua speech recognition and ethnic identification.
本文以20名哈萨克语本科生普通话学习者、10名维吾尔语学习者和10个标准语音的元音和语音语料库为基础,在语音学习模型和对比分析的框架下,将实验语音学方法应用于哈萨克语和维吾尔语学习者。分析学习者和标准说话者普通话元音的声学特征,如形成峰频率值、元音音长相似度和其他韵律参数与标准说话者进行比较。这些结果有利于为学习者提供有效的教学相关参考信息,为计算机辅助语言教学系统(call)提供可靠、正确的参数和语音评估,以及提高跨国汉语普通话语音识别和民族识别的准确性。
{"title":"A Comparative Analysis of Acoustic Characteristics between Kazak & Uyghur Mandarin Learners and Standard Mandarin Speakers","authors":"Gulnur Arkin, Gvljan Alijan, A. Hamdulla, Mijit Ablimit","doi":"10.1109/IALP48816.2019.9037703","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037703","url":null,"abstract":"In this paper, based on the vowel and phonological pronunciation corpora of 20 Kazakh undergraduate Mandarin learners, 10 Uyghur learners, and 10 standard pronunciations, under the framework of the phonetic learning model and comparative analysis, the method of experimental phonetics will be applied to the Kazak and Uyghur learners. The learners and standard speaker Mandarin vowels were analyzed for acoustic characteristics, such as formant frequency values, the vowel duration similarity and other prosodic parameters were compared with the standard speaker. These results are conducive to providing learners with effective teaching-related reference information, providing reliable and correct parameters and pronunciation assessments for computer-assisted language teaching systems (CALLs), as well as improving the accuracy of multinational Chinese Putonghua speech recognition and ethnic identification.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"53 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130861347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Letter’s Differences between Partial Indonesian Branch Language and English 部分印尼语与英语字母差异探析
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037715
Nankai Lin, Sihui Fu, Jiawen Huang, Sheng-yi Jiang
Differences of letter usage are the most basic differences between different languages, which can reflect the most essential diversity. Many linguists study the letter differences between common languages, but seldom research those between non-common languages. This paper selects three representative languages from the Indonesian branch of the Austronesian language family, namely Malay, Indonesian and Filipino. To study the letter differences between these three languages and English, we concentrate on word length distribution, letter frequency distribution, commonly used letter pairs, commonly used letter trigrams, and ranked letter frequency distribution. The results show that great differences do exist between three Indonesian-branch languages and English, and the differences between Malay and Indonesian are the smallest.
字母用法的差异是不同语言之间最基本的差异,它能反映出最本质的多样性。许多语言学家研究通用语言之间的字母差异,但很少研究非通用语言之间的字母差异。本文选取了南岛语系印尼语分支的三种代表性语言,即马来语、印尼语和菲律宾语。为了研究这三种语言与英语之间的字母差异,我们重点研究了单词长度分布、字母频率分布、常用字母对、常用字母三元组和字母频率排名分布。结果表明,三个印尼语分支语言与英语之间存在很大差异,马来语和印尼语之间的差异最小。
{"title":"Exploring Letter’s Differences between Partial Indonesian Branch Language and English","authors":"Nankai Lin, Sihui Fu, Jiawen Huang, Sheng-yi Jiang","doi":"10.1109/IALP48816.2019.9037715","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037715","url":null,"abstract":"Differences of letter usage are the most basic differences between different languages, which can reflect the most essential diversity. Many linguists study the letter differences between common languages, but seldom research those between non-common languages. This paper selects three representative languages from the Indonesian branch of the Austronesian language family, namely Malay, Indonesian and Filipino. To study the letter differences between these three languages and English, we concentrate on word length distribution, letter frequency distribution, commonly used letter pairs, commonly used letter trigrams, and ranked letter frequency distribution. The results show that great differences do exist between three Indonesian-branch languages and English, and the differences between Malay and Indonesian are the smallest.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129712179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the Etymology of he ‘river’ in Chinese 论汉语“河”的词源
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037654
Huibin Zhuang, Zhanting Bu
In Chinese he 河 ‘river’ can be used as proper names (for the Yellow River), as well as a common word for rivers in North China. Based on linguistic data, ethnological evidence and historical documents, this paper argues against these leading hypotheses and proposes that he originated from the Old Yi language, entered Chinese through language contact, and replaced shui which was from Old Qiang and later became the only common noun for river in North China.
在汉语中,“河”既可以作为黄河的专有名称,也可以作为中国北方河流的常用词。本文根据语言学资料、民族学证据和历史文献,反驳了这些主流假说,提出“水”起源于古彝语,通过语言接触进入汉语,并取代了古羌语中的“水”,成为华北地区唯一常用的“河”名词。
{"title":"On the Etymology of he ‘river’ in Chinese","authors":"Huibin Zhuang, Zhanting Bu","doi":"10.1109/IALP48816.2019.9037654","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037654","url":null,"abstract":"In Chinese he 河 ‘river’ can be used as proper names (for the Yellow River), as well as a common word for rivers in North China. Based on linguistic data, ethnological evidence and historical documents, this paper argues against these leading hypotheses and proposes that he originated from the Old Yi language, entered Chinese through language contact, and replaced shui which was from Old Qiang and later became the only common noun for river in North China.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122344303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diachronic Synonymy and Polysemy: Exploring Dynamic Relation Between Forms and Meanings of Words Based on Word Embeddings 历时同义与多义:基于词嵌入的词形与词义动态关系研究
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037663
Shichen Liang, Jianyu Zheng, Xuemei Tang, Renfen Hu, Zhiying Liu
In recent years, there has been a large number of publications that use distributed methods to track temporal changes in lexical semantics. However, most current researches only state the simple fact that the meaning of words has changed, lacking more detailed and in-depth analysis. We combine linguistic theory and word embedding model to study Chinese diachronic semantics. Specifically, two methods of word analogy and word similarity are associated with diachronic synonymy and diachronic polysemy respectively, and the aligned diachronic word embeddings are used to detect the changes of relationship between forms and meanings of words. Through experiments and case studies, our method achieves the ideal result. We also find that the evolution of Chinese vocabulary is closely related to social development, and there is a certain correlation between the polysemy and synonymy of the word meaning.
近年来,已经有大量的出版物使用分布式方法来跟踪词汇语义的时间变化。然而,目前的研究大多只陈述了词语意义变化的简单事实,缺乏更详细和深入的分析。本文结合语言学理论和词嵌入模型对汉语历时语义进行了研究。具体而言,分别将词的类比和词的相似两种方法与历时同义词和历时多义相关联,并利用对齐的历时词嵌入来检测词的形式和意义关系的变化。通过实验和案例分析,该方法取得了理想的效果。我们还发现,汉语词汇的演变与社会发展密切相关,词义的多义词和同义词之间存在一定的相关性。
{"title":"Diachronic Synonymy and Polysemy: Exploring Dynamic Relation Between Forms and Meanings of Words Based on Word Embeddings","authors":"Shichen Liang, Jianyu Zheng, Xuemei Tang, Renfen Hu, Zhiying Liu","doi":"10.1109/IALP48816.2019.9037663","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037663","url":null,"abstract":"In recent years, there has been a large number of publications that use distributed methods to track temporal changes in lexical semantics. However, most current researches only state the simple fact that the meaning of words has changed, lacking more detailed and in-depth analysis. We combine linguistic theory and word embedding model to study Chinese diachronic semantics. Specifically, two methods of word analogy and word similarity are associated with diachronic synonymy and diachronic polysemy respectively, and the aligned diachronic word embeddings are used to detect the changes of relationship between forms and meanings of words. Through experiments and case studies, our method achieves the ideal result. We also find that the evolution of Chinese vocabulary is closely related to social development, and there is a certain correlation between the polysemy and synonymy of the word meaning.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126248466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Japanese-English Bilingual Mapping of Word Embeddings based on Language Specificity 基于语言特异性的日英双语词嵌入映射改进
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037649
Yuting Song, Biligsaikhan Batjargal, Akira Maeda
Recently, cross-lingual word embeddings have attracted a lot of attention, because they can capture semantic meaning of words across languages, which can be applied to cross-lingual tasks. Most methods learn a single mapping (e.g., a linear mapping) to transform word embeddings space from one language to another. In this paper, we propose an advanced method for improving bilingual word embeddings by adding a language-specific mapping. We focus on learning Japanese-English bilingual word embedding mapping by considering the specificity of Japanese language. On a benchmark data set of JapaneseEnglish bilingual lexicon induction, the proposed method achieved competitive performance compared to the method using a single mapping, with better results being found on original Japanese words.
近年来,跨语言词嵌入技术因其能够捕获跨语言词的语义而备受关注,并可应用于跨语言任务中。大多数方法学习单一映射(例如,线性映射)来将词嵌入空间从一种语言转换为另一种语言。在本文中,我们提出了一种通过添加语言特定映射来改进双语词嵌入的高级方法。考虑到日语语言的特殊性,重点研究日英双语词嵌入映射的学习。在日英双语词汇归纳的基准数据集上,与使用单一映射的方法相比,所提出的方法取得了具有竞争力的性能,并且在原始日语单词上发现了更好的结果。
{"title":"Improving Japanese-English Bilingual Mapping of Word Embeddings based on Language Specificity","authors":"Yuting Song, Biligsaikhan Batjargal, Akira Maeda","doi":"10.1109/IALP48816.2019.9037649","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037649","url":null,"abstract":"Recently, cross-lingual word embeddings have attracted a lot of attention, because they can capture semantic meaning of words across languages, which can be applied to cross-lingual tasks. Most methods learn a single mapping (e.g., a linear mapping) to transform word embeddings space from one language to another. In this paper, we propose an advanced method for improving bilingual word embeddings by adding a language-specific mapping. We focus on learning Japanese-English bilingual word embedding mapping by considering the specificity of Japanese language. On a benchmark data set of JapaneseEnglish bilingual lexicon induction, the proposed method achieved competitive performance compared to the method using a single mapping, with better results being found on original Japanese words.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133367257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Extremely Low Resource Text simplification with Pre-trained Transformer Language Model 极低的资源文本简化与预训练的转换语言模型
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037650
T. Maruyama, Kazuhide Yamamoto
Recent text simplification approaches regard the task as a monolingual text-to-text generation inspired by machine translation. In particular, the transformer-based translation model outperform previous methods. Although machine translation approaches need a large-scale parallel corpus, parallel corpora for text simplification are very small compared to machine translation tasks. Therefore, we attempt a simple approach which fine-tunes the pre-trained language model for text simplification with a small parallel corpus. Specifically, we conduct experiments with the following two models: transformer-based encoder-decoder model and a language model that receives a joint input of original and simplified sentences, called TransformerLM. Thus, we show that TransformerLM, which is a simple text generation model, substantially outperforms a strong baseline. In addition, we show that fine-tuned TransformerLM with only 3,000 supervised examples can achieve performance comparable to a strong baseline trained by all supervised data.
最近的文本简化方法将任务视为受机器翻译启发的单语文本到文本生成。特别是,基于变压器的翻译模型优于以前的方法。虽然机器翻译方法需要大规模的并行语料库,但与机器翻译任务相比,用于文本简化的并行语料库非常小。因此,我们尝试了一种简单的方法,该方法对预训练的语言模型进行微调,以使用小型并行语料库进行文本简化。具体来说,我们用以下两个模型进行了实验:基于转换器的编码器-解码器模型和接收原始和简化句子联合输入的语言模型,称为TransformerLM。因此,我们展示了TransformerLM,它是一个简单的文本生成模型,在本质上优于一个强大的基线。此外,我们还表明,仅使用3,000个监督示例进行微调的TransformerLM可以达到与所有监督数据训练的强基线相当的性能。
{"title":"Extremely Low Resource Text simplification with Pre-trained Transformer Language Model","authors":"T. Maruyama, Kazuhide Yamamoto","doi":"10.1109/IALP48816.2019.9037650","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037650","url":null,"abstract":"Recent text simplification approaches regard the task as a monolingual text-to-text generation inspired by machine translation. In particular, the transformer-based translation model outperform previous methods. Although machine translation approaches need a large-scale parallel corpus, parallel corpora for text simplification are very small compared to machine translation tasks. Therefore, we attempt a simple approach which fine-tunes the pre-trained language model for text simplification with a small parallel corpus. Specifically, we conduct experiments with the following two models: transformer-based encoder-decoder model and a language model that receives a joint input of original and simplified sentences, called TransformerLM. Thus, we show that TransformerLM, which is a simple text generation model, substantially outperforms a strong baseline. In addition, we show that fine-tuned TransformerLM with only 3,000 supervised examples can achieve performance comparable to a strong baseline trained by all supervised data.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116837582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Neural Machine Translation Strategies for Generating Honorific-style Korean 敬语式韩语的神经机器翻译策略研究
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037681
Lijie Wang, Mei Tu, Mengxia Zhai, Huadong Wang, Song Liu, Sang Ha Kim
Expression with honorifics is an important way of dressing up the language and showing politeness in Korean. For machine translation, generating honorifics is indispensable on the formal occasion when the target language is Korean. However, current Neural Machine Translation (NMT) models ignore generation of honorifics, which causes the limitation of the MT application on business occasion. In order to address the problem, this paper presents two strategies to improve Korean honorific generation ratio: 1) we introduce honorific fusion training (HFT) loss under the minimum risk training framework to guide the model to generate honorifics; 2) we introduce a data labeling (DL) method which tags the training corpus with distinctive labels without any modification to the model structure. Our experimental results show that the proposed two strategies can significantly improve the honorific generation ratio by 34.35% and 45.59%.
韩国语的敬语表达是修饰语言、表现礼貌的重要方式。对于机器翻译来说,在目的语为韩语的正式场合,敬语的生成是必不可少的。然而,目前的神经机器翻译模型忽略了敬语的生成,这限制了机器翻译在商务场合的应用。针对这一问题,本文提出了提高韩语敬语生成率的两种策略:1)引入最小风险训练框架下的敬语融合训练(HFT)损失来指导模型生成敬语;2)引入数据标注(DL)方法,在不改变模型结构的情况下对训练语料库进行标注。实验结果表明,两种策略均能显著提高敬语生成率,分别提高34.35%和45.59%。
{"title":"Neural Machine Translation Strategies for Generating Honorific-style Korean","authors":"Lijie Wang, Mei Tu, Mengxia Zhai, Huadong Wang, Song Liu, Sang Ha Kim","doi":"10.1109/IALP48816.2019.9037681","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037681","url":null,"abstract":"Expression with honorifics is an important way of dressing up the language and showing politeness in Korean. For machine translation, generating honorifics is indispensable on the formal occasion when the target language is Korean. However, current Neural Machine Translation (NMT) models ignore generation of honorifics, which causes the limitation of the MT application on business occasion. In order to address the problem, this paper presents two strategies to improve Korean honorific generation ratio: 1) we introduce honorific fusion training (HFT) loss under the minimum risk training framework to guide the model to generate honorifics; 2) we introduce a data labeling (DL) method which tags the training corpus with distinctive labels without any modification to the model structure. Our experimental results show that the proposed two strategies can significantly improve the honorific generation ratio by 34.35% and 45.59%.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128095103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Study on Syntactic Complexity and Text Readability of ASEAN English News 东盟英语新闻的句法复杂性与篇章可读性研究
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037695
Yusha Zhang, Nankai Lin, Sheng-yi Jiang
English is the most widely used language in the world. With the spread and evolution of language, there are differences in the English text expression and reading difficulty in different regions. Due to the difference in the content and wording, English news in some countries is easier to understand than in others. Using an accurate and effective method to calculate the difficulty of text is not only beneficial for news writers to write easy-to-understand articles, but also for readers to choose articles that they can understand. In this paper, we study the differences in the text readability between most ASEAN countries, England and America. We compare the textual readability and syntactic complexity of English news texts among England, America and eight ASEAN countries (Indonesia, Malaysia, Philippines, Singapore, Brunei, Thailand, Vietnam, Cambodia). This paper selected the authoritative news media of each country as the research object. We used different indicators including Flesch-Kincaid Grade Level (FKG), Flesch Reading Ease Index (FRE), Gunning Fog Index (GF), Automated Readability Index (AR), Coleman-Liau Index (CL) and Linsear Write Index (LW) to measure the textual readability, and then applied L2SCA to analyze the syntactic complexity of news text. According to the analysis results, we used the hierarchical clustering method to classify the English texts of different countries into six different levels. Moreover, we elucidated the reasons for such readability differences in these countries.
英语是世界上使用最广泛的语言。随着语言的传播和演变,不同地区的英语文本表达和阅读难度存在差异。由于内容和措辞的不同,一些国家的英语新闻比另一些国家的英语新闻更容易理解。使用一种准确有效的方法来计算文本的难度,不仅有利于新闻作者写出易于理解的文章,也有利于读者选择自己能够理解的文章。本文研究了大多数东盟国家、英国和美国在文本可读性方面的差异。我们比较了英国、美国和东盟八个国家(印度尼西亚、马来西亚、菲律宾、新加坡、文莱、泰国、越南、柬埔寨)英语新闻语篇的可读性和句法复杂性。本文选取了各国的权威新闻媒体作为研究对象。本文采用Flesch- kincaid Grade Level (FKG)、Flesch Reading Ease Index (FRE)、Gunning Fog Index (GF)、Automated Readability Index (AR)、Coleman-Liau Index (CL)和Linsear Write Index (LW)等不同指标衡量新闻文本的可读性,并应用L2SCA对新闻文本的句法复杂度进行分析。根据分析结果,我们使用层次聚类方法将不同国家的英语文本划分为六个不同的层次。此外,我们还阐明了这些国家的可读性差异的原因。
{"title":"A Study on Syntactic Complexity and Text Readability of ASEAN English News","authors":"Yusha Zhang, Nankai Lin, Sheng-yi Jiang","doi":"10.1109/IALP48816.2019.9037695","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037695","url":null,"abstract":"English is the most widely used language in the world. With the spread and evolution of language, there are differences in the English text expression and reading difficulty in different regions. Due to the difference in the content and wording, English news in some countries is easier to understand than in others. Using an accurate and effective method to calculate the difficulty of text is not only beneficial for news writers to write easy-to-understand articles, but also for readers to choose articles that they can understand. In this paper, we study the differences in the text readability between most ASEAN countries, England and America. We compare the textual readability and syntactic complexity of English news texts among England, America and eight ASEAN countries (Indonesia, Malaysia, Philippines, Singapore, Brunei, Thailand, Vietnam, Cambodia). This paper selected the authoritative news media of each country as the research object. We used different indicators including Flesch-Kincaid Grade Level (FKG), Flesch Reading Ease Index (FRE), Gunning Fog Index (GF), Automated Readability Index (AR), Coleman-Liau Index (CL) and Linsear Write Index (LW) to measure the textual readability, and then applied L2SCA to analyze the syntactic complexity of news text. According to the analysis results, we used the hierarchical clustering method to classify the English texts of different countries into six different levels. Moreover, we elucidated the reasons for such readability differences in these countries.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125219339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2019 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1