首页 > 最新文献

2019 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
Examination-Style Reading Comprehension with Neural augmented Retrieval 基于神经增强检索的考试式阅读理解
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037657
Yiqing Zhang, Hai Zhao, Zhuosheng Zhang
In this paper, we focus on an examination-style reading comprehension task which requires a multiple choice question solving but without a pre-given document that is supposed to contain direct evidences for answering the question. Unlike the common machine reading comprehension tasks, the concerned task requires a deep understanding into the detail-rich and semantically complex question. Such a reading comprehension task can be considered as a variant of early deep question-answering. We propose a hybrid solution to solve the problem. First, an attentive neural network to obtain the keywords in question. Then a retrieval based model is used to retrieve relative evidence in knowledge sources with the importance score of each word. The final choice is made by considering both question and evidence. Our experimental results show that our system gives state-of-the-art performance on Chinese benchmarks and shows its effectiveness on English dataset only using unstructured knowledge source.
在本文中,我们关注的是一个考试式的阅读理解任务,它需要一个选择题的解决,但没有预先给出的文件,应该包含回答问题的直接证据。与常见的机器阅读理解任务不同,该任务要求对细节丰富、语义复杂的问题有深入的理解。这种阅读理解任务可以看作是早期深度问答的一种变体。我们提出一个混合解决方案来解决这个问题。首先,用一个细心的神经网络来获取问题的关键字。然后利用基于检索的模型,利用每个词的重要度分数检索知识库中的相关证据。最后的选择是在考虑问题和证据的基础上做出的。我们的实验结果表明,我们的系统在中文基准上具有最先进的性能,并且在仅使用非结构化知识库的英语数据集上显示出其有效性。
{"title":"Examination-Style Reading Comprehension with Neural augmented Retrieval","authors":"Yiqing Zhang, Hai Zhao, Zhuosheng Zhang","doi":"10.1109/IALP48816.2019.9037657","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037657","url":null,"abstract":"In this paper, we focus on an examination-style reading comprehension task which requires a multiple choice question solving but without a pre-given document that is supposed to contain direct evidences for answering the question. Unlike the common machine reading comprehension tasks, the concerned task requires a deep understanding into the detail-rich and semantically complex question. Such a reading comprehension task can be considered as a variant of early deep question-answering. We propose a hybrid solution to solve the problem. First, an attentive neural network to obtain the keywords in question. Then a retrieval based model is used to retrieve relative evidence in knowledge sources with the importance score of each word. The final choice is made by considering both question and evidence. Our experimental results show that our system gives state-of-the-art performance on Chinese benchmarks and shows its effectiveness on English dataset only using unstructured knowledge source.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134196211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Robust Neural Machine Reading Comprehension via Question Paraphrases 基于问题释义的鲁棒神经机器阅读理解
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037673
Ying Li, Hongyu Li, Jing Liu
In this paper, we focus on addressing the over-sensitivity issue of neural machine reading comprehension (MRC) models. By oversensitivity, we mean that the neural MRC models give different answers to question paraphrases that are semantically equivalent. To address this issue, we first create a large-scale Chinese MRC dataset with high-quality question paraphrases generated by a toolkit used in Baidu Search. Then, we quantitively analyze the oversensitivity issue of the neural MRC models on the dataset. Intuitively, if two questions are paraphrases of each other, a robust model should give the same predictions. Based on this intuition, we propose a regularized BERT-based model to encourage the model give the same predictions to similar inputs by leveraging high-quality question paraphrases. The experimental results show that our approaches can significantly improve the robustness of a strong BERT-based MRC model and achieve improvements over the BERT-based model in terms of held-out accuracy. Specifically, the different prediction ratio (DPR) for question paraphrases of the proposed model decreases more than 10%.
本文主要研究神经机器阅读理解(MRC)模型的过度敏感问题。通过过度敏感,我们的意思是神经MRC模型对语义等同的问题解释给出了不同的答案。为了解决这个问题,我们首先创建了一个大规模的中文MRC数据集,其中包含百度搜索中使用的工具包生成的高质量问题释义。然后,我们定量分析了神经MRC模型对数据集的过度敏感问题。直观地说,如果两个问题是相互解释的,那么一个健壮的模型应该给出相同的预测。基于这种直觉,我们提出了一个正则化的基于bert的模型,通过利用高质量的问题释义来鼓励模型对类似的输入给出相同的预测。实验结果表明,我们的方法可以显著提高基于bert的强MRC模型的鲁棒性,并在持位精度方面实现了基于bert的模型的改进。具体而言,该模型对问题释义的差异预测比(DPR)降低了10%以上。
{"title":"Towards Robust Neural Machine Reading Comprehension via Question Paraphrases","authors":"Ying Li, Hongyu Li, Jing Liu","doi":"10.1109/IALP48816.2019.9037673","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037673","url":null,"abstract":"In this paper, we focus on addressing the over-sensitivity issue of neural machine reading comprehension (MRC) models. By oversensitivity, we mean that the neural MRC models give different answers to question paraphrases that are semantically equivalent. To address this issue, we first create a large-scale Chinese MRC dataset with high-quality question paraphrases generated by a toolkit used in Baidu Search. Then, we quantitively analyze the oversensitivity issue of the neural MRC models on the dataset. Intuitively, if two questions are paraphrases of each other, a robust model should give the same predictions. Based on this intuition, we propose a regularized BERT-based model to encourage the model give the same predictions to similar inputs by leveraging high-quality question paraphrases. The experimental results show that our approaches can significantly improve the robustness of a strong BERT-based MRC model and achieve improvements over the BERT-based model in terms of held-out accuracy. Specifically, the different prediction ratio (DPR) for question paraphrases of the proposed model decreases more than 10%.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133214835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Are Scoring Feedback of CAPT Systems Helpful for Pronunciation Correction? –An Exception of Mandarin Nasal Finals CAPT系统的评分反馈对发音纠正有帮助吗?——普通话鼻音韵母的一个例外
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037720
Rui Cai, Wei Wei, Jinsong Zhang
The scoring feedback of Computer Assisted Pronunciation Training (CAPT) systems facilitate learner’s instant awareness of their problems, easily lead to more practices. But whether it is enough to instruct the learners to understand how to correct their errors is still unknown. To see in depth, the impacts from CAPT technology on language learning, and to investigate learner’s correction strategy after receiving error warnings, this paper studies long term learning data of Chinese utterances by a number of CSL (Chinese as a Second Language) learners, with special efforts paid to the utterances of nasal Finals. The data resulted from a 3-week use of a CAPT APP, called “SAIT汉语” for Chinese learning, by 10 learners with different mother tongues. Major findings include: 1) Improvements were seen with almost all kinds of phonemes, except nasal Finals; 2) Data analyses showed that the learners had tried to lengthen the nasal codas after they received error warnings, while Chinese native data shows a significant nasalization period before a short coda. These results suggest that the scoring feedback can be beneficial to pronunciation training in most cases, except for some special ones. For the sounds such as Chinese nasal Finals, more appropriate feedback method is desired.
计算机辅助发音训练(CAPT)系统的评分反馈有助于学习者即时意识到自己的问题,容易导致更多的练习。但是,是否足以指导学习者了解如何纠正自己的错误仍然是未知的。为了深入了解CAPT技术对语言学习的影响,并探讨学习者在收到错误警告后的纠正策略,本文研究了大量对外汉语学习者的长期汉语语音学习数据,特别关注了鼻音韵尾的语音。这些数据是由10名母语不同的学习者使用一款名为“赛特汉语”的CAPT应用程序进行为期3周的学习得出的。主要发现包括:1)除鼻音韵母外,几乎所有音素都有改善;2)数据分析表明,学习者在收到错误提示后会尝试延长鼻尾,而汉语母语数据显示,在短尾前有明显的鼻化期。这些结果表明,除了一些特殊情况外,评分反馈在大多数情况下都是有利于语音训练的。对于汉语鼻音韵母等语音,需要更合适的反馈方法。
{"title":"Are Scoring Feedback of CAPT Systems Helpful for Pronunciation Correction? –An Exception of Mandarin Nasal Finals","authors":"Rui Cai, Wei Wei, Jinsong Zhang","doi":"10.1109/IALP48816.2019.9037720","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037720","url":null,"abstract":"The scoring feedback of Computer Assisted Pronunciation Training (CAPT) systems facilitate learner’s instant awareness of their problems, easily lead to more practices. But whether it is enough to instruct the learners to understand how to correct their errors is still unknown. To see in depth, the impacts from CAPT technology on language learning, and to investigate learner’s correction strategy after receiving error warnings, this paper studies long term learning data of Chinese utterances by a number of CSL (Chinese as a Second Language) learners, with special efforts paid to the utterances of nasal Finals. The data resulted from a 3-week use of a CAPT APP, called “SAIT汉语” for Chinese learning, by 10 learners with different mother tongues. Major findings include: 1) Improvements were seen with almost all kinds of phonemes, except nasal Finals; 2) Data analyses showed that the learners had tried to lengthen the nasal codas after they received error warnings, while Chinese native data shows a significant nasalization period before a short coda. These results suggest that the scoring feedback can be beneficial to pronunciation training in most cases, except for some special ones. For the sounds such as Chinese nasal Finals, more appropriate feedback method is desired.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131242648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Statistical Analysis of Syllable Duration of Uyghur Language 维吾尔语音节音长的统计分析
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037656
A. Hamdulla, Guzalnur Dilmurat, Gulnur Arkin, Mijit Ablimit
Phonetics is both an ancient and a young subject. Syllables are important units of speech. Based on the data requirements of speech synthesis and speech recognition, this paper studies from the perspective of experimental phonetics. Firstly, different syllable words are counted from the large-scale “Speech Acoustic Parameters Database of Uyghur Language”, including monosyllable words, two-syllable words, three-syllable words and four-syllable words. Secondly, the prosodic parameters are extracted, and statistical analysis is made. Accordingly, the duration distribution of different length words for male and female speakers are studied, and the fixed CV type syllable duration of consonant, the duration of vowel, the whole syllable duration and the pitch of syllable are extracted and analyzed. The effect of different vowels on the duration of CV syllables is further studied, and provided the reliable parameter basis for Uyghur speech synthesis and speech recognition.
语音学是一门古老而又年轻的学科。音节是重要的言语单位。基于语音合成和语音识别的数据需求,本文从实验语音学的角度进行研究。首先,从大型的“维吾尔语语音参数数据库”中统计不同音节词,包括单音节词、双音节词、三音节词和四音节词。其次,提取韵律参数,并进行统计分析;据此,研究男女说话者不同长度词的音长分布,提取并分析辅音固定CV型音节音长、元音音长、全音节音长和音节音高。进一步研究了不同元音对CV音节时长的影响,为维吾尔语语音合成和语音识别提供了可靠的参数依据。
{"title":"Statistical Analysis of Syllable Duration of Uyghur Language","authors":"A. Hamdulla, Guzalnur Dilmurat, Gulnur Arkin, Mijit Ablimit","doi":"10.1109/IALP48816.2019.9037656","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037656","url":null,"abstract":"Phonetics is both an ancient and a young subject. Syllables are important units of speech. Based on the data requirements of speech synthesis and speech recognition, this paper studies from the perspective of experimental phonetics. Firstly, different syllable words are counted from the large-scale “Speech Acoustic Parameters Database of Uyghur Language”, including monosyllable words, two-syllable words, three-syllable words and four-syllable words. Secondly, the prosodic parameters are extracted, and statistical analysis is made. Accordingly, the duration distribution of different length words for male and female speakers are studied, and the fixed CV type syllable duration of consonant, the duration of vowel, the whole syllable duration and the pitch of syllable are extracted and analyzed. The effect of different vowels on the duration of CV syllables is further studied, and provided the reliable parameter basis for Uyghur speech synthesis and speech recognition.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121591880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Converting an Indonesian Constituency Treebank to the Penn Treebank Format 将印尼选区树库转换为宾州树库格式
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037723
Jessica Naraiswari Arwidarasti, Ika Alfina, A. Krisnadhi
A constituency treebank is a key component for deep syntactic parsing of natural language sentences. For Indonesian, this task is unfortunately hindered by the fact that the only one constituency treebank publicly available is rather small with just over 1000 sentences, and not only that, it employs a format incompatible with readily available constituency treebank processing tools. In this work, we present a conversion of the existing Indonesian constituency treebank to the widely accepted Penn Treebank format. Specifically, the conversion adjusts the bracketing format for compound words as well as the POS tagset according to the Penn Treebank format. In addition, we revised the word segmentation and POS tagging of a number of tokens. Finally, we performed an evaluation on the treebank quality by employing the Shift-Reduce parser from Stanford CoreNLP to create a parser model. A 10-fold cross-validated experiment on the parser model yields an F1-score of 70.90%.
选区树库是自然语言句子深度句法分析的关键组成部分。对于印尼语来说,不幸的是这项任务受到阻碍,因为唯一一个公开的选区树库相当小,只有1000多个句子,不仅如此,它采用的格式与现有的选区树库处理工具不兼容。在这项工作中,我们将现有的印度尼西亚选区树库转换为广泛接受的宾州树库格式。具体来说,转换根据Penn Treebank格式调整复合词的括号格式和POS标记集。此外,我们修订了一些令牌的分词和词性标注。最后,我们通过使用斯坦福CoreNLP的Shift-Reduce解析器来创建解析器模型,对树库质量进行了评估。在解析器模型上进行10次交叉验证实验,f1得分为70.90%。
{"title":"Converting an Indonesian Constituency Treebank to the Penn Treebank Format","authors":"Jessica Naraiswari Arwidarasti, Ika Alfina, A. Krisnadhi","doi":"10.1109/IALP48816.2019.9037723","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037723","url":null,"abstract":"A constituency treebank is a key component for deep syntactic parsing of natural language sentences. For Indonesian, this task is unfortunately hindered by the fact that the only one constituency treebank publicly available is rather small with just over 1000 sentences, and not only that, it employs a format incompatible with readily available constituency treebank processing tools. In this work, we present a conversion of the existing Indonesian constituency treebank to the widely accepted Penn Treebank format. Specifically, the conversion adjusts the bracketing format for compound words as well as the POS tagset according to the Penn Treebank format. In addition, we revised the word segmentation and POS tagging of a number of tokens. Finally, we performed an evaluation on the treebank quality by employing the Shift-Reduce parser from Stanford CoreNLP to create a parser model. A 10-fold cross-validated experiment on the parser model yields an F1-score of 70.90%.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124191013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Exploring Characteristics of Word Co-occurrence Network in Translated Chinese 翻译汉语词共现网络特征探析
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037722
Jianyu Zheng, Kun Ma, Xuemei Tang, Shichen Liang
The translation activity involves both the source language and the target language. Compared to the standard texts in the two language, translated texts show unique language characteristics. In order to explore them from the perspective of integrality and complexity, we introduce complex network method into the study on translated Chinese. Firstly, selected the experimental texts from The ZJU Corpus of Translational Chinese (ZCTC) and its corresponding six sub-corpora, such as Press reportage and Popular lore. And then removed the punctuation and did word segmentation. Secondly, constructed a word co-occurrence network of translated Chinese. After analyzing and counting the parameters, such as their shortest path lengths, degree distributions and clustering coefficients in these networks, we verify that the word co-occurrence network of translated Chinese has small world effect and scale-free property. Finally, by constructing co-occurrence networks of standard Chinese and calculating their network parameters, we compare and verify the differences between translated Chinese and standard Chinese: “simplification” and the more usage of common words. Our work expands the application of complex network in translation studies, and provides a feasible approach for studying translated Chinese based on complex networks.
翻译活动涉及源语言和目的语言。与两种语言的标准文本相比,翻译文本表现出独特的语言特征。为了从整体性和复杂性的角度对其进行探索,我们将复杂网络方法引入到汉语翻译研究中。首先,从浙江大学翻译汉语语料库及其相应的新闻报道文学和通俗文学六个子语料库中选取实验文本。然后去掉标点符号,进行分词。其次,构建了汉译词共现网络。通过对这些网络的最短路径长度、度分布和聚类系数等参数的分析和统计,验证了翻译汉语词共现网络具有小世界效应和无标度性。最后,通过构建标准汉语共现网络并计算其网络参数,比较验证了翻译汉语与标准汉语的差异:“简化”和更多使用常用词。本研究拓展了复杂网络在翻译研究中的应用,为基于复杂网络的汉译研究提供了一种可行的方法。
{"title":"Exploring Characteristics of Word Co-occurrence Network in Translated Chinese","authors":"Jianyu Zheng, Kun Ma, Xuemei Tang, Shichen Liang","doi":"10.1109/IALP48816.2019.9037722","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037722","url":null,"abstract":"The translation activity involves both the source language and the target language. Compared to the standard texts in the two language, translated texts show unique language characteristics. In order to explore them from the perspective of integrality and complexity, we introduce complex network method into the study on translated Chinese. Firstly, selected the experimental texts from The ZJU Corpus of Translational Chinese (ZCTC) and its corresponding six sub-corpora, such as Press reportage and Popular lore. And then removed the punctuation and did word segmentation. Secondly, constructed a word co-occurrence network of translated Chinese. After analyzing and counting the parameters, such as their shortest path lengths, degree distributions and clustering coefficients in these networks, we verify that the word co-occurrence network of translated Chinese has small world effect and scale-free property. Finally, by constructing co-occurrence networks of standard Chinese and calculating their network parameters, we compare and verify the differences between translated Chinese and standard Chinese: “simplification” and the more usage of common words. Our work expands the application of complex network in translation studies, and provides a feasible approach for studying translated Chinese based on complex networks.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122576708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Syntax-aware Transformer Encoder for Neural Machine Translation 用于神经机器翻译的语法感知转换器编码器
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037672
Sufeng Duan, Hai Zhao, Junru Zhou, Rui Wang
Syntax has been shown a helpful clue in various natural language processing tasks including previous statistical machine translation and recurrent neural network based machine translation. However, since the state-of-the-art neural machine translation (NMT) has to be built on the Transformer based encoder, few attempts are found on such a syntax enhancement. Thus in this paper, we explore effective ways to introduce syntax into Transformer for better machine translation. We empirically compare two ways, positional encoding and input embedding, to exploit syntactic clues from dependency tree over source sentence. Our proposed methods have a merit keeping the architecture of Transformer unchanged, thus the efficiency of Transformer can be kept. The experimental results on IWSLT’ 14 German-to-English and WMT14 English-to-German show that our method can yield advanced results over strong Transformer baselines.
语法已经在各种自然语言处理任务中显示出有用的线索,包括以前的统计机器翻译和基于循环神经网络的机器翻译。然而,由于最先进的神经机器翻译(NMT)必须构建在基于Transformer的编码器上,因此很少有人尝试对这种语法进行增强。因此,在本文中,我们探索了在Transformer中引入语法的有效方法,以实现更好的机器翻译。我们对位置编码和输入嵌入两种方法进行了实证比较,以挖掘源句子依赖树的句法线索。我们提出的方法在保持变压器结构不变的情况下,可以保持变压器的效率。IWSLT’14德语到英语和WMT14英语到德语的实验结果表明,我们的方法可以在强Transformer基线上获得先进的结果。
{"title":"Syntax-aware Transformer Encoder for Neural Machine Translation","authors":"Sufeng Duan, Hai Zhao, Junru Zhou, Rui Wang","doi":"10.1109/IALP48816.2019.9037672","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037672","url":null,"abstract":"Syntax has been shown a helpful clue in various natural language processing tasks including previous statistical machine translation and recurrent neural network based machine translation. However, since the state-of-the-art neural machine translation (NMT) has to be built on the Transformer based encoder, few attempts are found on such a syntax enhancement. Thus in this paper, we explore effective ways to introduce syntax into Transformer for better machine translation. We empirically compare two ways, positional encoding and input embedding, to exploit syntactic clues from dependency tree over source sentence. Our proposed methods have a merit keeping the architecture of Transformer unchanged, thus the efficiency of Transformer can be kept. The experimental results on IWSLT’ 14 German-to-English and WMT14 English-to-German show that our method can yield advanced results over strong Transformer baselines.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"46 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120922873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Using WHY-type Question-Answer Pairs to Improve Implicit Causal Relation Recognition 利用“为什么”型问答对改进内隐因果关系识别
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037693
Huibin Ruan, Yu Hong, Yu Sun, Yang Xu, Min Zhang
Implicit causal relation recognition aims to identify the causal relation between a pair of arguments. It is a challenging task due to the lack of conjunctions and the shortage of labeled data. In order to improve the identification performance, we come up with an approach to expand the training dataset. On the basis of the hypothesis that there inherently exists causal relations in WHY-type Question-Answer (QA) pairs, we utilize WHY-type QA pairs for the training set expansion. In practice, we first collect WHY-type QA pairs from the Knowledge Bases (KBs) of the reading comprehension tasks, and then convert them into narrative argument pairs by Question-Statement Conversion (QSC). In order to alleviate redundancy, we use active learning (AL) to select informative samples from the synthetic argument pairs. The sampled synthetic argument pairs are added to the Penn Discourse Treebank (PDTB), and the expanded PDTB is used to retrain the neural network-based classifiers. Experiments show that our method yields a performance gain of 2.42% F 1-score when AL is used, and 1.61% without using.
内隐因果关系识别的目的是识别一对论证之间的因果关系。由于缺乏连词和标记数据,这是一项具有挑战性的任务。为了提高识别性能,我们提出了一种扩展训练数据集的方法。基于why型问答对存在内在因果关系的假设,我们利用why型问答对进行训练集扩展。在实践中,我们首先从阅读理解任务的知识库(KBs)中收集WHY-type QA对,然后通过问题-陈述转换(QSC)将其转换为叙事性论点对。为了减少冗余,我们使用主动学习(AL)从合成参数对中选择信息样本。将采样的合成参数对添加到Penn话语树库(PDTB)中,并使用扩展后的PDTB对基于神经网络的分类器进行再训练。实验表明,我们的方法在使用人工智能时的性能增益为2.42%,而在不使用人工智能时的性能增益为1.61%。
{"title":"Using WHY-type Question-Answer Pairs to Improve Implicit Causal Relation Recognition","authors":"Huibin Ruan, Yu Hong, Yu Sun, Yang Xu, Min Zhang","doi":"10.1109/IALP48816.2019.9037693","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037693","url":null,"abstract":"Implicit causal relation recognition aims to identify the causal relation between a pair of arguments. It is a challenging task due to the lack of conjunctions and the shortage of labeled data. In order to improve the identification performance, we come up with an approach to expand the training dataset. On the basis of the hypothesis that there inherently exists causal relations in WHY-type Question-Answer (QA) pairs, we utilize WHY-type QA pairs for the training set expansion. In practice, we first collect WHY-type QA pairs from the Knowledge Bases (KBs) of the reading comprehension tasks, and then convert them into narrative argument pairs by Question-Statement Conversion (QSC). In order to alleviate redundancy, we use active learning (AL) to select informative samples from the synthetic argument pairs. The sampled synthetic argument pairs are added to the Penn Discourse Treebank (PDTB), and the expanded PDTB is used to retrain the neural network-based classifiers. Experiments show that our method yields a performance gain of 2.42% F 1-score when AL is used, and 1.61% without using.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120807035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Development of a Filipino Speaker Diarization in Meeting Room Conversations 菲律宾语在会议室对话中的发展
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037733
Angelica H. De La Cruz, Rodolfo C. Raga
Speaker diarization pertains to the process of determining speaker identity at a given time in an audio stream. It was first used for speech recognition and over time became useful in other applications such as video captioning and speech transcription. Recently, deep learning techniques have been applied to speaker diarization with considerable success, however, deep learning are conventionally data intensive and collecting large training samples can be difficult and expensive to collect especially for resource scarce languages. This study focused on investigating a speaker diarization approach for meeting room conversations in the Filipino language. To compensate for lack of resources, a one shot learning strategy was explored using Siamese neural network. Among the experiments conducted, the lowest diarization error rate yielded to 46%. There are, however, more parameters that can be tuned to improve the diarization results. To the best of our knowledge, no work in speaker diarization dedicated for Filipino language has yet been done.
说话人化涉及在音频流中给定时间确定说话人身份的过程。它最初用于语音识别,随着时间的推移,它在视频字幕和语音转录等其他应用中变得有用。最近,深度学习技术已被应用于说话人分类,并取得了相当大的成功,然而,深度学习通常是数据密集型的,收集大型训练样本可能是困难和昂贵的,特别是对于资源稀缺的语言。本研究旨在探讨菲律宾语会议室对话的说话人化方法。为了弥补资源的不足,利用Siamese神经网络探索了一种一次性学习策略。在所进行的实验中,最低的码错率为46%。然而,有更多的参数可以调优,以改善拨号结果。据我们所知,还没有专门针对菲律宾语的说话人分类工作。
{"title":"Development of a Filipino Speaker Diarization in Meeting Room Conversations","authors":"Angelica H. De La Cruz, Rodolfo C. Raga","doi":"10.1109/IALP48816.2019.9037733","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037733","url":null,"abstract":"Speaker diarization pertains to the process of determining speaker identity at a given time in an audio stream. It was first used for speech recognition and over time became useful in other applications such as video captioning and speech transcription. Recently, deep learning techniques have been applied to speaker diarization with considerable success, however, deep learning are conventionally data intensive and collecting large training samples can be difficult and expensive to collect especially for resource scarce languages. This study focused on investigating a speaker diarization approach for meeting room conversations in the Filipino language. To compensate for lack of resources, a one shot learning strategy was explored using Siamese neural network. Among the experiments conducted, the lowest diarization error rate yielded to 46%. There are, however, more parameters that can be tuned to improve the diarization results. To the best of our knowledge, no work in speaker diarization dedicated for Filipino language has yet been done.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"162 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129175650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on Tibetan Text Classification Method Based on Neural Network 基于神经网络的藏文文本分类方法研究
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037706
Zhensong Li, Jie Zhu, Zhixiang Luo, Saihu Liu
Text categorization is an important task in natural language processing, and it has a wide range of applications in real life. In this paper, two N-Gram feature models (MLP, FastText) and two sequential models (sepCNN, Bi-LSTM) are used to study the automatic classification for Tibetan text based on syllables and vocabulary. The experiment on Tibetan language data collected by China Tibet News Network shows that the classification accuracy is about 85%.
文本分类是自然语言处理中的一项重要任务,在现实生活中有着广泛的应用。本文采用两个N-Gram特征模型(MLP、FastText)和两个序列模型(sepCNN、Bi-LSTM)对基于音节和词汇的藏文文本自动分类进行了研究。对中国西藏新闻网收集的藏语数据进行实验,结果表明,该方法的分类准确率约为85%。
{"title":"Research on Tibetan Text Classification Method Based on Neural Network","authors":"Zhensong Li, Jie Zhu, Zhixiang Luo, Saihu Liu","doi":"10.1109/IALP48816.2019.9037706","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037706","url":null,"abstract":"Text categorization is an important task in natural language processing, and it has a wide range of applications in real life. In this paper, two N-Gram feature models (MLP, FastText) and two sequential models (sepCNN, Bi-LSTM) are used to study the automatic classification for Tibetan text based on syllables and vocabulary. The experiment on Tibetan language data collected by China Tibet News Network shows that the classification accuracy is about 85%.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132634783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2019 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1