首页 > 最新文献

2019 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
A Systematic Investigation of Neural Models for Chinese Implicit Discourse Relationship Recognition 汉语内隐话语关系识别神经模型的系统研究
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037686
Dejian Li, Man Lan, Yuanbin Wu
The Chinese implicit discourse relationship recognition is more challenging than English due to the lack of discourse connectives and high frequency in the text. So far, there is no systematical investigation into the neural components for Chinese implicit discourse relationship. To fill this gap, in this work we present a component-based neural framework to systematically study the Chinese implicit discourse relationship. Experimental results showed that our proposed neural Chinese implicit discourse parser achieves the SOTA performance in CoNLL-2016 corpus.
汉语隐性语篇关系识别比英语更具挑战性,因为汉语语篇连接词较少,语篇使用频率高。到目前为止,对汉语内隐话语关系的神经成分还没有系统的研究。为了填补这一空白,本文提出了一个基于组件的神经网络框架来系统地研究汉语隐含语篇关系。实验结果表明,本文提出的神经网络中文隐式语篇解析器在CoNLL-2016语料库中达到了SOTA的性能。
{"title":"A Systematic Investigation of Neural Models for Chinese Implicit Discourse Relationship Recognition","authors":"Dejian Li, Man Lan, Yuanbin Wu","doi":"10.1109/IALP48816.2019.9037686","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037686","url":null,"abstract":"The Chinese implicit discourse relationship recognition is more challenging than English due to the lack of discourse connectives and high frequency in the text. So far, there is no systematical investigation into the neural components for Chinese implicit discourse relationship. To fill this gap, in this work we present a component-based neural framework to systematically study the Chinese implicit discourse relationship. Experimental results showed that our proposed neural Chinese implicit discourse parser achieves the SOTA performance in CoNLL-2016 corpus.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic answer ranking based on sememe vector in KBQA 基于语义向量的KBQA自动答案排序
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037712
Yadi Li, Lingling Mu, Hao Li, Hongying Zan
This paper proposes an answer ranking method used in Knowledge Base Question Answering (KBQA) system. This method first extracts the features of predicate sequence similarity based on sememe vector, predicates’ edit distances, predicates’ word co-occurrences and classification. Then the above features are used as inputs of the ranking learning algorithm Ranking SVM to rank the candidate answers. In this paper, the experimental results on the data set of KBQA system evaluation task in the 2016 Natural Language Processing & Chinese Computing (NLPCC 2016) show that, the method of word similarity calculation based on sememe vector has better results than the method based on word2vec. Its accuracy, recall rate and average F1 value respectively are 73.88%, 82.29% and 75.88%. The above results show that the word representation with knowledge has import effect on natural language processing.
提出了一种应用于知识库问答(KBQA)系统的答案排序方法。该方法首先从语义向量、谓词编辑距离、谓词词共现和分类等方面提取谓词序列相似度特征;然后将上述特征作为排序学习算法ranking SVM的输入,对候选答案进行排序。本文在2016年自然语言处理与中文计算(NLPCC 2016)中KBQA系统评价任务数据集上的实验结果表明,基于语义向量的词相似度计算方法比基于word2vec的方法效果更好。其准确率、召回率和平均F1值分别为73.88%、82.29%和75.88%。上述结果表明,带知识的词表示在自然语言处理中具有重要作用。
{"title":"Automatic answer ranking based on sememe vector in KBQA","authors":"Yadi Li, Lingling Mu, Hao Li, Hongying Zan","doi":"10.1109/IALP48816.2019.9037712","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037712","url":null,"abstract":"This paper proposes an answer ranking method used in Knowledge Base Question Answering (KBQA) system. This method first extracts the features of predicate sequence similarity based on sememe vector, predicates’ edit distances, predicates’ word co-occurrences and classification. Then the above features are used as inputs of the ranking learning algorithm Ranking SVM to rank the candidate answers. In this paper, the experimental results on the data set of KBQA system evaluation task in the 2016 Natural Language Processing & Chinese Computing (NLPCC 2016) show that, the method of word similarity calculation based on sememe vector has better results than the method based on word2vec. Its accuracy, recall rate and average F1 value respectively are 73.88%, 82.29% and 75.88%. The above results show that the word representation with knowledge has import effect on natural language processing.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123498382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Japanese grammatical simplification with simplified corpus 用简化语料库简化日语语法
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037675
Yumeto Inaoka, Kazuhide Yamamoto
We construct a Japanese grammatical simplification corpus and established automatic simplification methods. We compare the conventional machine translation approach, our proposed method, and a hybrid method by automatic and manual evaluation. The results of the automatic evaluation show that the proposed method exhibits a lower score than the machine translation approach; however, the hybrid method garners the highest score. According to those results, the machine translation approach and proposed method present different sentences that can be simplified, while the hybrid version is effective in grammatical simplification.
构建了日语语法简化语料库,建立了自动简化方法。我们比较了传统的机器翻译方法,我们提出的方法,以及自动和人工评估的混合方法。自动评价结果表明,该方法的评分低于机器翻译方法;然而,混合方法获得了最高分。根据这些结果,机器翻译方法和提出的方法呈现出不同的句子可以简化,而混合版本在语法简化方面是有效的。
{"title":"Japanese grammatical simplification with simplified corpus","authors":"Yumeto Inaoka, Kazuhide Yamamoto","doi":"10.1109/IALP48816.2019.9037675","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037675","url":null,"abstract":"We construct a Japanese grammatical simplification corpus and established automatic simplification methods. We compare the conventional machine translation approach, our proposed method, and a hybrid method by automatic and manual evaluation. The results of the automatic evaluation show that the proposed method exhibits a lower score than the machine translation approach; however, the hybrid method garners the highest score. According to those results, the machine translation approach and proposed method present different sentences that can be simplified, while the hybrid version is effective in grammatical simplification.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127064645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Chinese word segment model for energy literature based on Neural Networks with Electricity User Dictionary 基于电力用户词典的能源文献中文分词模型
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037728
Bochuan Song, Bo Chai, Qiang Zhang, Quanye Jia
Traditional Chinese word segmentation (CWS) methods are based on supervised machine learning such as Condtional Random Fields(CRFs), Maximum Entropy(ME), whose features are mostly manual features. These manual features are often derived from local contexts. Currently, most state-of-art methods for Chinese word segmentation are based on neural networks. However these neural networks rarely introduct the user dictionary. We propose a LSTMbased Chinese word segmentation which can take advantage of the user dictionary. The experiments show that our model performs better than a popular segment tool in electricity domain. It is noticed that it achieves a better performance when transfered to a new domain using the user dictionary.
传统的中文分词方法是基于有监督的机器学习,如条件随机场(CRFs)、最大熵(ME)等,其特征多为人工特征。这些手动特性通常来源于本地上下文。目前,最先进的中文分词方法大多是基于神经网络的。然而,这些神经网络很少引入用户字典。提出了一种基于lstm的中文分词方法,该方法可以充分利用用户字典的优势。实验表明,该模型在电领域的性能优于常用的分段工具。注意到,当使用用户字典转移到新域时,它获得了更好的性能。
{"title":"A Chinese word segment model for energy literature based on Neural Networks with Electricity User Dictionary","authors":"Bochuan Song, Bo Chai, Qiang Zhang, Quanye Jia","doi":"10.1109/IALP48816.2019.9037728","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037728","url":null,"abstract":"Traditional Chinese word segmentation (CWS) methods are based on supervised machine learning such as Condtional Random Fields(CRFs), Maximum Entropy(ME), whose features are mostly manual features. These manual features are often derived from local contexts. Currently, most state-of-art methods for Chinese word segmentation are based on neural networks. However these neural networks rarely introduct the user dictionary. We propose a LSTMbased Chinese word segmentation which can take advantage of the user dictionary. The experiments show that our model performs better than a popular segment tool in electricity domain. It is noticed that it achieves a better performance when transfered to a new domain using the user dictionary.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"105 23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127456072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Classified Description and Application of Chinese Constitutive Role 汉语构成角色的分类描述与应用
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037730
Mengxiang Wang, Cuiyan Ma
Constitutive role is one of the 4 qualia roles, which expresses a kind of constitutive relationship between nouns. According to the original definition and description characteristics, this paper divides the constitutive roles into two categories: materials and components. At the same time, combined with the previous methods of extracting the role automatically, this paper optimizes the method of extracting the role automatically. Relying on auxiliary grammatical constructions, we extract noun-noun pairs from large-scale corpus to extract descriptive features of constitutive roles, and then classifies these descriptive knowledge by manual double-blind proofreading. Finally, the author discusses the application of Chinese constitutive roles in word-formational analysis, syntactic analysis and synonym discrimination.
构成角色是四种质角色之一,表达了名词之间的一种构成关系。根据原有的定义和描述特点,本文将本构作用分为两类:材料和构件。同时,结合以往的角色自动提取方法,对角色自动提取方法进行了优化。我们以辅助语法结构为依托,从大规模语料库中提取名词-名词对,提取构成角色的描述性特征,然后通过人工双盲校对对这些描述性知识进行分类。最后,作者讨论了汉语构成角色在构词分析、句法分析和同义词辨析中的应用。
{"title":"Classified Description and Application of Chinese Constitutive Role","authors":"Mengxiang Wang, Cuiyan Ma","doi":"10.1109/IALP48816.2019.9037730","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037730","url":null,"abstract":"Constitutive role is one of the 4 qualia roles, which expresses a kind of constitutive relationship between nouns. According to the original definition and description characteristics, this paper divides the constitutive roles into two categories: materials and components. At the same time, combined with the previous methods of extracting the role automatically, this paper optimizes the method of extracting the role automatically. Relying on auxiliary grammatical constructions, we extract noun-noun pairs from large-scale corpus to extract descriptive features of constitutive roles, and then classifies these descriptive knowledge by manual double-blind proofreading. Finally, the author discusses the application of Chinese constitutive roles in word-formational analysis, syntactic analysis and synonym discrimination.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126439107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of Quantitative Index System of Vocabulary Difficulty in Chinese Grade Reading 语文年级阅读词汇难度定量指标体系的构建
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037664
Huiping Wang, Lijiao Yang, Huimin Xiao
Chinese grade reading for children has a broad application prospect. In this paper, Chinese textbooks for grade 1 to 6 of primary schools published by People’s Education Press are taken as data sets, and the texts are divided into 12 difficulty levels successively. The effective lexical indexes to measure the readability of texts are discussed, and a regression model to effectively measure the lexical difficulty of Chinese texts is established. The study firstly collected 30 indexes at the text lexical level from the three dimensions of lexical richness, semantic transparency and contextual dependence, selected the 7 indexes with the highest relevance to the text difficulty through Person correlation coefficient, and finally constructed a Regression to predict the text difficulty based on Lasso Regression, ElasticNet, Ridge Regression and other algorithms. The regression results show that the model fits well, and the predicted value could explain 89.3% of the total variation of text difficulty, which proves that the quantitative index of vocabulary difficulty of Chinese text constructed in this paper is effective, and can be applied to Chinese grade reading and computer automatic grading of Chinese text difficulty.
少儿语文阅读具有广阔的应用前景。本文以人民教育出版社出版的小学一年级至六年级语文教材为数据集,将课文先后分为12个难度等级。讨论了衡量文本可读性的有效词汇指标,建立了有效衡量汉语文本词汇难度的回归模型。本研究首先从词汇丰富度、语义透明度和上下文依赖性三个维度收集了文本词汇层面的30个指标,通过Person相关系数选择了与文本难度相关度最高的7个指标,最后基于Lasso Regression、ElasticNet、Ridge Regression等算法构建了预测文本难度的回归模型。回归结果表明,模型拟合良好,预测值可解释文本难易度总变化的89.3%,证明本文构建的汉语文本词汇难易度定量指标是有效的,可用于汉语等级阅读和汉语文本难易度计算机自动评分。
{"title":"Construction of Quantitative Index System of Vocabulary Difficulty in Chinese Grade Reading","authors":"Huiping Wang, Lijiao Yang, Huimin Xiao","doi":"10.1109/IALP48816.2019.9037664","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037664","url":null,"abstract":"Chinese grade reading for children has a broad application prospect. In this paper, Chinese textbooks for grade 1 to 6 of primary schools published by People’s Education Press are taken as data sets, and the texts are divided into 12 difficulty levels successively. The effective lexical indexes to measure the readability of texts are discussed, and a regression model to effectively measure the lexical difficulty of Chinese texts is established. The study firstly collected 30 indexes at the text lexical level from the three dimensions of lexical richness, semantic transparency and contextual dependence, selected the 7 indexes with the highest relevance to the text difficulty through Person correlation coefficient, and finally constructed a Regression to predict the text difficulty based on Lasso Regression, ElasticNet, Ridge Regression and other algorithms. The regression results show that the model fits well, and the predicted value could explain 89.3% of the total variation of text difficulty, which proves that the quantitative index of vocabulary difficulty of Chinese text constructed in this paper is effective, and can be applied to Chinese grade reading and computer automatic grading of Chinese text difficulty.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127626679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Prosodic Realization of Focus in Changchun Mandarin and Nanjing Mandarin 长春普通话和南京普通话中焦点的韵律实现
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037655
Ying Chen, Jiajing Zhang, Bingying Ye, Chenfang Zhou
This study was designed to explore the prosodic patterns of focus in two dialects of Mandarin. One is Changchun Mandarin and the other is Nanjing Mandarin. The current paper compares the acoustics of their prosodic realization of focus in a production experiment. Similar to standard Mandarin, which uses in-focus expansion and concomitantly post-focus compression (PFC) to code focus, results in the current study indicate that both Changchun and Nanjing speakers produced significant in-focus expansion of pitch, intensity and duration and PFC of pitch and intensity in their Mandarin dialects. Meanwhile, the results show no significant difference of prosodic changes between Changchun and Nanjing Mandarin productions. These results reveal that PFC not only exists in standard Mandarin but also in Mandarin dialects.
本研究旨在探讨两种普通话方言的焦点韵律模式。一个是长春普通话,另一个是南京普通话。本文在一个生产实验中比较了它们的焦点韵律实现的声学效果。与标准普通话使用焦点内扩展和伴随焦点后压缩(PFC)对焦点进行编码类似,本研究结果表明,长春语和南京语的普通话方言都产生了显著的音调、强度和持续时间的焦点内扩展和音调和强度的PFC。同时,长春和南京普通话作品的韵律变化没有显著差异。结果表明,汉语普通话不仅存在于标准普通话中,也存在于普通话方言中。
{"title":"Prosodic Realization of Focus in Changchun Mandarin and Nanjing Mandarin","authors":"Ying Chen, Jiajing Zhang, Bingying Ye, Chenfang Zhou","doi":"10.1109/IALP48816.2019.9037655","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037655","url":null,"abstract":"This study was designed to explore the prosodic patterns of focus in two dialects of Mandarin. One is Changchun Mandarin and the other is Nanjing Mandarin. The current paper compares the acoustics of their prosodic realization of focus in a production experiment. Similar to standard Mandarin, which uses in-focus expansion and concomitantly post-focus compression (PFC) to code focus, results in the current study indicate that both Changchun and Nanjing speakers produced significant in-focus expansion of pitch, intensity and duration and PFC of pitch and intensity in their Mandarin dialects. Meanwhile, the results show no significant difference of prosodic changes between Changchun and Nanjing Mandarin productions. These results reveal that PFC not only exists in standard Mandarin but also in Mandarin dialects.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127673259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A New Method of Tonal Determination for Chinese Dialects 汉语方言声调确定的新方法
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037711
Yan Li, Zhiyi Wu
Values of the basic tones are the key to do research on dialects in China. The traditional method of determining tones by ear and the more popular method used in experimental phonetics are either inaccurate to some degree or difficult to learn. The method provided and discussed in this paper is simple and reliable, requiring the use of only Praat and fundamental frequency value. More examples are given to prove this method’s effectiveness.
基本声调的价值是研究中国方言的关键。传统的靠耳定音法和实验语音学中较为流行的定音法在一定程度上是不准确的,或者是难以学习的。本文所提供和讨论的方法简单可靠,只需要使用Praat和基频值。通过实例验证了该方法的有效性。
{"title":"A New Method of Tonal Determination for Chinese Dialects","authors":"Yan Li, Zhiyi Wu","doi":"10.1109/IALP48816.2019.9037711","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037711","url":null,"abstract":"Values of the basic tones are the key to do research on dialects in China. The traditional method of determining tones by ear and the more popular method used in experimental phonetics are either inaccurate to some degree or difficult to learn. The method provided and discussed in this paper is simple and reliable, requiring the use of only Praat and fundamental frequency value. More examples are given to prove this method’s effectiveness.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"103 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123525775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Context’s Diversity to Improve Neural Language Model 探索上下文的多样性以改进神经语言模型
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037662
Yanchun Zhang, Xingyuan Chen, Peng Jin, Yajun Du
The neural language models (NLMs), such as long short term memery networks (LSTMs), have achieved great success over the years. However the NLMs usually only minimize a loss between the prediction results and the target words. In fact, the context has natural diversity, i.e. there are few words that could occur more than once in a certain length of word sequence. We report the natural diversity as context’s diversity in this paper. The context’s diversity, in our model, means there is a high probability that the target words predicted by any two contexts are different given a fixed input sequence. Namely the softmax results of any two contexts should be diverse. Based on this observation, we propose a new cross-entropy loss function which is used to calculate the cross-entropy loss of the softmax outputs for any two different given contexts. Adding the new cross-entropy loss, our approach could explicitly consider the context’s diversity, therefore improving the model’s sensitivity of prediction for every context. Based on two typical LSTM models, one is regularized by dropout while the other is not, the results of our experiment show its effectiveness on the benchmark dataset.
神经语言模型(nlm),如长短期记忆网络(lstm),近年来取得了巨大的成功。然而,nlm通常只能将预测结果与目标词之间的损失最小化。事实上,上下文具有天然的多样性,即在一定长度的单词序列中,很少有单词可以出现一次以上。本文将自然多样性称为环境多样性。在我们的模型中,上下文的多样性意味着给定一个固定的输入序列,任意两个上下文预测的目标单词很可能是不同的。即任意两个上下文的softmax结果应该是不同的。基于这一观察,我们提出了一个新的交叉熵损失函数,用于计算任意两个不同给定上下文下softmax输出的交叉熵损失。加入新的交叉熵损失后,我们的方法可以明确地考虑上下文的多样性,从而提高模型对每个上下文的预测灵敏度。基于两种典型的LSTM模型,一种采用dropout正则化,另一种不采用dropout正则化,实验结果表明了该方法在基准数据集上的有效性。
{"title":"Exploring Context’s Diversity to Improve Neural Language Model","authors":"Yanchun Zhang, Xingyuan Chen, Peng Jin, Yajun Du","doi":"10.1109/IALP48816.2019.9037662","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037662","url":null,"abstract":"The neural language models (NLMs), such as long short term memery networks (LSTMs), have achieved great success over the years. However the NLMs usually only minimize a loss between the prediction results and the target words. In fact, the context has natural diversity, i.e. there are few words that could occur more than once in a certain length of word sequence. We report the natural diversity as context’s diversity in this paper. The context’s diversity, in our model, means there is a high probability that the target words predicted by any two contexts are different given a fixed input sequence. Namely the softmax results of any two contexts should be diverse. Based on this observation, we propose a new cross-entropy loss function which is used to calculate the cross-entropy loss of the softmax outputs for any two different given contexts. Adding the new cross-entropy loss, our approach could explicitly consider the context’s diversity, therefore improving the model’s sensitivity of prediction for every context. Based on two typical LSTM models, one is regularized by dropout while the other is not, the results of our experiment show its effectiveness on the benchmark dataset.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124245898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Letter’s Differences between Partial Indonesian Branch Language and English 部分印尼语与英语字母差异探析
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037715
Nankai Lin, Sihui Fu, Jiawen Huang, Sheng-yi Jiang
Differences of letter usage are the most basic differences between different languages, which can reflect the most essential diversity. Many linguists study the letter differences between common languages, but seldom research those between non-common languages. This paper selects three representative languages from the Indonesian branch of the Austronesian language family, namely Malay, Indonesian and Filipino. To study the letter differences between these three languages and English, we concentrate on word length distribution, letter frequency distribution, commonly used letter pairs, commonly used letter trigrams, and ranked letter frequency distribution. The results show that great differences do exist between three Indonesian-branch languages and English, and the differences between Malay and Indonesian are the smallest.
字母用法的差异是不同语言之间最基本的差异,它能反映出最本质的多样性。许多语言学家研究通用语言之间的字母差异,但很少研究非通用语言之间的字母差异。本文选取了南岛语系印尼语分支的三种代表性语言,即马来语、印尼语和菲律宾语。为了研究这三种语言与英语之间的字母差异,我们重点研究了单词长度分布、字母频率分布、常用字母对、常用字母三元组和字母频率排名分布。结果表明,三个印尼语分支语言与英语之间存在很大差异,马来语和印尼语之间的差异最小。
{"title":"Exploring Letter’s Differences between Partial Indonesian Branch Language and English","authors":"Nankai Lin, Sihui Fu, Jiawen Huang, Sheng-yi Jiang","doi":"10.1109/IALP48816.2019.9037715","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037715","url":null,"abstract":"Differences of letter usage are the most basic differences between different languages, which can reflect the most essential diversity. Many linguists study the letter differences between common languages, but seldom research those between non-common languages. This paper selects three representative languages from the Indonesian branch of the Austronesian language family, namely Malay, Indonesian and Filipino. To study the letter differences between these three languages and English, we concentrate on word length distribution, letter frequency distribution, commonly used letter pairs, commonly used letter trigrams, and ranked letter frequency distribution. The results show that great differences do exist between three Indonesian-branch languages and English, and the differences between Malay and Indonesian are the smallest.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129712179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2019 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1