2019 International Conference on Asian Language Processing (IALP)最新文献

英文中文

A Systematic Investigation of Neural Models for Chinese Implicit Discourse Relationship Recognition 汉语内隐话语关系识别神经模型的系统研究

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037686

Dejian Li, Man Lan, Yuanbin Wu

The Chinese implicit discourse relationship recognition is more challenging than English due to the lack of discourse connectives and high frequency in the text. So far, there is no systematical investigation into the neural components for Chinese implicit discourse relationship. To fill this gap, in this work we present a component-based neural framework to systematically study the Chinese implicit discourse relationship. Experimental results showed that our proposed neural Chinese implicit discourse parser achieves the SOTA performance in CoNLL-2016 corpus.

汉语隐性语篇关系识别比英语更具挑战性，因为汉语语篇连接词较少，语篇使用频率高。到目前为止，对汉语内隐话语关系的神经成分还没有系统的研究。为了填补这一空白，本文提出了一个基于组件的神经网络框架来系统地研究汉语隐含语篇关系。实验结果表明，本文提出的神经网络中文隐式语篇解析器在CoNLL-2016语料库中达到了SOTA的性能。

引用次数: 0

Automatic answer ranking based on sememe vector in KBQA 基于语义向量的KBQA自动答案排序

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037712

Yadi Li, Lingling Mu, Hao Li, Hongying Zan

This paper proposes an answer ranking method used in Knowledge Base Question Answering (KBQA) system. This method first extracts the features of predicate sequence similarity based on sememe vector, predicates’ edit distances, predicates’ word co-occurrences and classification. Then the above features are used as inputs of the ranking learning algorithm Ranking SVM to rank the candidate answers. In this paper, the experimental results on the data set of KBQA system evaluation task in the 2016 Natural Language Processing & Chinese Computing (NLPCC 2016) show that, the method of word similarity calculation based on sememe vector has better results than the method based on word2vec. Its accuracy, recall rate and average F1 value respectively are 73.88%, 82.29% and 75.88%. The above results show that the word representation with knowledge has import effect on natural language processing.

提出了一种应用于知识库问答(KBQA)系统的答案排序方法。该方法首先从语义向量、谓词编辑距离、谓词词共现和分类等方面提取谓词序列相似度特征;然后将上述特征作为排序学习算法ranking SVM的输入，对候选答案进行排序。本文在2016年自然语言处理与中文计算(NLPCC 2016)中KBQA系统评价任务数据集上的实验结果表明，基于语义向量的词相似度计算方法比基于word2vec的方法效果更好。其准确率、召回率和平均F1值分别为73.88%、82.29%和75.88%。上述结果表明，带知识的词表示在自然语言处理中具有重要作用。

引用次数: 0

Japanese grammatical simplification with simplified corpus 用简化语料库简化日语语法

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037675

Yumeto Inaoka, Kazuhide Yamamoto

We construct a Japanese grammatical simplification corpus and established automatic simplification methods. We compare the conventional machine translation approach, our proposed method, and a hybrid method by automatic and manual evaluation. The results of the automatic evaluation show that the proposed method exhibits a lower score than the machine translation approach; however, the hybrid method garners the highest score. According to those results, the machine translation approach and proposed method present different sentences that can be simplified, while the hybrid version is effective in grammatical simplification.

构建了日语语法简化语料库，建立了自动简化方法。我们比较了传统的机器翻译方法，我们提出的方法，以及自动和人工评估的混合方法。自动评价结果表明，该方法的评分低于机器翻译方法;然而，混合方法获得了最高分。根据这些结果，机器翻译方法和提出的方法呈现出不同的句子可以简化，而混合版本在语法简化方面是有效的。

引用次数: 0

A Chinese word segment model for energy literature based on Neural Networks with Electricity User Dictionary 基于电力用户词典的能源文献中文分词模型

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037728

Bochuan Song, Bo Chai, Qiang Zhang, Quanye Jia

Traditional Chinese word segmentation (CWS) methods are based on supervised machine learning such as Condtional Random Fields(CRFs), Maximum Entropy(ME), whose features are mostly manual features. These manual features are often derived from local contexts. Currently, most state-of-art methods for Chinese word segmentation are based on neural networks. However these neural networks rarely introduct the user dictionary. We propose a LSTMbased Chinese word segmentation which can take advantage of the user dictionary. The experiments show that our model performs better than a popular segment tool in electricity domain. It is noticed that it achieves a better performance when transfered to a new domain using the user dictionary.

传统的中文分词方法是基于有监督的机器学习，如条件随机场(CRFs)、最大熵(ME)等，其特征多为人工特征。这些手动特性通常来源于本地上下文。目前，最先进的中文分词方法大多是基于神经网络的。然而，这些神经网络很少引入用户字典。提出了一种基于lstm的中文分词方法，该方法可以充分利用用户字典的优势。实验表明，该模型在电领域的性能优于常用的分段工具。注意到，当使用用户字典转移到新域时，它获得了更好的性能。

引用次数: 1

Classified Description and Application of Chinese Constitutive Role 汉语构成角色的分类描述与应用

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037730

Mengxiang Wang, Cuiyan Ma

Constitutive role is one of the 4 qualia roles, which expresses a kind of constitutive relationship between nouns. According to the original definition and description characteristics, this paper divides the constitutive roles into two categories: materials and components. At the same time, combined with the previous methods of extracting the role automatically, this paper optimizes the method of extracting the role automatically. Relying on auxiliary grammatical constructions, we extract noun-noun pairs from large-scale corpus to extract descriptive features of constitutive roles, and then classifies these descriptive knowledge by manual double-blind proofreading. Finally, the author discusses the application of Chinese constitutive roles in word-formational analysis, syntactic analysis and synonym discrimination.

构成角色是四种质角色之一，表达了名词之间的一种构成关系。根据原有的定义和描述特点，本文将本构作用分为两类:材料和构件。同时，结合以往的角色自动提取方法，对角色自动提取方法进行了优化。我们以辅助语法结构为依托，从大规模语料库中提取名词-名词对，提取构成角色的描述性特征，然后通过人工双盲校对对这些描述性知识进行分类。最后，作者讨论了汉语构成角色在构词分析、句法分析和同义词辨析中的应用。

引用次数: 0

Construction of Quantitative Index System of Vocabulary Difficulty in Chinese Grade Reading 语文年级阅读词汇难度定量指标体系的构建

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037664

Huiping Wang, Lijiao Yang, Huimin Xiao

Chinese grade reading for children has a broad application prospect. In this paper, Chinese textbooks for grade 1 to 6 of primary schools published by People’s Education Press are taken as data sets, and the texts are divided into 12 difficulty levels successively. The effective lexical indexes to measure the readability of texts are discussed, and a regression model to effectively measure the lexical difficulty of Chinese texts is established. The study firstly collected 30 indexes at the text lexical level from the three dimensions of lexical richness, semantic transparency and contextual dependence, selected the 7 indexes with the highest relevance to the text difficulty through Person correlation coefficient, and finally constructed a Regression to predict the text difficulty based on Lasso Regression, ElasticNet, Ridge Regression and other algorithms. The regression results show that the model fits well, and the predicted value could explain 89.3% of the total variation of text difficulty, which proves that the quantitative index of vocabulary difficulty of Chinese text constructed in this paper is effective, and can be applied to Chinese grade reading and computer automatic grading of Chinese text difficulty.

少儿语文阅读具有广阔的应用前景。本文以人民教育出版社出版的小学一年级至六年级语文教材为数据集，将课文先后分为12个难度等级。讨论了衡量文本可读性的有效词汇指标，建立了有效衡量汉语文本词汇难度的回归模型。本研究首先从词汇丰富度、语义透明度和上下文依赖性三个维度收集了文本词汇层面的30个指标，通过Person相关系数选择了与文本难度相关度最高的7个指标，最后基于Lasso Regression、ElasticNet、Ridge Regression等算法构建了预测文本难度的回归模型。回归结果表明，模型拟合良好，预测值可解释文本难易度总变化的89.3%，证明本文构建的汉语文本词汇难易度定量指标是有效的，可用于汉语等级阅读和汉语文本难易度计算机自动评分。

{"title":"Construction of Quantitative Index System of Vocabulary Difficulty in Chinese Grade Reading","authors":"Huiping Wang, Lijiao Yang, Huimin Xiao","doi":"10.1109/IALP48816.2019.9037664","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037664","url":null,"abstract":"Chinese grade reading for children has a broad application prospect. In this paper, Chinese textbooks for grade 1 to 6 of primary schools published by People’s Education Press are taken as data sets, and the texts are divided into 12 difficulty levels successively. The effective lexical indexes to measure the readability of texts are discussed, and a regression model to effectively measure the lexical difficulty of Chinese texts is established. The study firstly collected 30 indexes at the text lexical level from the three dimensions of lexical richness, semantic transparency and contextual dependence, selected the 7 indexes with the highest relevance to the text difficulty through Person correlation coefficient, and finally constructed a Regression to predict the text difficulty based on Lasso Regression, ElasticNet, Ridge Regression and other algorithms. The regression results show that the model fits well, and the predicted value could explain 89.3% of the total variation of text difficulty, which proves that the quantitative index of vocabulary difficulty of Chinese text constructed in this paper is effective, and can be applied to Chinese grade reading and computer automatic grading of Chinese text difficulty.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127626679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Prosodic Realization of Focus in Changchun Mandarin and Nanjing Mandarin 长春普通话和南京普通话中焦点的韵律实现

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037655

Ying Chen, Jiajing Zhang, Bingying Ye, Chenfang Zhou

This study was designed to explore the prosodic patterns of focus in two dialects of Mandarin. One is Changchun Mandarin and the other is Nanjing Mandarin. The current paper compares the acoustics of their prosodic realization of focus in a production experiment. Similar to standard Mandarin, which uses in-focus expansion and concomitantly post-focus compression (PFC) to code focus, results in the current study indicate that both Changchun and Nanjing speakers produced significant in-focus expansion of pitch, intensity and duration and PFC of pitch and intensity in their Mandarin dialects. Meanwhile, the results show no significant difference of prosodic changes between Changchun and Nanjing Mandarin productions. These results reveal that PFC not only exists in standard Mandarin but also in Mandarin dialects.

本研究旨在探讨两种普通话方言的焦点韵律模式。一个是长春普通话，另一个是南京普通话。本文在一个生产实验中比较了它们的焦点韵律实现的声学效果。与标准普通话使用焦点内扩展和伴随焦点后压缩(PFC)对焦点进行编码类似，本研究结果表明，长春语和南京语的普通话方言都产生了显著的音调、强度和持续时间的焦点内扩展和音调和强度的PFC。同时，长春和南京普通话作品的韵律变化没有显著差异。结果表明，汉语普通话不仅存在于标准普通话中，也存在于普通话方言中。

引用次数: 1

Employing Gated Attention and Multi-similarities to Resolve Document-level Chinese Event Coreference 用门控注意和多重相似度解决文档级汉语事件共指

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037674

Haoyi Cheng, Peifeng Li, Qiaoming Zhu

Event coreference resolution is a challenging task. To address the issues of the influence on event-independent information in event mentions and the flexible and diverse sentence structure in Chinese language, this paper introduces a GANN (Gated Attention Neural Networks) model to document-level Chinese event coreference resolution. GANN introduces a gated attention mechanism to select eventrelated information from event mentions and then filter noisy information. Moreover, GANN not only uses a single Cosine distance to calculate the linear distance between two event mentions, but also introduces multi-mechanisms, i.e., Bilinear distance and Single Layer Network, to further calculate the linear and nonlinear distances. The experimental results on the ACE 2005 Chinese corpus illustrate that our model GANN outperforms the state-of-the-art baselines.

事件共引用解析是一项具有挑战性的任务。为了解决事件提及对事件无关信息的影响以及汉语句子结构的灵活多变等问题，本文引入了一种基于GANN(门控注意神经网络)的文档级汉语事件共指解析模型。GANN引入了一种门控注意机制，从事件提及中选择与事件相关的信息，然后过滤噪声信息。此外，GANN不仅使用单个余弦距离来计算两个事件提及之间的线性距离，而且还引入了双线性距离和单层网络等多机制来进一步计算线性和非线性距离。在ACE 2005中文语料库上的实验结果表明，我们的模型GANN优于最先进的基线。

引用次数: 0

An End-to-End Model Based on TDNN-BiGRU for Keyword Spotting 基于TDNN-BiGRU的端到端关键字识别模型

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037714

Shuzhou Chai, Zhenye Yang, Changsheng Lv, Weiqiang Zhang

In this paper, we proposed a neural network architecture based on Time-Delay Neural Network (TDNN)Bidirectional Gated Recurrent Unit (BiGRU) for small-footprint keyWord spotting. Our model consists of three parts: TDNN, BiGRU and Attention Mechanism. TDNN models the time information and BiGRU extracts the hidden layer features of the audio. The attention mechanism generates a vector of fixed length with hidden layer features. The system generates the final score through vector linear transformation and softmax function. We explored the step size and unit size of TDNN and two attention mechanisms. Our model has achieved a true positive rate of 99.63% at a 5% false positive rate.

本文提出了一种基于时延神经网络(TDNN)双向门控循环单元(BiGRU)的神经网络结构，用于小空间关键字识别。我们的模型由三部分组成:TDNN、BiGRU和注意机制。TDNN对时间信息进行建模，BiGRU提取音频的隐藏层特征。注意机制生成具有隐层特征的固定长度向量。系统通过向量线性变换和softmax函数生成最终分数。我们探讨了TDNN的步长和单位大小以及两种注意机制。我们的模型在5%的假阳性率下实现了99.63%的真阳性率。

引用次数: 2

Developing a machine learning-based grade level classifier for Filipino children’s literature 为菲律宾儿童文学开发一个基于机器学习的年级分类器

2019 International Conference on Asian Language Processing (IALP)

Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037694

Joseph Marvin Imperial, R. Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almaroi

Reading is an essential part of children’s learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our efforts to develop a baseline model for automatically identifying the readability of children’s and young adult’s books written in Filipino using machine learning algorithms. For this study, we processed 258 picture books published by Adarna House Inc. In contrast to old readability formulas relying on static attributes like number of words, sentences, syllables, etc., other textual features were explored. Count vectors, Term FrequencyInverse Document Frequency (TF-IDF), n-grams, and character-level n-grams were extracted to train models using three major machine learning algorithms–Multinomial Naïve-Bayes, Random Forest, and K-Nearest Neighbors. A combination of K-Nearest Neighbors and Random Forest via voting-based classification mechanism resulted with the best performing model with a high average training accuracy and validation accuracy of 0.822 and 0.74 respectively. Analysis of the top 10 most useful features for each algorithm show that they share common similarity in identifying readability levels–the use of Filipino stop words. Performance of other classifiers and features were also explored.

阅读是儿童学习的重要组成部分。确定阅读材料的适当可读性水平将确保有效理解。我们展示了我们的努力，开发一个基线模型，用于使用机器学习算法自动识别用菲律宾语编写的儿童和青少年书籍的可读性。在这项研究中，我们处理了由Adarna House Inc.出版的258本绘本。与以往依赖单词数、句子数、音节数等静态属性的可读性公式不同，我们探索了其他文本特征。提取计数向量、Term Frequency、inverse Document Frequency (TF-IDF)、n-gram和字符级n-gram，并使用三种主要的机器学习算法(multinomial Naïve-Bayes、Random Forest和K-Nearest Neighbors)训练模型。通过基于投票的分类机制，将k近邻与随机森林相结合，得到了最佳的模型，平均训练精度和验证精度分别达到0.822和0.74。对每个算法最有用的前10个特征的分析表明，它们在识别可读性水平上有共同的相似性——使用菲律宾语停顿词。对其他分类器和特征的性能也进行了探讨。

{"title":"Developing a machine learning-based grade level classifier for Filipino children’s literature","authors":"Joseph Marvin Imperial, R. Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almaroi","doi":"10.1109/IALP48816.2019.9037694","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037694","url":null,"abstract":"Reading is an essential part of children’s learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our efforts to develop a baseline model for automatically identifying the readability of children’s and young adult’s books written in Filipino using machine learning algorithms. For this study, we processed 258 picture books published by Adarna House Inc. In contrast to old readability formulas relying on static attributes like number of words, sentences, syllables, etc., other textual features were explored. Count vectors, Term FrequencyInverse Document Frequency (TF-IDF), n-grams, and character-level n-grams were extracted to train models using three major machine learning algorithms–Multinomial Naïve-Bayes, Random Forest, and K-Nearest Neighbors. A combination of K-Nearest Neighbors and Random Forest via voting-based classification mechanism resulted with the best performing model with a high average training accuracy and validation accuracy of 0.822 and 0.74 respectively. Analysis of the top 10 most useful features for each algorithm show that they share common similarity in identifying readability levels–the use of Filipino stop words. Performance of other classifiers and features were also explored.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121464491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 International Conference on Asian Language Processing (IALP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀