首页 > 最新文献

2019 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
A Systematic Investigation of Neural Models for Chinese Implicit Discourse Relationship Recognition 汉语内隐话语关系识别神经模型的系统研究
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037686
Dejian Li, Man Lan, Yuanbin Wu
The Chinese implicit discourse relationship recognition is more challenging than English due to the lack of discourse connectives and high frequency in the text. So far, there is no systematical investigation into the neural components for Chinese implicit discourse relationship. To fill this gap, in this work we present a component-based neural framework to systematically study the Chinese implicit discourse relationship. Experimental results showed that our proposed neural Chinese implicit discourse parser achieves the SOTA performance in CoNLL-2016 corpus.
汉语隐性语篇关系识别比英语更具挑战性,因为汉语语篇连接词较少,语篇使用频率高。到目前为止,对汉语内隐话语关系的神经成分还没有系统的研究。为了填补这一空白,本文提出了一个基于组件的神经网络框架来系统地研究汉语隐含语篇关系。实验结果表明,本文提出的神经网络中文隐式语篇解析器在CoNLL-2016语料库中达到了SOTA的性能。
{"title":"A Systematic Investigation of Neural Models for Chinese Implicit Discourse Relationship Recognition","authors":"Dejian Li, Man Lan, Yuanbin Wu","doi":"10.1109/IALP48816.2019.9037686","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037686","url":null,"abstract":"The Chinese implicit discourse relationship recognition is more challenging than English due to the lack of discourse connectives and high frequency in the text. So far, there is no systematical investigation into the neural components for Chinese implicit discourse relationship. To fill this gap, in this work we present a component-based neural framework to systematically study the Chinese implicit discourse relationship. Experimental results showed that our proposed neural Chinese implicit discourse parser achieves the SOTA performance in CoNLL-2016 corpus.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127896435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic answer ranking based on sememe vector in KBQA 基于语义向量的KBQA自动答案排序
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037712
Yadi Li, Lingling Mu, Hao Li, Hongying Zan
This paper proposes an answer ranking method used in Knowledge Base Question Answering (KBQA) system. This method first extracts the features of predicate sequence similarity based on sememe vector, predicates’ edit distances, predicates’ word co-occurrences and classification. Then the above features are used as inputs of the ranking learning algorithm Ranking SVM to rank the candidate answers. In this paper, the experimental results on the data set of KBQA system evaluation task in the 2016 Natural Language Processing & Chinese Computing (NLPCC 2016) show that, the method of word similarity calculation based on sememe vector has better results than the method based on word2vec. Its accuracy, recall rate and average F1 value respectively are 73.88%, 82.29% and 75.88%. The above results show that the word representation with knowledge has import effect on natural language processing.
提出了一种应用于知识库问答(KBQA)系统的答案排序方法。该方法首先从语义向量、谓词编辑距离、谓词词共现和分类等方面提取谓词序列相似度特征;然后将上述特征作为排序学习算法ranking SVM的输入,对候选答案进行排序。本文在2016年自然语言处理与中文计算(NLPCC 2016)中KBQA系统评价任务数据集上的实验结果表明,基于语义向量的词相似度计算方法比基于word2vec的方法效果更好。其准确率、召回率和平均F1值分别为73.88%、82.29%和75.88%。上述结果表明,带知识的词表示在自然语言处理中具有重要作用。
{"title":"Automatic answer ranking based on sememe vector in KBQA","authors":"Yadi Li, Lingling Mu, Hao Li, Hongying Zan","doi":"10.1109/IALP48816.2019.9037712","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037712","url":null,"abstract":"This paper proposes an answer ranking method used in Knowledge Base Question Answering (KBQA) system. This method first extracts the features of predicate sequence similarity based on sememe vector, predicates’ edit distances, predicates’ word co-occurrences and classification. Then the above features are used as inputs of the ranking learning algorithm Ranking SVM to rank the candidate answers. In this paper, the experimental results on the data set of KBQA system evaluation task in the 2016 Natural Language Processing & Chinese Computing (NLPCC 2016) show that, the method of word similarity calculation based on sememe vector has better results than the method based on word2vec. Its accuracy, recall rate and average F1 value respectively are 73.88%, 82.29% and 75.88%. The above results show that the word representation with knowledge has import effect on natural language processing.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123498382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Japanese grammatical simplification with simplified corpus 用简化语料库简化日语语法
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037675
Yumeto Inaoka, Kazuhide Yamamoto
We construct a Japanese grammatical simplification corpus and established automatic simplification methods. We compare the conventional machine translation approach, our proposed method, and a hybrid method by automatic and manual evaluation. The results of the automatic evaluation show that the proposed method exhibits a lower score than the machine translation approach; however, the hybrid method garners the highest score. According to those results, the machine translation approach and proposed method present different sentences that can be simplified, while the hybrid version is effective in grammatical simplification.
构建了日语语法简化语料库,建立了自动简化方法。我们比较了传统的机器翻译方法,我们提出的方法,以及自动和人工评估的混合方法。自动评价结果表明,该方法的评分低于机器翻译方法;然而,混合方法获得了最高分。根据这些结果,机器翻译方法和提出的方法呈现出不同的句子可以简化,而混合版本在语法简化方面是有效的。
{"title":"Japanese grammatical simplification with simplified corpus","authors":"Yumeto Inaoka, Kazuhide Yamamoto","doi":"10.1109/IALP48816.2019.9037675","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037675","url":null,"abstract":"We construct a Japanese grammatical simplification corpus and established automatic simplification methods. We compare the conventional machine translation approach, our proposed method, and a hybrid method by automatic and manual evaluation. The results of the automatic evaluation show that the proposed method exhibits a lower score than the machine translation approach; however, the hybrid method garners the highest score. According to those results, the machine translation approach and proposed method present different sentences that can be simplified, while the hybrid version is effective in grammatical simplification.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127064645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Chinese word segment model for energy literature based on Neural Networks with Electricity User Dictionary 基于电力用户词典的能源文献中文分词模型
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037728
Bochuan Song, Bo Chai, Qiang Zhang, Quanye Jia
Traditional Chinese word segmentation (CWS) methods are based on supervised machine learning such as Condtional Random Fields(CRFs), Maximum Entropy(ME), whose features are mostly manual features. These manual features are often derived from local contexts. Currently, most state-of-art methods for Chinese word segmentation are based on neural networks. However these neural networks rarely introduct the user dictionary. We propose a LSTMbased Chinese word segmentation which can take advantage of the user dictionary. The experiments show that our model performs better than a popular segment tool in electricity domain. It is noticed that it achieves a better performance when transfered to a new domain using the user dictionary.
传统的中文分词方法是基于有监督的机器学习,如条件随机场(CRFs)、最大熵(ME)等,其特征多为人工特征。这些手动特性通常来源于本地上下文。目前,最先进的中文分词方法大多是基于神经网络的。然而,这些神经网络很少引入用户字典。提出了一种基于lstm的中文分词方法,该方法可以充分利用用户字典的优势。实验表明,该模型在电领域的性能优于常用的分段工具。注意到,当使用用户字典转移到新域时,它获得了更好的性能。
{"title":"A Chinese word segment model for energy literature based on Neural Networks with Electricity User Dictionary","authors":"Bochuan Song, Bo Chai, Qiang Zhang, Quanye Jia","doi":"10.1109/IALP48816.2019.9037728","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037728","url":null,"abstract":"Traditional Chinese word segmentation (CWS) methods are based on supervised machine learning such as Condtional Random Fields(CRFs), Maximum Entropy(ME), whose features are mostly manual features. These manual features are often derived from local contexts. Currently, most state-of-art methods for Chinese word segmentation are based on neural networks. However these neural networks rarely introduct the user dictionary. We propose a LSTMbased Chinese word segmentation which can take advantage of the user dictionary. The experiments show that our model performs better than a popular segment tool in electricity domain. It is noticed that it achieves a better performance when transfered to a new domain using the user dictionary.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"105 23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127456072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Classified Description and Application of Chinese Constitutive Role 汉语构成角色的分类描述与应用
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037730
Mengxiang Wang, Cuiyan Ma
Constitutive role is one of the 4 qualia roles, which expresses a kind of constitutive relationship between nouns. According to the original definition and description characteristics, this paper divides the constitutive roles into two categories: materials and components. At the same time, combined with the previous methods of extracting the role automatically, this paper optimizes the method of extracting the role automatically. Relying on auxiliary grammatical constructions, we extract noun-noun pairs from large-scale corpus to extract descriptive features of constitutive roles, and then classifies these descriptive knowledge by manual double-blind proofreading. Finally, the author discusses the application of Chinese constitutive roles in word-formational analysis, syntactic analysis and synonym discrimination.
构成角色是四种质角色之一,表达了名词之间的一种构成关系。根据原有的定义和描述特点,本文将本构作用分为两类:材料和构件。同时,结合以往的角色自动提取方法,对角色自动提取方法进行了优化。我们以辅助语法结构为依托,从大规模语料库中提取名词-名词对,提取构成角色的描述性特征,然后通过人工双盲校对对这些描述性知识进行分类。最后,作者讨论了汉语构成角色在构词分析、句法分析和同义词辨析中的应用。
{"title":"Classified Description and Application of Chinese Constitutive Role","authors":"Mengxiang Wang, Cuiyan Ma","doi":"10.1109/IALP48816.2019.9037730","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037730","url":null,"abstract":"Constitutive role is one of the 4 qualia roles, which expresses a kind of constitutive relationship between nouns. According to the original definition and description characteristics, this paper divides the constitutive roles into two categories: materials and components. At the same time, combined with the previous methods of extracting the role automatically, this paper optimizes the method of extracting the role automatically. Relying on auxiliary grammatical constructions, we extract noun-noun pairs from large-scale corpus to extract descriptive features of constitutive roles, and then classifies these descriptive knowledge by manual double-blind proofreading. Finally, the author discusses the application of Chinese constitutive roles in word-formational analysis, syntactic analysis and synonym discrimination.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126439107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of Quantitative Index System of Vocabulary Difficulty in Chinese Grade Reading 语文年级阅读词汇难度定量指标体系的构建
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037664
Huiping Wang, Lijiao Yang, Huimin Xiao
Chinese grade reading for children has a broad application prospect. In this paper, Chinese textbooks for grade 1 to 6 of primary schools published by People’s Education Press are taken as data sets, and the texts are divided into 12 difficulty levels successively. The effective lexical indexes to measure the readability of texts are discussed, and a regression model to effectively measure the lexical difficulty of Chinese texts is established. The study firstly collected 30 indexes at the text lexical level from the three dimensions of lexical richness, semantic transparency and contextual dependence, selected the 7 indexes with the highest relevance to the text difficulty through Person correlation coefficient, and finally constructed a Regression to predict the text difficulty based on Lasso Regression, ElasticNet, Ridge Regression and other algorithms. The regression results show that the model fits well, and the predicted value could explain 89.3% of the total variation of text difficulty, which proves that the quantitative index of vocabulary difficulty of Chinese text constructed in this paper is effective, and can be applied to Chinese grade reading and computer automatic grading of Chinese text difficulty.
少儿语文阅读具有广阔的应用前景。本文以人民教育出版社出版的小学一年级至六年级语文教材为数据集,将课文先后分为12个难度等级。讨论了衡量文本可读性的有效词汇指标,建立了有效衡量汉语文本词汇难度的回归模型。本研究首先从词汇丰富度、语义透明度和上下文依赖性三个维度收集了文本词汇层面的30个指标,通过Person相关系数选择了与文本难度相关度最高的7个指标,最后基于Lasso Regression、ElasticNet、Ridge Regression等算法构建了预测文本难度的回归模型。回归结果表明,模型拟合良好,预测值可解释文本难易度总变化的89.3%,证明本文构建的汉语文本词汇难易度定量指标是有效的,可用于汉语等级阅读和汉语文本难易度计算机自动评分。
{"title":"Construction of Quantitative Index System of Vocabulary Difficulty in Chinese Grade Reading","authors":"Huiping Wang, Lijiao Yang, Huimin Xiao","doi":"10.1109/IALP48816.2019.9037664","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037664","url":null,"abstract":"Chinese grade reading for children has a broad application prospect. In this paper, Chinese textbooks for grade 1 to 6 of primary schools published by People’s Education Press are taken as data sets, and the texts are divided into 12 difficulty levels successively. The effective lexical indexes to measure the readability of texts are discussed, and a regression model to effectively measure the lexical difficulty of Chinese texts is established. The study firstly collected 30 indexes at the text lexical level from the three dimensions of lexical richness, semantic transparency and contextual dependence, selected the 7 indexes with the highest relevance to the text difficulty through Person correlation coefficient, and finally constructed a Regression to predict the text difficulty based on Lasso Regression, ElasticNet, Ridge Regression and other algorithms. The regression results show that the model fits well, and the predicted value could explain 89.3% of the total variation of text difficulty, which proves that the quantitative index of vocabulary difficulty of Chinese text constructed in this paper is effective, and can be applied to Chinese grade reading and computer automatic grading of Chinese text difficulty.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127626679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Prosodic Realization of Focus in Changchun Mandarin and Nanjing Mandarin 长春普通话和南京普通话中焦点的韵律实现
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037655
Ying Chen, Jiajing Zhang, Bingying Ye, Chenfang Zhou
This study was designed to explore the prosodic patterns of focus in two dialects of Mandarin. One is Changchun Mandarin and the other is Nanjing Mandarin. The current paper compares the acoustics of their prosodic realization of focus in a production experiment. Similar to standard Mandarin, which uses in-focus expansion and concomitantly post-focus compression (PFC) to code focus, results in the current study indicate that both Changchun and Nanjing speakers produced significant in-focus expansion of pitch, intensity and duration and PFC of pitch and intensity in their Mandarin dialects. Meanwhile, the results show no significant difference of prosodic changes between Changchun and Nanjing Mandarin productions. These results reveal that PFC not only exists in standard Mandarin but also in Mandarin dialects.
本研究旨在探讨两种普通话方言的焦点韵律模式。一个是长春普通话,另一个是南京普通话。本文在一个生产实验中比较了它们的焦点韵律实现的声学效果。与标准普通话使用焦点内扩展和伴随焦点后压缩(PFC)对焦点进行编码类似,本研究结果表明,长春语和南京语的普通话方言都产生了显著的音调、强度和持续时间的焦点内扩展和音调和强度的PFC。同时,长春和南京普通话作品的韵律变化没有显著差异。结果表明,汉语普通话不仅存在于标准普通话中,也存在于普通话方言中。
{"title":"Prosodic Realization of Focus in Changchun Mandarin and Nanjing Mandarin","authors":"Ying Chen, Jiajing Zhang, Bingying Ye, Chenfang Zhou","doi":"10.1109/IALP48816.2019.9037655","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037655","url":null,"abstract":"This study was designed to explore the prosodic patterns of focus in two dialects of Mandarin. One is Changchun Mandarin and the other is Nanjing Mandarin. The current paper compares the acoustics of their prosodic realization of focus in a production experiment. Similar to standard Mandarin, which uses in-focus expansion and concomitantly post-focus compression (PFC) to code focus, results in the current study indicate that both Changchun and Nanjing speakers produced significant in-focus expansion of pitch, intensity and duration and PFC of pitch and intensity in their Mandarin dialects. Meanwhile, the results show no significant difference of prosodic changes between Changchun and Nanjing Mandarin productions. These results reveal that PFC not only exists in standard Mandarin but also in Mandarin dialects.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127673259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Employing Gated Attention and Multi-similarities to Resolve Document-level Chinese Event Coreference 用门控注意和多重相似度解决文档级汉语事件共指
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037674
Haoyi Cheng, Peifeng Li, Qiaoming Zhu
Event coreference resolution is a challenging task. To address the issues of the influence on event-independent information in event mentions and the flexible and diverse sentence structure in Chinese language, this paper introduces a GANN (Gated Attention Neural Networks) model to document-level Chinese event coreference resolution. GANN introduces a gated attention mechanism to select eventrelated information from event mentions and then filter noisy information. Moreover, GANN not only uses a single Cosine distance to calculate the linear distance between two event mentions, but also introduces multi-mechanisms, i.e., Bilinear distance and Single Layer Network, to further calculate the linear and nonlinear distances. The experimental results on the ACE 2005 Chinese corpus illustrate that our model GANN outperforms the state-of-the-art baselines.
事件共引用解析是一项具有挑战性的任务。为了解决事件提及对事件无关信息的影响以及汉语句子结构的灵活多变等问题,本文引入了一种基于GANN(门控注意神经网络)的文档级汉语事件共指解析模型。GANN引入了一种门控注意机制,从事件提及中选择与事件相关的信息,然后过滤噪声信息。此外,GANN不仅使用单个余弦距离来计算两个事件提及之间的线性距离,而且还引入了双线性距离和单层网络等多机制来进一步计算线性和非线性距离。在ACE 2005中文语料库上的实验结果表明,我们的模型GANN优于最先进的基线。
{"title":"Employing Gated Attention and Multi-similarities to Resolve Document-level Chinese Event Coreference","authors":"Haoyi Cheng, Peifeng Li, Qiaoming Zhu","doi":"10.1109/IALP48816.2019.9037674","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037674","url":null,"abstract":"Event coreference resolution is a challenging task. To address the issues of the influence on event-independent information in event mentions and the flexible and diverse sentence structure in Chinese language, this paper introduces a GANN (Gated Attention Neural Networks) model to document-level Chinese event coreference resolution. GANN introduces a gated attention mechanism to select eventrelated information from event mentions and then filter noisy information. Moreover, GANN not only uses a single Cosine distance to calculate the linear distance between two event mentions, but also introduces multi-mechanisms, i.e., Bilinear distance and Single Layer Network, to further calculate the linear and nonlinear distances. The experimental results on the ACE 2005 Chinese corpus illustrate that our model GANN outperforms the state-of-the-art baselines.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131571031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An End-to-End Model Based on TDNN-BiGRU for Keyword Spotting 基于TDNN-BiGRU的端到端关键字识别模型
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037714
Shuzhou Chai, Zhenye Yang, Changsheng Lv, Weiqiang Zhang
In this paper, we proposed a neural network architecture based on Time-Delay Neural Network (TDNN)Bidirectional Gated Recurrent Unit (BiGRU) for small-footprint keyWord spotting. Our model consists of three parts: TDNN, BiGRU and Attention Mechanism. TDNN models the time information and BiGRU extracts the hidden layer features of the audio. The attention mechanism generates a vector of fixed length with hidden layer features. The system generates the final score through vector linear transformation and softmax function. We explored the step size and unit size of TDNN and two attention mechanisms. Our model has achieved a true positive rate of 99.63% at a 5% false positive rate.
本文提出了一种基于时延神经网络(TDNN)双向门控循环单元(BiGRU)的神经网络结构,用于小空间关键字识别。我们的模型由三部分组成:TDNN、BiGRU和注意机制。TDNN对时间信息进行建模,BiGRU提取音频的隐藏层特征。注意机制生成具有隐层特征的固定长度向量。系统通过向量线性变换和softmax函数生成最终分数。我们探讨了TDNN的步长和单位大小以及两种注意机制。我们的模型在5%的假阳性率下实现了99.63%的真阳性率。
{"title":"An End-to-End Model Based on TDNN-BiGRU for Keyword Spotting","authors":"Shuzhou Chai, Zhenye Yang, Changsheng Lv, Weiqiang Zhang","doi":"10.1109/IALP48816.2019.9037714","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037714","url":null,"abstract":"In this paper, we proposed a neural network architecture based on Time-Delay Neural Network (TDNN)Bidirectional Gated Recurrent Unit (BiGRU) for small-footprint keyWord spotting. Our model consists of three parts: TDNN, BiGRU and Attention Mechanism. TDNN models the time information and BiGRU extracts the hidden layer features of the audio. The attention mechanism generates a vector of fixed length with hidden layer features. The system generates the final score through vector linear transformation and softmax function. We explored the step size and unit size of TDNN and two attention mechanisms. Our model has achieved a true positive rate of 99.63% at a 5% false positive rate.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"521 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131869190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Developing a machine learning-based grade level classifier for Filipino children’s literature 为菲律宾儿童文学开发一个基于机器学习的年级分类器
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037694
Joseph Marvin Imperial, R. Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almaroi
Reading is an essential part of children’s learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our efforts to develop a baseline model for automatically identifying the readability of children’s and young adult’s books written in Filipino using machine learning algorithms. For this study, we processed 258 picture books published by Adarna House Inc. In contrast to old readability formulas relying on static attributes like number of words, sentences, syllables, etc., other textual features were explored. Count vectors, Term FrequencyInverse Document Frequency (TF-IDF), n-grams, and character-level n-grams were extracted to train models using three major machine learning algorithms–Multinomial Naïve-Bayes, Random Forest, and K-Nearest Neighbors. A combination of K-Nearest Neighbors and Random Forest via voting-based classification mechanism resulted with the best performing model with a high average training accuracy and validation accuracy of 0.822 and 0.74 respectively. Analysis of the top 10 most useful features for each algorithm show that they share common similarity in identifying readability levels–the use of Filipino stop words. Performance of other classifiers and features were also explored.
阅读是儿童学习的重要组成部分。确定阅读材料的适当可读性水平将确保有效理解。我们展示了我们的努力,开发一个基线模型,用于使用机器学习算法自动识别用菲律宾语编写的儿童和青少年书籍的可读性。在这项研究中,我们处理了由Adarna House Inc.出版的258本绘本。与以往依赖单词数、句子数、音节数等静态属性的可读性公式不同,我们探索了其他文本特征。提取计数向量、Term Frequency、inverse Document Frequency (TF-IDF)、n-gram和字符级n-gram,并使用三种主要的机器学习算法(multinomial Naïve-Bayes、Random Forest和K-Nearest Neighbors)训练模型。通过基于投票的分类机制,将k近邻与随机森林相结合,得到了最佳的模型,平均训练精度和验证精度分别达到0.822和0.74。对每个算法最有用的前10个特征的分析表明,它们在识别可读性水平上有共同的相似性——使用菲律宾语停顿词。对其他分类器和特征的性能也进行了探讨。
{"title":"Developing a machine learning-based grade level classifier for Filipino children’s literature","authors":"Joseph Marvin Imperial, R. Roxas, Erica Mae Campos, Jemelee Oandasan, Reyniel Caraballo, Ferry Winsley Sabdani, Ani Rosa Almaroi","doi":"10.1109/IALP48816.2019.9037694","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037694","url":null,"abstract":"Reading is an essential part of children’s learning. Identifying the proper readability level of reading materials will ensure effective comprehension. We present our efforts to develop a baseline model for automatically identifying the readability of children’s and young adult’s books written in Filipino using machine learning algorithms. For this study, we processed 258 picture books published by Adarna House Inc. In contrast to old readability formulas relying on static attributes like number of words, sentences, syllables, etc., other textual features were explored. Count vectors, Term FrequencyInverse Document Frequency (TF-IDF), n-grams, and character-level n-grams were extracted to train models using three major machine learning algorithms–Multinomial Naïve-Bayes, Random Forest, and K-Nearest Neighbors. A combination of K-Nearest Neighbors and Random Forest via voting-based classification mechanism resulted with the best performing model with a high average training accuracy and validation accuracy of 0.822 and 0.74 respectively. Analysis of the top 10 most useful features for each algorithm show that they share common similarity in identifying readability levels–the use of Filipino stop words. Performance of other classifiers and features were also explored.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121464491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2019 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1