首页 > 最新文献

2019 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
Multiple-source Entity Linking with Incomplete Sources 不完整源的多源实体链接
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037718
Q. Liu, Shui Liu, Lemao Liu, Bo Xiao
This paper introduces a new entity linking task from a well-known online video application in industry, where both entities and mentions are represented by multiple sources but some of them may be missing. To address the issue of incomplete sources, it proposes a novel neural approach to model the linking relationship between a pair of an entity and a mention. To verify the proposed approach to this task, it further creates a large scale dataset including 70k examples. Experiments on this dataset empirically demonstrate that the proposed approach is effective over a baseline and particularly it is robust to the missing sources in some extent.
本文介绍了一种新的实体链接任务,该任务来自于行业中一个知名的在线视频应用程序,其中实体和提及都由多个来源表示,但其中一些可能缺失。为了解决来源不完整的问题,提出了一种新的神经网络方法来模拟实体对与提及之间的链接关系。为了验证提出的方法,它进一步创建了一个包含70k个示例的大规模数据集。在该数据集上的实验经验表明,该方法在基线上是有效的,特别是在一定程度上对缺失源具有鲁棒性。
{"title":"Multiple-source Entity Linking with Incomplete Sources","authors":"Q. Liu, Shui Liu, Lemao Liu, Bo Xiao","doi":"10.1109/IALP48816.2019.9037718","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037718","url":null,"abstract":"This paper introduces a new entity linking task from a well-known online video application in industry, where both entities and mentions are represented by multiple sources but some of them may be missing. To address the issue of incomplete sources, it proposes a novel neural approach to model the linking relationship between a pair of an entity and a mention. To verify the proposed approach to this task, it further creates a large scale dataset including 70k examples. Experiments on this dataset empirically demonstrate that the proposed approach is effective over a baseline and particularly it is robust to the missing sources in some extent.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133857776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Duplicate Question Detection based on Neural Networks and Multi-head Attention 基于神经网络和多头注意的重复问题检测
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037671
Heng Zhang, Liangyu Chen
It is well known that using only one neural network can not get a satisfied accuracy for the problem of Duplicate Question Detection. In order to break through this dilemma, different neural networks are ensembled serially to strive for better accuracy. However, many problems, such as vanishing gradient or exploding gradient, will be encountered if the depth of neural network is blindly increased. Worse, the serial integration may be poor in computational performance since it is less parallelizable and needs more time to train. To solve these problems, we use ensemble learning with treating different neural networks as individual learners, calculating in parallel, and proposing a new voting mechanism to get better detection accuracy. In addition to the classical models based on recurrent or convolutional neural network, Multi-Head Attention is also integrated to reduce the correlation and the performance gap between different models. The experimental results in Quora question pairs dataset show that the accuracy of our method can reach 89.3%.
众所周知,对于重复问题的检测问题,仅使用一个神经网络是无法获得满意的准确率的。为了突破这一困境,不同的神经网络被连续集成,以争取更好的精度。然而,如果盲目增加神经网络的深度,会遇到梯度消失或梯度爆炸等问题。更糟糕的是,串行集成可能在计算性能上很差,因为它的并行性较差,需要更多的时间来训练。为了解决这些问题,我们采用集成学习的方法,将不同的神经网络视为独立的学习者,并行计算,并提出了一种新的投票机制,以获得更好的检测精度。除了基于循环或卷积神经网络的经典模型外,还集成了多头注意,以减少不同模型之间的相关性和性能差距。在Quora问题对数据集上的实验结果表明,我们的方法准确率可以达到89.3%。
{"title":"Duplicate Question Detection based on Neural Networks and Multi-head Attention","authors":"Heng Zhang, Liangyu Chen","doi":"10.1109/IALP48816.2019.9037671","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037671","url":null,"abstract":"It is well known that using only one neural network can not get a satisfied accuracy for the problem of Duplicate Question Detection. In order to break through this dilemma, different neural networks are ensembled serially to strive for better accuracy. However, many problems, such as vanishing gradient or exploding gradient, will be encountered if the depth of neural network is blindly increased. Worse, the serial integration may be poor in computational performance since it is less parallelizable and needs more time to train. To solve these problems, we use ensemble learning with treating different neural networks as individual learners, calculating in parallel, and proposing a new voting mechanism to get better detection accuracy. In addition to the classical models based on recurrent or convolutional neural network, Multi-Head Attention is also integrated to reduce the correlation and the performance gap between different models. The experimental results in Quora question pairs dataset show that the accuracy of our method can reach 89.3%.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134628984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
[IALP 2019 Front Matter] [IALP 2019前沿事项]
Pub Date : 2019-11-01 DOI: 10.1109/ialp48816.2019.9037701
{"title":"[IALP 2019 Front Matter]","authors":"","doi":"10.1109/ialp48816.2019.9037701","DOIUrl":"https://doi.org/10.1109/ialp48816.2019.9037701","url":null,"abstract":"","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132113086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Question Classification with Hybrid Networks 用混合网络改进问题分类
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037707
Yichao Cao, Miao Li, Tao Feng, Rujing Wang, Yue Wu
Question classification is a basic work in natural language processing, which has an important influence on question answering. Due to question sentences are complicated in many specific domains contain a large number of exclusive vocabulary, question classification becomes more difficult in these fields. To address the specific challenge, in this paper, we propose a novel hierarchical hybrid deep network for question classification. Specifically, we first take advantages of word2vec and a synonym dictionary to learn the distributed representations of words. Then, we exploit bi-directional long short-term memory networks to obtain the latent semantic representations of question sentences. Finally, we utilize convolutional neural networks to extract question sentence features and obtain the classification results by a fully-connected network. Besides, at the beginning of the model, we leverage the self-attention layer to capture more useful features between words, such as potential relationships, etc. Experimental results show that our model outperforms common classifiers such as SVM and CNN. Our approach achieves up to 9.37% average accuracy improvements over baseline method across our agricultural dataset.
问题分类是自然语言处理的一项基础工作,对问题的回答有重要影响。由于问句在许多特定的领域中包含大量的排他性词汇,使得问题分类变得更加困难。为了解决这一具体挑战,本文提出了一种新的分层混合深度网络用于问题分类。具体来说,我们首先利用word2vec和同义词字典来学习单词的分布式表示。然后,我们利用双向长短期记忆网络来获取问题句的潜在语义表征。最后,利用卷积神经网络提取疑问句特征,通过全连接网络获得分类结果。此外,在模型的开始,我们利用自注意层来捕获词之间更有用的特征,例如潜在的关系等。实验结果表明,该模型优于SVM和CNN等常用分类器。在我们的农业数据集中,我们的方法比基线方法的平均精度提高了9.37%。
{"title":"Improving Question Classification with Hybrid Networks","authors":"Yichao Cao, Miao Li, Tao Feng, Rujing Wang, Yue Wu","doi":"10.1109/IALP48816.2019.9037707","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037707","url":null,"abstract":"Question classification is a basic work in natural language processing, which has an important influence on question answering. Due to question sentences are complicated in many specific domains contain a large number of exclusive vocabulary, question classification becomes more difficult in these fields. To address the specific challenge, in this paper, we propose a novel hierarchical hybrid deep network for question classification. Specifically, we first take advantages of word2vec and a synonym dictionary to learn the distributed representations of words. Then, we exploit bi-directional long short-term memory networks to obtain the latent semantic representations of question sentences. Finally, we utilize convolutional neural networks to extract question sentence features and obtain the classification results by a fully-connected network. Besides, at the beginning of the model, we leverage the self-attention layer to capture more useful features between words, such as potential relationships, etc. Experimental results show that our model outperforms common classifiers such as SVM and CNN. Our approach achieves up to 9.37% average accuracy improvements over baseline method across our agricultural dataset.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132407671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of English Capitals On Reading Performance of Chinese Learners: Evidence from Eye Tracking 英文大写字母对汉语学习者阅读表现的影响:来自眼动追踪的证据
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037698
Yang Wei, Fu Xinyu
Native English speakers need more time to recognize capital letters in reading, yet the influence of capitals upon Chinese learners’ reading performance is seldom studied. We conducted an eye tracker experiment to explore the cognitive features of Chinese learners in reading texts containing capital letters. Effect of English proficiency on capital letter reading is also studied. The results showed that capitals significantly increase the cognitive load in Chinese learners’ reading process, complicate their cognitive processing, and lower their reading efficiency. The perception of capital letters of Chinese learners is found to be an isolated event and may influence the word superiority effect. English majors, who possess relatively stronger English logical thinking capability than non-English majors, face the same difficulty as the non-English majors do if no practice of capital letter reading have been done.
以英语为母语的人在阅读中需要更多的时间来识别大写字母,而大写字母对中国学习者阅读能力的影响研究却很少。通过眼动仪实验,探讨了汉语学习者在阅读含有大写字母的文本时的认知特征。本文还研究了英语熟练程度对大写字母阅读的影响。结果表明,大写字母显著增加了汉语学习者阅读过程中的认知负荷,使其认知加工复杂化,降低了其阅读效率。研究发现,汉语学习者对大写字母的感知是一个孤立事件,可能会影响单词优势效应。英语专业学生的英语逻辑思维能力比非英语专业学生强,如果没有进行大写字母的阅读练习,他们也会遇到和非英语专业学生一样的困难。
{"title":"Effects of English Capitals On Reading Performance of Chinese Learners: Evidence from Eye Tracking","authors":"Yang Wei, Fu Xinyu","doi":"10.1109/IALP48816.2019.9037698","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037698","url":null,"abstract":"Native English speakers need more time to recognize capital letters in reading, yet the influence of capitals upon Chinese learners’ reading performance is seldom studied. We conducted an eye tracker experiment to explore the cognitive features of Chinese learners in reading texts containing capital letters. Effect of English proficiency on capital letter reading is also studied. The results showed that capitals significantly increase the cognitive load in Chinese learners’ reading process, complicate their cognitive processing, and lower their reading efficiency. The perception of capital letters of Chinese learners is found to be an isolated event and may influence the word superiority effect. English majors, who possess relatively stronger English logical thinking capability than non-English majors, face the same difficulty as the non-English majors do if no practice of capital letter reading have been done.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131767970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical Machine Learning for Transliteration: Transliterating names between Sinhala, Tamil and English 用于音译的统计机器学习:在僧伽罗语、泰米尔语和英语之间音译名字
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037651
H. S. Priyadarshani, M. Rajapaksha, M. M. S. P. Ranasinghe, Kengatharaiyer Sarveswaran, G. Dias
In this paper, we focus on building models for transliteration of personal names between the primary languages of Sri Lanka-namely Sinhala, Tamil and English. Currently, a Rule-based system has been used to transliterate names between Sinhala and Tamil. However, we found that it fails in several cases. Further, there were no systems available to transliterate names to English. In this paper, we present a hybrid approach where we use machine learning and statistical machine translation to do the transliteration. We built a parallel trilingual corpus of personal names. Then we trained a machine learner to classify names based on the ethnicity as we found it is an influencing factor in transliteration. Then we took the transliteration as a translation problem and applied statistical machine translation to generate the most probable transliteration for personal names. The system shows very promising results compared with the existing rule-based system. It gives a BLEU score of 89 in all the test cases and produces the top BLEU score of 93.7 for Sinhala to English transliteration.
在本文中,我们着重于建立人名在斯里兰卡主要语言——即僧伽罗语、泰米尔语和英语之间的音译模型。目前,一个基于规则的系统已被用于在僧伽罗语和泰米尔语之间转写名字。然而,我们发现它在几个案例中失败了。此外,也没有将名字音译成英文的系统。在本文中,我们提出了一种混合方法,我们使用机器学习和统计机器翻译来做音译。我们建立了一个平行的三语人名语料库。然后我们训练了一个机器学习器,根据种族对名字进行分类,因为我们发现种族是音译的一个影响因素。然后我们将音译作为一个翻译问题,应用统计机器翻译生成人名最可能的音译。与现有的基于规则的系统相比,该系统显示出很好的效果。它在所有测试案例中给出了89分的BLEU分数,僧伽罗语到英语音译的BLEU得分最高,为93.7分。
{"title":"Statistical Machine Learning for Transliteration: Transliterating names between Sinhala, Tamil and English","authors":"H. S. Priyadarshani, M. Rajapaksha, M. M. S. P. Ranasinghe, Kengatharaiyer Sarveswaran, G. Dias","doi":"10.1109/IALP48816.2019.9037651","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037651","url":null,"abstract":"In this paper, we focus on building models for transliteration of personal names between the primary languages of Sri Lanka-namely Sinhala, Tamil and English. Currently, a Rule-based system has been used to transliterate names between Sinhala and Tamil. However, we found that it fails in several cases. Further, there were no systems available to transliterate names to English. In this paper, we present a hybrid approach where we use machine learning and statistical machine translation to do the transliteration. We built a parallel trilingual corpus of personal names. Then we trained a machine learner to classify names based on the ethnicity as we found it is an influencing factor in transliteration. Then we took the transliteration as a translation problem and applied statistical machine translation to generate the most probable transliteration for personal names. The system shows very promising results compared with the existing rule-based system. It gives a BLEU score of 89 in all the test cases and produces the top BLEU score of 93.7 for Sinhala to English transliteration.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"521 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131869160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
BERT with Enhanced Layer for Assistant Diagnosis Based on Chinese Obstetric EMRs 基于中国产科电子病历的增强层BERT辅助诊断
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037721
Kunli Zhang, Chuang Liu, Xuemin Duan, Lijuan Zhou, Yueshu Zhao, Hongying Zan
This paper proposes a novel method based on the language representation model called BERT (Bidirectional Encoder Representations from Transformers) for Obstetric assistant diagnosis on Chinese obstetric EMRs (Electronic Medical Records). To aggregate more information for final output, an enhanced layer is augmented to the BERT model. In particular, the enhanced layer in this paper is constructed based on strategy 1(A strategy) and/or strategy 2(A-AP strategy). The proposed method is evaluated on two datasets including Chinese Obstetric EMRs dataset and Arxiv Academic Paper Dataset (AAPD). The experimental results show that the proposed method based on BERT improves the F1 value by 19.58% and 2.71% over the state-of-the-art methods, and the proposed method based on BERT and the enhanced layer by strategy 2 improves the F1 value by 0.7% and 0.3% (strategy 1 improves the F1 value by 0.68% and 0.1%) over the method without adding enhanced layer respectively on Obstetric EMRs dataset and AAPD dataset.
本文提出了一种基于语言表示模型BERT (Bidirectional Encoder Representations from Transformers)的中国产科电子病历辅助诊断方法。为了为最终输出聚合更多信息,BERT模型中增加了一个增强层。特别地,本文的增强层是基于策略1(A策略)和/或策略2(A- ap策略)构建的。在中国产科EMRs数据集和Arxiv学术论文数据集(AAPD)上对该方法进行了评估。实验结果表明,基于BERT的方法在产科EMRs数据集和AAPD数据集上的F1值分别比现有方法提高了19.58%和2.71%,基于BERT和策略2的增强层的方法在产科EMRs数据集和AAPD数据集上的F1值分别比不添加增强层的方法提高了0.7%和0.3%(策略1的F1值分别提高了0.68%和0.1%)。
{"title":"BERT with Enhanced Layer for Assistant Diagnosis Based on Chinese Obstetric EMRs","authors":"Kunli Zhang, Chuang Liu, Xuemin Duan, Lijuan Zhou, Yueshu Zhao, Hongying Zan","doi":"10.1109/IALP48816.2019.9037721","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037721","url":null,"abstract":"This paper proposes a novel method based on the language representation model called BERT (Bidirectional Encoder Representations from Transformers) for Obstetric assistant diagnosis on Chinese obstetric EMRs (Electronic Medical Records). To aggregate more information for final output, an enhanced layer is augmented to the BERT model. In particular, the enhanced layer in this paper is constructed based on strategy 1(A strategy) and/or strategy 2(A-AP strategy). The proposed method is evaluated on two datasets including Chinese Obstetric EMRs dataset and Arxiv Academic Paper Dataset (AAPD). The experimental results show that the proposed method based on BERT improves the F1 value by 19.58% and 2.71% over the state-of-the-art methods, and the proposed method based on BERT and the enhanced layer by strategy 2 improves the F1 value by 0.7% and 0.3% (strategy 1 improves the F1 value by 0.68% and 0.1%) over the method without adding enhanced layer respectively on Obstetric EMRs dataset and AAPD dataset.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128578304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Phrase-Based Tibetan-Chinese Statistical Machine Translation 基于短语的藏汉统计机器翻译
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037691
Yong Cuo, Xiaodon Shi, T. Nyima, Yidong Chen
Statistical machine translation has made great progress in recent years, and Tibetan-Chinese machine translation has many needs. A phrase-based translation model is suitable for machine translation between Tibetan and Chinese, which have similar morphological changes. This paper studies the key technologies of phrase-based Tibetan-Chinese statistical machine translation, including phrase-translation models and reordering models, and proposes a phrase-based Tibetan-Chinese statistical machine translation prototype system. The method proposed in this paper has better accuracy than Moses, the current mainstream model, in the CWMT 2013 development set, and shows great performance improvement.
统计机器翻译近年来取得了很大的进步,藏汉机器翻译有很多需求。基于短语的翻译模型适用于具有相似形态变化的藏汉两种语言的机器翻译。研究了基于短语的藏汉统计机器翻译的关键技术,包括短语翻译模型和重新排序模型,提出了基于短语的藏汉统计机器翻译原型系统。本文提出的方法在CWMT 2013开发集上的准确率优于当前主流模型Moses,性能有较大提升。
{"title":"Phrase-Based Tibetan-Chinese Statistical Machine Translation","authors":"Yong Cuo, Xiaodon Shi, T. Nyima, Yidong Chen","doi":"10.1109/IALP48816.2019.9037691","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037691","url":null,"abstract":"Statistical machine translation has made great progress in recent years, and Tibetan-Chinese machine translation has many needs. A phrase-based translation model is suitable for machine translation between Tibetan and Chinese, which have similar morphological changes. This paper studies the key technologies of phrase-based Tibetan-Chinese statistical machine translation, including phrase-translation models and reordering models, and proposes a phrase-based Tibetan-Chinese statistical machine translation prototype system. The method proposed in this paper has better accuracy than Moses, the current mainstream model, in the CWMT 2013 development set, and shows great performance improvement.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132385844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Machine Learning Model for the Dating of Ancient Chinese Texts 中国古代文献年代测定的机器学习模型
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037653
Xuejin Yu, W. Huangfu
This paper, with the intent of solving the issues on the dating of ancient Chinese texts, takes advantage of the Long-Short Term Memory Network (LSTM) to analyze and process the character sequence in ancient Chinese. In this model, each character is transformed into a high-dimensional vector, and then vectors and the non-linear relationships among them are read and analyzed by LSTM, which finally achieve the dating tags. Experimental results show that the LSTM has a strong ability to date the ancient texts, and the precision reaches about 95% in our experiments. Thus, the proposed model offers an effective method on how to date the ancient Chinese texts. It also inspires us to actively improve the time-consuming analysis tasks in the Chinese NLP field.
本文利用长短期记忆网络(LSTM)对古汉语汉字序列进行分析和处理,旨在解决古汉语文本的年代测定问题。在该模型中,将每个字符转换成一个高维向量,然后通过LSTM对向量及其之间的非线性关系进行读取和分析,最终实现日期标记。实验结果表明,LSTM具有较强的古文本定年能力,在我们的实验中准确率达到95%左右。因此,该模型为中国古代文献的年代确定提供了一种有效的方法。这也激励我们积极改进中国NLP领域耗时的分析任务。
{"title":"A Machine Learning Model for the Dating of Ancient Chinese Texts","authors":"Xuejin Yu, W. Huangfu","doi":"10.1109/IALP48816.2019.9037653","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037653","url":null,"abstract":"This paper, with the intent of solving the issues on the dating of ancient Chinese texts, takes advantage of the Long-Short Term Memory Network (LSTM) to analyze and process the character sequence in ancient Chinese. In this model, each character is transformed into a high-dimensional vector, and then vectors and the non-linear relationships among them are read and analyzed by LSTM, which finally achieve the dating tags. Experimental results show that the LSTM has a strong ability to date the ancient texts, and the precision reaches about 95% in our experiments. Thus, the proposed model offers an effective method on how to date the ancient Chinese texts. It also inspires us to actively improve the time-consuming analysis tasks in the Chinese NLP field.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116592045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Correlational Neural Network Based Feature Adaptation in L2 Mispronunciation Detection 基于相关神经网络特征自适应的二语发音错误检测
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037719
Wenwei Dong, Yanlu Xie
Due to the difficulties of collecting and annotating second language (L2) learner’s speech corpus in Computer-Assisted Pronunciation Training (CAPT), traditional mispronunciation detection framework is similar to ASR, it uses speech corpus of native speaker to train neural networks and then the framework is used to evaluate non-native speaker’s pronunciation. Therefore there is a mismatch between them in channels, reading style, and speakers. In order to reduce this influence, this paper proposes a feature adaptation method using Correlational Neural Network (CorrNet). Before training the acoustic model, we use a few unannotated non-native data to adapt the native acoustic feature. The mispronunciation detection accuracy of CorrNet based method has improved 3.19% over un-normalized Fbank feature and 1.74% over bottleneck feature in Japanese speaking Chinese corpus. The results show the effectiveness of the method.
由于在计算机辅助发音训练(CAPT)中很难收集和标注第二语言学习者的语音语料库,传统的错误发音检测框架类似于ASR,它使用母语使用者的语音语料库训练神经网络,然后使用该框架来评估非母语使用者的发音。因此,他们在频道、阅读方式和演讲者之间存在不匹配。为了减小这种影响,本文提出了一种基于相关神经网络(cornet)的特征自适应方法。在训练声学模型之前,我们使用少量未标注的非本地数据来适应本地声学特征。在日语汉语语料库中,基于cornet的错误发音检测准确率比未归一化的Fbank特征提高了3.19%,比瓶颈特征提高了1.74%。结果表明了该方法的有效性。
{"title":"Correlational Neural Network Based Feature Adaptation in L2 Mispronunciation Detection","authors":"Wenwei Dong, Yanlu Xie","doi":"10.1109/IALP48816.2019.9037719","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037719","url":null,"abstract":"Due to the difficulties of collecting and annotating second language (L2) learner’s speech corpus in Computer-Assisted Pronunciation Training (CAPT), traditional mispronunciation detection framework is similar to ASR, it uses speech corpus of native speaker to train neural networks and then the framework is used to evaluate non-native speaker’s pronunciation. Therefore there is a mismatch between them in channels, reading style, and speakers. In order to reduce this influence, this paper proposes a feature adaptation method using Correlational Neural Network (CorrNet). Before training the acoustic model, we use a few unannotated non-native data to adapt the native acoustic feature. The mispronunciation detection accuracy of CorrNet based method has improved 3.19% over un-normalized Fbank feature and 1.74% over bottleneck feature in Japanese speaking Chinese corpus. The results show the effectiveness of the method.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126014198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1