首页 > 最新文献

2019 International Conference on Asian Language Processing (IALP)最新文献

英文 中文
Cross Language Information Retrieval Using Parallel Corpus with Bilingual Mapping Method 基于双语映射的平行语料库跨语言信息检索
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037705
Rinaldi Andrian Rahmanda, M. Adriani, Dipta Tanaya
This study presents an approach to generate a bilingual language model that will be used for CLIR task. Language models for Bahasa Indonesia and English are created by utilizing a bilingual parallel corpus, and then the bilingual language model is created by learning the mapping between the Indonesian model and the English model using the Multilayer Perceptron model. Query expansion is also used in this system to boost the results of the retrieval, using pre-Bilingual Mapping, post-Bilingual Mapping and hybrid approaches. The results of the experiments show that the implemented system, with the addition of pre-Bilingual Mapping query expansion, manages to improve the performance of the CLIR task.
本研究提出了一种生成用于CLIR任务的双语语言模型的方法。利用双语平行语料库创建印尼语和英语的语言模型,然后使用多层感知器模型学习印尼语模型和英语模型之间的映射,从而创建双语语言模型。该系统还采用了前双语映射、后双语映射和混合映射的方法来扩展查询,以提高检索结果。实验结果表明,该系统增加了前置双语映射查询扩展功能,有效地提高了CLIR任务的性能。
{"title":"Cross Language Information Retrieval Using Parallel Corpus with Bilingual Mapping Method","authors":"Rinaldi Andrian Rahmanda, M. Adriani, Dipta Tanaya","doi":"10.1109/IALP48816.2019.9037705","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037705","url":null,"abstract":"This study presents an approach to generate a bilingual language model that will be used for CLIR task. Language models for Bahasa Indonesia and English are created by utilizing a bilingual parallel corpus, and then the bilingual language model is created by learning the mapping between the Indonesian model and the English model using the Multilayer Perceptron model. Query expansion is also used in this system to boost the results of the retrieval, using pre-Bilingual Mapping, post-Bilingual Mapping and hybrid approaches. The results of the experiments show that the implemented system, with the addition of pre-Bilingual Mapping query expansion, manages to improve the performance of the CLIR task.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116701374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effect of Preprocessing for Distributed Representations: Case Study of Japanese Radiology Reports 预处理对分布式表示的影响:以日本放射学报告为例
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037678
Taro Tada, Kazuhide Yamamoto
A radiology report is a medical document based on an examination image in a hospital. However, the preparation of this report is a burden on busy physicians. To support them, a retrieval system of past documents to prepare radiology reports is required. In recent years, distributed representation has been used in various NLP tasks and its usefulness has been demonstrated. However, there is not much research about Japanese medical documents that use distributed representations. In this study, we investigate preprocessing on a retrieval system with a distributed representation of the radiology report, as a first step. As a result, we confirmed that in word segmentation using Morphological analyzer and dictionaries, medical terms in radiology reports are not handled as long nouns, but are more effective as shorter nouns like subwords. We also confirmed that text segmentation by SentencePiece to obtain sentence distributed representation reflects more sentence characteristics. Furthermore, by removing some phrases from the radiology report based on frequency, we were able to reflect the characteristics of the document and avoid unnecessary high similarity between documents. It was confirmed that preprocessing was effective in this task.
放射学报告是基于医院检查图像的医学文件。然而,这份报告的编写对忙碌的医生来说是一种负担。为了支持他们,需要一个过去文件的检索系统来准备放射学报告。近年来,分布式表示已被应用于各种NLP任务中,其实用性已得到证明。然而,关于使用分布式表示的日本医疗文件的研究并不多。在这项研究中,我们研究了一个检索系统的预处理与一个分布式表示的放射学报告,作为第一步。结果,我们证实了在使用形态学分析器和字典进行分词时,放射学报告中的医学术语不是作为长名词处理,而是作为子词等较短的名词更有效。我们还证实了通过sensenepece进行文本分割得到的句子分布式表示能够反映更多的句子特征。此外,通过基于频率从放射学报告中删除一些短语,我们能够反映文档的特征,并避免文档之间不必要的高相似性。验证了预处理在该任务中的有效性。
{"title":"Effect of Preprocessing for Distributed Representations: Case Study of Japanese Radiology Reports","authors":"Taro Tada, Kazuhide Yamamoto","doi":"10.1109/IALP48816.2019.9037678","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037678","url":null,"abstract":"A radiology report is a medical document based on an examination image in a hospital. However, the preparation of this report is a burden on busy physicians. To support them, a retrieval system of past documents to prepare radiology reports is required. In recent years, distributed representation has been used in various NLP tasks and its usefulness has been demonstrated. However, there is not much research about Japanese medical documents that use distributed representations. In this study, we investigate preprocessing on a retrieval system with a distributed representation of the radiology report, as a first step. As a result, we confirmed that in word segmentation using Morphological analyzer and dictionaries, medical terms in radiology reports are not handled as long nouns, but are more effective as shorter nouns like subwords. We also confirmed that text segmentation by SentencePiece to obtain sentence distributed representation reflects more sentence characteristics. Furthermore, by removing some phrases from the radiology report based on frequency, we were able to reflect the characteristics of the document and avoid unnecessary high similarity between documents. It was confirmed that preprocessing was effective in this task.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125064691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effect of Music Training on the Production of English Lexical Stress by Chinese English Learners 音乐训练对中国英语学习者英语词汇重音产生的影响
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037685
Hui Feng, Jie Lian, Ying Zhao
Under the guidance of the Theory of Multiple Intelligences, this study aims to find whether music training can improve English stress production among Chinese English learners without music background. Major findings are as follows. (1) In stress production, music training has significant influence on the stress production by Chinese English learners. Specifically, after music training, there has been evident improvement in pitch and intensity in the training group in distinguishing stressed and unstressed syllables in disyllabic pseudowords. Besides, the accuracy of the production of unfamiliar words in the training group has increased by 11.5% on average, compared with that of the control group which has little change. In addition, little effect of music training on duration proportion in stressed syllables is found in this experiment. (2) Chinese English learners’ perception of music can be positively transferred to their production of English lexical stress. Such findings provide further proof for the effect of music training on the production of English lexical stress, and propose a method for Chinese English learners to improve their English pronunciation.
在多元智能理论的指导下,本研究旨在探讨音乐训练是否能改善无音乐背景的中国英语学习者的英语重音产生。主要研究结果如下。(1)在重音产生方面,音乐训练对中国英语学习者的重音产生有显著影响。具体来说,经过音乐训练后,训练组在区分双音节假词的重音和非重音音节方面的音高和强度都有明显提高。此外,与对照组相比,训练组产生不熟悉词的准确率平均提高了11.5%,而对照组的准确率几乎没有变化。此外,本实验发现音乐训练对重音音长比例的影响不大。(2)中国英语学习者对音乐的感知可以正向迁移到英语词汇重音的产生。这些发现为音乐训练对英语词汇重音产生的影响提供了进一步的证据,并为中国英语学习者提高英语发音提供了一种方法。
{"title":"Effect of Music Training on the Production of English Lexical Stress by Chinese English Learners","authors":"Hui Feng, Jie Lian, Ying Zhao","doi":"10.1109/IALP48816.2019.9037685","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037685","url":null,"abstract":"Under the guidance of the Theory of Multiple Intelligences, this study aims to find whether music training can improve English stress production among Chinese English learners without music background. Major findings are as follows. (1) In stress production, music training has significant influence on the stress production by Chinese English learners. Specifically, after music training, there has been evident improvement in pitch and intensity in the training group in distinguishing stressed and unstressed syllables in disyllabic pseudowords. Besides, the accuracy of the production of unfamiliar words in the training group has increased by 11.5% on average, compared with that of the control group which has little change. In addition, little effect of music training on duration proportion in stressed syllables is found in this experiment. (2) Chinese English learners’ perception of music can be positively transferred to their production of English lexical stress. Such findings provide further proof for the effect of music training on the production of English lexical stress, and propose a method for Chinese English learners to improve their English pronunciation.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122808683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automated Prediction of Item Difficulty in Reading Comprehension Using Long Short-Term Memory 利用长短期记忆自动预测阅读理解项目难度
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037716
Lin Lin, Tao-Hsing Chang, Fu-Yuan Hsu
Standardized tests are an important tool in education. During the test preparation process, the difficulty of each test item needs to be defined, which previously relied on expert validation or pretest for the most part, requiring a considerable amount of labor and cost. These problems can be overcome by using machines to predict the difficulty of the test items. In this study, long short-term memory (LSTM) will be used to predict the test item difficulty in reading comprehension. Experimental results show that the proposed method has a good prediction for agreement rate.
标准化考试是教育的重要工具。在测试准备过程中,需要对每个测试项目的难度进行定义,这在以前大部分依赖于专家验证或预测,需要相当多的人力和成本。这些问题可以通过使用机器来预测测试项目的难度来解决。本研究将长短期记忆(LSTM)用于预测阅读理解测试项目的难度。实验结果表明,该方法具有较好的一致性预测效果。
{"title":"Automated Prediction of Item Difficulty in Reading Comprehension Using Long Short-Term Memory","authors":"Lin Lin, Tao-Hsing Chang, Fu-Yuan Hsu","doi":"10.1109/IALP48816.2019.9037716","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037716","url":null,"abstract":"Standardized tests are an important tool in education. During the test preparation process, the difficulty of each test item needs to be defined, which previously relied on expert validation or pretest for the most part, requiring a considerable amount of labor and cost. These problems can be overcome by using machines to predict the difficulty of the test items. In this study, long short-term memory (LSTM) will be used to predict the test item difficulty in reading comprehension. Experimental results show that the proposed method has a good prediction for agreement rate.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114188439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
What affects the difficulty of Chinese syntax? 影响汉语句法难度的因素是什么?
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037724
Yueming Du, Lijiao Yang
The traditional measurement of sentence difficulty only focuses on lexical features but neglects syntactic features. This paper takes 800 sentences in primary school Chinese textbooks published by People's Education Press as the research object and studies their syntactic features. We use random forest to select the top five important features and then employed SVM to do the classification experiment. The precision rate, recall rate and F-scored for the classification of 5 levels are respectively 50.42%, 50.40% and 50.41%, which indicates that the features we selected has practical value for the related research.
传统的句子难度测量只关注词汇特征而忽视句法特征。本文以人民教育出版社出版的小学语文教材中的800个句子为研究对象,研究其句法特征。我们使用随机森林选择前5个重要特征,然后使用支持向量机进行分类实验。5个层次分类的准确率、召回率和f分分别为50.42%、50.40%和50.41%,表明我们选取的特征对相关研究具有实用价值。
{"title":"What affects the difficulty of Chinese syntax?","authors":"Yueming Du, Lijiao Yang","doi":"10.1109/IALP48816.2019.9037724","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037724","url":null,"abstract":"The traditional measurement of sentence difficulty only focuses on lexical features but neglects syntactic features. This paper takes 800 sentences in primary school Chinese textbooks published by People's Education Press as the research object and studies their syntactic features. We use random forest to select the top five important features and then employed SVM to do the classification experiment. The precision rate, recall rate and F-scored for the classification of 5 levels are respectively 50.42%, 50.40% and 50.41%, which indicates that the features we selected has practical value for the related research.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129959442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving text simplification by corpus expansion with unsupervised learning 利用无监督学习的语料库扩展改进文本简化
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037567
Akihiro Katsuta, Kazuhide Yamamoto
Automatic sentence simplification aims to reduce the complexity of vocabulary and expressions in a sentence while retaining its original meaning. We constructed a simplification model that does not require a parallel corpus using an unsupervised translation model. In order to learn simplification by unsupervised manner, we show that pseudo-corpus is constructed from the web corpus and that the corpus expansion contributes to output more simplified sentences. In addition, we confirm that it is possible to learn the operation of simplification by preparing large-scale pseudo data even if there is non-parallel corpus for simplification.
句子自动简化的目的是在保留句子原意的同时,减少句子中词汇和表达的复杂性。我们使用无监督翻译模型构建了一个不需要并行语料库的简化模型。为了通过无监督的方式学习简化,我们证明了伪语料库是由网络语料库构建的,语料库的扩展有助于输出更多的简化句子。此外,我们证实了即使存在非并行的简化语料库,也可以通过准备大规模的伪数据来学习简化操作。
{"title":"Improving text simplification by corpus expansion with unsupervised learning","authors":"Akihiro Katsuta, Kazuhide Yamamoto","doi":"10.1109/IALP48816.2019.9037567","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037567","url":null,"abstract":"Automatic sentence simplification aims to reduce the complexity of vocabulary and expressions in a sentence while retaining its original meaning. We constructed a simplification model that does not require a parallel corpus using an unsupervised translation model. In order to learn simplification by unsupervised manner, we show that pseudo-corpus is constructed from the web corpus and that the corpus expansion contributes to output more simplified sentences. In addition, we confirm that it is possible to learn the operation of simplification by preparing large-scale pseudo data even if there is non-parallel corpus for simplification.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"46 26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123345932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Carrier Sentence Selection with Word and Context Embeddings 带有词和上下文嵌入的载体句选择
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037727
C. Y. Yeung, J. Lee, Benjamin Ka-Yin T'sou
This paper presents the first data-driven model for selecting carrier sentences with word and context embeddings. In computer-assisted language learning systems, fill-in-the-blank items help users review or learn new vocabulary. A crucial step in automatic generation of fill-in-the-blank items is the selection of carrier sentences that illustrate the usage and meaning of the target word. Previous approaches for carrier sentence selection have mostly relied on features related to sentence length, vocabulary difficulty and word association strength. We train a statistical classifier on a large-scale, automatically constructed corpus of sample carrier sentences for learning Chinese as a foreign language, and use it to predict the suitability of a candidate carrier sentence for a target word. Human evaluation shows that our approach leads to substantial improvement over a word co-occurrence heuristic, and that context embeddings further enhance selection performance.
本文提出了第一个数据驱动模型,用于选择带有词和上下文嵌入的载体句。在计算机辅助语言学习系统中,填空项目可以帮助用户复习或学习新词汇。自动生成填空题的一个关键步骤是选择能够说明目标词的用法和含义的载体句。以往的载体句选择方法主要依赖于与句子长度、词汇难度和单词联想强度相关的特征。我们在一个大规模的、自动构建的用于对外汉语学习的样本载体句语料库上训练了一个统计分类器,并使用它来预测候选载体句对目标词的适用性。人类的评估表明,我们的方法比词共现启发式有了实质性的改进,并且上下文嵌入进一步提高了选择性能。
{"title":"Carrier Sentence Selection with Word and Context Embeddings","authors":"C. Y. Yeung, J. Lee, Benjamin Ka-Yin T'sou","doi":"10.1109/IALP48816.2019.9037727","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037727","url":null,"abstract":"This paper presents the first data-driven model for selecting carrier sentences with word and context embeddings. In computer-assisted language learning systems, fill-in-the-blank items help users review or learn new vocabulary. A crucial step in automatic generation of fill-in-the-blank items is the selection of carrier sentences that illustrate the usage and meaning of the target word. Previous approaches for carrier sentence selection have mostly relied on features related to sentence length, vocabulary difficulty and word association strength. We train a statistical classifier on a large-scale, automatically constructed corpus of sample carrier sentences for learning Chinese as a foreign language, and use it to predict the suitability of a candidate carrier sentence for a target word. Human evaluation shows that our approach leads to substantial improvement over a word co-occurrence heuristic, and that context embeddings further enhance selection performance.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131933070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Acquisition of Knowledge with Time Information from Twitter 从Twitter获取知识与时间信息
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037659
Kohei Yamamoto, Kazutaka Shimada
In this paper, we propose a knowledge acquisition method for non-task-oriented dialogue systems. Such dialogue systems need a wide variety of knowledge for generating appropriate and sophisticated responses. However, constructing such knowledge is costly. To solve this problem, we focus on a relation about each tweet and the posted time. First, we extract event words, such as verbs, from tweets. Second, we generate frequency distribution for five different time divisions: e.g., a monthly basis. Then, we remove burst words on the basis of variance for obtaining refined distributions. We checked high ranked words in each time division. As a result, we obtained not only common sense things such as “sleep” in night but also interesting activities such as “recruit” in April and May (April is the beginning of the recruitment process for the new year in Japan.) and “raise the spirits/plow into” around 9 AM for inspiring oneself at the beginning of his/her work of the day. In addition, the knowledge that our method extracts probably contributes to not only dialogue systems but also text mining and behavior analysis of data on social media and so on.
本文提出了一种非任务导向对话系统的知识获取方法。这种对话系统需要广泛的知识来产生适当和复杂的反应。然而,构建这样的知识是昂贵的。为了解决这个问题,我们关注每条tweet和发布时间的关系。首先,我们从tweet中提取事件词,比如动词。其次,我们为五个不同的时间段生成频率分布:例如,每月一次。然后,我们在方差的基础上去除突发词,得到精细化的分布。我们在每个时间段检查排名靠前的单词。因此,我们不仅获得了像晚上“睡觉”这样的常识,而且还获得了像4月和5月的“招聘”(4月是日本新年招聘过程的开始)和早上9点左右的“提振精神/犁地”等有趣的活动,以激励自己开始一天的工作。此外,我们的方法提取的知识可能不仅有助于对话系统,还有助于社交媒体数据的文本挖掘和行为分析等。
{"title":"Acquisition of Knowledge with Time Information from Twitter","authors":"Kohei Yamamoto, Kazutaka Shimada","doi":"10.1109/IALP48816.2019.9037659","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037659","url":null,"abstract":"In this paper, we propose a knowledge acquisition method for non-task-oriented dialogue systems. Such dialogue systems need a wide variety of knowledge for generating appropriate and sophisticated responses. However, constructing such knowledge is costly. To solve this problem, we focus on a relation about each tweet and the posted time. First, we extract event words, such as verbs, from tweets. Second, we generate frequency distribution for five different time divisions: e.g., a monthly basis. Then, we remove burst words on the basis of variance for obtaining refined distributions. We checked high ranked words in each time division. As a result, we obtained not only common sense things such as “sleep” in night but also interesting activities such as “recruit” in April and May (April is the beginning of the recruitment process for the new year in Japan.) and “raise the spirits/plow into” around 9 AM for inspiring oneself at the beginning of his/her work of the day. In addition, the knowledge that our method extracts probably contributes to not only dialogue systems but also text mining and behavior analysis of data on social media and so on.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129961694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Aspect-based Opinion Mining for Code-Mixed Restaurant Reviews in Indonesia 基于方面的意见挖掘在印度尼西亚的代码混合餐厅评论
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037689
Andi Suciati, I. Budi
The goal of opinion mining is to extract the sentiment, emotions, or judgement of reviews and classified it. These reviews are very important because they can affect the decision-making from a person. In this paper, we conducted an aspect-based opinion mining research using customer reviews of restaurants in Indonesia and we focused into analyzing the code-mixed dataset. The evaluation conducted by making four scenarios namely removing stopwords without stemming, without removing stopwords but with stemming, without removing stopwords and stemming, and preprocessing with removing stopwords and stemming. We compared five algorithms which are Random Forest (RF), Multinomial Naive Bayes (NB), Logistic Regression (LR), Decision Tree (DT), and Extra Tree classifier (ET). The models were evaluated by using 10 folds cross validation, and the results show that all aspects achieved highest scores with different algorithms. LR achieved highest score for food (81.76%) and ambience (77.29%) aspects while the highest score for price (78.71%) and service (85.07%) aspects were obtained by DT.
意见挖掘的目标是提取评论的情绪、情感或判断并对其进行分类。这些评估非常重要,因为它们会影响一个人的决策。在本文中,我们使用印度尼西亚餐馆的顾客评论进行了基于方面的意见挖掘研究,并重点分析了代码混合数据集。通过设置不提取停词、不提取停词但提取词干、不提取停词并提取词干、提取停词并提取预处理四种场景进行评价。我们比较了随机森林(RF)、多项朴素贝叶斯(NB)、逻辑回归(LR)、决策树(DT)和额外树分类器(ET)这五种算法。采用10倍交叉验证对模型进行评价,结果表明,不同算法下,模型各方面得分最高。LR在食物(81.76%)和环境(77.29%)方面得分最高,DT在价格(78.71%)和服务(85.07%)方面得分最高。
{"title":"Aspect-based Opinion Mining for Code-Mixed Restaurant Reviews in Indonesia","authors":"Andi Suciati, I. Budi","doi":"10.1109/IALP48816.2019.9037689","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037689","url":null,"abstract":"The goal of opinion mining is to extract the sentiment, emotions, or judgement of reviews and classified it. These reviews are very important because they can affect the decision-making from a person. In this paper, we conducted an aspect-based opinion mining research using customer reviews of restaurants in Indonesia and we focused into analyzing the code-mixed dataset. The evaluation conducted by making four scenarios namely removing stopwords without stemming, without removing stopwords but with stemming, without removing stopwords and stemming, and preprocessing with removing stopwords and stemming. We compared five algorithms which are Random Forest (RF), Multinomial Naive Bayes (NB), Logistic Regression (LR), Decision Tree (DT), and Extra Tree classifier (ET). The models were evaluated by using 10 folds cross validation, and the results show that all aspects achieved highest scores with different algorithms. LR achieved highest score for food (81.76%) and ambience (77.29%) aspects while the highest score for price (78.71%) and service (85.07%) aspects were obtained by DT.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133888832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Using Convolutional Neural Network with BERT for Intent Determination 基于BERT的卷积神经网络意图判定
Pub Date : 2019-11-01 DOI: 10.1109/IALP48816.2019.9037668
Changai He, Sibao Chen, Shilei Huang, Jian Zhang, Xiao Song
We propose an Intent Determination (ID) method by combining the single-layer Convolutional Neural Network (CNN) with the Bidirectional Encoder Representations from Transformers (BERT). The ID task is usually treated as a classification issue and the user’s query statement is usually of short text type. It has been proven that CNN is suitable for conducting short text classification tasks. We utilize BERT as a sentence encoder, which can accurately get the context representation of a sentence. Our method improves the performance of ID with the powerful ability to capture semantic and long-distance dependencies in sentences. Our experimental results demonstrate that our model outperforms the state-of-the-art approach and improves the accuracy of 0.67% on the ATIS dataset. On the ground truth of the Chinese dataset, as the intent granularity increases, our method improves the accuracy by 15.99%, 4.75%, 4.69%, 6.29%, and 4.12% compared to the baseline.
我们提出了一种将单层卷积神经网络(CNN)与来自变压器的双向编码器表示(BERT)相结合的意图确定(ID)方法。ID任务通常被视为分类问题,用户的查询语句通常是短文本类型。事实证明,CNN适合进行短文本分类任务。我们使用BERT作为句子编码器,它可以准确地获得句子的上下文表示。我们的方法通过捕获句子中的语义和远程依赖关系的强大能力提高了ID的性能。我们的实验结果表明,我们的模型优于最先进的方法,并将ATIS数据集的准确率提高了0.67%。在中文数据集的ground truth上,随着意图粒度的增加,我们的方法相对于基线的准确率分别提高了15.99%、4.75%、4.69%、6.29%和4.12%。
{"title":"Using Convolutional Neural Network with BERT for Intent Determination","authors":"Changai He, Sibao Chen, Shilei Huang, Jian Zhang, Xiao Song","doi":"10.1109/IALP48816.2019.9037668","DOIUrl":"https://doi.org/10.1109/IALP48816.2019.9037668","url":null,"abstract":"We propose an Intent Determination (ID) method by combining the single-layer Convolutional Neural Network (CNN) with the Bidirectional Encoder Representations from Transformers (BERT). The ID task is usually treated as a classification issue and the user’s query statement is usually of short text type. It has been proven that CNN is suitable for conducting short text classification tasks. We utilize BERT as a sentence encoder, which can accurately get the context representation of a sentence. Our method improves the performance of ID with the powerful ability to capture semantic and long-distance dependencies in sentences. Our experimental results demonstrate that our model outperforms the state-of-the-art approach and improves the accuracy of 0.67% on the ATIS dataset. On the ground truth of the Chinese dataset, as the intent granularity increases, our method improves the accuracy by 15.99%, 4.75%, 4.69%, 6.29%, and 4.12% compared to the baseline.","PeriodicalId":208066,"journal":{"name":"2019 International Conference on Asian Language Processing (IALP)","volume":"67 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131896266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
2019 International Conference on Asian Language Processing (IALP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1