印尼语短文自动评分文本预处理技术的实验研究

U. Hasanah, Tri Astuti, R. Wahyudi, Zanuar Rifai, Rilas Agung Pambudi
{"title":"印尼语短文自动评分文本预处理技术的实验研究","authors":"U. Hasanah, Tri Astuti, R. Wahyudi, Zanuar Rifai, Rilas Agung Pambudi","doi":"10.1109/ICITISEE.2018.8720957","DOIUrl":null,"url":null,"abstract":"The preprocessing phase in information retrieval is intended to reduce the size of the text. Previous studies have used many preprocessing techniques in several applications such as Clustering, Classification, Document Indexing, Summarization, and Automatic Essay Grading. In this study we aim to conduct an experimental study to measure the effectiveness of preprocessing techniques in Automatic Short Answer Grading (ASAG) using questions and answers in Indonesian. As previously known, Indonesian has a different morphology from English. With the limitations of Indonesian language processing tools, we are working on several processing techniques that can be used, such as Case Folding, Tokenization, Punctuation Removal, Stopword Removal, and Stemming. We use data consisting of 6 questions and each question answered by 32 students. As a reference answer, we will use one teacher’s answer on each question. Technically, we conducted two types of experimental studies. In the first experiment, we carried out two types of pre-processing techniques, namely Punctuation Removal and Tokenization. In the second experiment, we added three other preprocessing techniques, namely Case Folding, Stemming, and Stopword Removal. We measure the similarity values of teacher and student answers using the Cosine Similarity method. Next, we calculated the correlation values and Mean Absolute Error to measure the effectiveness of the preprocessing techniques that have been used. In the end, the results of the paired-samples t-test showed that there were no significant differences in the two experiments.","PeriodicalId":180051,"journal":{"name":"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian\",\"authors\":\"U. Hasanah, Tri Astuti, R. Wahyudi, Zanuar Rifai, Rilas Agung Pambudi\",\"doi\":\"10.1109/ICITISEE.2018.8720957\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The preprocessing phase in information retrieval is intended to reduce the size of the text. Previous studies have used many preprocessing techniques in several applications such as Clustering, Classification, Document Indexing, Summarization, and Automatic Essay Grading. In this study we aim to conduct an experimental study to measure the effectiveness of preprocessing techniques in Automatic Short Answer Grading (ASAG) using questions and answers in Indonesian. As previously known, Indonesian has a different morphology from English. With the limitations of Indonesian language processing tools, we are working on several processing techniques that can be used, such as Case Folding, Tokenization, Punctuation Removal, Stopword Removal, and Stemming. We use data consisting of 6 questions and each question answered by 32 students. As a reference answer, we will use one teacher’s answer on each question. Technically, we conducted two types of experimental studies. In the first experiment, we carried out two types of pre-processing techniques, namely Punctuation Removal and Tokenization. In the second experiment, we added three other preprocessing techniques, namely Case Folding, Stemming, and Stopword Removal. We measure the similarity values of teacher and student answers using the Cosine Similarity method. Next, we calculated the correlation values and Mean Absolute Error to measure the effectiveness of the preprocessing techniques that have been used. In the end, the results of the paired-samples t-test showed that there were no significant differences in the two experiments.\",\"PeriodicalId\":180051,\"journal\":{\"name\":\"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITISEE.2018.8720957\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITISEE.2018.8720957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

信息检索中的预处理阶段旨在减小文本的大小。以往的研究在聚类、分类、文献索引、摘要和论文自动评分等应用中使用了许多预处理技术。在这项研究中,我们的目的是进行一项实验研究,以衡量使用印尼语问答的自动简答评分(ASAG)预处理技术的有效性。如前所述,印尼语与英语有不同的词法。由于印度尼西亚语言处理工具的局限性,我们正在研究几种可以使用的处理技术,如Case折叠、Tokenization、标点删除、Stopword删除和词干提取。我们使用的数据由6个问题组成,每个问题由32名学生回答。作为参考答案,我们将在每个问题上使用一位老师的答案。从技术上讲,我们进行了两类实验研究。在第一个实验中,我们进行了两种预处理技术,即标点符号去除和Tokenization。在第二个实验中,我们添加了另外三种预处理技术,即Case折叠、词干提取和停词去除。我们使用余弦相似度法测量教师和学生答案的相似度值。接下来,我们计算相关值和平均绝对误差来衡量所使用的预处理技术的有效性。最后,配对样本t检验的结果显示,两个实验没有显著差异。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An Experimental Study of Text Preprocessing Techniques for Automatic Short Answer Grading in Indonesian
The preprocessing phase in information retrieval is intended to reduce the size of the text. Previous studies have used many preprocessing techniques in several applications such as Clustering, Classification, Document Indexing, Summarization, and Automatic Essay Grading. In this study we aim to conduct an experimental study to measure the effectiveness of preprocessing techniques in Automatic Short Answer Grading (ASAG) using questions and answers in Indonesian. As previously known, Indonesian has a different morphology from English. With the limitations of Indonesian language processing tools, we are working on several processing techniques that can be used, such as Case Folding, Tokenization, Punctuation Removal, Stopword Removal, and Stemming. We use data consisting of 6 questions and each question answered by 32 students. As a reference answer, we will use one teacher’s answer on each question. Technically, we conducted two types of experimental studies. In the first experiment, we carried out two types of pre-processing techniques, namely Punctuation Removal and Tokenization. In the second experiment, we added three other preprocessing techniques, namely Case Folding, Stemming, and Stopword Removal. We measure the similarity values of teacher and student answers using the Cosine Similarity method. Next, we calculated the correlation values and Mean Absolute Error to measure the effectiveness of the preprocessing techniques that have been used. In the end, the results of the paired-samples t-test showed that there were no significant differences in the two experiments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Class Diagram Similarity Measurement: A Different Approach Implementation of QR Code and Imei on Android and Web-Based Student Presence Systems Robustness Analysis of PI Controller to Constant Output Power with Dynamic Load Condition in DC Nanogrid System Indonesian Sign Language Recognition Application For Two-Way Communication Deaf-Mute People Comparison Study of Deep Learning and Time Series for Bioelectric Potential Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1