Identifying Optimal Baseline Variant of Unsupervised Term Weighting in Question Classification Based on Bloom Taxonomy

Mendel Pub Date : 2022-06-30 DOI:10.13164/mendel.2022.1.008
A. Sangodiah, Tham Jee San, Yong Tien Fui, Lim Ean Heng, R. Ayyasamy, Norazira A Jalil
{"title":"Identifying Optimal Baseline Variant of Unsupervised Term Weighting in Question Classification Based on Bloom Taxonomy","authors":"A. Sangodiah, Tham Jee San, Yong Tien Fui, Lim Ean Heng, R. Ayyasamy, Norazira A Jalil","doi":"10.13164/mendel.2022.1.008","DOIUrl":null,"url":null,"abstract":"Examination is one of the common ways to evaluate the students’ cognitive levels in higher education institutions. Exam questions are labeled manually by educators in accordance with Bloom’s taxonomy cognitive domain. To ease the burden of the educators, several past research works have proposed the automated question classification based on Bloom’s taxonomy using the machine learning technique. Feature selection, feature extraction and term weighting are common ways to improve the accuracy of question classification. Commonly used term weighting method in the past work is unsupervised namely TF and TF-IDF. There are several variants of TF and TFIDF and the most optimal variant has yet to be identified in the context of question classification based on BT. Therefore, this paper aims to study the TF, TF-IDF and normalized TF-IDF variants and identify the optimal variant that can enhance the exam question classification accuracy. To investigate the variants two different classifiers were used, which are Support Vector Machine (SVM) and Naïve Bayes. The average accuracies achieved by TF-IDF and normalized TF-IDF variants using SVM classifier were 64.3% and 72.4% respectively, while using Naïve Bayes classifier the average accuracies for TF-IDF and normalized TF-IDF were 61.9% and 63.0% respectively. Generally, the normalized TF-IDF variants outperformed TF and TF-IDF variants in accuracy and F1-measure respectively. Further statistical analysis using t-test and Wilcoxon Signed also shows that the differences in accuracy between normalized TF-IDF and TF, TF-IDF are significant. The findings from this study show that the Normalized TF-IDF3 variant recorded the highest accuracy of 74.0% among normalized TF-IDF variants. Also, the differences in accuracy between Normalized TF-IDF3 and other normalized variants are generally significant, thus the optimal variant is Normalized TF-IDF3. Therefore, the normalized TF-IDF3 variant is important for benchmarking purposes, which can be used to compare with other term weighting techniques in future work.","PeriodicalId":38293,"journal":{"name":"Mendel","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mendel","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13164/mendel.2022.1.008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Examination is one of the common ways to evaluate the students’ cognitive levels in higher education institutions. Exam questions are labeled manually by educators in accordance with Bloom’s taxonomy cognitive domain. To ease the burden of the educators, several past research works have proposed the automated question classification based on Bloom’s taxonomy using the machine learning technique. Feature selection, feature extraction and term weighting are common ways to improve the accuracy of question classification. Commonly used term weighting method in the past work is unsupervised namely TF and TF-IDF. There are several variants of TF and TFIDF and the most optimal variant has yet to be identified in the context of question classification based on BT. Therefore, this paper aims to study the TF, TF-IDF and normalized TF-IDF variants and identify the optimal variant that can enhance the exam question classification accuracy. To investigate the variants two different classifiers were used, which are Support Vector Machine (SVM) and Naïve Bayes. The average accuracies achieved by TF-IDF and normalized TF-IDF variants using SVM classifier were 64.3% and 72.4% respectively, while using Naïve Bayes classifier the average accuracies for TF-IDF and normalized TF-IDF were 61.9% and 63.0% respectively. Generally, the normalized TF-IDF variants outperformed TF and TF-IDF variants in accuracy and F1-measure respectively. Further statistical analysis using t-test and Wilcoxon Signed also shows that the differences in accuracy between normalized TF-IDF and TF, TF-IDF are significant. The findings from this study show that the Normalized TF-IDF3 variant recorded the highest accuracy of 74.0% among normalized TF-IDF variants. Also, the differences in accuracy between Normalized TF-IDF3 and other normalized variants are generally significant, thus the optimal variant is Normalized TF-IDF3. Therefore, the normalized TF-IDF3 variant is important for benchmarking purposes, which can be used to compare with other term weighting techniques in future work.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于Bloom分类法的问题分类中无监督词权最优基线变量识别
考试是高等院校评估学生认知水平的常用手段之一。考试题目由教育工作者根据布鲁姆的分类认知领域手工标记。为了减轻教育工作者的负担,过去的一些研究工作提出了基于Bloom分类法的机器学习技术的自动问题分类。特征选择、特征提取和词项加权是提高问题分类准确率的常用方法。以往工作中常用的术语加权方法是无监督的,即TF和TF- idf。TF和TFIDF有多种变体,在基于BT的问题分类中尚未找到最优变体,因此,本文旨在研究TF、TF- idf和规范化TF- idf变体,并找出能够提高考试问题分类准确率的最优变体。为了研究变量,使用了两种不同的分类器,即支持向量机(SVM)和Naïve贝叶斯。使用SVM分类器对TF-IDF和归一化TF-IDF变量的平均准确率分别为64.3%和72.4%,而使用Naïve贝叶斯分类器对TF-IDF和归一化TF-IDF的平均准确率分别为61.9%和63.0%。一般来说,归一化TF- idf变体在精度和f1测量方面分别优于TF和TF- idf变体。进一步使用t检验和Wilcoxon sign进行统计分析也表明,归一化TF- idf与TF、TF- idf的准确率差异显著。本研究结果表明,归一化TF-IDF3变异在归一化TF-IDF变异中准确率最高,为74.0%。此外,归一化TF-IDF3与其他归一化变体之间的精度差异通常是显著的,因此最优变体是归一化TF-IDF3。因此,标准化的TF-IDF3变体对于基准测试非常重要,它可以用于在未来的工作中与其他术语加权技术进行比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Mendel
Mendel Decision Sciences-Decision Sciences (miscellaneous)
CiteScore
2.20
自引率
0.00%
发文量
7
期刊最新文献
Detecting Outliers Using Modified Recursive PCA Algorithm For Dynamic Streaming Data Stock and Structured Warrant Portfolio Optimization Using Black-Litterman Model and Binomial Method Optimized Fixed-Time Synergetic Controller via a modified Salp Swarm Algorithm for Acute and Chronic HBV Transmission System Initial Coin Offering Prediction Comparison Using Ridge Regression, Artificial Neural Network, Random Forest Regression, and Hybrid ANN-Ridge Predicting Football Match Outcomes with Machine Learning Approaches
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1