PDF数学练习的自动标记与词N-grams VSM分类

IF 6.7 Q1 EDUCATION & EDUCATIONAL RESEARCH Smart Learning Environments Pub Date : 2023-10-18 DOI:10.1186/s40561-023-00271-9
Taisei Yamauchi, Brendan Flanagan, Ryosuke Nakamoto, Yiling Dai, Kyosuke Takami, Hiroaki Ogata
{"title":"PDF数学练习的自动标记与词N-grams VSM分类","authors":"Taisei Yamauchi, Brendan Flanagan, Ryosuke Nakamoto, Yiling Dai, Kyosuke Takami, Hiroaki Ogata","doi":"10.1186/s40561-023-00271-9","DOIUrl":null,"url":null,"abstract":"Abstract In recent years, smart learning environments have become central to modern education and support students and instructors through tools based on prediction and recommendation models. These methods often use learning material metadata, such as the knowledge contained in an exercise which is usually labeled by domain experts and is costly and difficult to scale. It recognizes that automated labeling eases the workload on experts, as seen in previous studies using automatic classification algorithms for research papers and Japanese mathematical exercises. However, these studies didn’t delve into fine-grained labeling. In addition to that, as the use of materials in the system becomes more widespread, paper materials are transformed into PDF formats, which can lead to incomplete extraction. However, there is less emphasis on labeling incomplete mathematical sentences to tackle this problem in the previous research. This study aims to achieve precise automated classification even from incomplete text inputs. To tackle these challenges, we propose a mathematical exercise labeling algorithm that can handle detailed labels, even for incomplete sentences, using word n-grams, compared to the state-of-the-art word embedding method. The results of the experiment show that mono-gram features with Random Forest models achieved the best performance with a macro F-measure of 92.50%, 61.28% for 24-class labeling and 297-class labeling tasks, respectively. The contribution of this research is showing that the proposed method based on traditional simple n-grams has the ability to find context-independent similarities in incomplete sentences and outperforms state-of-the-art word embedding methods in specific tasks like classifying short and incomplete texts.","PeriodicalId":21774,"journal":{"name":"Smart Learning Environments","volume":"24 1","pages":"0"},"PeriodicalIF":6.7000,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated labeling of PDF mathematical exercises with word N-grams VSM classification\",\"authors\":\"Taisei Yamauchi, Brendan Flanagan, Ryosuke Nakamoto, Yiling Dai, Kyosuke Takami, Hiroaki Ogata\",\"doi\":\"10.1186/s40561-023-00271-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In recent years, smart learning environments have become central to modern education and support students and instructors through tools based on prediction and recommendation models. These methods often use learning material metadata, such as the knowledge contained in an exercise which is usually labeled by domain experts and is costly and difficult to scale. It recognizes that automated labeling eases the workload on experts, as seen in previous studies using automatic classification algorithms for research papers and Japanese mathematical exercises. However, these studies didn’t delve into fine-grained labeling. In addition to that, as the use of materials in the system becomes more widespread, paper materials are transformed into PDF formats, which can lead to incomplete extraction. However, there is less emphasis on labeling incomplete mathematical sentences to tackle this problem in the previous research. This study aims to achieve precise automated classification even from incomplete text inputs. To tackle these challenges, we propose a mathematical exercise labeling algorithm that can handle detailed labels, even for incomplete sentences, using word n-grams, compared to the state-of-the-art word embedding method. The results of the experiment show that mono-gram features with Random Forest models achieved the best performance with a macro F-measure of 92.50%, 61.28% for 24-class labeling and 297-class labeling tasks, respectively. The contribution of this research is showing that the proposed method based on traditional simple n-grams has the ability to find context-independent similarities in incomplete sentences and outperforms state-of-the-art word embedding methods in specific tasks like classifying short and incomplete texts.\",\"PeriodicalId\":21774,\"journal\":{\"name\":\"Smart Learning Environments\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2023-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Smart Learning Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40561-023-00271-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart Learning Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40561-023-00271-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

摘要

近年来,智能学习环境已成为现代教育的核心,并通过基于预测和推荐模型的工具为学生和教师提供支持。这些方法通常使用学习材料元数据,例如通常由领域专家标记的练习中包含的知识,并且成本高且难以扩展。它认识到自动标记减轻了专家的工作量,正如在以前的研究中使用自动分类算法进行研究论文和日本数学练习所看到的那样。然而,这些研究并没有深入研究细粒度的标签。除此之外,随着系统中材料的使用越来越广泛,纸质材料被转换为PDF格式,这可能导致提取不完整。然而,在以往的研究中,对不完整数学句子标注的重视程度较低。本研究旨在从不完整的文本输入中实现精确的自动分类。为了解决这些挑战,我们提出了一种数学练习标记算法,与最先进的单词嵌入方法相比,该算法可以使用单词n-grams处理详细的标签,甚至可以处理不完整的句子。实验结果表明,单图特征与随机森林模型在24类和297类标注任务上的宏观f测度分别为92.50%和61.28%,达到了最佳性能。这项研究的贡献在于,基于传统的简单n-grams的方法能够在不完整的句子中找到与上下文无关的相似性,并且在对短文本和不完整文本进行分类等特定任务中优于最先进的单词嵌入方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automated labeling of PDF mathematical exercises with word N-grams VSM classification
Abstract In recent years, smart learning environments have become central to modern education and support students and instructors through tools based on prediction and recommendation models. These methods often use learning material metadata, such as the knowledge contained in an exercise which is usually labeled by domain experts and is costly and difficult to scale. It recognizes that automated labeling eases the workload on experts, as seen in previous studies using automatic classification algorithms for research papers and Japanese mathematical exercises. However, these studies didn’t delve into fine-grained labeling. In addition to that, as the use of materials in the system becomes more widespread, paper materials are transformed into PDF formats, which can lead to incomplete extraction. However, there is less emphasis on labeling incomplete mathematical sentences to tackle this problem in the previous research. This study aims to achieve precise automated classification even from incomplete text inputs. To tackle these challenges, we propose a mathematical exercise labeling algorithm that can handle detailed labels, even for incomplete sentences, using word n-grams, compared to the state-of-the-art word embedding method. The results of the experiment show that mono-gram features with Random Forest models achieved the best performance with a macro F-measure of 92.50%, 61.28% for 24-class labeling and 297-class labeling tasks, respectively. The contribution of this research is showing that the proposed method based on traditional simple n-grams has the ability to find context-independent similarities in incomplete sentences and outperforms state-of-the-art word embedding methods in specific tasks like classifying short and incomplete texts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Smart Learning Environments
Smart Learning Environments Social Sciences-Education
CiteScore
13.20
自引率
2.10%
发文量
29
审稿时长
19 weeks
期刊最新文献
The role of telecollaboration in English language teacher education: a systematic review Revealing the true potential and prospects of augmented reality in education Designing and evaluating an augmented reality system for an engineering drawing course Student comprehension of biochemistry in a flipped classroom format. Challenges and opportunities of AI in inclusive education: a case study of data-enhanced active reading in Japan
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1