PDF数学练习的自动标记与词N-grams VSM分类

IF 6.7 Q1 EDUCATION & EDUCATIONAL RESEARCH Smart Learning Environments Pub Date : 2023-10-18 DOI:10.1186/s40561-023-00271-9

Taisei Yamauchi, Brendan Flanagan, Ryosuke Nakamoto, Yiling Dai, Kyosuke Takami, Hiroaki Ogata

{"title":"PDF数学练习的自动标记与词N-grams VSM分类","authors":"Taisei Yamauchi, Brendan Flanagan, Ryosuke Nakamoto, Yiling Dai, Kyosuke Takami, Hiroaki Ogata","doi":"10.1186/s40561-023-00271-9","DOIUrl":null,"url":null,"abstract":"Abstract In recent years, smart learning environments have become central to modern education and support students and instructors through tools based on prediction and recommendation models. These methods often use learning material metadata, such as the knowledge contained in an exercise which is usually labeled by domain experts and is costly and difficult to scale. It recognizes that automated labeling eases the workload on experts, as seen in previous studies using automatic classification algorithms for research papers and Japanese mathematical exercises. However, these studies didn’t delve into fine-grained labeling. In addition to that, as the use of materials in the system becomes more widespread, paper materials are transformed into PDF formats, which can lead to incomplete extraction. However, there is less emphasis on labeling incomplete mathematical sentences to tackle this problem in the previous research. This study aims to achieve precise automated classification even from incomplete text inputs. To tackle these challenges, we propose a mathematical exercise labeling algorithm that can handle detailed labels, even for incomplete sentences, using word n-grams, compared to the state-of-the-art word embedding method. The results of the experiment show that mono-gram features with Random Forest models achieved the best performance with a macro F-measure of 92.50%, 61.28% for 24-class labeling and 297-class labeling tasks, respectively. The contribution of this research is showing that the proposed method based on traditional simple n-grams has the ability to find context-independent similarities in incomplete sentences and outperforms state-of-the-art word embedding methods in specific tasks like classifying short and incomplete texts.","PeriodicalId":21774,"journal":{"name":"Smart Learning Environments","volume":"24 1","pages":"0"},"PeriodicalIF":6.7000,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated labeling of PDF mathematical exercises with word N-grams VSM classification\",\"authors\":\"Taisei Yamauchi, Brendan Flanagan, Ryosuke Nakamoto, Yiling Dai, Kyosuke Takami, Hiroaki Ogata\",\"doi\":\"10.1186/s40561-023-00271-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract In recent years, smart learning environments have become central to modern education and support students and instructors through tools based on prediction and recommendation models. These methods often use learning material metadata, such as the knowledge contained in an exercise which is usually labeled by domain experts and is costly and difficult to scale. It recognizes that automated labeling eases the workload on experts, as seen in previous studies using automatic classification algorithms for research papers and Japanese mathematical exercises. However, these studies didn’t delve into fine-grained labeling. In addition to that, as the use of materials in the system becomes more widespread, paper materials are transformed into PDF formats, which can lead to incomplete extraction. However, there is less emphasis on labeling incomplete mathematical sentences to tackle this problem in the previous research. This study aims to achieve precise automated classification even from incomplete text inputs. To tackle these challenges, we propose a mathematical exercise labeling algorithm that can handle detailed labels, even for incomplete sentences, using word n-grams, compared to the state-of-the-art word embedding method. The results of the experiment show that mono-gram features with Random Forest models achieved the best performance with a macro F-measure of 92.50%, 61.28% for 24-class labeling and 297-class labeling tasks, respectively. The contribution of this research is showing that the proposed method based on traditional simple n-grams has the ability to find context-independent similarities in incomplete sentences and outperforms state-of-the-art word embedding methods in specific tasks like classifying short and incomplete texts.\",\"PeriodicalId\":21774,\"journal\":{\"name\":\"Smart Learning Environments\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2023-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Smart Learning Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40561-023-00271-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart Learning Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40561-023-00271-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

近年来，智能学习环境已成为现代教育的核心，并通过基于预测和推荐模型的工具为学生和教师提供支持。这些方法通常使用学习材料元数据，例如通常由领域专家标记的练习中包含的知识，并且成本高且难以扩展。它认识到自动标记减轻了专家的工作量，正如在以前的研究中使用自动分类算法进行研究论文和日本数学练习所看到的那样。然而，这些研究并没有深入研究细粒度的标签。除此之外，随着系统中材料的使用越来越广泛，纸质材料被转换为PDF格式，这可能导致提取不完整。然而，在以往的研究中，对不完整数学句子标注的重视程度较低。本研究旨在从不完整的文本输入中实现精确的自动分类。为了解决这些挑战，我们提出了一种数学练习标记算法，与最先进的单词嵌入方法相比，该算法可以使用单词n-grams处理详细的标签，甚至可以处理不完整的句子。实验结果表明，单图特征与随机森林模型在24类和297类标注任务上的宏观f测度分别为92.50%和61.28%，达到了最佳性能。这项研究的贡献在于，基于传统的简单n-grams的方法能够在不完整的句子中找到与上下文无关的相似性，并且在对短文本和不完整文本进行分类等特定任务中优于最先进的单词嵌入方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Automated labeling of PDF mathematical exercises with word N-grams VSM classification

Abstract In recent years, smart learning environments have become central to modern education and support students and instructors through tools based on prediction and recommendation models. These methods often use learning material metadata, such as the knowledge contained in an exercise which is usually labeled by domain experts and is costly and difficult to scale. It recognizes that automated labeling eases the workload on experts, as seen in previous studies using automatic classification algorithms for research papers and Japanese mathematical exercises. However, these studies didn’t delve into fine-grained labeling. In addition to that, as the use of materials in the system becomes more widespread, paper materials are transformed into PDF formats, which can lead to incomplete extraction. However, there is less emphasis on labeling incomplete mathematical sentences to tackle this problem in the previous research. This study aims to achieve precise automated classification even from incomplete text inputs. To tackle these challenges, we propose a mathematical exercise labeling algorithm that can handle detailed labels, even for incomplete sentences, using word n-grams, compared to the state-of-the-art word embedding method. The results of the experiment show that mono-gram features with Random Forest models achieved the best performance with a macro F-measure of 92.50%, 61.28% for 24-class labeling and 297-class labeling tasks, respectively. The contribution of this research is showing that the proposed method based on traditional simple n-grams has the ability to find context-independent similarities in incomplete sentences and outperforms state-of-the-art word embedding methods in specific tasks like classifying short and incomplete texts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊