Dataset for Analysis of Russian-Language Reviews on MOOCs Extracted from Stepik

IF 0.7 Q4 EDUCATION, SCIENTIFIC DISCIPLINES Voprosy Obrazovaniya-Educational Studies Moscow Pub Date : 2022-01-01 DOI:10.17323/1814-9545-2022-4-298-321
Y. Dyulicheva
{"title":"Dataset for Analysis of Russian-Language Reviews on MOOCs Extracted from Stepik","authors":"Y. Dyulicheva","doi":"10.17323/1814-9545-2022-4-298-321","DOIUrl":null,"url":null,"abstract":"The article provides an overview of datasets and research areas in the field of educational data analysis based on natural language processing methods. The overview demonstrates the lack of datasets for the analysis of Russian-language reviews on MOOCs. Based on the scraping of reviews from the Stepik platform, a dataset of 5721 Russian-language reviews for MOOCs in mathematics, programming, biology, chemistry and physics was formed. A study of Russian-language reviews from the dataset was carried out based on descriptive statistics, frequency analysis of unigrams and bigrams, sentiment analysis using the dostoevsky python library with weighted F1-score for estimation accuracy of classification by sentiment as 74%. The descriptive characteristics of courses with respect to sentiments were detected based on unigrams analysis, the description of different aspects of learning content and difficulties encountered by students in learning MOOCs were detected based on bigrams analysis. The results of the sentiment analysis demonstrate the predominance of positive and neutral reviews of MOOCs in the studied dataset. The dataset is placed in the public domain Mendeley Data and will be useful to specialists in the field of text data analysis and the development of learning analytics tools.","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"26 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Voprosy Obrazovaniya-Educational Studies Moscow","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17323/1814-9545-2022-4-298-321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 1

Abstract

The article provides an overview of datasets and research areas in the field of educational data analysis based on natural language processing methods. The overview demonstrates the lack of datasets for the analysis of Russian-language reviews on MOOCs. Based on the scraping of reviews from the Stepik platform, a dataset of 5721 Russian-language reviews for MOOCs in mathematics, programming, biology, chemistry and physics was formed. A study of Russian-language reviews from the dataset was carried out based on descriptive statistics, frequency analysis of unigrams and bigrams, sentiment analysis using the dostoevsky python library with weighted F1-score for estimation accuracy of classification by sentiment as 74%. The descriptive characteristics of courses with respect to sentiments were detected based on unigrams analysis, the description of different aspects of learning content and difficulties encountered by students in learning MOOCs were detected based on bigrams analysis. The results of the sentiment analysis demonstrate the predominance of positive and neutral reviews of MOOCs in the studied dataset. The dataset is placed in the public domain Mendeley Data and will be useful to specialists in the field of text data analysis and the development of learning analytics tools.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
mooc俄语评论分析数据集(摘自Stepik)
本文概述了基于自然语言处理方法的教育数据分析领域的数据集和研究领域。概述表明缺乏用于分析mooc上俄语评论的数据集。基于从Stepik平台上收集的评论,形成了一个5721篇俄语评论的数据集,这些评论适用于数学、编程、生物、化学和物理等mooc课程。基于描述性统计、单格和双格的频率分析、使用陀思妥耶夫斯基python库的情感分析,对数据集中的俄语评论进行了研究,加权f1得分估计情感分类的准确率为74%。基于双图分析检测课程在情感方面的描述性特征,基于双图分析检测学生在mooc学习中遇到的学习内容和困难的不同方面的描述。情感分析的结果表明,在所研究的数据集中,对mooc的正面和中性评论占主导地位。该数据集位于Mendeley Data的公共领域,对文本数据分析领域的专家和学习分析工具的开发非常有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Voprosy Obrazovaniya-Educational Studies Moscow
Voprosy Obrazovaniya-Educational Studies Moscow EDUCATION, SCIENTIFIC DISCIPLINES-
CiteScore
2.20
自引率
42.90%
发文量
23
期刊最新文献
Power of Probability in Psychometrics. Review of the book “Bayesian Psychometric Modeling“ The Role of Context in Scenario-Based Tasks for Measuring Universal Skills: The Use of Generalizability Theory Experience of Using Bifactor Models to Reduce the Effects of Social Desirability on the Normative Questionnaire of Universal Competencies Psychometric Research: Modern Methods and New Opportunities for Education Psychometrics and Cognitive Research: Contradictions and Possibility for Cooperation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1