数字助教生成的解决独特编程练习的程序源代码数据集

IF 2.7 3区 物理与天体物理 Q2 PHYSICS, ATOMIC, MOLECULAR & CHEMICAL Atomic Data and Nuclear Data Tables Pub Date : 2023-06-14 DOI:10.3390/data8060109
Liliya A. Demidova, E. Andrianova, Peter N. Sovietov, A. Gorchakov
{"title":"数字助教生成的解决独特编程练习的程序源代码数据集","authors":"Liliya A. Demidova, E. Andrianova, Peter N. Sovietov, A. Gorchakov","doi":"10.3390/data8060109","DOIUrl":null,"url":null,"abstract":"This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students.","PeriodicalId":55580,"journal":{"name":"Atomic Data and Nuclear Data Tables","volume":"1 1","pages":"109"},"PeriodicalIF":2.7000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant\",\"authors\":\"Liliya A. Demidova, E. Andrianova, Peter N. Sovietov, A. Gorchakov\",\"doi\":\"10.3390/data8060109\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students.\",\"PeriodicalId\":55580,\"journal\":{\"name\":\"Atomic Data and Nuclear Data Tables\",\"volume\":\"1 1\",\"pages\":\"109\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2023-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Atomic Data and Nuclear Data Tables\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/data8060109\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, ATOMIC, MOLECULAR & CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Atomic Data and Nuclear Data Tables","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/data8060109","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, ATOMIC, MOLECULAR & CHEMICAL","Score":null,"Total":0}
引用次数: 3

摘要

本文提出了一个包含自动收集的源代码的数据集,用于解决不同类型的独特编程练习。编程练习是由数字教学助理(DTA)系统自动生成的,该系统自动化了MIREA -俄罗斯技术大学(RTU MIREA)的大型Python编程课程。按解决任务类型分组的小程序源代码可用于对源代码分类和聚类算法进行基准测试。此外,这些数据还可以用于训练智能程序合成器或对标突变测试框架,还有更多的应用有待发现。我们描述了DTA系统的架构,旨在提供关于如何以及为什么收集数据集的详细见解。此外,我们还描述了DTA系统中负责源代码分析的算法。这些算法使用基于马尔可夫链的程序向量表示,计算程序的成对Jensen-Shannon散度,并应用分层聚类算法,以便在解决独特任务时自动发现学生使用的高级概念。当需要确定学生实施的方法时,建议的方法可以合并到大规模编程课程中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant
This paper presents a dataset containing automatically collected source codes solving unique programming exercises of different types. The programming exercises were automatically generated by the Digital Teaching Assistant (DTA) system that automates a massive Python programming course at MIREA—Russian Technological University (RTU MIREA). Source codes of the small programs grouped by the type of the solved task can be used for benchmarking source code classification and clustering algorithms. Moreover, the data can be used for training intelligent program synthesizers or benchmarking mutation testing frameworks, and more applications are yet to be discovered. We describe the architecture of the DTA system, aiming to provide detailed insight regarding how and why the dataset was collected. In addition, we describe the algorithms responsible for source code analysis in the DTA system. These algorithms use vector representations of programs based on Markov chains, compute pairwise Jensen–Shannon divergences of programs, and apply hierarchical clustering algorithms in order to automatically discover high-level concepts used by students while solving unique tasks. The proposed approach can be incorporated into massive programming courses when there is a need to identify approaches implemented by students.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Atomic Data and Nuclear Data Tables
Atomic Data and Nuclear Data Tables 物理-物理:核物理
CiteScore
4.50
自引率
11.10%
发文量
27
审稿时长
47 days
期刊介绍: Atomic Data and Nuclear Data Tables presents compilations of experimental and theoretical information in atomic physics, nuclear physics, and closely related fields. The journal is devoted to the publication of tables and graphs of general usefulness to researchers in both basic and applied areas. Extensive ... click here for full Aims & Scope Atomic Data and Nuclear Data Tables presents compilations of experimental and theoretical information in atomic physics, nuclear physics, and closely related fields. The journal is devoted to the publication of tables and graphs of general usefulness to researchers in both basic and applied areas. Extensive and comprehensive compilations of experimental and theoretical results are featured.
期刊最新文献
Editorial Board Subshell gaps and onsets of collectivity from proton and neutron pairing gap correlations Matrix elements for spin-orbit couplings in KRb Fine structure transitions with spectral features in Fe V and Fe VI Editorial Board
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1