A Support Vector Machine based approach for plagiarism detection in Python code submissions in undergraduate settings

IF 2.4 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Frontiers in Computer Science Pub Date : 2024-06-13 DOI:10.3389/fcomp.2024.1393723
Nandini Gandhi, Kaushik Gopalan, Prajish Prasad
{"title":"A Support Vector Machine based approach for plagiarism detection in Python code submissions in undergraduate settings","authors":"Nandini Gandhi, Kaushik Gopalan, Prajish Prasad","doi":"10.3389/fcomp.2024.1393723","DOIUrl":null,"url":null,"abstract":"Mechanisms for plagiarism detection play a crucial role in maintaining academic integrity, acting both to penalize wrongdoing while also serving as a preemptive deterrent for bad behavior. This manuscript proposes a customized plagiarism detection algorithm tailored to detect source code plagiarism in the Python programming language. Our approach combines textual and syntactic techniques, employing a support vector machine (SVM) to effectively combine various indicators of similarity and calculate the resulting similarity scores. The algorithm was trained and tested using a sample of code submissions of 4 coding problems each from 45 volunteers; 15 of these were original submissions while the other 30 were plagiarized samples. The submissions of two of the questions was used for training and the other two for testing-using the leave-p-out cross-validation strategy to avoid overfitting. We compare the performance of the proposed method with two widely used tools-MOSS and JPlag—and find that the proposed method results in a small but significant improvement in accuracy compared to JPlag, while significantly outperforming MOSS in flagging plagiarized samples.","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomp.2024.1393723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Mechanisms for plagiarism detection play a crucial role in maintaining academic integrity, acting both to penalize wrongdoing while also serving as a preemptive deterrent for bad behavior. This manuscript proposes a customized plagiarism detection algorithm tailored to detect source code plagiarism in the Python programming language. Our approach combines textual and syntactic techniques, employing a support vector machine (SVM) to effectively combine various indicators of similarity and calculate the resulting similarity scores. The algorithm was trained and tested using a sample of code submissions of 4 coding problems each from 45 volunteers; 15 of these were original submissions while the other 30 were plagiarized samples. The submissions of two of the questions was used for training and the other two for testing-using the leave-p-out cross-validation strategy to avoid overfitting. We compare the performance of the proposed method with two widely used tools-MOSS and JPlag—and find that the proposed method results in a small but significant improvement in accuracy compared to JPlag, while significantly outperforming MOSS in flagging plagiarized samples.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于支持向量机的本科生 Python 代码抄袭检测方法
剽窃检测机制在维护学术诚信方面发挥着至关重要的作用,既能惩罚不法行为,又能对不良行为起到先发制人的威慑作用。本手稿提出了一种定制的剽窃检测算法,专门用于检测 Python 编程语言中的源代码剽窃行为。我们的方法结合了文本和语法技术,采用支持向量机(SVM)有效地组合各种相似性指标,并计算由此产生的相似性分数。我们使用 45 名志愿者提交的各 4 个编码问题的代码样本对算法进行了训练和测试,其中 15 个是原创提交,另外 30 个是抄袭样本。其中两个问题的提交用于训练,另外两个问题的提交用于测试--使用 "留空 "交叉验证策略以避免过度拟合。我们将所提方法的性能与两种广泛使用的工具--MOSS 和 JPlag 进行了比较,发现与 JPlag 相比,所提方法在准确性上有微小但显著的提高,同时在标记抄袭样本方面明显优于 MOSS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers in Computer Science
Frontiers in Computer Science COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
4.30
自引率
0.00%
发文量
152
审稿时长
13 weeks
期刊最新文献
A Support Vector Machine based approach for plagiarism detection in Python code submissions in undergraduate settings Working with agile and crowd: human factors identified from the industry Energy-efficient, low-latency, and non-contact eye blink detection with capacitive sensing Experimenting with D-Wave quantum annealers on prime factorization problems Fuzzy Markov model for the reliability analysis of hybrid microgrids
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1