A Support Vector Machine based approach for plagiarism detection in Python code submissions in undergraduate settings

IF 2.4 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Frontiers in Computer Science Pub Date : 2024-06-13 DOI:10.3389/fcomp.2024.1393723

Nandini Gandhi, Kaushik Gopalan, Prajish Prasad

{"title":"A Support Vector Machine based approach for plagiarism detection in Python code submissions in undergraduate settings","authors":"Nandini Gandhi, Kaushik Gopalan, Prajish Prasad","doi":"10.3389/fcomp.2024.1393723","DOIUrl":null,"url":null,"abstract":"Mechanisms for plagiarism detection play a crucial role in maintaining academic integrity, acting both to penalize wrongdoing while also serving as a preemptive deterrent for bad behavior. This manuscript proposes a customized plagiarism detection algorithm tailored to detect source code plagiarism in the Python programming language. Our approach combines textual and syntactic techniques, employing a support vector machine (SVM) to effectively combine various indicators of similarity and calculate the resulting similarity scores. The algorithm was trained and tested using a sample of code submissions of 4 coding problems each from 45 volunteers; 15 of these were original submissions while the other 30 were plagiarized samples. The submissions of two of the questions was used for training and the other two for testing-using the leave-p-out cross-validation strategy to avoid overfitting. We compare the performance of the proposed method with two widely used tools-MOSS and JPlag—and find that the proposed method results in a small but significant improvement in accuracy compared to JPlag, while significantly outperforming MOSS in flagging plagiarized samples.","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomp.2024.1393723","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Mechanisms for plagiarism detection play a crucial role in maintaining academic integrity, acting both to penalize wrongdoing while also serving as a preemptive deterrent for bad behavior. This manuscript proposes a customized plagiarism detection algorithm tailored to detect source code plagiarism in the Python programming language. Our approach combines textual and syntactic techniques, employing a support vector machine (SVM) to effectively combine various indicators of similarity and calculate the resulting similarity scores. The algorithm was trained and tested using a sample of code submissions of 4 coding problems each from 45 volunteers; 15 of these were original submissions while the other 30 were plagiarized samples. The submissions of two of the questions was used for training and the other two for testing-using the leave-p-out cross-validation strategy to avoid overfitting. We compare the performance of the proposed method with two widely used tools-MOSS and JPlag—and find that the proposed method results in a small but significant improvement in accuracy compared to JPlag, while significantly outperforming MOSS in flagging plagiarized samples.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于支持向量机的本科生 Python 代码抄袭检测方法

剽窃检测机制在维护学术诚信方面发挥着至关重要的作用，既能惩罚不法行为，又能对不良行为起到先发制人的威慑作用。本手稿提出了一种定制的剽窃检测算法，专门用于检测 Python 编程语言中的源代码剽窃行为。我们的方法结合了文本和语法技术，采用支持向量机（SVM）有效地组合各种相似性指标，并计算由此产生的相似性分数。我们使用 45 名志愿者提交的各 4 个编码问题的代码样本对算法进行了训练和测试，其中 15 个是原创提交，另外 30 个是抄袭样本。其中两个问题的提交用于训练，另外两个问题的提交用于测试--使用 "留空 "交叉验证策略以避免过度拟合。我们将所提方法的性能与两种广泛使用的工具--MOSS 和 JPlag 进行了比较，发现与 JPlag 相比，所提方法在准确性上有微小但显著的提高，同时在标记抄袭样本方面明显优于 MOSS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊