{"title":"Code Plagiarism Detection Method Based on Code Similarity and Student Behavior Characteristics","authors":"Qiubo Huang, Xuezhi Song, Guozheng Fang","doi":"10.1109/ICAICA50127.2020.9182389","DOIUrl":null,"url":null,"abstract":"We proposed a plagiarism detection approach based on code similarity and student behavior characteristics in educational scenarios. The traditional plagiarism check is based on the code only, which enables that students can escape inspection by modifying a small amount of code. We proposed that if the behavioral characteristics of students when submitting code can be considered, the suspected plagiarism can be more accurately identified. We proposed the concept of code similarity concentration (SCD) with reference to the Gini coefficient idea. SCD can reflect the similarity distribution between all the codes submitted by a student and others' codes. A large value of SCD means that a student's codes are always the most similar to the codes of some particular classmates. In addition, we also extracted other features to help detection. Finally, we classify the plagiarism detection problem as a binary classification problem and use LightGBM to make decisions. The experimental results show that the accuracy is close to 99% and f1-score is close to 98%.","PeriodicalId":113564,"journal":{"name":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICA50127.2020.9182389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We proposed a plagiarism detection approach based on code similarity and student behavior characteristics in educational scenarios. The traditional plagiarism check is based on the code only, which enables that students can escape inspection by modifying a small amount of code. We proposed that if the behavioral characteristics of students when submitting code can be considered, the suspected plagiarism can be more accurately identified. We proposed the concept of code similarity concentration (SCD) with reference to the Gini coefficient idea. SCD can reflect the similarity distribution between all the codes submitted by a student and others' codes. A large value of SCD means that a student's codes are always the most similar to the codes of some particular classmates. In addition, we also extracted other features to help detection. Finally, we classify the plagiarism detection problem as a binary classification problem and use LightGBM to make decisions. The experimental results show that the accuracy is close to 99% and f1-score is close to 98%.