{"title":"Fuzzy Semantic-Based String Similarity Experiments to Detect Plagiarism in Indonesian Documents","authors":"Chonan Firda Odayakana Umareta, Siti Mariyah","doi":"10.1109/ICICoS48119.2019.8982501","DOIUrl":null,"url":null,"abstract":"Plagiarism is a topic of concern in the world of education. One way to overcome plagiarism is to make comparisons between documents. Due to a large number of documents, extrinsic plagiarism detection frameworks are needed to make comparisons of documents in large numbers. On the other hand, there is intelligent plagiarism in which plagiarists try to hide their actions by one of them is replacing words with semantics. Therefore, this study applies an extrinsic plagiarism detection system with a Fuzzy Semantic-Based String Similarity method which is divided into three stages, namely Preprocessing, Heuristic Retrieval (HR), and Detailed Analysis (DA). In the preprocessing stage, the removal of irrelevant characters, the division of text based on sentences, stemming, tokenization, and the elimination of stopwords were performed. The search for pairs of candidate documents in the HR stage used fingerprints and Jaccard similarity. DA stage applied fuzzy semantic based-similarity. Experiments were carried out by comparing the level of document similarity between Jaccard similarity in the HR stage and fuzzy semantic-based similarity in the DA stage because both were able to produce a level of document similarity. The results show that fuzzy semantic-based similarity is better than Jaccard similarity because it can detect semantic similarities in the form of synonyms.","PeriodicalId":105407,"journal":{"name":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS48119.2019.8982501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Plagiarism is a topic of concern in the world of education. One way to overcome plagiarism is to make comparisons between documents. Due to a large number of documents, extrinsic plagiarism detection frameworks are needed to make comparisons of documents in large numbers. On the other hand, there is intelligent plagiarism in which plagiarists try to hide their actions by one of them is replacing words with semantics. Therefore, this study applies an extrinsic plagiarism detection system with a Fuzzy Semantic-Based String Similarity method which is divided into three stages, namely Preprocessing, Heuristic Retrieval (HR), and Detailed Analysis (DA). In the preprocessing stage, the removal of irrelevant characters, the division of text based on sentences, stemming, tokenization, and the elimination of stopwords were performed. The search for pairs of candidate documents in the HR stage used fingerprints and Jaccard similarity. DA stage applied fuzzy semantic based-similarity. Experiments were carried out by comparing the level of document similarity between Jaccard similarity in the HR stage and fuzzy semantic-based similarity in the DA stage because both were able to produce a level of document similarity. The results show that fuzzy semantic-based similarity is better than Jaccard similarity because it can detect semantic similarities in the form of synonyms.