Waqar Ali, Tanveer Ahmed, Zobia Rehman, A. Rehman, M. Slaman
{"title":"Detection of Plagiarism in Urdu Text Documents","authors":"Waqar Ali, Tanveer Ahmed, Zobia Rehman, A. Rehman, M. Slaman","doi":"10.1109/ICET.2018.8603616","DOIUrl":null,"url":null,"abstract":"Plagiarism, intellectual theft, and copyright violation are the most important problems for researchers and academic organizations such as universities. The famous publicly available Plagiarism Detection (PD) tools are Turnitin, APlagramme, Plagscan, and Aplag and these tools use to overcome plagiarism problems. However, these tools mainly work for English, Persian and Arabic languages. Copyright and intellectual document have written in every language of the world and many South Asian countries including Pakistan and India, a huge amount of academic content is available in the Urdu language. Unfortunately, due to resources scarcity and less concentration of researcher There is no enough work has been done in Urdu PD. Capturing of plagiarism in Urdu is presented in this paper. Most existing Urdu PD systems fail to identify paraphrase plagiarism in comparison between suspicious and source text document. However, the proposed system is able to identify different types of plagiarism like sentence reordering, inert/delete inter-textual similarity and near copy similarity. The proposed system is based on a distance measuring method, structural alignment algorithm, and vector space model. The system performance is evaluated using machine learning classifiers i.e. Support Vector Machine and Naïve Bayes. The experimental results demonstrated that performance of the proposed method is improved as compared to other existing model i.e. cosine method, simple Jaccard measure.","PeriodicalId":443353,"journal":{"name":"2018 14th International Conference on Emerging Technologies (ICET)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 14th International Conference on Emerging Technologies (ICET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET.2018.8603616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Plagiarism, intellectual theft, and copyright violation are the most important problems for researchers and academic organizations such as universities. The famous publicly available Plagiarism Detection (PD) tools are Turnitin, APlagramme, Plagscan, and Aplag and these tools use to overcome plagiarism problems. However, these tools mainly work for English, Persian and Arabic languages. Copyright and intellectual document have written in every language of the world and many South Asian countries including Pakistan and India, a huge amount of academic content is available in the Urdu language. Unfortunately, due to resources scarcity and less concentration of researcher There is no enough work has been done in Urdu PD. Capturing of plagiarism in Urdu is presented in this paper. Most existing Urdu PD systems fail to identify paraphrase plagiarism in comparison between suspicious and source text document. However, the proposed system is able to identify different types of plagiarism like sentence reordering, inert/delete inter-textual similarity and near copy similarity. The proposed system is based on a distance measuring method, structural alignment algorithm, and vector space model. The system performance is evaluated using machine learning classifiers i.e. Support Vector Machine and Naïve Bayes. The experimental results demonstrated that performance of the proposed method is improved as compared to other existing model i.e. cosine method, simple Jaccard measure.