Detection of Plagiarism in Urdu Text Documents

Waqar Ali, Tanveer Ahmed, Zobia Rehman, A. Rehman, M. Slaman
{"title":"Detection of Plagiarism in Urdu Text Documents","authors":"Waqar Ali, Tanveer Ahmed, Zobia Rehman, A. Rehman, M. Slaman","doi":"10.1109/ICET.2018.8603616","DOIUrl":null,"url":null,"abstract":"Plagiarism, intellectual theft, and copyright violation are the most important problems for researchers and academic organizations such as universities. The famous publicly available Plagiarism Detection (PD) tools are Turnitin, APlagramme, Plagscan, and Aplag and these tools use to overcome plagiarism problems. However, these tools mainly work for English, Persian and Arabic languages. Copyright and intellectual document have written in every language of the world and many South Asian countries including Pakistan and India, a huge amount of academic content is available in the Urdu language. Unfortunately, due to resources scarcity and less concentration of researcher There is no enough work has been done in Urdu PD. Capturing of plagiarism in Urdu is presented in this paper. Most existing Urdu PD systems fail to identify paraphrase plagiarism in comparison between suspicious and source text document. However, the proposed system is able to identify different types of plagiarism like sentence reordering, inert/delete inter-textual similarity and near copy similarity. The proposed system is based on a distance measuring method, structural alignment algorithm, and vector space model. The system performance is evaluated using machine learning classifiers i.e. Support Vector Machine and Naïve Bayes. The experimental results demonstrated that performance of the proposed method is improved as compared to other existing model i.e. cosine method, simple Jaccard measure.","PeriodicalId":443353,"journal":{"name":"2018 14th International Conference on Emerging Technologies (ICET)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 14th International Conference on Emerging Technologies (ICET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICET.2018.8603616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Plagiarism, intellectual theft, and copyright violation are the most important problems for researchers and academic organizations such as universities. The famous publicly available Plagiarism Detection (PD) tools are Turnitin, APlagramme, Plagscan, and Aplag and these tools use to overcome plagiarism problems. However, these tools mainly work for English, Persian and Arabic languages. Copyright and intellectual document have written in every language of the world and many South Asian countries including Pakistan and India, a huge amount of academic content is available in the Urdu language. Unfortunately, due to resources scarcity and less concentration of researcher There is no enough work has been done in Urdu PD. Capturing of plagiarism in Urdu is presented in this paper. Most existing Urdu PD systems fail to identify paraphrase plagiarism in comparison between suspicious and source text document. However, the proposed system is able to identify different types of plagiarism like sentence reordering, inert/delete inter-textual similarity and near copy similarity. The proposed system is based on a distance measuring method, structural alignment algorithm, and vector space model. The system performance is evaluated using machine learning classifiers i.e. Support Vector Machine and Naïve Bayes. The experimental results demonstrated that performance of the proposed method is improved as compared to other existing model i.e. cosine method, simple Jaccard measure.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
乌尔都语文本文件中的抄袭检测
剽窃、知识盗窃和侵犯版权是研究人员和大学等学术机构面临的最重要的问题。著名的公开抄袭检测(PD)工具是Turnitin, aplagme, Plagscan和Aplag,这些工具用于克服抄袭问题。然而,这些工具主要适用于英语、波斯语和阿拉伯语。版权和知识文献以世界上各种语言写成,包括巴基斯坦和印度在内的许多南亚国家,大量的学术内容以乌尔都语提供。遗憾的是,由于资源的缺乏和研究人员的不集中,乌尔都语PD方面的工作还不够。本文介绍了乌尔都语中的剽窃行为。大多数现有的乌尔都语PD系统在可疑文本和源文本文档之间的比较中无法识别释义剽窃。然而,该系统能够识别不同类型的抄袭,如句子重排、惰性/删除文本间相似性和近复制相似性。该系统基于距离测量方法、结构对准算法和向量空间模型。使用机器学习分类器(即支持向量机和Naïve贝叶斯)评估系统性能。实验结果表明,与现有的余弦法、简单的Jaccard测度等模型相比,该方法的性能得到了提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Semantic Analysis of News Based on the Deep Convolution Neural Network Identification and mapping of coral reefs using Landsat 8 OLI in Astola Island, Pakistan coastal ocean Robot Localization in Indoor and Outdoor Environments by Multi-sensor Fusion Understanding Worker Mobility within the Stay Locations using HMMs on Semantic Trajectories Domain Specific Emotion Lexicon Expansion
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1