Algorithm of the Text Copy Detection Based on Topic Bag

Sen Wang, Yu Wang
{"title":"Algorithm of the Text Copy Detection Based on Topic Bag","authors":"Sen Wang, Yu Wang","doi":"10.1109/WISM.2010.159","DOIUrl":null,"url":null,"abstract":"In order to resolve the current problem about seriously academic plagiarism in the web environment, this article proposes an algorithm of the text copy detection on the topic bag and the algorithm uses the idea of semantic clustering and multi-instance learning. Firstly, a paper is divided into three layers construction tree: a leaf node denotes a sentence; a branch node represents a topic bag, and the topic bag formed by semantic clustering of several paragraphs; the uppermost a root node is a text. Secondly, the similarities of topic bags are calculated by the similarities of sentences; then we can get the similarity of two papers by similarities and weights of topic bags. Experiments show that the proposed algorithm has higher accuracy.","PeriodicalId":119569,"journal":{"name":"2010 International Conference on Web Information Systems and Mining","volume":"3 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Web Information Systems and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISM.2010.159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In order to resolve the current problem about seriously academic plagiarism in the web environment, this article proposes an algorithm of the text copy detection on the topic bag and the algorithm uses the idea of semantic clustering and multi-instance learning. Firstly, a paper is divided into three layers construction tree: a leaf node denotes a sentence; a branch node represents a topic bag, and the topic bag formed by semantic clustering of several paragraphs; the uppermost a root node is a text. Secondly, the similarities of topic bags are calculated by the similarities of sentences; then we can get the similarity of two papers by similarities and weights of topic bags. Experiments show that the proposed algorithm has higher accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于主题袋的文本复制检测算法
为了解决当前网络环境下学术剽窃严重的问题,本文提出了一种基于主题袋的文本复制检测算法,该算法采用了语义聚类和多实例学习的思想。首先,将一篇论文分为三层结构树:一个叶节点表示一个句子;分支节点代表一个主题袋,由几个段落的语义聚类而成的主题袋;最上面的根节点是一个文本。其次,根据句子的相似度计算主题袋的相似度;然后通过主题袋的相似度和权重得到两篇论文的相似度。实验表明,该算法具有较高的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Numerical Simulation of Micronized Re-burning (MCR) Organic Acid Salt Used as an Accelerator The Research of the Grouping Algorithm for Chinese Learners Based on Transitive Closure Research on Multi-colony Diploid Genetic Algorithm for Production Logistics Scheduling Optimization Application of Second Order Diagonal Recurrent Neural Network in Nonlinear System Identification Synchronization Research of Uncoupled Hyper-chaotic Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1