Hybrid spamicity score approach to web spam detection

S. P. Algur, N. T. Pendari
{"title":"Hybrid spamicity score approach to web spam detection","authors":"S. P. Algur, N. T. Pendari","doi":"10.1109/ICPRIME.2012.6208284","DOIUrl":null,"url":null,"abstract":"Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Fundamentally, Web spam is designed to pollute search engines and corrupt the user experience by driving traffic to particular spammed Web pages, regardless of the merits of those pages. Recently, there is dramatic increase in amount of web spam, leading to a degradation of search results. Most of the existing web spam detection methods are supervised that require a large set of training web pages. The proposed system studies the problem of unsupervised web spam detection. It introduces the notion of spamicity to measure how likely a page is spam. Spamicity is a more flexible measure than the traditional supervised classification methods. In the proposed system link and content spam techniques are used to determine the spamicity score of web page. A threshold is set by empirical analysis which classifies the web page into spam or non spam.","PeriodicalId":148511,"journal":{"name":"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2012.6208284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Fundamentally, Web spam is designed to pollute search engines and corrupt the user experience by driving traffic to particular spammed Web pages, regardless of the merits of those pages. Recently, there is dramatic increase in amount of web spam, leading to a degradation of search results. Most of the existing web spam detection methods are supervised that require a large set of training web pages. The proposed system studies the problem of unsupervised web spam detection. It introduces the notion of spamicity to measure how likely a page is spam. Spamicity is a more flexible measure than the traditional supervised classification methods. In the proposed system link and content spam techniques are used to determine the spamicity score of web page. A threshold is set by empirical analysis which classifies the web page into spam or non spam.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
混合垃圾邮件得分方法的网络垃圾邮件检测
网络垃圾邮件指的是误导搜索引擎的行为,并给予一些页面比他们应得的更高的排名。从根本上说,Web垃圾邮件的目的是通过将流量驱动到特定的垃圾邮件Web页面来污染搜索引擎并破坏用户体验,而不考虑这些页面的优点。最近,网络垃圾邮件的数量急剧增加,导致搜索结果的退化。现有的垃圾邮件检测方法大多是监督式的,需要大量的训练网页。该系统研究了无监督网络垃圾邮件检测问题。它引入了垃圾信息的概念来衡量一个页面是垃圾邮件的可能性。垃圾信息是一种比传统的监督分类方法更灵活的度量方法。在该系统中,链接垃圾邮件和内容垃圾邮件技术被用来确定网页的垃圾邮件得分。通过实证分析设置阈值,将网页分为垃圾网页和非垃圾网页。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An optimized cluster based approach for multi-source multicast routing protocol in mobile ad hoc networks with differential evolution Increasing cluster uniqueness in Fuzzy C-Means through affinity measure Rule extraction from neural networks — A comparative study Text extraction from digital English comic image using two blobs extraction method A novel approach for Kannada text extraction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1