{"title":"Hybrid spamicity score approach to web spam detection","authors":"S. P. Algur, N. T. Pendari","doi":"10.1109/ICPRIME.2012.6208284","DOIUrl":null,"url":null,"abstract":"Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Fundamentally, Web spam is designed to pollute search engines and corrupt the user experience by driving traffic to particular spammed Web pages, regardless of the merits of those pages. Recently, there is dramatic increase in amount of web spam, leading to a degradation of search results. Most of the existing web spam detection methods are supervised that require a large set of training web pages. The proposed system studies the problem of unsupervised web spam detection. It introduces the notion of spamicity to measure how likely a page is spam. Spamicity is a more flexible measure than the traditional supervised classification methods. In the proposed system link and content spam techniques are used to determine the spamicity score of web page. A threshold is set by empirical analysis which classifies the web page into spam or non spam.","PeriodicalId":148511,"journal":{"name":"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2012.6208284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Web spamming refers to actions intended to mislead search engines and give some pages higher ranking than they deserve. Fundamentally, Web spam is designed to pollute search engines and corrupt the user experience by driving traffic to particular spammed Web pages, regardless of the merits of those pages. Recently, there is dramatic increase in amount of web spam, leading to a degradation of search results. Most of the existing web spam detection methods are supervised that require a large set of training web pages. The proposed system studies the problem of unsupervised web spam detection. It introduces the notion of spamicity to measure how likely a page is spam. Spamicity is a more flexible measure than the traditional supervised classification methods. In the proposed system link and content spam techniques are used to determine the spamicity score of web page. A threshold is set by empirical analysis which classifies the web page into spam or non spam.