一种新的链路场检测增强技术

D. Saraswathi, A. V. Kathiravan, R. Kavitha
{"title":"一种新的链路场检测增强技术","authors":"D. Saraswathi, A. V. Kathiravan, R. Kavitha","doi":"10.1109/ICPRIME.2012.6208290","DOIUrl":null,"url":null,"abstract":"Search engine spam is a webpage that has been designed to artificially inflating its search engine ranking. Recently this search engine spam has been increased dramatically and creates problem to the search engine and the web surfer. It degrades the search engine's results, occupies more memory and consumes more time for creating indexes, and frustrates the user by giving irrelevant results. Search engines have tried many techniques to filter out these spam pages before they can appear on the query results page. Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We have designed and develop a system, spamcity score that detects spam hosts or pages on the Web. The UK Web Spam UK 2007 data set has been used for experimentation. It is a public web spam dataset annotated at the level of hosts, for all results reported here. System uses the key features of popular link based algorithms to detect spam in improved manner. In this paper, various ways of creating spam pages, a collection of current methods that are being used to detect spam and a new approach to build a tool for improving link spam detection using spamcity score of term spam. This new approach uses SVMLight tool to detect the link spam which considers the link structure of Web and page contents. These statistical features are used to build a classifier that is tested over a large collection of Web link spam. The link farm can be identifying based on Web Graph, classification by using SVMLight Tool, Degree based measure, page Rank, Trust Rank, and Truncated PageRank. The spam classifier makes use of the Wordnet word database and SVMLight tool to classify web links as either spam or not spam. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links.","PeriodicalId":148511,"journal":{"name":"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A new enhanced technique for link farm detection\",\"authors\":\"D. Saraswathi, A. V. Kathiravan, R. Kavitha\",\"doi\":\"10.1109/ICPRIME.2012.6208290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Search engine spam is a webpage that has been designed to artificially inflating its search engine ranking. Recently this search engine spam has been increased dramatically and creates problem to the search engine and the web surfer. It degrades the search engine's results, occupies more memory and consumes more time for creating indexes, and frustrates the user by giving irrelevant results. Search engines have tried many techniques to filter out these spam pages before they can appear on the query results page. Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We have designed and develop a system, spamcity score that detects spam hosts or pages on the Web. The UK Web Spam UK 2007 data set has been used for experimentation. It is a public web spam dataset annotated at the level of hosts, for all results reported here. System uses the key features of popular link based algorithms to detect spam in improved manner. In this paper, various ways of creating spam pages, a collection of current methods that are being used to detect spam and a new approach to build a tool for improving link spam detection using spamcity score of term spam. This new approach uses SVMLight tool to detect the link spam which considers the link structure of Web and page contents. These statistical features are used to build a classifier that is tested over a large collection of Web link spam. The link farm can be identifying based on Web Graph, classification by using SVMLight Tool, Degree based measure, page Rank, Trust Rank, and Truncated PageRank. The spam classifier makes use of the Wordnet word database and SVMLight tool to classify web links as either spam or not spam. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links.\",\"PeriodicalId\":148511,\"journal\":{\"name\":\"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPRIME.2012.6208290\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2012.6208290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

搜索引擎垃圾邮件是一个网页,已被人为地夸大其搜索引擎排名。最近,这种搜索引擎垃圾邮件急剧增加,给搜索引擎和网络冲浪者带来了问题。它降低了搜索引擎的结果,占用了更多的内存,花费了更多的时间来创建索引,并且由于给出了不相关的结果而使用户感到沮丧。搜索引擎已经尝试了许多技术来过滤掉这些垃圾页面,以便它们能够出现在查询结果页面上。垃圾邮件发送者打算通过创建大量指向垃圾邮件页面的链接来提高某些垃圾邮件页面的PageRank。我们已经设计和开发了一个系统,垃圾邮件得分,检测垃圾主机或网页在网络上。英国网络垃圾邮件英国2007数据集已被用于实验。对于这里报告的所有结果,它是一个在主机级别标注的公共web垃圾邮件数据集。系统利用当前流行的基于链接的算法的主要特点,对垃圾邮件进行改进检测。本文介绍了创建垃圾邮件页面的各种方法,收集了当前用于检测垃圾邮件的方法,并提出了一种新的方法来构建一个工具,用于使用垃圾邮件术语的垃圾邮件得分来改进链接垃圾邮件检测。该方法使用SVMLight工具检测垃圾链接,该工具考虑了Web和页面内容的链接结构。这些统计特征用于构建一个分类器,该分类器将在大量的Web垃圾链接集合上进行测试。链接场可以基于Web Graph进行识别,使用SVMLight工具进行分类,基于程度的度量,页面排名,信任排名和截断的PageRank。垃圾邮件分类器使用Wordnet word数据库和SVMLight工具将web链接分类为垃圾邮件或非垃圾邮件。这些特性不仅与从Web页面中提取的定量数据有关,而且与定性属性有关,主要是页面链接。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A new enhanced technique for link farm detection
Search engine spam is a webpage that has been designed to artificially inflating its search engine ranking. Recently this search engine spam has been increased dramatically and creates problem to the search engine and the web surfer. It degrades the search engine's results, occupies more memory and consumes more time for creating indexes, and frustrates the user by giving irrelevant results. Search engines have tried many techniques to filter out these spam pages before they can appear on the query results page. Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We have designed and develop a system, spamcity score that detects spam hosts or pages on the Web. The UK Web Spam UK 2007 data set has been used for experimentation. It is a public web spam dataset annotated at the level of hosts, for all results reported here. System uses the key features of popular link based algorithms to detect spam in improved manner. In this paper, various ways of creating spam pages, a collection of current methods that are being used to detect spam and a new approach to build a tool for improving link spam detection using spamcity score of term spam. This new approach uses SVMLight tool to detect the link spam which considers the link structure of Web and page contents. These statistical features are used to build a classifier that is tested over a large collection of Web link spam. The link farm can be identifying based on Web Graph, classification by using SVMLight Tool, Degree based measure, page Rank, Trust Rank, and Truncated PageRank. The spam classifier makes use of the Wordnet word database and SVMLight tool to classify web links as either spam or not spam. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An optimized cluster based approach for multi-source multicast routing protocol in mobile ad hoc networks with differential evolution Increasing cluster uniqueness in Fuzzy C-Means through affinity measure Rule extraction from neural networks — A comparative study Text extraction from digital English comic image using two blobs extraction method A novel approach for Kannada text extraction
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1