The connectivity sonar: detecting site functionality by structural patterns

E. Amitay, David Carmel, Adam Darlow, R. Lempel, A. Soffer
{"title":"The connectivity sonar: detecting site functionality by structural patterns","authors":"E. Amitay, David Carmel, Adam Darlow, R. Lempel, A. Soffer","doi":"10.1145/900051.900060","DOIUrl":null,"url":null,"abstract":"Web sites today serve many different functions, such as corporate sites, search engines, e-stores, and so forth. As sites are created for different purposes, their structure and connectivity characteristics vary. However, this research argues that sites of similar role exhibit similar structural patterns, as the functionality of a site naturally induces a typical hyperlinked structure and typical connectivity patterns to and from the rest of the Web. Thus, the functionality of Web sites is reflected in a set of structural and connectivity-based features that form a typical signature. In this paper, we automatically categorize sites into eight distinct functional classes, and highlight several search-engine related applications that could make immediate use of such technology. We purposely limit our categorization algorithms by tapping connectivity and structural data alone, making no use of any content analysis whatsoever. When applying two classification algorithms to a set of 202 sites of the eight defined functional categories, the algorithms correctly classified between 54.5% and 59% of the sites. On some categories, the precision of the classification exceeded 85%. An additional result of this work indicates that the structural signature can be used to detect spam rings and mirror sites, by clustering sites with almost identical signatures.","PeriodicalId":236572,"journal":{"name":"J. Digit. Inf.","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"154","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Digit. Inf.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/900051.900060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 154

Abstract

Web sites today serve many different functions, such as corporate sites, search engines, e-stores, and so forth. As sites are created for different purposes, their structure and connectivity characteristics vary. However, this research argues that sites of similar role exhibit similar structural patterns, as the functionality of a site naturally induces a typical hyperlinked structure and typical connectivity patterns to and from the rest of the Web. Thus, the functionality of Web sites is reflected in a set of structural and connectivity-based features that form a typical signature. In this paper, we automatically categorize sites into eight distinct functional classes, and highlight several search-engine related applications that could make immediate use of such technology. We purposely limit our categorization algorithms by tapping connectivity and structural data alone, making no use of any content analysis whatsoever. When applying two classification algorithms to a set of 202 sites of the eight defined functional categories, the algorithms correctly classified between 54.5% and 59% of the sites. On some categories, the precision of the classification exceeded 85%. An additional result of this work indicates that the structural signature can be used to detect spam rings and mirror sites, by clustering sites with almost identical signatures.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
连通性声纳:通过结构模式检测站点功能
今天的Web站点提供许多不同的功能,例如公司站点、搜索引擎、电子商店等等。由于创建站点的目的不同,它们的结构和连通性特征也有所不同。然而,本研究认为,类似角色的站点表现出类似的结构模式,因为站点的功能自然地诱导出典型的超链接结构和典型的与网络其他部分之间的连接模式。因此,Web站点的功能反映在一组结构和基于连接性的特性中,这些特性形成了典型的签名。在本文中,我们自动将站点分为八个不同的功能类,并强调了几个可以立即使用这种技术的与搜索引擎相关的应用程序。我们故意限制我们的分类算法,只利用连接和结构数据,不使用任何内容分析。当将两种分类算法应用于8种定义的功能类别的202个站点时,算法正确分类的站点在54.5%到59%之间。在某些类别上,分类精度超过85%。这项工作的另一个结果表明,通过聚类具有几乎相同签名的站点,结构签名可以用于检测垃圾邮件环和镜像站点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Writing the Web Adaptive personal information environment based on the semantic web The connectivity sonar: detecting site functionality by structural patterns Map-based horizontal navigation in educational Hypertext Memory Scalability in Constraint-Based Multimedia Style Sheet Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1