A term-based algorithm for hierarchical clustering of Web documents

A. Schenker, Mark Last, A. Kandel
{"title":"A term-based algorithm for hierarchical clustering of Web documents","authors":"A. Schenker, Mark Last, A. Kandel","doi":"10.1109/NAFIPS.2001.943719","DOIUrl":null,"url":null,"abstract":"In this paper we introduce the novel class hierarchy construction algorithm (CHCA) in order to create hierarchical clusterings of Web documents. Unlike most clustering methods, CHCA operates on nominal data (the words occurring in each document) and it differs from other hierarchical clustering techniques in that it uses the object-oriented concept of inheritance to create the parent/child relationship between clusters. A prototype system has been developed using CHCA to create cluster hierarchies from web search results returned by conventional search engines. CHCA, without any guidance, creates term-based clusters from the contents of the retrieved pages and assigns each page to a cluster; the clusters correspond to topics and sub-topics in the investigated domain. The performance of our system is compared with a similar web search clustering system (Vivisimo).","PeriodicalId":227374,"journal":{"name":"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAFIPS.2001.943719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

In this paper we introduce the novel class hierarchy construction algorithm (CHCA) in order to create hierarchical clusterings of Web documents. Unlike most clustering methods, CHCA operates on nominal data (the words occurring in each document) and it differs from other hierarchical clustering techniques in that it uses the object-oriented concept of inheritance to create the parent/child relationship between clusters. A prototype system has been developed using CHCA to create cluster hierarchies from web search results returned by conventional search engines. CHCA, without any guidance, creates term-based clusters from the contents of the retrieved pages and assigns each page to a cluster; the clusters correspond to topics and sub-topics in the investigated domain. The performance of our system is compared with a similar web search clustering system (Vivisimo).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于Web文档分层聚类的基于术语的算法
本文介绍了一种新的类层次构造算法(CHCA),用于创建Web文档的层次聚类。与大多数聚类方法不同,CHCA对标称数据(每个文档中出现的单词)进行操作,它与其他分层聚类技术的不同之处在于,它使用面向对象的继承概念来创建聚类之间的父/子关系。本文开发了一个原型系统,利用CHCA从传统搜索引擎返回的网络搜索结果中创建集群层次结构。CHCA在没有任何指导的情况下,根据检索页面的内容创建基于术语的集群,并将每个页面分配给一个集群;集群对应于所研究领域的主题和子主题。我们的系统的性能与类似的web搜索聚类系统(Vivisimo)进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A fuzzy database and knowledge base environment for intelligent retrieval Acquisition of sensor fusion rule based on environmental condition in sensor fusion system Interactive fuzzy programming for a decentralized two-level transportation planning and work force assignment problem Long term prediction of Tehran price index (TEPIX) using neural networks Different models of fuzzy logic programming with fuzzy unification (towards a revision of fuzzy databases)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1