Spectral analysis of text collection for similarity-based clustering

Proceedings. 20th International Conference on Data Engineering Pub Date : 2004-03-30 DOI:10.1109/ICDE.2004.1320064

Wenyuan Li, W. Ng, Ee-Peng Lim

引用次数: 7

Abstract

Clustering of text collections is generally difficult due to its high dimensionality, heterogeneity, and large size. These characteristics compound the problem of determining the appropriate similarity space for clustering algorithms. Here, we propose to use the spectral analysis of the similarity space of a text collection to predict clustering behavior before actual clustering is performed. Spectral analysis is a technique that has been adopted across different domains to analyze the key encoding information of a system. Using spectral analysis for prediction is useful in first determining the quality of the similarity space and discovering any possible problems the selected feature set may present.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于相似度聚类的文本收集光谱分析

由于文本集合的高维性、异构性和大尺寸，聚类通常是困难的。这些特征使得为聚类算法确定合适的相似空间的问题复杂化。在这里，我们建议在执行实际聚类之前，使用文本集合的相似空间的谱分析来预测聚类行为。谱分析是一种用于分析系统关键编码信息的跨领域技术。使用谱分析进行预测在首先确定相似空间的质量和发现所选特征集可能存在的任何可能的问题时是有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. 20th International Conference on Data Engineering

自引率

0.00%

发文量

期刊最新文献

ContextMetrics/sup /spl trade//: semantic and syntactic interoperability in cross-border trading systems EShopMonitor: a Web content monitoring tool A probabilistic approach to metasearching with adaptive probing Simple, robust and highly concurrent b-trees with node deletion Substructure clustering on sequential 3d object datasets