Practical solutions to the problem of diagonal dominance in kernel document clustering

Proceedings of the 23rd international conference on Machine learning Pub Date : 2006-06-25 DOI:10.1145/1143844.1143892

Derek Greene, P. Cunningham

引用次数: 484

Abstract

In supervised kernel methods, it has been observed that the performance of the SVM classifier is poor in cases where the diagonal entries of the Gram matrix are large relative to the off-diagonal entries. This problem, referred to as diagonal dominance, often occurs when certain kernel functions are applied to sparse high-dimensional data, such as text corpora. In this paper we investigate the implications of diagonal dominance for unsupervised kernel methods, specifically in the task of document clustering. We propose a selection of strategies for addressing this issue, and evaluate their effectiveness in producing more accurate and stable clusterings.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

核心文档聚类中对角优势问题的实用解决方案

在监督核方法中，已经观察到，当Gram矩阵的对角线条目相对于非对角线条目较大时，支持向量机分类器的性能较差。当某些核函数应用于稀疏的高维数据(如文本语料库)时，通常会出现这种被称为对角优势的问题。在本文中，我们研究了对角优势对无监督核方法的影响，特别是在文档聚类任务中。我们提出了一系列解决这一问题的策略，并评估了它们在产生更准确和稳定的聚类方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 23rd international conference on Machine learning

自引率

0.00%

发文量

期刊最新文献

On a theory of learning with similarity functions Bayesian learning of measurement and structural models Predictive search distributions Data association for topic intensity tracking Feature value acquisition in testing: a sequential batch test algorithm