Edge Weight Regularization over Multiple Graphs for Similarity Learning

2010 IEEE International Conference on Data Mining Pub Date : 2010-12-13 DOI:10.1109/ICDM.2010.156

Pradeep Muthukrishnan, Dragomir R. Radev, Q. Mei

{"title":"Edge Weight Regularization over Multiple Graphs for Similarity Learning","authors":"Pradeep Muthukrishnan, Dragomir R. Radev, Q. Mei","doi":"10.1109/ICDM.2010.156","DOIUrl":null,"url":null,"abstract":"The growth of the web has directly influenced the increase in the availability of relational data. One of the key problems in mining such data is computing the similarity between objects with heterogeneous feature types. For example, publications have many heterogeneous features like text, citations, authorship information, venue information, etc. In most approaches, similarity is estimated using each feature type in isolation and then combined in a linear fashion. However, this approach does not take advantage of the dependencies between the different feature spaces. In this paper, we propose a novel approach to combine the different sources of similarity using a regularization framework over edges in multiple graphs. We show that the objective function induced by the framework is convex. We also propose an efficient algorithm using coordinate descent [1] to solve the optimization problem. We extrinsically evaluate the performance of the proposed unified similarity measure on two different tasks, clustering and classification. The proposed similarity measure outperforms three baselines and a state-of-the-art classification algorithm on a variety of standard, large data sets.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2010.156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

Abstract

The growth of the web has directly influenced the increase in the availability of relational data. One of the key problems in mining such data is computing the similarity between objects with heterogeneous feature types. For example, publications have many heterogeneous features like text, citations, authorship information, venue information, etc. In most approaches, similarity is estimated using each feature type in isolation and then combined in a linear fashion. However, this approach does not take advantage of the dependencies between the different feature spaces. In this paper, we propose a novel approach to combine the different sources of similarity using a regularization framework over edges in multiple graphs. We show that the objective function induced by the framework is convex. We also propose an efficient algorithm using coordinate descent [1] to solve the optimization problem. We extrinsically evaluate the performance of the proposed unified similarity measure on two different tasks, clustering and classification. The proposed similarity measure outperforms three baselines and a state-of-the-art classification algorithm on a variety of standard, large data sets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于相似性学习的多图边权正则化

网络的发展直接影响了关系数据可用性的增加。挖掘此类数据的关键问题之一是计算具有异构特征类型的对象之间的相似度。例如，出版物具有许多异构特征，如文本、引文、作者信息、地点信息等。在大多数方法中，相似性是单独使用每个特征类型来估计的，然后以线性方式组合。然而，这种方法没有利用不同特征空间之间的依赖关系。在本文中，我们提出了一种新的方法来结合不同来源的相似度在多个图的边缘上使用正则化框架。我们证明了由框架诱导的目标函数是凸的。我们还提出了一种使用坐标下降[1]的高效算法来解决优化问题。我们从外部评价了所提出的统一相似性度量在两个不同任务上的性能，聚类和分类。提出的相似性度量在各种标准的大型数据集上优于三个基线和最先进的分类算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2010 IEEE International Conference on Data Mining

自引率

0.00%

发文量

期刊最新文献

Generalized Probabilistic Matrix Factorizations for Collaborative Filtering MoodCast: Emotion Prediction via Dynamic Continuous Factor Graph Model Finding Local Anomalies in Very High Dimensional Space Efficient Probabilistic Latent Semantic Analysis with Sparsity Control Enhancing Single-Objective Projective Clustering Ensembles