Fast and Robust Graph-based Transductive Learning via Minimum Tree Cut

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI:10.1109/ICDM.2011.66

Yanming Zhang, Kaizhu Huang, Cheng-Lin Liu

{"title":"Fast and Robust Graph-based Transductive Learning via Minimum Tree Cut","authors":"Yanming Zhang, Kaizhu Huang, Cheng-Lin Liu","doi":"10.1109/ICDM.2011.66","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an efficient and robust algorithm for graph-based transductive classification. After approximating a graph with a spanning tree, we develop a linear-time algorithm to label the tree such that the cut size of the tree is minimized. This significantly improves typical graph-based methods, which either have a cubic time complexity (for a dense graph) or $O(kn^2)$ (for a sparse graph with $k$ denoting the node degree). %In addition to its great scalability on large data, our proposed algorithm demonstrates high robustness and accuracy. In particular, on a graph with 400,000 nodes (in which 10,000 nodes are labeled) and 10,455,545 edges, our algorithm achieves the highest accuracy of $99.6\\%$ but takes less than $10$ seconds to label all the unlabeled data. Furthermore, our method shows great robustness to the graph construction both theoretically and empirically, this overcomes another big problem of traditional graph-based methods. In addition to its good scalability and robustness, the proposed algorithm demonstrates high accuracy. In particular, on a graph with $400,000$ nodes (in which $10,000$ nodes are labeled) and $10,455,545$ edges, our algorithm achieves the highest accuracy of $99.6\\%$ but takes less than $10$ seconds to label all the unlabeled data.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 11th International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2011.66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

In this paper, we propose an efficient and robust algorithm for graph-based transductive classification. After approximating a graph with a spanning tree, we develop a linear-time algorithm to label the tree such that the cut size of the tree is minimized. This significantly improves typical graph-based methods, which either have a cubic time complexity (for a dense graph) or $O(kn^2)$ (for a sparse graph with $k$ denoting the node degree). %In addition to its great scalability on large data, our proposed algorithm demonstrates high robustness and accuracy. In particular, on a graph with 400,000 nodes (in which 10,000 nodes are labeled) and 10,455,545 edges, our algorithm achieves the highest accuracy of $99.6\%$ but takes less than $10$ seconds to label all the unlabeled data. Furthermore, our method shows great robustness to the graph construction both theoretically and empirically, this overcomes another big problem of traditional graph-based methods. In addition to its good scalability and robustness, the proposed algorithm demonstrates high accuracy. In particular, on a graph with $400,000$ nodes (in which $10,000$ nodes are labeled) and $10,455,545$ edges, our algorithm achieves the highest accuracy of $99.6\%$ but takes less than $10$ seconds to label all the unlabeled data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于最小树切的快速鲁棒基于图的换导学习

本文提出了一种高效鲁棒的基于图的转换分类算法。在用生成树逼近图之后，我们开发了一种线性时间算法来标记树，使树的切割尺寸最小化。这大大改进了典型的基于图的方法，这些方法要么具有三次时间复杂度(对于密集图)，要么具有$O(kn^2)$(对于用$k$表示节点度的稀疏图)。除了在大数据上具有良好的可扩展性外，我们提出的算法具有很高的鲁棒性和准确性。特别是，在一个有400,000个节点(其中10,000个节点被标记)和10,455,545条边的图上，我们的算法达到了99.6%的最高准确率，但标记所有未标记数据的时间不到10秒。此外，该方法对图的构造具有很强的鲁棒性，克服了传统基于图的方法存在的另一个大问题。该算法不仅具有良好的可扩展性和鲁棒性，而且具有较高的准确率。特别是，在一个有$400,000$节点(其中$10,000$节点被标记)和$10,455,545$边的图上，我们的算法达到了$ 99.6% $的最高准确率，但花费不到$10$秒来标记所有未标记的数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE 11th International Conference on Data Mining

自引率

0.00%

发文量

期刊最新文献

Nonnegative Matrix Tri-factorization Based High-Order Co-clustering and Its Fast Implementation Helix: Unsupervised Grammar Induction for Structured Activity Recognition Partitionable Kernels for Mapping Kernels Multi-task Learning for Bayesian Matrix Factorization Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL