Discriminative frequent subgraph mining with optimality guarantees

IF 3.6 4区数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Statistical Analysis and Data Mining Pub Date : 2010-10-01 DOI:10.1002/SAM.V3:5

Marisa Thoma, Hong Cheng, A. Gretton, Jiawei Han, H. Kriegel, Alex Smola, Le Song, Philip S. Yu, Xifeng Yan, Karsten M. Borgwardt

引用次数: 23

Abstract

The goal of frequent subgraph mining is to detect subgraphs that frequently occur in a dataset of graphs. In classification settings, one is often interested in discovering discriminative frequent subgraphs, whose presence or absence is indicative of the class membership of a graph. In this article, we propose an approach to feature selection on frequent subgraphs, called CORK, that combines two central advantages. First, it optimizes a submodular quality criterion, which means that we can yield a near-optimal solution using greedy feature selection. Second, our submodular quality function criterion can be integrated into gSpan, the state-of-the-art tool for frequent subgraph mining, and help to prune the search space for discriminative frequent subgraphs even during frequent subgraph mining. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 302-318, 2010

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有最优性保证的判别频繁子图挖掘

频繁子图挖掘的目标是检测频繁出现在图数据集中的子图。在分类设置中，人们通常对发现判别频繁子图感兴趣，它们的存在与否表明了图的类隶属度。在本文中，我们提出了一种在频繁子图上进行特征选择的方法，称为CORK，它结合了两个主要优点。首先，它优化了一个次模质量准则，这意味着我们可以使用贪婪特征选择产生一个接近最优的解决方案。其次，我们的子模块质量函数准则可以集成到gSpan中，gSpan是最先进的频繁子图挖掘工具，即使在频繁子图挖掘过程中，也有助于减少判别性频繁子图的搜索空间。版权所有©2010 Wiley期刊公司统计分析与数据挖掘，2010

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

3.20

自引率

7.70%

发文量

期刊介绍： Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.