Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds of online data. In this paper, we propose a novel topic modeling framework for document networks, which builds a unified generative topic model that is able to consider both text and structure information for documents. A graphical model is proposed to describe the generative model. On the top layer of this graphical model, we define a novel multivariate Markov Random Field for topic distribution random variables for each document, to model the dependency relationships among documents over the network structure. On the bottom layer, we follow the traditional topic model to model the generation of text for each document. A joint distribution function for both the text and structure of the documents is thus provided. A solution to estimate this topic model is given, by maximizing the log-likelihood of the joint probability. Some important practical issues in real applications are also discussed, including how to decide the topic number and how to choose a good network structure. We apply the model on two real datasets, DBLP and Cora, and the experiments show that this model is more effective in comparison with the state-of-the-art topic modeling algorithms.
{"title":"iTopicModel: Information Network-Integrated Topic Modeling","authors":"Yizhou Sun, Jiawei Han, Jing Gao, Yintao Yu","doi":"10.1109/ICDM.2009.43","DOIUrl":"https://doi.org/10.1109/ICDM.2009.43","url":null,"abstract":"Document networks, i.e., networks associated with text information, are becoming increasingly popular due to the ubiquity of Web documents, blogs, and various kinds of online data. In this paper, we propose a novel topic modeling framework for document networks, which builds a unified generative topic model that is able to consider both text and structure information for documents. A graphical model is proposed to describe the generative model. On the top layer of this graphical model, we define a novel multivariate Markov Random Field for topic distribution random variables for each document, to model the dependency relationships among documents over the network structure. On the bottom layer, we follow the traditional topic model to model the generation of text for each document. A joint distribution function for both the text and structure of the documents is thus provided. A solution to estimate this topic model is given, by maximizing the log-likelihood of the joint probability. Some important practical issues in real applications are also discussed, including how to decide the topic number and how to choose a good network structure. We apply the model on two real datasets, DBLP and Cora, and the experiments show that this model is more effective in comparison with the state-of-the-art topic modeling algorithms.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115139555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Santanu Das, Kanishka Bhaduri, N. Oza, A. Srivastava
In this paper we propose ν-Anomica, a novel anomaly detection technique that can be trained on huge data sets with much reduced running time compared to the benchmark one-class Support Vector Machines algorithm. In ν-Anomica, the idea is to train the machine such that it can provide a close approximation to the exact decision plane using fewer training points and without losing much of the generalization performance of the classical approach. We have tested the proposed algorithm on a variety of continuous data sets under different conditions. We show that under all test conditions the developed procedure closely preserves the accuracy of standard one- class Support Vector Machines while reducing both the training time and the test time by 5 − 20 times.
{"title":"ν-Anomica: A Fast Support Vector Based Novelty Detection Technique","authors":"Santanu Das, Kanishka Bhaduri, N. Oza, A. Srivastava","doi":"10.1109/ICDM.2009.42","DOIUrl":"https://doi.org/10.1109/ICDM.2009.42","url":null,"abstract":"In this paper we propose ν-Anomica, a novel anomaly detection technique that can be trained on huge data sets with much reduced running time compared to the benchmark one-class Support Vector Machines algorithm. In ν-Anomica, the idea is to train the machine such that it can provide a close approximation to the exact decision plane using fewer training points and without losing much of the generalization performance of the classical approach. We have tested the proposed algorithm on a variety of continuous data sets under different conditions. We show that under all test conditions the developed procedure closely preserves the accuracy of standard one- class Support Vector Machines while reducing both the training time and the test time by 5 − 20 times.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124287367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper studies how useful the standard 2-norm regularized SVM is in approximating the 1-norm SVM problem. To this end, we examine a general method that is based on iteratively re-weighting the features and solving a 2-norm optimization problem. The convergence rate of this method is unknown. Previous work indicates that it might require an excessive number of iterations. We study how well we can do with just a small number of iterations. In theory the convergence rate is fast, except for coordinates of the current solution that are close to zero. Our empirical experiments confirm this. In many problems with irrelevant features, already one iteration is often enough to produce accuracy as good as or better than that of the 1-norm SVM. Hence, it seems that in these problems we do not need to converge to the 1-norm SVM solution near zero values. The benefit of this approach is that we can build something similar to the 1-norm regularized solver based on any 2-norm regularized solver. This is quick to implement and the solution inherits the good qualities of the solver such as scalability and stability.
{"title":"A Walk from 2-Norm SVM to 1-Norm SVM","authors":"J. Kujala, T. Aho, Tapio Elomaa","doi":"10.1109/ICDM.2009.100","DOIUrl":"https://doi.org/10.1109/ICDM.2009.100","url":null,"abstract":"This paper studies how useful the standard 2-norm regularized SVM is in approximating the 1-norm SVM problem. To this end, we examine a general method that is based on iteratively re-weighting the features and solving a 2-norm optimization problem. The convergence rate of this method is unknown. Previous work indicates that it might require an excessive number of iterations. We study how well we can do with just a small number of iterations. In theory the convergence rate is fast, except for coordinates of the current solution that are close to zero. Our empirical experiments confirm this. In many problems with irrelevant features, already one iteration is often enough to produce accuracy as good as or better than that of the 1-norm SVM. Hence, it seems that in these problems we do not need to converge to the 1-norm SVM solution near zero values. The benefit of this approach is that we can build something similar to the 1-norm regularized solver based on any 2-norm regularized solver. This is quick to implement and the solution inherits the good qualities of the solver such as scalability and stability.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125321602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a local and distributed expectation maximization algorithm for learning parameters of Gaussian mixture models (GMM) in large peer-to-peer (P2P) environments. The algorithm can be used for a variety of well-known data mining tasks in distributed environments such as clustering, anomaly detection, target tracking, and density estimation to name a few, necessary for many emerging P2P applications in bioinformatics, webmining and sensor networks. Centralizing all or some of the data to build global models is impractical in such P2P environments because of the large number of data sources, the asynchronous nature of the P2P networks, and dynamic nature of the data/network. The proposed algorithm takes a two-step approach. In the monitoring phase, the algorithm checks if the model ‘quality’ is acceptable by using an efficient local algorithm. This is then used as a feedback loop to sample data from the network and rebuild the GMM when it is outdated. We present thorough experimental results to verify our theoretical claims.
{"title":"A Local Scalable Distributed Expectation Maximization Algorithm for Large Peer-to-Peer Networks","authors":"Kanishka Bhaduri, A. Srivastava","doi":"10.1109/ICDM.2009.45","DOIUrl":"https://doi.org/10.1109/ICDM.2009.45","url":null,"abstract":"This paper describes a local and distributed expectation maximization algorithm for learning parameters of Gaussian mixture models (GMM) in large peer-to-peer (P2P) environments. The algorithm can be used for a variety of well-known data mining tasks in distributed environments such as clustering, anomaly detection, target tracking, and density estimation to name a few, necessary for many emerging P2P applications in bioinformatics, webmining and sensor networks. Centralizing all or some of the data to build global models is impractical in such P2P environments because of the large number of data sources, the asynchronous nature of the P2P networks, and dynamic nature of the data/network. The proposed algorithm takes a two-step approach. In the monitoring phase, the algorithm checks if the model ‘quality’ is acceptable by using an efficient local algorithm. This is then used as a feedback loop to sample data from the network and rebuild the GMM when it is outdated. We present thorough experimental results to verify our theoretical claims.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115109333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work examines under what conditions compression methodologies can retain the outcome of clustering operations. We focus on the popular k-Means clustering algorithm and we demonstrate how a properly constructed compression scheme based on post-clustering quantization is capable of maintaining the global cluster structure. Our analytical derivations indicate that a 1-bit moment preserving quantizer per cluster is sufficient to retain the original data clusters. Merits of the proposed compression technique include: a) reduced storage requirements with clustering guarantees, b) data privacy on the original values, and c) shape preservation for data visualization purposes. We evaluate quantization scheme on various high-dimensional datasets, including 1-dimensional and 2-dimensional time-series (shape datasets) and demonstrate the cluster preservation property. We also compare with previously proposed simplification techniques in the time-series area and show significant improvements both on the clustering and shape preservation of the compressed datasets.
{"title":"On K-Means Cluster Preservation Using Quantization Schemes","authors":"D. Turaga, M. Vlachos, O. Verscheure","doi":"10.1109/ICDM.2009.12","DOIUrl":"https://doi.org/10.1109/ICDM.2009.12","url":null,"abstract":"This work examines under what conditions compression methodologies can retain the outcome of clustering operations. We focus on the popular k-Means clustering algorithm and we demonstrate how a properly constructed compression scheme based on post-clustering quantization is capable of maintaining the global cluster structure. Our analytical derivations indicate that a 1-bit moment preserving quantizer per cluster is sufficient to retain the original data clusters. Merits of the proposed compression technique include: a) reduced storage requirements with clustering guarantees, b) data privacy on the original values, and c) shape preservation for data visualization purposes. We evaluate quantization scheme on various high-dimensional datasets, including 1-dimensional and 2-dimensional time-series (shape datasets) and demonstrate the cluster preservation property. We also compare with previously proposed simplification techniques in the time-series area and show significant improvements both on the clustering and shape preservation of the compressed datasets.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122364776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanzhe Cai, G. Cong, Xu Jia, Hongyan Liu, Jun He, Jiaheng Lu, Xiaoyong Du
Similarity calculation has many applications, such as information retrieval, and collaborative filtering, among many others. It has been shown that link-based similarity measure, such as SimRank, is very effective in characterizing the object similarities in networks, such as the Web, by exploiting the object-to-object relationship. Unfortunately, it is prohibitively expensive to compute the link-based similarity in a relatively large graph. In this paper, based on the observation that link-based similarity scores of real world graphs follow the power-law distribution, we propose a new approximate algorithm, namely Power-SimRank, with guaranteed error bound to efficiently compute link-based similarity measure. We also prove the convergence of the proposed algorithm. Extensive experiments conducted on real world datasets and synthetic datasets show that the proposed algorithm outperforms SimRank by four-five times in terms of efficiency while the error generated by the approximation is small.
{"title":"Efficient Algorithm for Computing Link-Based Similarity in Real World Networks","authors":"Yuanzhe Cai, G. Cong, Xu Jia, Hongyan Liu, Jun He, Jiaheng Lu, Xiaoyong Du","doi":"10.1109/ICDM.2009.136","DOIUrl":"https://doi.org/10.1109/ICDM.2009.136","url":null,"abstract":"Similarity calculation has many applications, such as information retrieval, and collaborative filtering, among many others. It has been shown that link-based similarity measure, such as SimRank, is very effective in characterizing the object similarities in networks, such as the Web, by exploiting the object-to-object relationship. Unfortunately, it is prohibitively expensive to compute the link-based similarity in a relatively large graph. In this paper, based on the observation that link-based similarity scores of real world graphs follow the power-law distribution, we propose a new approximate algorithm, namely Power-SimRank, with guaranteed error bound to efficiently compute link-based similarity measure. We also prove the convergence of the proposed algorithm. Extensive experiments conducted on real world datasets and synthetic datasets show that the proposed algorithm outperforms SimRank by four-five times in terms of efficiency while the error generated by the approximation is small.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122999085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There has been significant recent interest in sparse metric learning (SML) in which we simultaneously learn both a good distance metric and a low-dimensional representation. Unfortunately, the performance of existing sparse metric learning approaches is usually limited because the authors assumed certain problem relaxations or they target the SML objective indirectly. In this paper, we propose a Generalized Sparse Metric Learning method (GSML). This novel framework offers a unified view for understanding many of the popular sparse metric learning algorithms including the Sparse Metric Learning framework proposed, the Large Margin Nearest Neighbor (LMNN), and the D-ranking Vector Machine (D-ranking VM). Moreover, GSML also establishes a close relationship with the Pairwise Support Vector Machine. Furthermore, the proposed framework is capable of extending many current non-sparse metric learning models such as Relevant Vector Machine (RCA) and a state-of-the-art method proposed into their sparse versions. We present the detailed framework, provide theoretical justifications, build various connections with other models, and propose a practical iterative optimization method, making the framework both theoretically important and practically scalable for medium or large datasets. A series of experiments show that the proposed approach can outperform previous methods in terms of both test accuracy and dimension reduction, on six real-world benchmark datasets.
{"title":"GSML: A Unified Framework for Sparse Metric Learning","authors":"Kaizhu Huang, Yiming Ying, C. Campbell","doi":"10.1109/ICDM.2009.22","DOIUrl":"https://doi.org/10.1109/ICDM.2009.22","url":null,"abstract":"There has been significant recent interest in sparse metric learning (SML) in which we simultaneously learn both a good distance metric and a low-dimensional representation. Unfortunately, the performance of existing sparse metric learning approaches is usually limited because the authors assumed certain problem relaxations or they target the SML objective indirectly. In this paper, we propose a Generalized Sparse Metric Learning method (GSML). This novel framework offers a unified view for understanding many of the popular sparse metric learning algorithms including the Sparse Metric Learning framework proposed, the Large Margin Nearest Neighbor (LMNN), and the D-ranking Vector Machine (D-ranking VM). Moreover, GSML also establishes a close relationship with the Pairwise Support Vector Machine. Furthermore, the proposed framework is capable of extending many current non-sparse metric learning models such as Relevant Vector Machine (RCA) and a state-of-the-art method proposed into their sparse versions. We present the detailed framework, provide theoretical justifications, build various connections with other models, and propose a practical iterative optimization method, making the framework both theoretically important and practically scalable for medium or large datasets. A series of experiments show that the proposed approach can outperform previous methods in terms of both test accuracy and dimension reduction, on six real-world benchmark datasets.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114276575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The recent interest in outsourcing IT services onto the cloud raises two main concerns: security and cost. One task that could be outsourced is data mining. In VLDB 2007, Wong et al. propose an approach for outsourcing association rule mining. Their approach maps a set of real items into a set of pseudo items, then maps each transaction non-deterministically. This paper, analyzes both the security and costs associated with outsourcing association rule mining. We show how to break the encoding scheme from Wong et al. without using context specific information and reduce the security to a one-to-one mapping. We present a stricter notion of security than used by Wong et al., and then consider the practicality of outsourcing association rule mining. Our results indicate that outsourcing association rule mining may not be practical, if the data owner is concerned with data confidentiality.
{"title":"On the (In)Security and (Im)Practicality of Outsourcing Precise Association Rule Mining","authors":"Ian Molloy, Ninghui Li, Tiancheng Li","doi":"10.1109/ICDM.2009.122","DOIUrl":"https://doi.org/10.1109/ICDM.2009.122","url":null,"abstract":"The recent interest in outsourcing IT services onto the cloud raises two main concerns: security and cost. One task that could be outsourced is data mining. In VLDB 2007, Wong et al. propose an approach for outsourcing association rule mining. Their approach maps a set of real items into a set of pseudo items, then maps each transaction non-deterministically. This paper, analyzes both the security and costs associated with outsourcing association rule mining. We show how to break the encoding scheme from Wong et al. without using context specific information and reduce the security to a one-to-one mapping. We present a stricter notion of security than used by Wong et al., and then consider the practicality of outsourcing association rule mining. Our results indicate that outsourcing association rule mining may not be practical, if the data owner is concerned with data confidentiality.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129143360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Cleuziou, M. Exbrayat, Lionel Martin, Jacques-Henri Sublemontier
This paper deals with clustering for multi-view data, i.e. objects described by several sets of variables or proximity matrices. Many important domains or applications such as Information Retrieval, biology, chemistry and marketing are concerned by this problematic. The aim of this data mining research field is to search for clustering patterns that perform a consensus between the patterns from different views. This requires to merge information from each view by performing a fusion process that identifies the agreement between the views and solves the conflicts. Various fusion strategies can be applied, occurring either before, after or during the clustering process. We draw our inspiration from the existing algorithms based on a centralized strategy. We propose a fuzzy clustering approach that generalizes the three fusion strategies and outperforms the main existing multi-view clustering algorithm both on synthetic and real datasets.
{"title":"CoFKM: A Centralized Method for Multiple-View Clustering","authors":"G. Cleuziou, M. Exbrayat, Lionel Martin, Jacques-Henri Sublemontier","doi":"10.1109/ICDM.2009.138","DOIUrl":"https://doi.org/10.1109/ICDM.2009.138","url":null,"abstract":"This paper deals with clustering for multi-view data, i.e. objects described by several sets of variables or proximity matrices. Many important domains or applications such as Information Retrieval, biology, chemistry and marketing are concerned by this problematic. The aim of this data mining research field is to search for clustering patterns that perform a consensus between the patterns from different views. This requires to merge information from each view by performing a fusion process that identifies the agreement between the views and solves the conflicts. Various fusion strategies can be applied, occurring either before, after or during the clustering process. We draw our inspiration from the existing algorithms based on a centralized strategy. We propose a fuzzy clustering approach that generalizes the three fusion strategies and outperforms the main existing multi-view clustering algorithm both on synthetic and real datasets.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116341089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic graphs are used to represent relationships between entities that evolve over time. Meaningful patterns in such structured data must capture strong interactions and their evolution over time. In social networks, such patterns can be seen as dynamic community structures, i.e., sets of individuals who strongly and repeatedly interact. In this paper, we propose a constraint-based mining approach to uncover evolving patterns. We propose to mine dense and isolated subgraphs defined by two user-parameterized constraints. The temporal evolution of such patterns is captured by associating a temporal event type to each identified subgraph. We consider five basic temporal events: The formation, dissolution, growth, diminution and stability of subgraphs from one time stamp to the next. We propose an algorithm that finds such subgraphs in a time series of graphs processed incrementally. The extraction is feasible due to efficient patterns and data pruning strategies. We demonstrate the applicability of our method on several real-world dynamic graphs and extract meaningful evolving communities.
{"title":"Constraint-Based Pattern Mining in Dynamic Graphs","authors":"C. Robardet","doi":"10.1109/ICDM.2009.99","DOIUrl":"https://doi.org/10.1109/ICDM.2009.99","url":null,"abstract":"Dynamic graphs are used to represent relationships between entities that evolve over time. Meaningful patterns in such structured data must capture strong interactions and their evolution over time. In social networks, such patterns can be seen as dynamic community structures, i.e., sets of individuals who strongly and repeatedly interact. In this paper, we propose a constraint-based mining approach to uncover evolving patterns. We propose to mine dense and isolated subgraphs defined by two user-parameterized constraints. The temporal evolution of such patterns is captured by associating a temporal event type to each identified subgraph. We consider five basic temporal events: The formation, dissolution, growth, diminution and stability of subgraphs from one time stamp to the next. We propose an algorithm that finds such subgraphs in a time series of graphs processed incrementally. The extraction is feasible due to efficient patterns and data pruning strategies. We demonstrate the applicability of our method on several real-world dynamic graphs and extract meaningful evolving communities.","PeriodicalId":247645,"journal":{"name":"2009 Ninth IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116517813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}