Summary form only given. Cooperation and trust play an increasingly important role in today's information systems. For instance, peer-to-peer systems like BitTorrent, Sopcast and Skype are powered by resource contributions from participating users, federated systems like the Internet have to respect the interests, policies and laws of participating organizations and countries; in the Cloud, users entrust their data and computation to third-part infrastructure. In this talk, we consider accountability as a way to facilitate transparency and trust in cooperative systems. We look at practical techniques to account for the integrity of distributed, cooperative computations, and look at some of the difficulties and open problems in accountability.
{"title":"Accountability and Trust in Cooperative Information Systems","authors":"P. Druschel","doi":"10.1109/ICDE.2012.152","DOIUrl":"https://doi.org/10.1109/ICDE.2012.152","url":null,"abstract":"Summary form only given. Cooperation and trust play an increasingly important role in today's information systems. For instance, peer-to-peer systems like BitTorrent, Sopcast and Skype are powered by resource contributions from participating users, federated systems like the Internet have to respect the interests, policies and laws of participating organizations and countries; in the Cloud, users entrust their data and computation to third-part infrastructure. In this talk, we consider accountability as a way to facilitate transparency and trust in cooperative systems. We look at practical techniques to account for the integrity of distributed, cooperative computations, and look at some of the difficulties and open problems in accountability.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124784481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Afrati, A. Sarma, David Menestrina, Aditya G. Parameswaran, J. Ullman
Fuzzy/similarity joins have been widely studied in the research community and extensively used in real-world applications. This paper proposes and evaluates several algorithms for finding all pairs of elements from an input set that meet a similarity threshold. The computation model is a single MapReduce job. Because we allow only one MapReduce round, the Reduce function must be designed so a given output pair is produced by only one task, for many algorithms, satisfying this condition is one of the biggest challenges. We break the cost of an algorithm into three components: the execution cost of the mappers, the execution cost of the reducers, and the communication cost from the mappers to reducers. The algorithms are presented first in terms of Hamming distance, but extensions to edit distance and Jaccard distance are shown as well. We find that there are many different approaches to the similarity-join problem using MapReduce, and none dominates the others when both communication and reducer costs are considered. Our cost analyses enable applications to pick the optimal algorithm based on their communication, memory, and cluster requirements.
{"title":"Fuzzy Joins Using MapReduce","authors":"F. Afrati, A. Sarma, David Menestrina, Aditya G. Parameswaran, J. Ullman","doi":"10.1109/ICDE.2012.66","DOIUrl":"https://doi.org/10.1109/ICDE.2012.66","url":null,"abstract":"Fuzzy/similarity joins have been widely studied in the research community and extensively used in real-world applications. This paper proposes and evaluates several algorithms for finding all pairs of elements from an input set that meet a similarity threshold. The computation model is a single MapReduce job. Because we allow only one MapReduce round, the Reduce function must be designed so a given output pair is produced by only one task, for many algorithms, satisfying this condition is one of the biggest challenges. We break the cost of an algorithm into three components: the execution cost of the mappers, the execution cost of the reducers, and the communication cost from the mappers to reducers. The algorithms are presented first in terms of Hamming distance, but extensions to edit distance and Jaccard distance are shown as well. We find that there are many different approaches to the similarity-join problem using MapReduce, and none dominates the others when both communication and reducer costs are considered. Our cost analyses enable applications to pick the optimal algorithm based on their communication, memory, and cluster requirements.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114954002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meiyu Lu, S. Bangalore, Graham Cormode, Marios Hadjieleftheriou, D. Srivastava
A key step in validating a proposed idea or system is to evaluate over a suitable dataset. However, to this date there have been no useful tools for researchers to understand which datasets have been used for what purpose, or in what prior work. Instead, they have to manually browse through papers to find the suitable datasets and their corresponding URLs, which is laborious and inefficient. To better aid the dataset discovery process, and provide a better understanding of how and where datasets have been used, we propose a framework to effectively identify datasets within the scientific corpus. The key technical challenges are identification of datasets, and discovery of the association between a dataset and the URLs where they can be accessed. Based on this, we have built a user friendly web-based search interface for users to conveniently explore the dataset-paper relationships, and find relevant datasets and their properties.
{"title":"A Dataset Search Engine for the Research Document Corpus","authors":"Meiyu Lu, S. Bangalore, Graham Cormode, Marios Hadjieleftheriou, D. Srivastava","doi":"10.1109/ICDE.2012.80","DOIUrl":"https://doi.org/10.1109/ICDE.2012.80","url":null,"abstract":"A key step in validating a proposed idea or system is to evaluate over a suitable dataset. However, to this date there have been no useful tools for researchers to understand which datasets have been used for what purpose, or in what prior work. Instead, they have to manually browse through papers to find the suitable datasets and their corresponding URLs, which is laborious and inefficient. To better aid the dataset discovery process, and provide a better understanding of how and where datasets have been used, we propose a framework to effectively identify datasets within the scientific corpus. The key technical challenges are identification of datasets, and discovery of the association between a dataset and the URLs where they can be accessed. Based on this, we have built a user friendly web-based search interface for users to conveniently explore the dataset-paper relationships, and find relevant datasets and their properties.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123966010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Steffen Hirte, Andreas Seifert, S. Baumann, Daniel Klan, K. Sattler
Motion sensing input devices like Microsoft's Kinect offer an alternative to traditional computer input devices like keyboards and mouses. Daily new applications using this interface appear. Most of them implement their own gesture detection. In our demonstration we show a new approach using the data stream engine Andu IN. The gesture detection is done based on Andu IN's complex event processing functionality. This way we build a system that allows to define new and complex gestures on the basis of a declarative programming interface. On this basis our demonstration data3 provides a basic natural interaction OLAP interface for a sample star schema database using Microsoft's Kinect.
{"title":"Data3 -- A Kinect Interface for OLAP Using Complex Event Processing","authors":"Steffen Hirte, Andreas Seifert, S. Baumann, Daniel Klan, K. Sattler","doi":"10.1109/ICDE.2012.131","DOIUrl":"https://doi.org/10.1109/ICDE.2012.131","url":null,"abstract":"Motion sensing input devices like Microsoft's Kinect offer an alternative to traditional computer input devices like keyboards and mouses. Daily new applications using this interface appear. Most of them implement their own gesture detection. In our demonstration we show a new approach using the data stream engine Andu IN. The gesture detection is done based on Andu IN's complex event processing functionality. This way we build a system that allows to define new and complex gestures on the basis of a declarative programming interface. On this basis our demonstration data3 provides a basic natural interaction OLAP interface for a sample star schema database using Microsoft's Kinect.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125264986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shenoda Guirguis, M. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis
Aggregate Continuous Queries (ACQs) are both a very popular class of Continuous Queries (CQs) and also have a potentially high execution cost. As such, optimizing the processing of ACQs is imperative for Data Stream Management Systems (DSMSs) to reach their full potential in supporting (critical) monitoring applications. For multiple ACQs that vary in window specifications and pre-aggregation filters, existing multiple ACQs optimization schemes assume a processing model where each ACQ is computed as a final-aggregation of a sub-aggregation. In this paper, we propose a novel processing model for ACQs, called Tri Ops, with the goal of minimizing the repetition of operator execution at the sub-aggregation level. We also propose Tri Weave, a Tri Ops-aware multi-query optimizer. We analytically and experimentally demonstrate the performance gains of our proposed schemes which shows their superiority over alternative schemes. Finally, we generalize Tri Weave to incorporate the classical subsumption-based multi-query optimization techniques.
{"title":"Three-Level Processing of Multiple Aggregate Continuous Queries","authors":"Shenoda Guirguis, M. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis","doi":"10.1109/ICDE.2012.112","DOIUrl":"https://doi.org/10.1109/ICDE.2012.112","url":null,"abstract":"Aggregate Continuous Queries (ACQs) are both a very popular class of Continuous Queries (CQs) and also have a potentially high execution cost. As such, optimizing the processing of ACQs is imperative for Data Stream Management Systems (DSMSs) to reach their full potential in supporting (critical) monitoring applications. For multiple ACQs that vary in window specifications and pre-aggregation filters, existing multiple ACQs optimization schemes assume a processing model where each ACQ is computed as a final-aggregation of a sub-aggregation. In this paper, we propose a novel processing model for ACQs, called Tri Ops, with the goal of minimizing the repetition of operator execution at the sub-aggregation level. We also propose Tri Weave, a Tri Ops-aware multi-query optimizer. We analytically and experimentally demonstrate the performance gains of our proposed schemes which shows their superiority over alternative schemes. Finally, we generalize Tri Weave to incorporate the classical subsumption-based multi-query optimization techniques.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120959231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Bergamaschi, Matteo Interlandi, Mario Longo, Laura Po, M. Vincini
The adoption of business intelligence technology in industries is growing rapidly. Business managers are not satisfied with ad hoc and static reports and they ask for more flexible and easy to use data analysis tools. Recently, application interfaces that expand the range of operations available to the user, hiding the underlying complexity, have been developed. The paper presents eLog, a business intelligence solution designed and developed in collaboration between the database group of the University of Modena and Reggio Emilia and eBilling, an Italian SME supplier of solutions for the design, production and automation of documentary processes for top Italian companies. eLog enables business managers to define OLAP reports by means of a web interface and to customize analysis indicators adopting a simple meta-language. The framework translates the user's reports into MDX queries and is able to automatically select the data cube suitable for each query. Over 140 medium and large companies have exploited the technological services of eBilling S.p.A. to manage their documents flows. In particular, eLog services have been used by the major media and telecommunications Italian companies and their foreign annex, such as Sky, Media set, H3G, Tim Brazil etc. The largest customer can provide up to 30 millions mail pieces within 6 months (about 200 GB of data in the relational DBMS). In a period of 18 months, eLog could reach 150 millions mail pieces (1 TB of data) to handle.
商业智能技术在各行业中的应用正在迅速增长。业务经理不满足于临时和静态的报告,他们要求更灵活和易于使用的数据分析工具。最近,已经开发了扩展用户可用操作范围、隐藏底层复杂性的应用程序接口。本文介绍了eLog,这是一种商业智能解决方案,由摩德纳大学数据库组和Reggio Emilia以及eBilling合作设计和开发,eBilling是一家意大利中小企业供应商,为意大利顶级公司提供设计、生产和自动化文档流程的解决方案。eLog允许业务管理人员通过web界面定义OLAP报表,并通过简单的元语言定制分析指标。该框架将用户的报告转换为MDX查询,并能够自动选择适合每个查询的数据多维数据集。超过140家大中型公司已经利用eBilling S.p.A.的技术服务来管理他们的文档流。特别是,eLog服务已被意大利主要媒体和电信公司及其国外子公司使用,如Sky, media set, H3G, Tim Brazil等。最大的客户可以在6个月内提供多达3000万封邮件(在关系DBMS中大约是200gb的数据)。在18个月的时间里,eLog日志管理系统可以处理1.5亿封邮件,相当于1tb的数据量。
{"title":"A Meta-language for MDX Queries in eLog Business Solution","authors":"S. Bergamaschi, Matteo Interlandi, Mario Longo, Laura Po, M. Vincini","doi":"10.1109/ICDE.2012.100","DOIUrl":"https://doi.org/10.1109/ICDE.2012.100","url":null,"abstract":"The adoption of business intelligence technology in industries is growing rapidly. Business managers are not satisfied with ad hoc and static reports and they ask for more flexible and easy to use data analysis tools. Recently, application interfaces that expand the range of operations available to the user, hiding the underlying complexity, have been developed. The paper presents eLog, a business intelligence solution designed and developed in collaboration between the database group of the University of Modena and Reggio Emilia and eBilling, an Italian SME supplier of solutions for the design, production and automation of documentary processes for top Italian companies. eLog enables business managers to define OLAP reports by means of a web interface and to customize analysis indicators adopting a simple meta-language. The framework translates the user's reports into MDX queries and is able to automatically select the data cube suitable for each query. Over 140 medium and large companies have exploited the technological services of eBilling S.p.A. to manage their documents flows. In particular, eLog services have been used by the major media and telecommunications Italian companies and their foreign annex, such as Sky, Media set, H3G, Tim Brazil etc. The largest customer can provide up to 30 millions mail pieces within 6 months (about 200 GB of data in the relational DBMS). In a period of 18 months, eLog could reach 150 millions mail pieces (1 TB of data) to handle.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128879788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyo-Sang Lim, Gabriel Ghinita, E. Bertino, Murat Kantarcioglu
Sensor networks are being increasingly deployed in many application domains ranging from environment monitoring to supervising critical infrastructure systems (e.g., the power grid). Due to their ability to continuously collect large amounts of data, sensor networks represent a key component in decisionmaking, enabling timely situation assessment and response. However, sensors deployed in hostile environments may be subject to attacks by adversaries who intend to inject false data into the system. In this context, data trustworthiness is an important concern, as false readings may result in wrong decisions with serious consequences (e.g., large-scale power outages). To defend against this threat, it is important to establish trust levels for sensor nodes and adjust node trustworthiness scores to account for malicious interferences. In this paper, we develop a game-theoretic defense strategy to protect sensor nodes from attacks and to guarantee a high level of trustworthiness for sensed data. We use a discrete time model, and we consider that there is a limited attack budget that bounds the capability of the attacker in each round. The defense strategy objective is to ensure that sufficient sensor nodes are protected in each round such that the discrepancy between the value accepted and the truthful sensed value is below a certain threshold. We model the attack-defense interaction as a Stackelberg game, and we derive the Nash equilibrium condition that is sufficient to ensure that the sensed data are truthful within a nominal error bound. We implement a prototype of the proposed strategy and we show through extensive experiments that our solution provides an effective and efficient way of protecting sensor networks from attacks.
{"title":"A Game-Theoretic Approach for High-Assurance of Data Trustworthiness in Sensor Networks","authors":"Hyo-Sang Lim, Gabriel Ghinita, E. Bertino, Murat Kantarcioglu","doi":"10.1109/ICDE.2012.78","DOIUrl":"https://doi.org/10.1109/ICDE.2012.78","url":null,"abstract":"Sensor networks are being increasingly deployed in many application domains ranging from environment monitoring to supervising critical infrastructure systems (e.g., the power grid). Due to their ability to continuously collect large amounts of data, sensor networks represent a key component in decisionmaking, enabling timely situation assessment and response. However, sensors deployed in hostile environments may be subject to attacks by adversaries who intend to inject false data into the system. In this context, data trustworthiness is an important concern, as false readings may result in wrong decisions with serious consequences (e.g., large-scale power outages). To defend against this threat, it is important to establish trust levels for sensor nodes and adjust node trustworthiness scores to account for malicious interferences. In this paper, we develop a game-theoretic defense strategy to protect sensor nodes from attacks and to guarantee a high level of trustworthiness for sensed data. We use a discrete time model, and we consider that there is a limited attack budget that bounds the capability of the attacker in each round. The defense strategy objective is to ensure that sufficient sensor nodes are protected in each round such that the discrepancy between the value accepted and the truthful sensed value is below a certain threshold. We model the attack-defense interaction as a Stackelberg game, and we derive the Nash equilibrium condition that is sufficient to ensure that the sensed data are truthful within a nominal error bound. We implement a prototype of the proposed strategy and we show through extensive experiments that our solution provides an effective and efficient way of protecting sensor networks from attacks.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128515715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The problem of community detection in social media has been widely studied in the social networking community in the context of the structure of the underlying graphs. Most community detection algorithms use the links between the nodes in order to determine the dense regions in the graph. These dense regions are the communities of social media in the graph. Such methods are typically based purely on the linkage structure of the underlying social media network. However, in many recent applications, edge content is available in order to provide better supervision to the community detection process. Many natural representations of edges in social interactions such as shared images and videos, user tags and comments are naturally associated with content on the edges. While some work has been done on utilizing node content for community detection, the presence of edge content presents unprecedented opportunities and flexibility for the community detection process. We will show that such edge content can be leveraged in order to greatly improve the effectiveness of the community detection process in social media networks. We present experimental results illustrating the effectiveness of our approach.
{"title":"Community Detection with Edge Content in Social Media Networks","authors":"Guo-Jun Qi, C. Aggarwal, Thomas S. Huang","doi":"10.1109/ICDE.2012.77","DOIUrl":"https://doi.org/10.1109/ICDE.2012.77","url":null,"abstract":"The problem of community detection in social media has been widely studied in the social networking community in the context of the structure of the underlying graphs. Most community detection algorithms use the links between the nodes in order to determine the dense regions in the graph. These dense regions are the communities of social media in the graph. Such methods are typically based purely on the linkage structure of the underlying social media network. However, in many recent applications, edge content is available in order to provide better supervision to the community detection process. Many natural representations of edges in social interactions such as shared images and videos, user tags and comments are naturally associated with content on the edges. While some work has been done on utilizing node content for community detection, the presence of edge content presents unprecedented opportunities and flexibility for the community detection process. We will show that such edge content can be leveraged in order to greatly improve the effectiveness of the community detection process in social media networks. We present experimental results illustrating the effectiveness of our approach.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"7 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116844635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the ubiquitous nature and sheer scale of data collection, the problem of data summarization is most critical for effective data management. Classical matrix decomposition techniques have often been used for this purpose, and have been the subject of much study. In recent years, several other forms of decomposition, including Boolean Matrix Decomposition have become of significant practical interest. Since much of the data collected is categorical in nature, it can be viewed in terms of a Boolean matrix. Boolean matrix decomposition (BMD), wherein a boolean matrix is expressed as a product of two Boolean matrices, can be used to provide concise and interpretable representations of Boolean data sets. The decomposed matrices give the set of meaningful concepts and their combination which can be used to reconstruct the original data. Such decompositions are useful in a number of application domains including role engineering, text mining as well as knowledge discovery from databases. In this seminar, we look at the theory underlying the BMD problem, study some of its variants and solutions, and examine different practical applications.
{"title":"Boolean Matrix Decomposition Problem: Theory, Variations and Applications to Data Engineering","authors":"Jaideep Vaidya","doi":"10.1109/ICDE.2012.144","DOIUrl":"https://doi.org/10.1109/ICDE.2012.144","url":null,"abstract":"With the ubiquitous nature and sheer scale of data collection, the problem of data summarization is most critical for effective data management. Classical matrix decomposition techniques have often been used for this purpose, and have been the subject of much study. In recent years, several other forms of decomposition, including Boolean Matrix Decomposition have become of significant practical interest. Since much of the data collected is categorical in nature, it can be viewed in terms of a Boolean matrix. Boolean matrix decomposition (BMD), wherein a boolean matrix is expressed as a product of two Boolean matrices, can be used to provide concise and interpretable representations of Boolean data sets. The decomposed matrices give the set of meaningful concepts and their combination which can be used to reconstruct the original data. Such decompositions are useful in a number of application domains including role engineering, text mining as well as knowledge discovery from databases. In this seminar, we look at the theory underlying the BMD problem, study some of its variants and solutions, and examine different practical applications.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115580236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cliques are topological structures that usually provide important information for understanding the structure of a graph or network. However, detecting and extracting cliques efficiently is known to be very hard. In this paper, we define and introduce the notion of a Triangle K-Core, a simpler topological structure and one that is more tractable and can moreover be used as a proxy for extracting clique-like structure from large graphs. Based on this definition we first develop a localized algorithm for extracting Triangle K-Cores from large graphs. Subsequently we extend the simple algorithm to accommodate dynamic graphs (where edges can be dynamically added and deleted). Finally, we extend the basic definition to support various template pattern cliques with applications to network visualization and event detection on graphs and networks. Our empirical results reveal the efficiency and efficacy of the proposed methods on many real world datasets.
{"title":"Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks","authors":"Yang Zhang, S. Parthasarathy","doi":"10.1109/ICDE.2012.35","DOIUrl":"https://doi.org/10.1109/ICDE.2012.35","url":null,"abstract":"Cliques are topological structures that usually provide important information for understanding the structure of a graph or network. However, detecting and extracting cliques efficiently is known to be very hard. In this paper, we define and introduce the notion of a Triangle K-Core, a simpler topological structure and one that is more tractable and can moreover be used as a proxy for extracting clique-like structure from large graphs. Based on this definition we first develop a localized algorithm for extracting Triangle K-Cores from large graphs. Subsequently we extend the simple algorithm to accommodate dynamic graphs (where edges can be dynamically added and deleted). Finally, we extend the basic definition to support various template pattern cliques with applications to network visualization and event detection on graphs and networks. Our empirical results reveal the efficiency and efficacy of the proposed methods on many real world datasets.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114278773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}