2012 IEEE 28th International Conference on Data Engineering最新文献

英文中文

Accountability and Trust in Cooperative Information Systems 合作性信息系统中的问责制和信任

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.152

P. Druschel

Summary form only given. Cooperation and trust play an increasingly important role in today's information systems. For instance, peer-to-peer systems like BitTorrent, Sopcast and Skype are powered by resource contributions from participating users, federated systems like the Internet have to respect the interests, policies and laws of participating organizations and countries; in the Cloud, users entrust their data and computation to third-part infrastructure. In this talk, we consider accountability as a way to facilitate transparency and trust in cooperative systems. We look at practical techniques to account for the integrity of distributed, cooperative computations, and look at some of the difficulties and open problems in accountability.

只提供摘要形式。在当今的信息系统中，合作与信任扮演着越来越重要的角色。例如，像BitTorrent、Sopcast和Skype这样的点对点系统是由参与用户的资源贡献驱动的，像互联网这样的联邦系统必须尊重参与组织和国家的利益、政策和法律;在云中，用户将数据和计算委托给第三方基础设施。在这次演讲中，我们认为问责制是促进合作系统透明度和信任的一种方式。我们研究了一些实用的技术来解释分布式、协作计算的完整性，并研究了问责制中的一些困难和开放问题。

引用次数: 2

Fuzzy Joins Using MapReduce 模糊连接使用MapReduce

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.66

F. Afrati, A. Sarma, David Menestrina, Aditya G. Parameswaran, J. Ullman

Fuzzy/similarity joins have been widely studied in the research community and extensively used in real-world applications. This paper proposes and evaluates several algorithms for finding all pairs of elements from an input set that meet a similarity threshold. The computation model is a single MapReduce job. Because we allow only one MapReduce round, the Reduce function must be designed so a given output pair is produced by only one task, for many algorithms, satisfying this condition is one of the biggest challenges. We break the cost of an algorithm into three components: the execution cost of the mappers, the execution cost of the reducers, and the communication cost from the mappers to reducers. The algorithms are presented first in terms of Hamming distance, but extensions to edit distance and Jaccard distance are shown as well. We find that there are many different approaches to the similarity-join problem using MapReduce, and none dominates the others when both communication and reducer costs are considered. Our cost analyses enable applications to pick the optimal algorithm based on their communication, memory, and cluster requirements.

模糊/相似连接在研究界得到了广泛的研究，并广泛应用于实际应用中。本文提出并评估了几种从满足相似性阈值的输入集合中寻找所有元素对的算法。计算模型是单个MapReduce作业。因为我们只允许一次MapReduce循环，所以Reduce函数必须被设计成只由一个任务产生给定的输出对，对于许多算法来说，满足这个条件是最大的挑战之一。我们将一个算法的成本分为三个部分:映射器的执行成本，简化器的执行成本，以及从映射器到简化器的通信成本。这些算法首先从汉明距离的角度提出，然后扩展到编辑距离和Jaccard距离。我们发现使用MapReduce有许多不同的方法来解决相似连接问题，当考虑通信和reducer成本时，没有一种方法优于其他方法。我们的成本分析使应用程序能够根据其通信、内存和集群需求选择最佳算法。

{"title":"Fuzzy Joins Using MapReduce","authors":"F. Afrati, A. Sarma, David Menestrina, Aditya G. Parameswaran, J. Ullman","doi":"10.1109/ICDE.2012.66","DOIUrl":"https://doi.org/10.1109/ICDE.2012.66","url":null,"abstract":"Fuzzy/similarity joins have been widely studied in the research community and extensively used in real-world applications. This paper proposes and evaluates several algorithms for finding all pairs of elements from an input set that meet a similarity threshold. The computation model is a single MapReduce job. Because we allow only one MapReduce round, the Reduce function must be designed so a given output pair is produced by only one task, for many algorithms, satisfying this condition is one of the biggest challenges. We break the cost of an algorithm into three components: the execution cost of the mappers, the execution cost of the reducers, and the communication cost from the mappers to reducers. The algorithms are presented first in terms of Hamming distance, but extensions to edit distance and Jaccard distance are shown as well. We find that there are many different approaches to the similarity-join problem using MapReduce, and none dominates the others when both communication and reducer costs are considered. Our cost analyses enable applications to pick the optimal algorithm based on their communication, memory, and cluster requirements.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114954002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 106

A Dataset Search Engine for the Research Document Corpus 研究文档语料库的数据集搜索引擎

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.80

Meiyu Lu, S. Bangalore, Graham Cormode, Marios Hadjieleftheriou, D. Srivastava

A key step in validating a proposed idea or system is to evaluate over a suitable dataset. However, to this date there have been no useful tools for researchers to understand which datasets have been used for what purpose, or in what prior work. Instead, they have to manually browse through papers to find the suitable datasets and their corresponding URLs, which is laborious and inefficient. To better aid the dataset discovery process, and provide a better understanding of how and where datasets have been used, we propose a framework to effectively identify datasets within the scientific corpus. The key technical challenges are identification of datasets, and discovery of the association between a dataset and the URLs where they can be accessed. Based on this, we have built a user friendly web-based search interface for users to conveniently explore the dataset-paper relationships, and find relevant datasets and their properties.

验证提议的想法或系统的关键步骤是在合适的数据集上进行评估。然而，到目前为止，研究人员还没有有用的工具来了解哪些数据集被用于什么目的，或者在哪些先前的工作中使用。相反，他们必须手动浏览论文以找到合适的数据集及其相应的url，这既费力又低效。为了更好地帮助数据集发现过程，并更好地理解数据集的使用方式和位置，我们提出了一个框架来有效地识别科学语料库中的数据集。关键的技术挑战是数据集的识别，以及数据集与可访问它们的url之间的关联的发现。在此基础上，我们构建了一个用户友好的基于web的搜索界面，方便用户探索数据集与论文之间的关系，查找相关数据集及其属性。

引用次数: 16

Data3 -- A Kinect Interface for OLAP Using Complex Event Processing Data3——使用复杂事件处理的OLAP Kinect接口

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.131

Steffen Hirte, Andreas Seifert, S. Baumann, Daniel Klan, K. Sattler

Motion sensing input devices like Microsoft's Kinect offer an alternative to traditional computer input devices like keyboards and mouses. Daily new applications using this interface appear. Most of them implement their own gesture detection. In our demonstration we show a new approach using the data stream engine Andu IN. The gesture detection is done based on Andu IN's complex event processing functionality. This way we build a system that allows to define new and complex gestures on the basis of a declarative programming interface. On this basis our demonstration data3 provides a basic natural interaction OLAP interface for a sample star schema database using Microsoft's Kinect.

像微软Kinect这样的体感输入设备为键盘和鼠标等传统电脑输入设备提供了另一种选择。每天都会出现使用这个界面的新应用程序。它们中的大多数都实现了自己的手势检测。在我们的演示中，我们展示了一种使用数据流引擎Andu In的新方法。手势检测基于Andu IN的复杂事件处理功能完成。通过这种方式，我们构建了一个系统，允许在声明性编程接口的基础上定义新的和复杂的手势。在此基础上，我们的演示数据3为使用Microsoft的Kinect的示例星型模式数据库提供了一个基本的自然交互OLAP接口。

引用次数: 24

Three-Level Processing of Multiple Aggregate Continuous Queries 多聚合连续查询的三层处理

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.112

Shenoda Guirguis, M. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis

Aggregate Continuous Queries (ACQs) are both a very popular class of Continuous Queries (CQs) and also have a potentially high execution cost. As such, optimizing the processing of ACQs is imperative for Data Stream Management Systems (DSMSs) to reach their full potential in supporting (critical) monitoring applications. For multiple ACQs that vary in window specifications and pre-aggregation filters, existing multiple ACQs optimization schemes assume a processing model where each ACQ is computed as a final-aggregation of a sub-aggregation. In this paper, we propose a novel processing model for ACQs, called Tri Ops, with the goal of minimizing the repetition of operator execution at the sub-aggregation level. We also propose Tri Weave, a Tri Ops-aware multi-query optimizer. We analytically and experimentally demonstrate the performance gains of our proposed schemes which shows their superiority over alternative schemes. Finally, we generalize Tri Weave to incorporate the classical subsumption-based multi-query optimization techniques.

聚合连续查询(acq)是一种非常流行的连续查询(cq)，同时也具有潜在的高执行成本。因此，优化acq的处理对于数据流管理系统(dsm)在支持(关键)监控应用方面充分发挥其潜力至关重要。对于窗口规格和预聚合过滤器不同的多个ACQ，现有的多个ACQ优化方案假设一个处理模型，其中每个ACQ被计算为子聚合的最终聚合。在本文中，我们提出了一种新的acq处理模型，称为Tri Ops，其目标是在子聚合级别上最小化操作符的重复执行。我们还提出了Tri Weave，一个Tri ops感知的多查询优化器。我们通过分析和实验证明了我们提出的方案的性能增益，显示了它们比其他方案的优越性。最后，我们将Tri Weave推广到经典的基于包容的多查询优化技术。

引用次数: 33

A Meta-language for MDX Queries in eLog Business Solution eLog业务解决方案中用于MDX查询的元语言

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.100

S. Bergamaschi, Matteo Interlandi, Mario Longo, Laura Po, M. Vincini

The adoption of business intelligence technology in industries is growing rapidly. Business managers are not satisfied with ad hoc and static reports and they ask for more flexible and easy to use data analysis tools. Recently, application interfaces that expand the range of operations available to the user, hiding the underlying complexity, have been developed. The paper presents eLog, a business intelligence solution designed and developed in collaboration between the database group of the University of Modena and Reggio Emilia and eBilling, an Italian SME supplier of solutions for the design, production and automation of documentary processes for top Italian companies. eLog enables business managers to define OLAP reports by means of a web interface and to customize analysis indicators adopting a simple meta-language. The framework translates the user's reports into MDX queries and is able to automatically select the data cube suitable for each query. Over 140 medium and large companies have exploited the technological services of eBilling S.p.A. to manage their documents flows. In particular, eLog services have been used by the major media and telecommunications Italian companies and their foreign annex, such as Sky, Media set, H3G, Tim Brazil etc. The largest customer can provide up to 30 millions mail pieces within 6 months (about 200 GB of data in the relational DBMS). In a period of 18 months, eLog could reach 150 millions mail pieces (1 TB of data) to handle.

商业智能技术在各行业中的应用正在迅速增长。业务经理不满足于临时和静态的报告，他们要求更灵活和易于使用的数据分析工具。最近，已经开发了扩展用户可用操作范围、隐藏底层复杂性的应用程序接口。本文介绍了eLog，这是一种商业智能解决方案，由摩德纳大学数据库组和Reggio Emilia以及eBilling合作设计和开发，eBilling是一家意大利中小企业供应商，为意大利顶级公司提供设计、生产和自动化文档流程的解决方案。eLog允许业务管理人员通过web界面定义OLAP报表，并通过简单的元语言定制分析指标。该框架将用户的报告转换为MDX查询，并能够自动选择适合每个查询的数据多维数据集。超过140家大中型公司已经利用eBilling S.p.A.的技术服务来管理他们的文档流。特别是，eLog服务已被意大利主要媒体和电信公司及其国外子公司使用，如Sky, media set, H3G, Tim Brazil等。最大的客户可以在6个月内提供多达3000万封邮件(在关系DBMS中大约是200gb的数据)。在18个月的时间里，eLog日志管理系统可以处理1.5亿封邮件，相当于1tb的数据量。

{"title":"A Meta-language for MDX Queries in eLog Business Solution","authors":"S. Bergamaschi, Matteo Interlandi, Mario Longo, Laura Po, M. Vincini","doi":"10.1109/ICDE.2012.100","DOIUrl":"https://doi.org/10.1109/ICDE.2012.100","url":null,"abstract":"The adoption of business intelligence technology in industries is growing rapidly. Business managers are not satisfied with ad hoc and static reports and they ask for more flexible and easy to use data analysis tools. Recently, application interfaces that expand the range of operations available to the user, hiding the underlying complexity, have been developed. The paper presents eLog, a business intelligence solution designed and developed in collaboration between the database group of the University of Modena and Reggio Emilia and eBilling, an Italian SME supplier of solutions for the design, production and automation of documentary processes for top Italian companies. eLog enables business managers to define OLAP reports by means of a web interface and to customize analysis indicators adopting a simple meta-language. The framework translates the user's reports into MDX queries and is able to automatically select the data cube suitable for each query. Over 140 medium and large companies have exploited the technological services of eBilling S.p.A. to manage their documents flows. In particular, eLog services have been used by the major media and telecommunications Italian companies and their foreign annex, such as Sky, Media set, H3G, Tim Brazil etc. The largest customer can provide up to 30 millions mail pieces within 6 months (about 200 GB of data in the relational DBMS). In a period of 18 months, eLog could reach 150 millions mail pieces (1 TB of data) to handle.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128879788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Game-Theoretic Approach for High-Assurance of Data Trustworthiness in Sensor Networks 传感器网络数据可信度高保证的博弈论方法

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.78

Hyo-Sang Lim, Gabriel Ghinita, E. Bertino, Murat Kantarcioglu

Sensor networks are being increasingly deployed in many application domains ranging from environment monitoring to supervising critical infrastructure systems (e.g., the power grid). Due to their ability to continuously collect large amounts of data, sensor networks represent a key component in decisionmaking, enabling timely situation assessment and response. However, sensors deployed in hostile environments may be subject to attacks by adversaries who intend to inject false data into the system. In this context, data trustworthiness is an important concern, as false readings may result in wrong decisions with serious consequences (e.g., large-scale power outages). To defend against this threat, it is important to establish trust levels for sensor nodes and adjust node trustworthiness scores to account for malicious interferences. In this paper, we develop a game-theoretic defense strategy to protect sensor nodes from attacks and to guarantee a high level of trustworthiness for sensed data. We use a discrete time model, and we consider that there is a limited attack budget that bounds the capability of the attacker in each round. The defense strategy objective is to ensure that sufficient sensor nodes are protected in each round such that the discrepancy between the value accepted and the truthful sensed value is below a certain threshold. We model the attack-defense interaction as a Stackelberg game, and we derive the Nash equilibrium condition that is sufficient to ensure that the sensed data are truthful within a nominal error bound. We implement a prototype of the proposed strategy and we show through extensive experiments that our solution provides an effective and efficient way of protecting sensor networks from attacks.

传感器网络越来越多地部署在许多应用领域，从环境监测到监督关键基础设施系统(例如，电网)。由于能够持续收集大量数据，传感器网络是决策的关键组成部分，能够及时评估和响应情况。然而，部署在敌对环境中的传感器可能会受到攻击者的攻击，他们打算向系统注入虚假数据。在这种情况下，数据的可信度是一个重要的问题，因为错误的读数可能导致错误的决策，并带来严重的后果(例如，大规模停电)。为了防御这种威胁，重要的是为传感器节点建立信任级别并调整节点可信度评分以考虑恶意干扰。在本文中，我们开发了一种博弈论防御策略来保护传感器节点免受攻击，并保证感测数据的高可信度。我们使用离散时间模型，并且我们认为存在一个有限的攻击预算来限制攻击者在每一轮中的能力。防御策略的目标是保证每一轮都有足够的传感器节点受到保护，使接受值与真实感知值之间的差异低于某一阈值。我们将攻击-防御交互建模为Stackelberg博弈，并推导出纳什均衡条件，该条件足以确保感知数据在名义误差范围内是真实的。我们实现了所提出策略的原型，并通过广泛的实验表明，我们的解决方案提供了一种有效且高效的保护传感器网络免受攻击的方法。

{"title":"A Game-Theoretic Approach for High-Assurance of Data Trustworthiness in Sensor Networks","authors":"Hyo-Sang Lim, Gabriel Ghinita, E. Bertino, Murat Kantarcioglu","doi":"10.1109/ICDE.2012.78","DOIUrl":"https://doi.org/10.1109/ICDE.2012.78","url":null,"abstract":"Sensor networks are being increasingly deployed in many application domains ranging from environment monitoring to supervising critical infrastructure systems (e.g., the power grid). Due to their ability to continuously collect large amounts of data, sensor networks represent a key component in decisionmaking, enabling timely situation assessment and response. However, sensors deployed in hostile environments may be subject to attacks by adversaries who intend to inject false data into the system. In this context, data trustworthiness is an important concern, as false readings may result in wrong decisions with serious consequences (e.g., large-scale power outages). To defend against this threat, it is important to establish trust levels for sensor nodes and adjust node trustworthiness scores to account for malicious interferences. In this paper, we develop a game-theoretic defense strategy to protect sensor nodes from attacks and to guarantee a high level of trustworthiness for sensed data. We use a discrete time model, and we consider that there is a limited attack budget that bounds the capability of the attacker in each round. The defense strategy objective is to ensure that sufficient sensor nodes are protected in each round such that the discrepancy between the value accepted and the truthful sensed value is below a certain threshold. We model the attack-defense interaction as a Stackelberg game, and we derive the Nash equilibrium condition that is sufficient to ensure that the sensed data are truthful within a nominal error bound. We implement a prototype of the proposed strategy and we show through extensive experiments that our solution provides an effective and efficient way of protecting sensor networks from attacks.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128515715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

Community Detection with Edge Content in Social Media Networks 基于边缘内容的社交媒体网络社区检测

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.77

Guo-Jun Qi, C. Aggarwal, Thomas S. Huang

The problem of community detection in social media has been widely studied in the social networking community in the context of the structure of the underlying graphs. Most community detection algorithms use the links between the nodes in order to determine the dense regions in the graph. These dense regions are the communities of social media in the graph. Such methods are typically based purely on the linkage structure of the underlying social media network. However, in many recent applications, edge content is available in order to provide better supervision to the community detection process. Many natural representations of edges in social interactions such as shared images and videos, user tags and comments are naturally associated with content on the edges. While some work has been done on utilizing node content for community detection, the presence of edge content presents unprecedented opportunities and flexibility for the community detection process. We will show that such edge content can be leveraged in order to greatly improve the effectiveness of the community detection process in social media networks. We present experimental results illustrating the effectiveness of our approach.

社交媒体中的社区检测问题在底层图结构的背景下得到了广泛的研究。大多数社区检测算法使用节点之间的链接来确定图中的密集区域。这些密集的区域就是图中的社交媒体社区。这些方法通常纯粹基于底层社交媒体网络的链接结构。然而，在最近的许多应用中，边缘内容是可用的，以便为社区检测过程提供更好的监督。在社交互动中，许多边缘的自然表示，如共享的图像和视频、用户标签和评论，都自然地与边缘上的内容相关联。虽然在利用节点内容进行社区检测方面已经做了一些工作，但边缘内容的存在为社区检测过程提供了前所未有的机会和灵活性。我们将展示，可以利用这些边缘内容，以大大提高社交媒体网络中社区检测过程的有效性。我们给出的实验结果说明了我们的方法的有效性。

{"title":"Community Detection with Edge Content in Social Media Networks","authors":"Guo-Jun Qi, C. Aggarwal, Thomas S. Huang","doi":"10.1109/ICDE.2012.77","DOIUrl":"https://doi.org/10.1109/ICDE.2012.77","url":null,"abstract":"The problem of community detection in social media has been widely studied in the social networking community in the context of the structure of the underlying graphs. Most community detection algorithms use the links between the nodes in order to determine the dense regions in the graph. These dense regions are the communities of social media in the graph. Such methods are typically based purely on the linkage structure of the underlying social media network. However, in many recent applications, edge content is available in order to provide better supervision to the community detection process. Many natural representations of edges in social interactions such as shared images and videos, user tags and comments are naturally associated with content on the edges. While some work has been done on utilizing node content for community detection, the presence of edge content presents unprecedented opportunities and flexibility for the community detection process. We will show that such edge content can be leveraged in order to greatly improve the effectiveness of the community detection process in social media networks. We present experimental results illustrating the effectiveness of our approach.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"7 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116844635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 160

Boolean Matrix Decomposition Problem: Theory, Variations and Applications to Data Engineering 布尔矩阵分解问题:理论、变化及其在数据工程中的应用

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.144

Jaideep Vaidya

With the ubiquitous nature and sheer scale of data collection, the problem of data summarization is most critical for effective data management. Classical matrix decomposition techniques have often been used for this purpose, and have been the subject of much study. In recent years, several other forms of decomposition, including Boolean Matrix Decomposition have become of significant practical interest. Since much of the data collected is categorical in nature, it can be viewed in terms of a Boolean matrix. Boolean matrix decomposition (BMD), wherein a boolean matrix is expressed as a product of two Boolean matrices, can be used to provide concise and interpretable representations of Boolean data sets. The decomposed matrices give the set of meaningful concepts and their combination which can be used to reconstruct the original data. Such decompositions are useful in a number of application domains including role engineering, text mining as well as knowledge discovery from databases. In this seminar, we look at the theory underlying the BMD problem, study some of its variants and solutions, and examine different practical applications.

由于数据收集的普遍性和庞大规模，数据汇总问题对于有效的数据管理至关重要。经典的矩阵分解技术经常用于此目的，并已成为许多研究的主题。近年来，其他几种形式的分解，包括布尔矩阵分解，已经成为重要的实际兴趣。由于收集的大部分数据本质上是分类的，因此可以根据布尔矩阵来查看。布尔矩阵分解(BMD)，其中布尔矩阵表示为两个布尔矩阵的乘积，可用于提供布尔数据集的简明和可解释的表示。分解后的矩阵给出了一组有意义的概念及其组合，可以用来重构原始数据。这种分解在许多应用领域都很有用，包括角色工程、文本挖掘以及从数据库中发现知识。在本次研讨会中，我们将探讨BMD问题的理论基础，研究其一些变体和解决方案，并研究不同的实际应用。

{"title":"Boolean Matrix Decomposition Problem: Theory, Variations and Applications to Data Engineering","authors":"Jaideep Vaidya","doi":"10.1109/ICDE.2012.144","DOIUrl":"https://doi.org/10.1109/ICDE.2012.144","url":null,"abstract":"With the ubiquitous nature and sheer scale of data collection, the problem of data summarization is most critical for effective data management. Classical matrix decomposition techniques have often been used for this purpose, and have been the subject of much study. In recent years, several other forms of decomposition, including Boolean Matrix Decomposition have become of significant practical interest. Since much of the data collected is categorical in nature, it can be viewed in terms of a Boolean matrix. Boolean matrix decomposition (BMD), wherein a boolean matrix is expressed as a product of two Boolean matrices, can be used to provide concise and interpretable representations of Boolean data sets. The decomposed matrices give the set of meaningful concepts and their combination which can be used to reconstruct the original data. Such decompositions are useful in a number of application domains including role engineering, text mining as well as knowledge discovery from databases. In this seminar, we look at the theory underlying the BMD problem, study some of its variants and solutions, and examine different practical applications.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115580236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Extracting Analyzing and Visualizing Triangle K-Core Motifs within Networks 网络中三角形k核基元的提取、分析和可视化

2012 IEEE 28th International Conference on Data Engineering

Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.35

Yang Zhang, S. Parthasarathy

Cliques are topological structures that usually provide important information for understanding the structure of a graph or network. However, detecting and extracting cliques efficiently is known to be very hard. In this paper, we define and introduce the notion of a Triangle K-Core, a simpler topological structure and one that is more tractable and can moreover be used as a proxy for extracting clique-like structure from large graphs. Based on this definition we first develop a localized algorithm for extracting Triangle K-Cores from large graphs. Subsequently we extend the simple algorithm to accommodate dynamic graphs (where edges can be dynamically added and deleted). Finally, we extend the basic definition to support various template pattern cliques with applications to network visualization and event detection on graphs and networks. Our empirical results reveal the efficiency and efficacy of the proposed methods on many real world datasets.

团是一种拓扑结构，通常为理解图或网络的结构提供重要的信息。然而，有效地检测和提取派系是非常困难的。在本文中，我们定义并引入了三角形k核的概念，这是一种更简单的拓扑结构，更易于处理，并且可以用作从大型图中提取团状结构的代理。基于这个定义，我们首先开发了一种从大图中提取三角形k核的局部算法。随后，我们扩展了简单的算法以适应动态图(其中可以动态添加和删除边)。最后，我们扩展了基本定义，以支持各种模板模式团，并将其应用于图和网络的网络可视化和事件检测。我们的实证结果揭示了所提出的方法在许多真实世界数据集上的效率和功效。

引用次数: 139

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE 28th International Conference on Data Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀