首页 > 最新文献

2011 IEEE 27th International Conference on Data Engineering最新文献

英文 中文
General secure sensor aggregation in the presence of malicious nodes 存在恶意节点时的一般安全传感器聚合
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767849
Keith B. Frikken, Kyle Kauffman, Aaron Steele
Sensor networks have the potential to allow users to “query the physical world” [1] by querying the sensor nodes for an aggregate result. However, a concern with such aggregation is that a few corrupt nodes in the network may manipulate the results that the querier sees. There has been a substantial amount of work on providing integrity for sensor network computations. However to the best of our knowledge, this prior work has one or more of the following two limitations: i) the methods only work for a specific aggregation function, or ii) the methods do not consider adversaries whose goal is to prevent the base station and the querier from receiving the result. In this paper we present the first scheme that provides general aggregation for sensor networks in the presence of malicious adversaries. The generality of the scheme results from the ability to securely evaluate any algorithm in the streaming model of computation. The main idea of this paper is to convert the commonly used aggregation tree into an aggregation list, and a process for doing this conversion is also presented. This result is an interesting and important first step towards achieving the realization of general secure querying of a sensor network.
传感器网络有可能允许用户通过查询传感器节点获取聚合结果来“查询物理世界”[1]。然而,这种聚合的一个问题是,网络中一些损坏的节点可能会操纵查询器看到的结果。在为传感器网络计算提供完整性方面已经做了大量的工作。然而,据我们所知,这种先前的工作有以下两个限制中的一个或多个:i)方法只适用于特定的聚合函数,或者ii)方法不考虑以阻止基站和查询器接收结果为目标的对手。在本文中,我们提出了第一种方案,在存在恶意对手的情况下为传感器网络提供通用聚合。该方案的通用性来自于能够在计算流模型中安全地评估任何算法。本文的主要思想是将常用的聚合树转换为聚合列表,并给出了这种转换的过程。这一结果是实现传感器网络通用安全查询的重要一步。
{"title":"General secure sensor aggregation in the presence of malicious nodes","authors":"Keith B. Frikken, Kyle Kauffman, Aaron Steele","doi":"10.1109/ICDE.2011.5767849","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767849","url":null,"abstract":"Sensor networks have the potential to allow users to “query the physical world” [1] by querying the sensor nodes for an aggregate result. However, a concern with such aggregation is that a few corrupt nodes in the network may manipulate the results that the querier sees. There has been a substantial amount of work on providing integrity for sensor network computations. However to the best of our knowledge, this prior work has one or more of the following two limitations: i) the methods only work for a specific aggregation function, or ii) the methods do not consider adversaries whose goal is to prevent the base station and the querier from receiving the result. In this paper we present the first scheme that provides general aggregation for sensor networks in the presence of malicious adversaries. The generality of the scheme results from the ability to securely evaluate any algorithm in the streaming model of computation. The main idea of this paper is to convert the commonly used aggregation tree into an aggregation list, and a process for doing this conversion is also presented. This result is an interesting and important first step towards achieving the realization of general secure querying of a sensor network.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115420124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deriving probabilistic databases with inference ensembles 推导具有推理集成的概率数据库
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767854
Julia Stoyanovich, S. Davidson, T. Milo, V. Tannen
Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.
许多现实世界的应用程序处理不确定或丢失的数据,促使概率数据库领域的活动激增。先前工作的一个缺点是假设一个适当的概率模型,以及必要的概率分布,是给定的。我们通过提出一个框架来解决这个缺点,该框架用于从数据的完整部分学习一组推理集成,称为元规则半格,或MRSL。我们使用MRSL来推断缺失数据的概率分布,并通过实验证明,当每个元组缺少单个属性值时,可以实现高精度。接下来,我们提出了一种基于Gibbs抽样的推理算法,可以准确地预测多个缺失值的概率分布。我们还开发了一个优化,极大地提高了元组集合的多属性推理性能,同时保持了较高的准确性。最后,我们开发了一个实验框架来评估我们的方法的效率和准确性。
{"title":"Deriving probabilistic databases with inference ensembles","authors":"Julia Stoyanovich, S. Davidson, T. Milo, V. Tannen","doi":"10.1109/ICDE.2011.5767854","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767854","url":null,"abstract":"Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117057017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Non-metric similarity search problems in very large collections 超大集合中的非度量相似性搜索问题
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767955
B. Bustos, T. Skopal
This tutorial surveys domains employing non-metric functions for effective similarity search, and methods for efficient non-metric similarity search in very large collections.
本教程调查了使用非度量函数进行有效相似性搜索的领域,以及在非常大的集合中进行有效非度量相似性搜索的方法。
{"title":"Non-metric similarity search problems in very large collections","authors":"B. Bustos, T. Skopal","doi":"10.1109/ICDE.2011.5767955","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767955","url":null,"abstract":"This tutorial surveys domains employing non-metric functions for effective similarity search, and methods for efficient non-metric similarity search in very large collections.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127122678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Continuous monitoring of distance-based outliers over data streams 对数据流上基于距离的异常值进行持续监测
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767923
Maria Kontaki, A. Gounaris, A. Papadopoulos, K. Tsichlas, Y. Manolopoulos
Anomaly detection is considered an important data mining task, aiming at the discovery of elements (also known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria lead to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x. The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, run faster than previously proposed techniques and offer significant flexibility. Experiments performed on real-life as well as synthetic data sets verify our theoretical study.
异常检测被认为是一项重要的数据挖掘任务,旨在发现与预期情况有重大偏离的元素(也称为离群值)。更具体地说,给定一组对象,问题是返回明显偏离典型行为的可疑对象。与聚类的情况一样,应用不同的标准会导致异常值的不同定义。在这项工作中,我们专注于基于距离的异常值:如果距离x的距离不超过R,则对象x是异常值。当考虑基于流的环境时,该问题提出了重大挑战,其中数据连续到达,并且必须实时检测异常值。关于连续离群点检测问题的研究工作很少。然而,由于以下原因,这些建议都不符合现代基于流的应用程序的要求:(i)它们需要大量的存储开销,(ii)它们的效率有限,(iii)它们缺乏灵活性。在这项工作中,我们提出了基于滑动窗口的数据流连续离群监测的新算法。我们的技术能够减少所需的存储开销,比以前提出的技术运行得更快,并提供显著的灵活性。在现实生活和合成数据集上进行的实验验证了我们的理论研究。
{"title":"Continuous monitoring of distance-based outliers over data streams","authors":"Maria Kontaki, A. Gounaris, A. Papadopoulos, K. Tsichlas, Y. Manolopoulos","doi":"10.1109/ICDE.2011.5767923","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767923","url":null,"abstract":"Anomaly detection is considered an important data mining task, aiming at the discovery of elements (also known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria lead to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x. The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, run faster than previously proposed techniques and offer significant flexibility. Experiments performed on real-life as well as synthetic data sets verify our theoretical study.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116119019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 169
Playing games with databases 用数据库玩游戏
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767963
J. Gehrke
Scalability is a fundamental problem in the development of computer games and massively multiplayer online games (MMOs). Players always demand more — more polygons, more physics particles, more interesting AI behavior, more monsters, more simultaneous players and interactions, and larger virtual worlds.
可扩展性是计算机游戏和大型多人在线游戏(MMOs)开发中的一个基本问题。玩家总是要求更多——更多多边形、更多物理粒子、更有趣的AI行为、更多怪物、更多同时进行的玩家和互动,以及更大的虚拟世界。
{"title":"Playing games with databases","authors":"J. Gehrke","doi":"10.1109/ICDE.2011.5767963","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767963","url":null,"abstract":"Scalability is a fundamental problem in the development of computer games and massively multiplayer online games (MMOs). Players always demand more — more polygons, more physics particles, more interesting AI behavior, more monsters, more simultaneous players and interactions, and larger virtual worlds.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122495475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed data management in 2020? 2020年的分布式数据管理?
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767962
M. Tamer Özsu, P. Valduriez, S. Abiteboul, Bettina Kemme, R. Jiménez-Peris, B. Ooi
Work on distributed data management commenced shortly after the introduction of the relational model in the mid-1970's. 1970's and 1980's were very active periods for the development of distributed relational database technology, and claims were made that in the following ten years centralized databases will be an “antique curiosity” and most organizations will move toward distributed database managers [1]. That prediction has certainly become true, and all commercial DBMSs today are distributed.
在20世纪70年代中期引入关系模型后不久,分布式数据管理的工作就开始了。20世纪70年代和80年代是分布式关系数据库技术发展非常活跃的时期,有人声称,在接下来的十年里,集中式数据库将成为“古董珍品”,大多数组织将转向分布式数据库管理器[1]。这个预测已经成为现实,今天所有的商业dbms都是分布式的。
{"title":"Distributed data management in 2020?","authors":"M. Tamer Özsu, P. Valduriez, S. Abiteboul, Bettina Kemme, R. Jiménez-Peris, B. Ooi","doi":"10.1109/ICDE.2011.5767962","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767962","url":null,"abstract":"Work on distributed data management commenced shortly after the introduction of the relational model in the mid-1970's. 1970's and 1980's were very active periods for the development of distributed relational database technology, and claims were made that in the following ten years centralized databases will be an “antique curiosity” and most organizations will move toward distributed database managers [1]. That prediction has certainly become true, and all commercial DBMSs today are distributed.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121847213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ES2: A cloud data storage system for supporting both OLTP and OLAP ES2:同时支持OLTP和OLAP的云数据存储系统
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767881
Yu Cao, Chun Chen, Fei Guo, Dawei Jiang, Yuting Lin, B. Ooi, Hoang Tam Vo, Sai Wu, Quanqing Xu
Cloud computing represents a paradigm shift driven by the increasing demand of Web based applications for elastic, scalable and efficient system architectures that can efficiently support their ever-growing data volume and large-scale data analysis. A typical data management system has to deal with real-time updates by individual users, and as well as periodical large scale analytical processing, indexing, and data extraction. While such operations may take place in the same domain, the design and development of the systems have somehow evolved independently for transactional and periodical analytical processing. Such a system-level separation has resulted in problems such as data freshness as well as serious data storage redundancy. Ideally, it would be more efficient to apply ad-hoc analytical processing on the same data directly. However, to the best of our knowledge, such an approach has not been adopted in real implementation. Intrigued by such an observation, we have designed and implemented epiC, an elastic power-aware data-itensive Cloud platform for supporting both data intensive analytical operations (ref. as OLAP) and online transactions (ref. as OLTP). In this paper, we present ES2 - the elastic data storage system of epiC, which is designed to support both functionalities within the same storage. We present the system architecture and the functions of each system component, and experimental results which demonstrate the efficiency of the system.
云计算代表了一种范式转变,其驱动因素是基于Web的应用程序对弹性、可扩展和高效的系统架构的需求不断增长,这些系统架构可以有效地支持其不断增长的数据量和大规模数据分析。典型的数据管理系统既要处理单个用户的实时更新,又要定期进行大规模的分析处理、索引和数据提取。虽然这些操作可能发生在同一领域,但系统的设计和开发在某种程度上已经独立地发展为事务性和周期性的分析处理。这种系统级的分离导致了数据新鲜度和严重的数据存储冗余等问题。理想情况下,直接对相同的数据应用特别的分析处理会更有效。然而,据我们所知,在实际执行中并没有采用这种办法。受到这种观察结果的启发,我们设计并实现了epiC,这是一个弹性的功率感知数据密集型云平台,用于支持数据密集型分析操作(参考OLAP)和在线事务(参考OLTP)。在本文中,我们提出了ES2——epiC的弹性数据存储系统,它被设计为在同一存储中支持这两种功能。给出了系统的总体结构和各组成部分的功能,并通过实验验证了系统的有效性。
{"title":"ES2: A cloud data storage system for supporting both OLTP and OLAP","authors":"Yu Cao, Chun Chen, Fei Guo, Dawei Jiang, Yuting Lin, B. Ooi, Hoang Tam Vo, Sai Wu, Quanqing Xu","doi":"10.1109/ICDE.2011.5767881","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767881","url":null,"abstract":"Cloud computing represents a paradigm shift driven by the increasing demand of Web based applications for elastic, scalable and efficient system architectures that can efficiently support their ever-growing data volume and large-scale data analysis. A typical data management system has to deal with real-time updates by individual users, and as well as periodical large scale analytical processing, indexing, and data extraction. While such operations may take place in the same domain, the design and development of the systems have somehow evolved independently for transactional and periodical analytical processing. Such a system-level separation has resulted in problems such as data freshness as well as serious data storage redundancy. Ideally, it would be more efficient to apply ad-hoc analytical processing on the same data directly. However, to the best of our knowledge, such an approach has not been adopted in real implementation. Intrigued by such an observation, we have designed and implemented epiC, an elastic power-aware data-itensive Cloud platform for supporting both data intensive analytical operations (ref. as OLAP) and online transactions (ref. as OLTP). In this paper, we present ES2 - the elastic data storage system of epiC, which is designed to support both functionalities within the same storage. We present the system architecture and the functions of each system component, and experimental results which demonstrate the efficiency of the system.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122158401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 99
Advanced search, visualization and tagging of sensor metadata 传感器元数据的高级搜索、可视化和标记
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767943
Ioannis K. Paparrizos, Hoyoung Jeung, K. Aberer
As sensors continue to proliferate, the capabilities of effectively querying not only sensor data but also its metadata becomes important in a wide range of applications. This paper demonstrates a search system that utilizes various techniques and tools for querying sensor metadata and visualizing the results. Our system provides an easy-to-use query interface, built upon semantic technologies where users can freely store and query their metadata. Going beyond basic keyword search, the system provides a variety of advanced functionalities tailored for sensor metadata search; ordering search results according to our ranking mechanism based on the PageRank algorithm, recommending pages that contain relevant metadata information to given search conditions, presenting search results using various visualization tools, and offering dynamic hypergraphs and tag clouds of metadata. The system has been running as a real application and its effectiveness has been proved by a number of users.
随着传感器的不断增加,有效查询传感器数据及其元数据的能力在广泛的应用中变得非常重要。本文演示了一个搜索系统,该系统利用各种技术和工具来查询传感器元数据并将结果可视化。我们的系统提供了一个易于使用的查询界面,建立在语义技术的基础上,用户可以自由存储和查询他们的元数据。除了基本的关键字搜索之外,该系统还提供了为传感器元数据搜索量身定制的各种高级功能;根据基于PageRank算法的排名机制对搜索结果排序,根据给定的搜索条件推荐包含相关元数据信息的页面,使用各种可视化工具呈现搜索结果,并提供元数据的动态超图和标记云。该系统已在实际应用中运行,并得到了大量用户的验证。
{"title":"Advanced search, visualization and tagging of sensor metadata","authors":"Ioannis K. Paparrizos, Hoyoung Jeung, K. Aberer","doi":"10.1109/ICDE.2011.5767943","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767943","url":null,"abstract":"As sensors continue to proliferate, the capabilities of effectively querying not only sensor data but also its metadata becomes important in a wide range of applications. This paper demonstrates a search system that utilizes various techniques and tools for querying sensor metadata and visualizing the results. Our system provides an easy-to-use query interface, built upon semantic technologies where users can freely store and query their metadata. Going beyond basic keyword search, the system provides a variety of advanced functionalities tailored for sensor metadata search; ordering search results according to our ranking mechanism based on the PageRank algorithm, recommending pages that contain relevant metadata information to given search conditions, presenting search results using various visualization tools, and offering dynamic hypergraphs and tag clouds of metadata. The system has been running as a real application and its effectiveness has been proved by a number of users.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114073013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient core decomposition in massive networks 大规模网络中的高效核心分解
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767911
James Cheng, Yiping Ke, Shumo Chu, M. Tamer Özsu
The k-core of a graph is the largest subgraph in which every vertex is connected to at least k other vertices within the subgraph. Core decomposition finds the k-core of the graph for every possible k. Past studies have shown important applications of core decomposition such as in the study of the properties of large networks (e.g., sustainability, connectivity, centrality, etc.), for solving NP-hard problems efficiently in real networks (e.g., maximum clique finding, densest subgraph approximation, etc.), and for large-scale network fingerprinting and visualization. The k-core is a well accepted concept partly because there exists a simple and efficient algorithm for core decomposition, by recursively removing the lowest degree vertices and their incident edges. However, this algorithm requires random access to the graph and hence assumes the entire graph can be kept in main memory. Nevertheless, real-world networks such as online social networks have become exceedingly large in recent years and still keep growing at a steady rate. In this paper, we propose the first external-memory algorithm for core decomposition in massive graphs. When the memory is large enough to hold the graph, our algorithm achieves comparable performance as the in-memory algorithm. When the graph is too large to be kept in the memory, our algorithm requires only O(kmax) scans of the graph, where kmax is the largest core number of the graph. We demonstrate the efficiency of our algorithm on real networks with up to 52.9 million vertices and 1.65 billion edges.
图的k核是最大的子图,其中每个顶点与子图内至少k个其他顶点相连。核心分解为每一个可能的k找到图的k核。过去的研究已经显示了核心分解的重要应用,例如在研究大型网络的性质(例如,可持续性,连通性,中心性等),在实际网络中有效解决np困难问题(例如,最大团查找,最密集子图近似等),以及大规模网络指纹和可视化。k核是一个被广泛接受的概念,部分原因是存在一种简单有效的核分解算法,通过递归地去除最低次顶点及其相关边。然而,该算法需要随机访问图,因此假设整个图可以保存在主内存中。然而,现实世界的网络,如在线社交网络,近年来已经变得非常庞大,并且仍在以稳定的速度增长。在本文中,我们提出了第一个用于大规模图核分解的外部存储器算法。当内存大到足以容纳图时,我们的算法达到与内存中算法相当的性能。当图太大而无法保存在内存中时,我们的算法只需要对图进行O(kmax)次扫描,其中kmax是图的最大核心数。我们在真实网络上展示了算法的效率,该网络有多达5290万个顶点和16.5亿个边。
{"title":"Efficient core decomposition in massive networks","authors":"James Cheng, Yiping Ke, Shumo Chu, M. Tamer Özsu","doi":"10.1109/ICDE.2011.5767911","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767911","url":null,"abstract":"The k-core of a graph is the largest subgraph in which every vertex is connected to at least k other vertices within the subgraph. Core decomposition finds the k-core of the graph for every possible k. Past studies have shown important applications of core decomposition such as in the study of the properties of large networks (e.g., sustainability, connectivity, centrality, etc.), for solving NP-hard problems efficiently in real networks (e.g., maximum clique finding, densest subgraph approximation, etc.), and for large-scale network fingerprinting and visualization. The k-core is a well accepted concept partly because there exists a simple and efficient algorithm for core decomposition, by recursively removing the lowest degree vertices and their incident edges. However, this algorithm requires random access to the graph and hence assumes the entire graph can be kept in main memory. Nevertheless, real-world networks such as online social networks have become exceedingly large in recent years and still keep growing at a steady rate. In this paper, we propose the first external-memory algorithm for core decomposition in massive graphs. When the memory is large enough to hold the graph, our algorithm achieves comparable performance as the in-memory algorithm. When the graph is too large to be kept in the memory, our algorithm requires only O(kmax) scans of the graph, where kmax is the largest core number of the graph. We demonstrate the efficiency of our algorithm on real networks with up to 52.9 million vertices and 1.65 billion edges.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121629143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 229
Intelligent management of virtualized resources for database systems in cloud environment 云环境下数据库系统虚拟化资源的智能管理
Pub Date : 2011-04-11 DOI: 10.1109/ICDE.2011.5767928
Pengcheng Xiong, Yun Chi, Shenghuo Zhu, H. J. Moon, C. Pu, Hakan Hacıgümüş
In a cloud computing environment, resources are shared among different clients. Intelligently managing and allocating resources among various clients is important for system providers, whose business model relies on managing the infrastructure resources in a cost-effective manner while satisfying the client service level agreements (SLAs). In this paper, we address the issue of how to intelligently manage the resources in a shared cloud database system and present SmartSLA, a cost-aware resource management system. SmartSLA consists of two main components: the system modeling module and the resource allocation decision module. The system modeling module uses machine learning techniques to learn a model that describes the potential profit margins for each client under different resource allocations. Based on the learned model, the resource allocation decision module dynamically adjusts the resource allocations in order to achieve the optimum profits. We evaluate SmartSLA by using the TPC-W benchmark with workload characteristics derived from real-life systems. The performance results indicate that SmartSLA can successfully compute predictive models under different hardware resource allocations, such as CPU and memory, as well as database specific resources, such as the number of replicas in the database systems. The experimental results also show that SmartSLA can provide intelligent service differentiation according to factors such as variable workloads, SLA levels, resource costs, and deliver improved profit margins.
在云计算环境中,资源在不同的客户机之间共享。对于系统提供商来说,在各种客户机之间智能地管理和分配资源非常重要,因为系统提供商的业务模型依赖于在满足客户机服务水平协议(sla)的同时以经济有效的方式管理基础设施资源。在本文中,我们解决了如何在共享云数据库系统中智能管理资源的问题,并提出了SmartSLA,一种成本感知的资源管理系统。SmartSLA主要由两个部分组成:系统建模模块和资源分配决策模块。系统建模模块使用机器学习技术来学习一个模型,该模型描述了不同资源分配下每个客户的潜在利润率。在学习模型的基础上,资源配置决策模块动态调整资源配置,以实现企业的最优利润。我们通过使用TPC-W基准测试来评估SmartSLA,该基准测试具有来自实际系统的工作负载特征。性能结果表明,SmartSLA可以在不同的硬件资源分配(如CPU和内存)以及数据库特定资源(如数据库系统中的副本数量)下成功地计算预测模型。实验结果还表明,SmartSLA可以根据不同的工作负载、SLA级别、资源成本等因素提供智能的业务差异化,提高利润率。
{"title":"Intelligent management of virtualized resources for database systems in cloud environment","authors":"Pengcheng Xiong, Yun Chi, Shenghuo Zhu, H. J. Moon, C. Pu, Hakan Hacıgümüş","doi":"10.1109/ICDE.2011.5767928","DOIUrl":"https://doi.org/10.1109/ICDE.2011.5767928","url":null,"abstract":"In a cloud computing environment, resources are shared among different clients. Intelligently managing and allocating resources among various clients is important for system providers, whose business model relies on managing the infrastructure resources in a cost-effective manner while satisfying the client service level agreements (SLAs). In this paper, we address the issue of how to intelligently manage the resources in a shared cloud database system and present SmartSLA, a cost-aware resource management system. SmartSLA consists of two main components: the system modeling module and the resource allocation decision module. The system modeling module uses machine learning techniques to learn a model that describes the potential profit margins for each client under different resource allocations. Based on the learned model, the resource allocation decision module dynamically adjusts the resource allocations in order to achieve the optimum profits. We evaluate SmartSLA by using the TPC-W benchmark with workload characteristics derived from real-life systems. The performance results indicate that SmartSLA can successfully compute predictive models under different hardware resource allocations, such as CPU and memory, as well as database specific resources, such as the number of replicas in the database systems. The experimental results also show that SmartSLA can provide intelligent service differentiation according to factors such as variable workloads, SLA levels, resource costs, and deliver improved profit margins.","PeriodicalId":332374,"journal":{"name":"2011 IEEE 27th International Conference on Data Engineering","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124026298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 161
期刊
2011 IEEE 27th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1