首页 > 最新文献

2012 IEEE 28th International Conference on Data Engineering最新文献

英文 中文
A Foundation for Efficient Indoor Distance-Aware Query Processing 高效室内距离感知查询处理的基础
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.44
Hua Lu, Xin Cao, Christian S. Jensen
Indoor spaces accommodate large numbers of spatial objects, e.g., points of interest (POIs), and moving populations. A variety of services, e.g., location-based services and security control, are relevant to indoor spaces. Such services can be improved substantially if they are capable of utilizing indoor distances. However, existing indoor space models do not account well for indoor distances. To address this shortcoming, we propose a data management infrastructure that captures indoor distance and facilitates distance-aware query processing. In particular, we propose a distance-aware indoor space model that integrates indoor distance seamlessly. To enable the use of the model as a foundation for query processing, we develop accompanying, efficient algorithms that compute indoor distances for different indoor entities like doors as well as locations. We also propose an indexing framework that accommodates indoor distances that are pre-computed using the proposed algorithms. On top of this foundation, we develop efficient algorithms for typical indoor, distance-aware queries. The results of an extensive experimental evaluation demonstrate the efficacy of the proposals.
室内空间容纳了大量的空间对象,例如兴趣点(poi)和流动人口。各种各样的服务,例如基于位置的服务和安全控制,都与室内空间有关。如果这种服务能够利用室内距离,就可以大大改进。然而,现有的室内空间模型不能很好地考虑室内距离。为了解决这一缺点,我们提出了一种数据管理基础设施,可以捕获室内距离并促进距离感知查询处理。特别是,我们提出了一个无缝集成室内距离的距离感知室内空间模型。为了使用该模型作为查询处理的基础,我们开发了相应的高效算法,用于计算不同室内实体(如门和位置)的室内距离。我们还提出了一个索引框架,该框架适用于使用所提出的算法预先计算的室内距离。在此基础上,我们为典型的室内距离感知查询开发了高效算法。广泛的实验评估结果证明了这些建议的有效性。
{"title":"A Foundation for Efficient Indoor Distance-Aware Query Processing","authors":"Hua Lu, Xin Cao, Christian S. Jensen","doi":"10.1109/ICDE.2012.44","DOIUrl":"https://doi.org/10.1109/ICDE.2012.44","url":null,"abstract":"Indoor spaces accommodate large numbers of spatial objects, e.g., points of interest (POIs), and moving populations. A variety of services, e.g., location-based services and security control, are relevant to indoor spaces. Such services can be improved substantially if they are capable of utilizing indoor distances. However, existing indoor space models do not account well for indoor distances. To address this shortcoming, we propose a data management infrastructure that captures indoor distance and facilitates distance-aware query processing. In particular, we propose a distance-aware indoor space model that integrates indoor distance seamlessly. To enable the use of the model as a foundation for query processing, we develop accompanying, efficient algorithms that compute indoor distances for different indoor entities like doors as well as locations. We also propose an indexing framework that accommodates indoor distances that are pre-computed using the proposed algorithms. On top of this foundation, we develop efficient algorithms for typical indoor, distance-aware queries. The results of an extensive experimental evaluation demonstrate the efficacy of the proposals.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132954012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Load Balancing in MapReduce Based on Scalable Cardinality Estimates 基于可伸缩基数估计的MapReduce负载平衡
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.58
B. Gufler, Nikolaus Augsten, Angelika Reiser, A. Kemper
MapReduce has emerged as a popular tool for distributed and scalable processing of massive data sets and is being used increasingly in e-science applications. Unfortunately, the performance of MapReduce systems strongly depends on an even data distribution while scientific data sets are often highly skewed. The resulting load imbalance, which raises the processing time, is even amplified by high runtime complexity of the reducer tasks. An adaptive load balancing strategy is required for appropriate skew handling. In this paper, we address the problem of estimating the cost of the tasks that are distributed to the reducers based on a given cost model. An accurate cost estimation is the basis for adaptive load balancing algorithms and requires to gather statistics from the mappers. This is challenging: (a) Since the statistics from all mappers must be integrated, the mapper statistics must be small. (b) Although each mapper sees only a small fraction of the data, the integrated statistics must capture the global data distribution. (c) The mappers terminate after sending the statistics to the controller, and no second round is possible. Our solution to these challenges consists of two components. First, a monitoring component executed on every mapper captures the local data distribution and identifies its most relevant subset for cost estimation. Second, an integration component aggregates these subsets approximating the global data distribution.
MapReduce已经成为分布式和可扩展处理海量数据集的流行工具,并越来越多地用于电子科学应用。不幸的是,MapReduce系统的性能在很大程度上依赖于均匀的数据分布,而科学数据集通常是高度倾斜的。由此产生的负载不平衡(这会增加处理时间)甚至会被reducer任务的高运行时复杂性放大。适当的倾斜处理需要自适应负载平衡策略。在本文中,我们基于给定的成本模型,解决了分配给reducer的任务的成本估计问题。准确的成本估计是自适应负载平衡算法的基础,需要从映射器中收集统计信息。这是具有挑战性的:(a)由于必须综合所有制图器的统计数据,制图器的统计数据必须很小。(b)虽然每个制图者只看到数据的一小部分,但综合统计必须反映全球数据的分布情况。(c)映射器在向控制器发送统计数据后终止,不可能进行第二轮。我们应对这些挑战的解决方案包括两个部分。首先,在每个映射器上执行的监视组件捕获本地数据分布,并确定其最相关的子集以进行成本估算。其次,集成组件聚合这些近似全局数据分布的子集。
{"title":"Load Balancing in MapReduce Based on Scalable Cardinality Estimates","authors":"B. Gufler, Nikolaus Augsten, Angelika Reiser, A. Kemper","doi":"10.1109/ICDE.2012.58","DOIUrl":"https://doi.org/10.1109/ICDE.2012.58","url":null,"abstract":"MapReduce has emerged as a popular tool for distributed and scalable processing of massive data sets and is being used increasingly in e-science applications. Unfortunately, the performance of MapReduce systems strongly depends on an even data distribution while scientific data sets are often highly skewed. The resulting load imbalance, which raises the processing time, is even amplified by high runtime complexity of the reducer tasks. An adaptive load balancing strategy is required for appropriate skew handling. In this paper, we address the problem of estimating the cost of the tasks that are distributed to the reducers based on a given cost model. An accurate cost estimation is the basis for adaptive load balancing algorithms and requires to gather statistics from the mappers. This is challenging: (a) Since the statistics from all mappers must be integrated, the mapper statistics must be small. (b) Although each mapper sees only a small fraction of the data, the integrated statistics must capture the global data distribution. (c) The mappers terminate after sending the statistics to the controller, and no second round is possible. Our solution to these challenges consists of two components. First, a monitoring component executed on every mapper captures the local data distribution and identifies its most relevant subset for cost estimation. Second, an integration component aggregates these subsets approximating the global data distribution.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132587368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 142
A Deep Embedding of Queries into Ruby 查询在Ruby中的深度嵌入
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.121
Torsten Grust, Manuel Mayr
We demonstrate SWITCH, a deep embedding of relational queries into Ruby and Ruby on Rails. With SWITCH, there is no syntactic or stylistic difference between Ruby programs that operate over in-memory array objects or database-resident tables, even if these programs rely on array order or nesting. SWITCH's built-in compiler and SQL code generator guarantee to emit few queries, addressing long-standing performance problems that trace back to Rails' Active Record database binding. "Looks likes Ruby, but performs like handcrafted SQL, " is the ideal that drives the research and development effort behind SWITCH.
我们将演示SWITCH,一个在Ruby和Ruby on Rails中深度嵌入关系查询的工具。使用SWITCH,在操作内存中的数组对象或数据库驻留表的Ruby程序之间没有语法或风格上的区别,即使这些程序依赖于数组顺序或嵌套。SWITCH的内置编译器和SQL代码生成器保证发出很少的查询,解决了长期存在的性能问题,这些问题可以追溯到Rails的Active Record数据库绑定。“看起来像Ruby,但执行起来像手工制作的SQL”是驱动SWITCH背后研究和开发工作的理想。
{"title":"A Deep Embedding of Queries into Ruby","authors":"Torsten Grust, Manuel Mayr","doi":"10.1109/ICDE.2012.121","DOIUrl":"https://doi.org/10.1109/ICDE.2012.121","url":null,"abstract":"We demonstrate SWITCH, a deep embedding of relational queries into Ruby and Ruby on Rails. With SWITCH, there is no syntactic or stylistic difference between Ruby programs that operate over in-memory array objects or database-resident tables, even if these programs rely on array order or nesting. SWITCH's built-in compiler and SQL code generator guarantee to emit few queries, addressing long-standing performance problems that trace back to Rails' Active Record database binding. \"Looks likes Ruby, but performs like handcrafted SQL, \" is the ideal that drives the research and development effort behind SWITCH.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131362926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Mining Knowledge from Data: An Information Network Analysis Approach 从数据中挖掘知识:一种信息网络分析方法
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.145
Jiawei Han, Yizhou Sun, Xifeng Yan, Philip S. Yu
Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database researchers consider a database merely as a data repository that supports storage and retrieval rather than an information-rich, inter-related and multi-typed information network that supports comprehensive data analysis, whereas many network researchers focus on homogeneous networks. Departing from both, we view interconnected, semi-structured datasets as heterogeneous, information-rich networks and study how to uncover hidden knowledge in such networks. For example, a university database can be viewed as a heterogeneous information network, where objects of multiple types, such as students, professors, courses, departments, and multiple typed relationships, such as teach and advise are intertwined together, providing abundant information. In this tutorial, we present an organized picture on mining heterogeneous information networks and introduce a set of interesting, effective and scalable network mining methods. The topics to be covered include (i) database as an information network, (ii) mining information networks: clustering, classification, ranking, similarity search, and meta path-guided analysis, (iii) construction of quality, informative networks by data mining, (iv) trend and evolution analysis in heterogeneous information networks, and (v) research frontiers. We show that heterogeneous information networks are informative, and link analysis on such networks is powerful at uncovering critical knowledge hidden in large semi-structured datasets. Finally, we also present a few promising research directions.
现实世界中的大多数对象和数据都是相互联系的,形成了复杂的、异构的、但往往是半结构化的信息网络。然而,许多数据库研究者认为数据库仅仅是一个支持存储和检索的数据存储库,而不是一个支持全面数据分析的信息丰富、相互关联和多类型的信息网络,而许多网络研究者关注的是同构网络。从两者出发,我们将相互连接的半结构化数据集视为异构的、信息丰富的网络,并研究如何在这些网络中发现隐藏的知识。例如,大学数据库可以看作是一个异构信息网络,学生、教授、课程、院系等多种类型的对象和教学、咨询等多种类型的关系交织在一起,提供了丰富的信息。在本教程中,我们对异构信息网络的挖掘进行了有组织的描述,并介绍了一套有趣、有效和可扩展的网络挖掘方法。将涵盖的主题包括(i)数据库作为信息网络;(ii)挖掘信息网络:聚类、分类、排序、相似搜索和元路径引导分析;(iii)利用数据挖掘构建高质量的信息网络;(iv)异构信息网络的趋势和进化分析;以及(v)研究前沿。我们证明了异构信息网络是信息性的,并且这种网络上的链接分析在揭示隐藏在大型半结构化数据集中的关键知识方面是强大的。最后,提出了今后的研究方向。
{"title":"Mining Knowledge from Data: An Information Network Analysis Approach","authors":"Jiawei Han, Yizhou Sun, Xifeng Yan, Philip S. Yu","doi":"10.1109/ICDE.2012.145","DOIUrl":"https://doi.org/10.1109/ICDE.2012.145","url":null,"abstract":"Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database researchers consider a database merely as a data repository that supports storage and retrieval rather than an information-rich, inter-related and multi-typed information network that supports comprehensive data analysis, whereas many network researchers focus on homogeneous networks. Departing from both, we view interconnected, semi-structured datasets as heterogeneous, information-rich networks and study how to uncover hidden knowledge in such networks. For example, a university database can be viewed as a heterogeneous information network, where objects of multiple types, such as students, professors, courses, departments, and multiple typed relationships, such as teach and advise are intertwined together, providing abundant information. In this tutorial, we present an organized picture on mining heterogeneous information networks and introduce a set of interesting, effective and scalable network mining methods. The topics to be covered include (i) database as an information network, (ii) mining information networks: clustering, classification, ranking, similarity search, and meta path-guided analysis, (iii) construction of quality, informative networks by data mining, (iv) trend and evolution analysis in heterogeneous information networks, and (v) research frontiers. We show that heterogeneous information networks are informative, and link analysis on such networks is powerful at uncovering critical knowledge hidden in large semi-structured datasets. Finally, we also present a few promising research directions.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134367970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Detecting Clones, Copying and Reuse on the Web 检测克隆,复制和重用在Web上
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.146
X. Dong, D. Srivastava
The Web has enabled the availability of a vast amount of useful information in recent years. However, the web technologies that have enabled sources to share their information have also made it easy for sources to copy from each other and often publish without proper attribution. Understanding the copying relationships between sources has many benefits, including helping data providers protect their own rights, improving various aspects of data integration, and facilitating in-depth analysis of information flow. The importance of copy detection has led to a substantial amount of research in many disciplines of Computer Science, based on the type of information considered, such as text, images, videos, software code, and structured data. This seminar explores the similarities and differences between the techniques proposed for copy detection across the different types of information. We also examine the computational challenges associated with large-scale copy detection, indicating how they could be detected efficiently, and identify a range of open problems for the community.
近年来,网络使大量有用信息的可用性成为可能。然而,网络技术使消息来源能够共享他们的信息,也使消息来源之间的相互复制变得容易,并且经常在没有适当归属的情况下发布。了解源之间的复制关系有很多好处,包括帮助数据提供者保护自己的权利、改进数据集成的各个方面,以及促进对信息流的深入分析。基于所考虑的信息类型(如文本、图像、视频、软件代码和结构化数据),复制检测的重要性在计算机科学的许多学科中引起了大量的研究。本次研讨会探讨了在不同类型的信息中提出的复制检测技术之间的异同。我们还研究了与大规模复制检测相关的计算挑战,指出了如何有效地检测它们,并为社区确定了一系列开放问题。
{"title":"Detecting Clones, Copying and Reuse on the Web","authors":"X. Dong, D. Srivastava","doi":"10.1109/ICDE.2012.146","DOIUrl":"https://doi.org/10.1109/ICDE.2012.146","url":null,"abstract":"The Web has enabled the availability of a vast amount of useful information in recent years. However, the web technologies that have enabled sources to share their information have also made it easy for sources to copy from each other and often publish without proper attribution. Understanding the copying relationships between sources has many benefits, including helping data providers protect their own rights, improving various aspects of data integration, and facilitating in-depth analysis of information flow. The importance of copy detection has led to a substantial amount of research in many disciplines of Computer Science, based on the type of information considered, such as text, images, videos, software code, and structured data. This seminar explores the similarities and differences between the techniques proposed for copy detection across the different types of information. We also examine the computational challenges associated with large-scale copy detection, indicating how they could be detected efficiently, and identify a range of open problems for the community.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132930884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Querying XML Data: As You Shape It 查询XML数据:当你塑造它
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.65
C. Dyreson, S. Bhowmick
A limitation of XQuery is that a programmer has to be familiar with the shape of the data to query it effectively. And if that shape changes, or if the shape is other than what the programmer expects, the query may fail. One way to avoid this limitation is to transform the data into a desired shape. A data transformation is a rearrangement of data into a new shape. In this paper, we present the semantics and implementation of XMorph 2.0, a shape-polymorphic data transformation language for XML. An XMorph program can act as a query guard. The guard both transforms data to the shape needed by the query and determines whether and how the transformation potentially loses information, a transformation that loses information may lead to a query yielding an inaccurate result. This paper describes how to use XMorph as a query guard, gives a formal semantics for shape-to-shape transformations, documents how XMorph determines how a transformation potentially loses information, and describes the XMorph implementation.
XQuery的一个限制是,程序员必须熟悉数据的形状才能有效地进行查询。如果形状发生了变化,或者形状与程序员所期望的不同,则查询可能会失败。避免此限制的一种方法是将数据转换为所需的形状。数据转换是将数据重新排列成新的形状。在本文中,我们介绍了XML的形状多态数据转换语言XMorph 2.0的语义和实现。XMorph程序可以充当查询守卫。守卫将数据转换为查询所需的形状,并确定转换是否以及如何潜在地丢失信息,丢失信息的转换可能导致查询产生不准确的结果。本文描述了如何使用XMorph作为查询保护,给出了形状到形状转换的形式化语义,记录了XMorph如何确定转换如何可能丢失信息,并描述了XMorph的实现。
{"title":"Querying XML Data: As You Shape It","authors":"C. Dyreson, S. Bhowmick","doi":"10.1109/ICDE.2012.65","DOIUrl":"https://doi.org/10.1109/ICDE.2012.65","url":null,"abstract":"A limitation of XQuery is that a programmer has to be familiar with the shape of the data to query it effectively. And if that shape changes, or if the shape is other than what the programmer expects, the query may fail. One way to avoid this limitation is to transform the data into a desired shape. A data transformation is a rearrangement of data into a new shape. In this paper, we present the semantics and implementation of XMorph 2.0, a shape-polymorphic data transformation language for XML. An XMorph program can act as a query guard. The guard both transforms data to the shape needed by the query and determines whether and how the transformation potentially loses information, a transformation that loses information may lead to a query yielding an inaccurate result. This paper describes how to use XMorph as a query guard, gives a formal semantics for shape-to-shape transformations, documents how XMorph determines how a transformation potentially loses information, and describes the XMorph implementation.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133327907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Vectorwise: A Vectorized Analytical DBMS 矢量化:一个矢量化的分析数据库管理系统
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.148
M. Zukowski, M. V. D. Wiel, P. Boncz
Vector wise is a new entrant in the analytical database marketplace whose technology comes straight from innovations in the database research community in the past years. The product has since made waves due to its excellent performance in analytical customer workloads as well as benchmarks. We describe the history of Vectorwise, as well as its basic architecture and the experiences in turning a technology developed in an academic context into a commercial-grade product. Finally, we turn our attention to recent performance results, most notably on the TPC-H benchmark at various sizes.
Vector wise是分析数据库市场的新进入者,其技术直接来自过去几年数据库研究社区的创新。该产品由于其在分析客户工作负载和基准测试中的出色性能而引起轰动。我们描述了Vectorwise的历史,以及它的基本架构和将学术环境中开发的技术转化为商业级产品的经验。最后,我们将注意力转向最近的性能结果,最值得注意的是不同尺寸的TPC-H基准测试。
{"title":"Vectorwise: A Vectorized Analytical DBMS","authors":"M. Zukowski, M. V. D. Wiel, P. Boncz","doi":"10.1109/ICDE.2012.148","DOIUrl":"https://doi.org/10.1109/ICDE.2012.148","url":null,"abstract":"Vector wise is a new entrant in the analytical database marketplace whose technology comes straight from innovations in the database research community in the past years. The product has since made waves due to its excellent performance in analytical customer workloads as well as benchmarks. We describe the history of Vectorwise, as well as its basic architecture and the experiences in turning a technology developed in an academic context into a commercial-grade product. Finally, we turn our attention to recent performance results, most notably on the TPC-H benchmark at various sizes.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116094819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
M3: Stream Processing on Main-Memory MapReduce M3:在内存MapReduce上的流处理
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.120
Ahmed M. Aly, Asmaa Sallam, B. Gnanasekaran, Long-Van Nguyen-Dinh, Walid G. Aref, M. Ouzzani, A. Ghafoor
The continuous growth of social web applications along with the development of sensor capabilities in electronic devices is creating countless opportunities to analyze the enormous amounts of data that is continuously steaming from these applications and devices. To process large scale data on large scale computing clusters, MapReduce has been introduced as a framework for parallel computing. However, most of the current implementations of the MapReduce framework support only the execution of fixed-input jobs. Such restriction makes these implementations inapplicable for most streaming applications, in which queries are continuous in nature, and input data streams are continuously received at high arrival rates. In this demonstration, we showcase M3, a prototype implementation of the MapReduce framework in which continuous queries over streams of data can be efficiently answered. M3 extends Hadoop, the open source implementation of MapReduce, bypassing the Hadoop Distributed File System (HDFS) to support main-memory-only processing. Moreover, M3 supports continuous execution of the Map and Reduce phases where individual Mappers and Reducers never terminate.
社交网络应用程序的持续增长以及电子设备中传感器功能的发展为分析这些应用程序和设备中不断冒出的大量数据创造了无数的机会。为了在大规模计算集群上处理大规模数据,MapReduce作为并行计算的框架被引入。然而,MapReduce框架的大多数当前实现只支持执行固定输入的作业。这种限制使得这些实现不适用于大多数流应用程序,其中查询本质上是连续的,并且输入数据流以高到达率连续接收。在这个演示中,我们展示了M3,一个MapReduce框架的原型实现,在这个框架中,可以有效地回答对数据流的连续查询。M3扩展了Hadoop, MapReduce的开源实现,绕过Hadoop分布式文件系统(HDFS)来支持仅主存处理。此外,M3支持Map和Reduce阶段的连续执行,其中单个mapper和Reducers永远不会终止。
{"title":"M3: Stream Processing on Main-Memory MapReduce","authors":"Ahmed M. Aly, Asmaa Sallam, B. Gnanasekaran, Long-Van Nguyen-Dinh, Walid G. Aref, M. Ouzzani, A. Ghafoor","doi":"10.1109/ICDE.2012.120","DOIUrl":"https://doi.org/10.1109/ICDE.2012.120","url":null,"abstract":"The continuous growth of social web applications along with the development of sensor capabilities in electronic devices is creating countless opportunities to analyze the enormous amounts of data that is continuously steaming from these applications and devices. To process large scale data on large scale computing clusters, MapReduce has been introduced as a framework for parallel computing. However, most of the current implementations of the MapReduce framework support only the execution of fixed-input jobs. Such restriction makes these implementations inapplicable for most streaming applications, in which queries are continuous in nature, and input data streams are continuously received at high arrival rates. In this demonstration, we showcase M3, a prototype implementation of the MapReduce framework in which continuous queries over streams of data can be efficiently answered. M3 extends Hadoop, the open source implementation of MapReduce, bypassing the Hadoop Distributed File System (HDFS) to support main-memory-only processing. Moreover, M3 supports continuous execution of the Map and Reduce phases where individual Mappers and Reducers never terminate.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125749650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Efficient Similarity Search over Encrypted Data 加密数据的高效相似度搜索
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.23
Mehmet Kuzu, M. S. Islam, Murat Kantarcioglu
In recent years, due to the appealing features of cloud computing, large amount of data have been stored in the cloud. Although cloud based services offer many advantages, privacy and security of the sensitive data is a big concern. To mitigate the concerns, it is desirable to outsource sensitive data in encrypted form. Encrypted storage protects the data against illegal access, but it complicates some basic, yet important functionality such as the search on the data. To achieve search over encrypted data without compromising the privacy, considerable amount of searchable encryption schemes have been proposed in the literature. However, almost all of them handle exact query matching but not similarity matching, a crucial requirement for real world applications. Although some sophisticated secure multi-party computation based cryptographic techniques are available for similarity tests, they are computationally intensive and do not scale for large data sources. In this paper, we propose an efficient scheme for similarity search over encrypted data. To do so, we utilize a state-of-the-art algorithm for fast near neighbor search in high dimensional spaces called locality sensitive hashing. To ensure the confidentiality of the sensitive data, we provide a rigorous security definition and prove the security of the proposed scheme under the provided definition. In addition, we provide a real world application of the proposed scheme and verify the theoretical results with empirical observations on a real dataset.
近年来,由于云计算吸引人的特性,大量的数据被存储在云中。尽管基于云的服务提供了许多优势,但敏感数据的隐私和安全性是一个大问题。为了减轻这种担忧,最好将敏感数据以加密的形式外包出去。加密存储可以保护数据免受非法访问,但它使一些基本但重要的功能(如数据搜索)变得复杂。为了在不损害隐私的情况下实现对加密数据的搜索,文献中提出了大量的可搜索加密方案。然而,几乎所有这些都处理精确的查询匹配,而不是相似性匹配,这是现实世界应用程序的关键需求。尽管一些复杂的安全的基于多方计算的加密技术可用于相似性测试,但它们是计算密集型的,不能扩展到大型数据源。本文提出了一种有效的加密数据相似度搜索方案。为此,我们利用最先进的算法在高维空间中进行快速近邻搜索,称为局部敏感散列。为了保证敏感数据的保密性,我们给出了严格的安全定义,并在此定义下证明了所提出方案的安全性。此外,我们提供了所提出方案的实际应用,并通过实际数据集的经验观察验证了理论结果。
{"title":"Efficient Similarity Search over Encrypted Data","authors":"Mehmet Kuzu, M. S. Islam, Murat Kantarcioglu","doi":"10.1109/ICDE.2012.23","DOIUrl":"https://doi.org/10.1109/ICDE.2012.23","url":null,"abstract":"In recent years, due to the appealing features of cloud computing, large amount of data have been stored in the cloud. Although cloud based services offer many advantages, privacy and security of the sensitive data is a big concern. To mitigate the concerns, it is desirable to outsource sensitive data in encrypted form. Encrypted storage protects the data against illegal access, but it complicates some basic, yet important functionality such as the search on the data. To achieve search over encrypted data without compromising the privacy, considerable amount of searchable encryption schemes have been proposed in the literature. However, almost all of them handle exact query matching but not similarity matching, a crucial requirement for real world applications. Although some sophisticated secure multi-party computation based cryptographic techniques are available for similarity tests, they are computationally intensive and do not scale for large data sources. In this paper, we propose an efficient scheme for similarity search over encrypted data. To do so, we utilize a state-of-the-art algorithm for fast near neighbor search in high dimensional spaces called locality sensitive hashing. To ensure the confidentiality of the sensitive data, we provide a rigorous security definition and prove the security of the proposed scheme under the provided definition. In addition, we provide a real world application of the proposed scheme and verify the theoretical results with empirical observations on a real dataset.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127560794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 301
SWST: A Disk Based Index for Sliding Window Spatio-Temporal Data 基于磁盘的滑动窗口时空数据索引
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.98
Manish Singh, Qiang Zhu, H. Jagadish
Numerous applications such as wireless communication and telematics need to keep track of evolution of spatio-temporal data for a limited past. Limited retention may even be required by regulations. In general, each data entry can have its own user specified lifetime. It is desired that expired entries are automatically removed by the system through some garbage collection mechanism. This kind of limited retention can be achieved by using a sliding window semantics similar to that from stream data processing. However, due to the large volume and relatively long lifetime of data in the aforementioned applications (in contrast to the real-time transient streaming data), the sliding window here needs to be maintained for data on disk rather than in memory. It is a new challenge to provide fast access to the information from the recent past and, at the same time, facilitate efficient deletion of the expired entries. In this paper, we propose a disk based, two-layered, sliding window indexing scheme for discretely moving spatio-temporal data. Our index can support efficient processing of standard time slice and interval queries and delete expired entries with almost no overhead. In existing historical spatio-temporal indexing techniques, deletion is either infeasible or very inefficient. Our sliding window based processing model can support both current and past entries, while many existing historical spatio-temporal indexing techniques cannot keep these two types of data together in the same index. Our experimental comparison with the best known historical index (i.e., the MV3R tree) for discretely moving spatio-temporal data shows that our index is about five times faster in terms of insertion time and comparable in terms of search performance. MV3R follows a partial persistency model, whereas our index can support very efficient deletion and update.
无线通信和远程信息处理等许多应用需要跟踪有限过去的时空数据的演变。条例甚至可能要求有限度的保留。一般来说,每个数据条目都可以有自己的用户指定生命周期。我们希望系统通过某种垃圾收集机制自动删除过期的条目。这种有限的保留可以通过使用类似于流数据处理的滑动窗口语义来实现。然而,由于上述应用程序中的数据量大且生命周期相对较长(与实时瞬态流数据相比),这里需要为磁盘上的数据而不是内存中的数据维护滑动窗口。如何提供对最近的信息的快速访问,同时方便有效地删除过期的条目是一个新的挑战。在本文中,我们提出了一种基于磁盘的双层滑动窗口索引方案,用于离散移动的时空数据。我们的索引可以支持对标准时间片和间隔查询的有效处理,并且几乎没有开销地删除过期条目。在现有的历史时空索引技术中,删除要么不可行,要么效率低下。我们基于滑动窗口的处理模型可以同时支持当前和过去条目,而许多现有的历史时空索引技术无法将这两种类型的数据放在同一个索引中。我们与最著名的用于离散移动时空数据的历史索引(即MV3R树)的实验比较表明,我们的索引在插入时间和搜索性能方面大约快五倍。MV3R遵循部分持久性模型,而我们的索引可以支持非常有效的删除和更新。
{"title":"SWST: A Disk Based Index for Sliding Window Spatio-Temporal Data","authors":"Manish Singh, Qiang Zhu, H. Jagadish","doi":"10.1109/ICDE.2012.98","DOIUrl":"https://doi.org/10.1109/ICDE.2012.98","url":null,"abstract":"Numerous applications such as wireless communication and telematics need to keep track of evolution of spatio-temporal data for a limited past. Limited retention may even be required by regulations. In general, each data entry can have its own user specified lifetime. It is desired that expired entries are automatically removed by the system through some garbage collection mechanism. This kind of limited retention can be achieved by using a sliding window semantics similar to that from stream data processing. However, due to the large volume and relatively long lifetime of data in the aforementioned applications (in contrast to the real-time transient streaming data), the sliding window here needs to be maintained for data on disk rather than in memory. It is a new challenge to provide fast access to the information from the recent past and, at the same time, facilitate efficient deletion of the expired entries. In this paper, we propose a disk based, two-layered, sliding window indexing scheme for discretely moving spatio-temporal data. Our index can support efficient processing of standard time slice and interval queries and delete expired entries with almost no overhead. In existing historical spatio-temporal indexing techniques, deletion is either infeasible or very inefficient. Our sliding window based processing model can support both current and past entries, while many existing historical spatio-temporal indexing techniques cannot keep these two types of data together in the same index. Our experimental comparison with the best known historical index (i.e., the MV3R tree) for discretely moving spatio-temporal data shows that our index is about five times faster in terms of insertion time and comparable in terms of search performance. MV3R follows a partial persistency model, whereas our index can support very efficient deletion and update.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127841086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2012 IEEE 28th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1