首页 > 最新文献

2012 IEEE 28th International Conference on Data Engineering最新文献

英文 中文
Joint Entity Resolution 联合实体决议
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.119
Steven Euijong Whang, H. Garcia-Molina
Entity resolution (ER) is the problem of identifying which records in a database represent the same entity. Often, records of different types are involved (e.g., authors, publications, institutions, venues), and resolving records of one type can impact the resolution of other types of records. In this paper we propose a flexible, modular resolution framework where existing ER algorithms developed for a given record type can be plugged in and used in concert with other ER algorithms. Our approach also makes it possible to run ER on subsets of similar records at a time, important when the full data is too large to resolve together. We study the scheduling and coordination of the individual ER algorithms in order to resolve the full data set. We then evaluate our joint ER techniques on synthetic and real data and show the scalability of our approach.
实体解析(ER)是识别数据库中哪些记录代表同一实体的问题。通常,涉及不同类型的记录(例如,作者、出版物、机构、场所),解决一种类型的记录可能会影响其他类型记录的解决。在本文中,我们提出了一个灵活的模块化解决框架,其中为给定记录类型开发的现有ER算法可以插入并与其他ER算法一起使用。我们的方法还可以一次在类似记录的子集上运行ER,这在完整数据太大而无法一起解析时非常重要。为了解决完整的数据集,我们研究了各个ER算法的调度和协调。然后,我们在合成数据和真实数据上评估了我们的联合ER技术,并展示了我们方法的可扩展性。
{"title":"Joint Entity Resolution","authors":"Steven Euijong Whang, H. Garcia-Molina","doi":"10.1109/ICDE.2012.119","DOIUrl":"https://doi.org/10.1109/ICDE.2012.119","url":null,"abstract":"Entity resolution (ER) is the problem of identifying which records in a database represent the same entity. Often, records of different types are involved (e.g., authors, publications, institutions, venues), and resolving records of one type can impact the resolution of other types of records. In this paper we propose a flexible, modular resolution framework where existing ER algorithms developed for a given record type can be plugged in and used in concert with other ER algorithms. Our approach also makes it possible to run ER on subsets of similar records at a time, important when the full data is too large to resolve together. We study the scheduling and coordination of the individual ER algorithms in order to resolve the full data set. We then evaluate our joint ER techniques on synthetic and real data and show the scalability of our approach.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123055349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
On Discovery of Traveling Companions from Streaming Trajectories 从流轨迹中发现旅伴
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.33
L. Tang, Yu Zheng, Jing Yuan, Jiawei Han, Alice Leung, Chih-Chieh Hung, Wen-Chih Peng
The advance of object tracking technologies leads to huge volumes of spatio-temporal data collected in the form of trajectory data stream. In this study, we investigate the problem of discovering object groups that travel together (i.e., traveling companions) from trajectory stream. Such technique has broad applications in the areas of scientific study, transportation management and military surveillance. To discover traveling companions, the monitoring system should cluster the objects of each snapshot and intersect the clustering results to retrieve moving-together objects. Since both clustering and intersection steps involve high computational overhead, the key issue of companion discovery is to improve the algorithm's efficiency. We propose the models of closed companion candidates and smart intersection to accelerate data processing. A new data structure termed traveling buddy is designed to facilitate scalable and flexible companion discovery on trajectory stream. The traveling buddies are micro-groups of objects that are tightly bound together. By only storing the object relationships rather than their spatial coordinates, the buddies can be dynamically maintained along trajectory stream with low cost. Based on traveling buddies, the system can discover companions without accessing the object details. The proposed methods are evaluated with extensive experiments on both real and synthetic datasets. The buddy-based method is an order of magnitude faster than existing methods. It also outperforms other competitors with higher precision and recall in companion discovery.
随着目标跟踪技术的发展,以轨迹数据流的形式收集了海量的时空数据。在本研究中,我们研究了从轨迹流中发现一起旅行的目标群(即旅伴)的问题。该技术在科学研究、交通管理和军事监视等领域有着广泛的应用。为了发现同伴,监控系统需要对每个快照的对象进行聚类,并将聚类结果相交以检索一起运动的对象。由于聚类和相交步骤都涉及较高的计算开销,因此同伴发现的关键问题是提高算法的效率。我们提出了封闭候选伙伴模型和智能交叉口模型来加速数据处理。设计了一种新的数据结构,称为旅行伙伴,以方便在轨迹流上可扩展和灵活地发现同伴。“旅行伙伴”是由紧密结合在一起的物体组成的微观群体。通过只存储对象关系而不存储空间坐标,可以低成本地动态维护伙伴关系。基于旅行伙伴,系统可以在不访问对象详细信息的情况下发现同伴。在真实数据集和合成数据集上对所提出的方法进行了广泛的实验评估。基于伙伴的方法比现有方法快一个数量级。在同伴发现方面,它也以更高的准确率和召回率优于其他竞争对手。
{"title":"On Discovery of Traveling Companions from Streaming Trajectories","authors":"L. Tang, Yu Zheng, Jing Yuan, Jiawei Han, Alice Leung, Chih-Chieh Hung, Wen-Chih Peng","doi":"10.1109/ICDE.2012.33","DOIUrl":"https://doi.org/10.1109/ICDE.2012.33","url":null,"abstract":"The advance of object tracking technologies leads to huge volumes of spatio-temporal data collected in the form of trajectory data stream. In this study, we investigate the problem of discovering object groups that travel together (i.e., traveling companions) from trajectory stream. Such technique has broad applications in the areas of scientific study, transportation management and military surveillance. To discover traveling companions, the monitoring system should cluster the objects of each snapshot and intersect the clustering results to retrieve moving-together objects. Since both clustering and intersection steps involve high computational overhead, the key issue of companion discovery is to improve the algorithm's efficiency. We propose the models of closed companion candidates and smart intersection to accelerate data processing. A new data structure termed traveling buddy is designed to facilitate scalable and flexible companion discovery on trajectory stream. The traveling buddies are micro-groups of objects that are tightly bound together. By only storing the object relationships rather than their spatial coordinates, the buddies can be dynamically maintained along trajectory stream with low cost. Based on traveling buddies, the system can discover companions without accessing the object details. The proposed methods are evaluated with extensive experiments on both real and synthetic datasets. The buddy-based method is an order of magnitude faster than existing methods. It also outperforms other competitors with higher precision and recall in companion discovery.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123093717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 163
Integrating Frequent Pattern Mining from Multiple Data Domains for Classification 集成多数据域频繁模式挖掘进行分类
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.63
D. Patel, W. Hsu, M. Lee
Many frequent pattern mining algorithms have been developed for categorical, numerical, time series, or interval data. However, little attention has been given to integrate these algorithms so as to mine frequent patterns from multiple domain datasets for classification. In this paper, we introduce the notion of a heterogenous pattern to capture the associations among different kinds of data. We propose a unified framework for mining multiple domain datasets and design an iterative algorithm called HTMiner. HTMiner discovers essential heterogenous patterns for classification and performs instance elimination. This instance elimination step reduces the problem size progressively by removing training instances which are correctly covered by the discovered essential heterogenous pattern. Experiments on two real world datasets show that the HTMiner is efficient and can significantly improve the classification accuracy.
对于分类、数值、时间序列或区间数据,已经开发了许多频繁模式挖掘算法。然而,如何将这些算法整合起来,从多领域数据集中挖掘出频繁的模式进行分类却鲜有人关注。在本文中,我们引入了异构模式的概念来捕获不同类型数据之间的关联。我们提出了一个统一的多领域数据集挖掘框架,并设计了一个迭代算法HTMiner。HTMiner发现用于分类的基本异构模式,并执行实例消除。实例消除步骤通过去除被发现的基本异构模式正确覆盖的训练实例来逐步减小问题的大小。在两个真实数据集上的实验表明,HTMiner是高效的,可以显著提高分类精度。
{"title":"Integrating Frequent Pattern Mining from Multiple Data Domains for Classification","authors":"D. Patel, W. Hsu, M. Lee","doi":"10.1109/ICDE.2012.63","DOIUrl":"https://doi.org/10.1109/ICDE.2012.63","url":null,"abstract":"Many frequent pattern mining algorithms have been developed for categorical, numerical, time series, or interval data. However, little attention has been given to integrate these algorithms so as to mine frequent patterns from multiple domain datasets for classification. In this paper, we introduce the notion of a heterogenous pattern to capture the associations among different kinds of data. We propose a unified framework for mining multiple domain datasets and design an iterative algorithm called HTMiner. HTMiner discovers essential heterogenous patterns for classification and performs instance elimination. This instance elimination step reduces the problem size progressively by removing training instances which are correctly covered by the discovered essential heterogenous pattern. Experiments on two real world datasets show that the HTMiner is efficient and can significantly improve the classification accuracy.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117017195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Towards Preference-aware Relational Databases 面向支持偏好的关系数据库
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.31
Anastasios Arvanitis, G. Koutrika
In implementing preference-aware query processing, a straightforward option is to build a plug-in on top of the database engine. However, treating the DBMS as a black box affects both the expressivity and performance of queries with preferences. In this paper, we argue that preference-aware query processing needs to be pushed closer to the DBMS. We present a preference-aware relational data model that extends database tuples with preferences and an extended algebra that captures the essence of processing queries with preferences. A key novelty of our preference model itself is that it defines a preference in three dimensions showing the tuples affected, their preference scores and the credibility of the preference. Our query processing strategies push preference evaluation inside the query plan and leverage its algebraic properties for finer-grained query optimization. We experimentally evaluate the proposed strategies. Finally, we compare our framework to a pure plug-in implementation and we show its feasibility and advantages.
在实现感知偏好的查询处理时,一个直接的选择是在数据库引擎之上构建一个插件。然而,将DBMS视为黑盒会影响带有首选项的查询的表现力和性能。在本文中,我们认为偏好感知查询处理需要更接近DBMS。我们提出了一个具有偏好感知的关系数据模型,该模型扩展了具有偏好的数据库元组,并提供了一个扩展代数,该代数捕获了处理具有偏好的查询的本质。我们的偏好模型本身的一个关键新颖之处在于,它在三个维度上定义了偏好,显示了受影响的元组、它们的偏好得分和偏好的可信度。我们的查询处理策略将首选项评估推入查询计划,并利用其代数属性进行更细粒度的查询优化。我们通过实验对所提出的策略进行了评估。最后,我们将我们的框架与纯插件实现进行了比较,并展示了其可行性和优势。
{"title":"Towards Preference-aware Relational Databases","authors":"Anastasios Arvanitis, G. Koutrika","doi":"10.1109/ICDE.2012.31","DOIUrl":"https://doi.org/10.1109/ICDE.2012.31","url":null,"abstract":"In implementing preference-aware query processing, a straightforward option is to build a plug-in on top of the database engine. However, treating the DBMS as a black box affects both the expressivity and performance of queries with preferences. In this paper, we argue that preference-aware query processing needs to be pushed closer to the DBMS. We present a preference-aware relational data model that extends database tuples with preferences and an extended algebra that captures the essence of processing queries with preferences. A key novelty of our preference model itself is that it defines a preference in three dimensions showing the tuples affected, their preference scores and the credibility of the preference. Our query processing strategies push preference evaluation inside the query plan and leverage its algebraic properties for finer-grained query optimization. We experimentally evaluate the proposed strategies. Finally, we compare our framework to a pure plug-in implementation and we show its feasibility and advantages.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127342674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
AutoDict: Automated Dictionary Discovery 自动字典发现
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.126
Fei Chiang, Periklis Andritsos, Erkang Zhu, Renée J. Miller
An attribute dictionary is a set of attributes together with a set of common values of each attribute. Such dictionaries are valuable in understanding unstructured or loosely structured textual descriptions of entity collections, such as product catalogs. Dictionaries provide the supervised data for learning product or entity descriptions. In this demonstration, we will present AutoDict, a system that analyzes input data records, and discovers high quality dictionaries using information theoretic techniques. To the best of our knowledge, AutoDict is the first end-to-end system for building attribute dictionaries. Our demonstration will showcase the different information analysis and extraction features within AutoDict, and highlight the process of generating high quality attribute dictionaries.
属性字典是一组属性以及每个属性的一组公共值。这样的字典在理解实体集合(如产品目录)的非结构化或松散结构化文本描述时很有价值。字典为学习产品或实体描述提供有监督的数据。在这个演示中,我们将介绍AutoDict,一个分析输入数据记录并使用信息理论技术发现高质量字典的系统。据我们所知,AutoDict是第一个用于构建属性字典的端到端系统。我们的演示将展示AutoDict中不同的信息分析和提取功能,并重点介绍生成高质量属性字典的过程。
{"title":"AutoDict: Automated Dictionary Discovery","authors":"Fei Chiang, Periklis Andritsos, Erkang Zhu, Renée J. Miller","doi":"10.1109/ICDE.2012.126","DOIUrl":"https://doi.org/10.1109/ICDE.2012.126","url":null,"abstract":"An attribute dictionary is a set of attributes together with a set of common values of each attribute. Such dictionaries are valuable in understanding unstructured or loosely structured textual descriptions of entity collections, such as product catalogs. Dictionaries provide the supervised data for learning product or entity descriptions. In this demonstration, we will present AutoDict, a system that analyzes input data records, and discovers high quality dictionaries using information theoretic techniques. To the best of our knowledge, AutoDict is the first end-to-end system for building attribute dictionaries. Our demonstration will showcase the different information analysis and extraction features within AutoDict, and highlight the process of generating high quality attribute dictionaries.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126052053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Extending Map-Reduce for Efficient Predicate-Based Sampling 基于谓词的高效采样扩展Map-Reduce
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.104
Raman Grover, M. Carey
In this paper we address the problem of using MapReduce to sample a massive data set in order to produce a fixed-size sample whose contents satisfy a given predicate. While it is simple to express this computation using MapReduce, its default Hadoop execution is dependent on the input size and is wasteful of cluster resources. This is unfortunate, as sampling queries are fairly common (e.g., for exploratory data analysis at Facebook), and the resulting waste can significantly impact the performance of a shared cluster. To address such use cases, we present the design, implementation and evaluation of a Hadoop execution model extension that supports incremental job expansion. Under this model, a job consumes input as required and can dynamically govern its resource consumption while producing the required results. The proposed mechanism is able to support a variety of policies regarding job growth rates as they relate to cluster capacity and current load. We have implemented the mechanism in Hadoop, and we present results from an experimental performance study of different job growth policies under both single- and multi-user workloads.
在本文中,我们解决了使用MapReduce对大量数据集进行采样的问题,以便生成固定大小的样本,其内容满足给定的谓词。虽然使用MapReduce表示这种计算很简单,但它的默认Hadoop执行依赖于输入大小,并且浪费集群资源。这是不幸的,因为抽样查询是相当常见的(例如,用于Facebook的探索性数据分析),并且由此产生的浪费会严重影响共享集群的性能。为了解决这样的用例,我们提出了一个支持增量作业扩展的Hadoop执行模型扩展的设计、实现和评估。在此模型下,作业根据需要消耗输入,并且可以在生成所需结果的同时动态地控制其资源消耗。所提议的机制能够支持与就业增长率相关的各种政策,因为它们与集群容量和当前负载相关。我们已经在Hadoop中实现了这一机制,并展示了在单用户和多用户工作负载下不同工作增长策略的实验性能研究结果。
{"title":"Extending Map-Reduce for Efficient Predicate-Based Sampling","authors":"Raman Grover, M. Carey","doi":"10.1109/ICDE.2012.104","DOIUrl":"https://doi.org/10.1109/ICDE.2012.104","url":null,"abstract":"In this paper we address the problem of using MapReduce to sample a massive data set in order to produce a fixed-size sample whose contents satisfy a given predicate. While it is simple to express this computation using MapReduce, its default Hadoop execution is dependent on the input size and is wasteful of cluster resources. This is unfortunate, as sampling queries are fairly common (e.g., for exploratory data analysis at Facebook), and the resulting waste can significantly impact the performance of a shared cluster. To address such use cases, we present the design, implementation and evaluation of a Hadoop execution model extension that supports incremental job expansion. Under this model, a job consumes input as required and can dynamically govern its resource consumption while producing the required results. The proposed mechanism is able to support a variety of policies regarding job growth rates as they relate to cluster capacity and current load. We have implemented the mechanism in Hadoop, and we present results from an experimental performance study of different job growth policies under both single- and multi-user workloads.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114941442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Parameter-Free Determination of Distance Thresholds for Metric Distance Constraints 度量距离约束中距离阈值的无参数确定
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.46
Shaoxu Song, Lei Chen, Hong Cheng
The importance of introducing distance constraints to data dependencies, such as differential dependencies (DDs) [28], has recently been recognized. The metric distance constraints are tolerant to small variations, which enable them apply to wide data quality checking applications, such as detecting data violations. However, the determination of distance thresholds for the metric distance constraints is non-trivial. It often relies on a truth data instance which embeds the distance constraints. To find useful distance threshold patterns from data, there are several guidelines of statistical measures to specify, e.g., support, confidence and dependent quality. Unfortunately, given a data instance, users might not have any knowledge about the data distribution, thus it is very challenging to set the right parameters. In this paper, we study the determination of distance thresholds for metric distance constraints, in a parameter-free style. Specifically, we compute an expected utility based on the statistical measures from the data. According to our analysis as well as experimental verification, distance threshold patterns with higher expected utility could offer better usage in real applications, such as violation detection. We then develop efficient algorithms to determine the distance thresholds having the maximum expected utility. Finally, our extensive experimental evaluation demonstrates the effectiveness and efficiency of the proposed methods.
最近,人们认识到对数据依赖关系(如差分依赖关系(dd))引入距离约束的重要性[28]。度量距离约束可以容忍小的变化,这使它们能够应用于广泛的数据质量检查应用程序,例如检测数据违例。然而,度量距离约束的距离阈值的确定是非平凡的。它通常依赖于嵌入距离约束的真值数据实例。为了从数据中找到有用的距离阈值模式,需要指定一些统计度量准则,例如支持度、置信度和依赖质量。不幸的是,给定一个数据实例,用户可能对数据分布没有任何了解,因此设置正确的参数非常具有挑战性。在本文中,我们以无参数的方式研究度量距离约束的距离阈值的确定。具体来说,我们根据数据的统计度量计算期望效用。根据我们的分析和实验验证,具有更高期望效用的距离阈值模式可以在实际应用中提供更好的用途,例如违规检测。然后,我们开发了有效的算法来确定具有最大期望效用的距离阈值。最后,我们进行了大量的实验评估,证明了所提出方法的有效性和效率。
{"title":"Parameter-Free Determination of Distance Thresholds for Metric Distance Constraints","authors":"Shaoxu Song, Lei Chen, Hong Cheng","doi":"10.1109/ICDE.2012.46","DOIUrl":"https://doi.org/10.1109/ICDE.2012.46","url":null,"abstract":"The importance of introducing distance constraints to data dependencies, such as differential dependencies (DDs) [28], has recently been recognized. The metric distance constraints are tolerant to small variations, which enable them apply to wide data quality checking applications, such as detecting data violations. However, the determination of distance thresholds for the metric distance constraints is non-trivial. It often relies on a truth data instance which embeds the distance constraints. To find useful distance threshold patterns from data, there are several guidelines of statistical measures to specify, e.g., support, confidence and dependent quality. Unfortunately, given a data instance, users might not have any knowledge about the data distribution, thus it is very challenging to set the right parameters. In this paper, we study the determination of distance thresholds for metric distance constraints, in a parameter-free style. Specifically, we compute an expected utility based on the statistical measures from the data. According to our analysis as well as experimental verification, distance threshold patterns with higher expected utility could offer better usage in real applications, such as violation detection. We then develop efficient algorithms to determine the distance thresholds having the maximum expected utility. Finally, our extensive experimental evaluation demonstrates the effectiveness and efficiency of the proposed methods.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122136990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Multi-query Stream Processing on FPGAs fpga上的多查询流处理
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.39
Mohammad Sadoghi, Rija Javed, Naif Tarafdar, Harsh V. P. Singh, R. Palaniappan, H. Jacobsen
We present an efficient multi-query event stream platform to support query processing over high-frequency event streams. Our platform is built over reconfigurable hardware -- FPGAs -- to achieve line-rate multi-query processing by exploiting unprecedented degrees of parallelism and potential for pipelining, only available through custom-built, application-specific and low-level logic design. Moreover, a multi-query event stream processing engine is at the core of a wide range of applications including real-time data analytics, algorithmic trading, targeted advertisement, and (complex) event processing.
我们提出了一个高效的多查询事件流平台,支持对高频事件流的查询处理。我们的平台建立在可重构硬件(fpga)之上,通过利用前所未有的并行度和潜在的流水线来实现行速率的多查询处理,只有通过定制的、特定于应用程序的低级逻辑设计才能实现。此外,多查询事件流处理引擎是实时数据分析、算法交易、目标广告和(复杂)事件处理等广泛应用的核心。
{"title":"Multi-query Stream Processing on FPGAs","authors":"Mohammad Sadoghi, Rija Javed, Naif Tarafdar, Harsh V. P. Singh, R. Palaniappan, H. Jacobsen","doi":"10.1109/ICDE.2012.39","DOIUrl":"https://doi.org/10.1109/ICDE.2012.39","url":null,"abstract":"We present an efficient multi-query event stream platform to support query processing over high-frequency event streams. Our platform is built over reconfigurable hardware -- FPGAs -- to achieve line-rate multi-query processing by exploiting unprecedented degrees of parallelism and potential for pipelining, only available through custom-built, application-specific and low-level logic design. Moreover, a multi-query event stream processing engine is at the core of a wide range of applications including real-time data analytics, algorithmic trading, targeted advertisement, and (complex) event processing.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129755169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
An Efficient Graph Indexing Method 一种高效的图索引方法
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.28
Xiaoli Wang, Xiaofeng Ding, A. Tung, Shanshan Ying, Hai Jin
Graphs are popular models for representing complex structure data and similarity search for graphs has become a fundamental research problem. Many techniques have been proposed to support similarity search based on the graph edit distance. However, they all suffer from certain drawbacks: high computational complexity, poor scalability in terms of database size, or not taking full advantage of indexes. To address these problems, in this paper, we propose SEGOS, an indexing and query processing framework for graph similarity search. First, an effective two-level index is constructed off-line based on sub-unit decomposition of graphs. Then, a novel search strategy based on the index is proposed. Two algorithms adapted from TA and CA methods are seamlessly integrated into the proposed strategy to enhance graph search. More specially, the proposed framework is easy to be pipelined to support continuous graph pruning. Extensive experiments are conducted on two real datasets to evaluate the effectiveness and scalability of our approaches.
图是表示复杂结构数据的常用模型,图的相似度搜索已成为一个基本的研究问题。基于图编辑距离的相似度搜索已经被提出了许多技术。然而,它们都有一定的缺点:计算复杂性高,数据库大小方面的可伸缩性差,或者没有充分利用索引。为了解决这些问题,本文提出了一种用于图相似度搜索的索引和查询处理框架SEGOS。首先,基于图的子单元分解,离线构造有效的二级索引;然后,提出了一种基于索引的搜索策略。两种算法改编自TA和CA方法无缝集成到所提出的策略中,以增强图搜索。更特别的是,该框架易于流水线化以支持连续的图修剪。在两个真实数据集上进行了大量的实验,以评估我们的方法的有效性和可扩展性。
{"title":"An Efficient Graph Indexing Method","authors":"Xiaoli Wang, Xiaofeng Ding, A. Tung, Shanshan Ying, Hai Jin","doi":"10.1109/ICDE.2012.28","DOIUrl":"https://doi.org/10.1109/ICDE.2012.28","url":null,"abstract":"Graphs are popular models for representing complex structure data and similarity search for graphs has become a fundamental research problem. Many techniques have been proposed to support similarity search based on the graph edit distance. However, they all suffer from certain drawbacks: high computational complexity, poor scalability in terms of database size, or not taking full advantage of indexes. To address these problems, in this paper, we propose SEGOS, an indexing and query processing framework for graph similarity search. First, an effective two-level index is constructed off-line based on sub-unit decomposition of graphs. Then, a novel search strategy based on the index is proposed. Two algorithms adapted from TA and CA methods are seamlessly integrated into the proposed strategy to enhance graph search. More specially, the proposed framework is easy to be pipelined to support continuous graph pruning. Extensive experiments are conducted on two real datasets to evaluate the effectiveness and scalability of our approaches.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128428209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 95
Analyzing Query Optimization Process: Portraits of Join Enumeration Algorithms 分析查询优化过程:联接枚举算法的画像
Pub Date : 2012-04-01 DOI: 10.1109/ICDE.2012.132
A. Nica, I. Charlesworth, Maysum Panju
Search spaces generated by query optimizers during the optimization process encapsulate characteristics of the join enumeration algorithms, the cost models, as well as critical decisions made for pruning and choosing the best plan. We demonstrate the Join Enumeration Viewer which is a tool designed for visualizing, mining, and comparing plan search spaces generated by different join enumeration algorithms when optimizing same SQL statement. We have enhanced Sybase SQL Anywhere relational database management system to log, in a very compact format, its search space during an optimization process. Such optimization log can then be analyzed by the Join Enumeration Viewer which internally builds the logical and physical plan graphs representing complete and partial plans considered during the optimization process. The optimization logs also contain statistics of the resource consumption during the query optimization such as optimization time breakdown, for example, for logical join enumeration versus costing physical plans, and memory allocation for different optimization structures. The SQL Anywhere Optimizer implements a highly adaptable, self-managing, search space generation algorithm by having several join enumeration algorithms to choose from, each enhanced with different ordering and pruning techniques. The emphasis of the demonstration will be on comparing and contrasting these join enumeration algorithms by analyzing their optimization logs. The demonstration scenarios will include optimizing SQL statements under various conditions which will exercise different algorithms, pruning and ordering techniques. These search spaces will then be visualized and compared using the Join Enumeration Viewer.
查询优化器在优化过程中生成的搜索空间封装了连接枚举算法、成本模型以及为修剪和选择最佳计划而做出的关键决策的特征。我们将演示Join Enumeration Viewer,它是一种工具,用于在优化同一SQL语句时,对不同的联接枚举算法生成的计划搜索空间进行可视化、挖掘和比较。我们增强了Sybase SQL Anywhere关系数据库管理系统,在优化过程中以非常紧凑的格式记录其搜索空间。然后Join Enumeration Viewer可以分析这种优化日志,它在内部构建表示优化过程中考虑的完整和部分计划的逻辑和物理规划图。优化日志还包含查询优化期间的资源消耗统计信息,例如优化时间分解(例如,逻辑连接枚举与成本物理计划),以及不同优化结构的内存分配。SQL Anywhere Optimizer通过多种连接枚举算法可供选择,每种算法都使用不同的排序和修剪技术进行了增强,从而实现了一种高度适应性、自我管理的搜索空间生成算法。演示的重点是通过分析这些连接枚举算法的优化日志来比较和对比它们。演示场景将包括在各种条件下优化SQL语句,这些条件将使用不同的算法、修剪和排序技术。然后将使用Join Enumeration Viewer对这些搜索空间进行可视化和比较。
{"title":"Analyzing Query Optimization Process: Portraits of Join Enumeration Algorithms","authors":"A. Nica, I. Charlesworth, Maysum Panju","doi":"10.1109/ICDE.2012.132","DOIUrl":"https://doi.org/10.1109/ICDE.2012.132","url":null,"abstract":"Search spaces generated by query optimizers during the optimization process encapsulate characteristics of the join enumeration algorithms, the cost models, as well as critical decisions made for pruning and choosing the best plan. We demonstrate the Join Enumeration Viewer which is a tool designed for visualizing, mining, and comparing plan search spaces generated by different join enumeration algorithms when optimizing same SQL statement. We have enhanced Sybase SQL Anywhere relational database management system to log, in a very compact format, its search space during an optimization process. Such optimization log can then be analyzed by the Join Enumeration Viewer which internally builds the logical and physical plan graphs representing complete and partial plans considered during the optimization process. The optimization logs also contain statistics of the resource consumption during the query optimization such as optimization time breakdown, for example, for logical join enumeration versus costing physical plans, and memory allocation for different optimization structures. The SQL Anywhere Optimizer implements a highly adaptable, self-managing, search space generation algorithm by having several join enumeration algorithms to choose from, each enhanced with different ordering and pruning techniques. The emphasis of the demonstration will be on comparing and contrasting these join enumeration algorithms by analyzing their optimization logs. The demonstration scenarios will include optimizing SQL statements under various conditions which will exercise different algorithms, pruning and ordering techniques. These search spaces will then be visualized and compared using the Join Enumeration Viewer.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128604366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2012 IEEE 28th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1