首页 > 最新文献

Proceedings. International Database Engineering and Applications Symposium最新文献

英文 中文
Black-box determination of cost models' parameters for federated stream-processing systems 联邦流处理系统成本模型参数的黑盒确定
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076654
Michael Daum, F. Lauterwald, P. Baumgärtel, Niko Pollner, K. Meyer-Wegener
For distribution and deployment of queries in distributed stream-processing environments, it is vital to estimate the expected costs in advance. Having heterogeneous Stream-Processing Systems (SPSs) running on various hosts, the parameters of a cost model for an operator must be determined by measurements for each relevant combination of an SPS and hardware. This paper presents a black-box method that determines the parameters of appropriate cost models that regard system-specific behavior. For some SPSs, there might not be any appropriate cost model available due to the lack of internal knowledge. If no cost model is available for any reason, we provide and apply a non-parametric model.
对于在分布式流处理环境中分发和部署查询,提前估计预期成本是至关重要的。由于异构流处理系统(SPSs)运行在不同的主机上,运营商成本模型的参数必须通过测量SPS和硬件的每个相关组合来确定。本文提出了一种黑盒方法来确定考虑系统特定行为的适当成本模型的参数。对于一些SPSs,由于缺乏内部知识,可能没有任何合适的成本模型可用。如果由于任何原因没有可用的成本模型,我们将提供并应用非参数模型。
{"title":"Black-box determination of cost models' parameters for federated stream-processing systems","authors":"Michael Daum, F. Lauterwald, P. Baumgärtel, Niko Pollner, K. Meyer-Wegener","doi":"10.1145/2076623.2076654","DOIUrl":"https://doi.org/10.1145/2076623.2076654","url":null,"abstract":"For distribution and deployment of queries in distributed stream-processing environments, it is vital to estimate the expected costs in advance. Having heterogeneous Stream-Processing Systems (SPSs) running on various hosts, the parameters of a cost model for an operator must be determined by measurements for each relevant combination of an SPS and hardware.\u0000 This paper presents a black-box method that determines the parameters of appropriate cost models that regard system-specific behavior. For some SPSs, there might not be any appropriate cost model available due to the lack of internal knowledge. If no cost model is available for any reason, we provide and apply a non-parametric model.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"26 1","pages":"226-232"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84036106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
FB-tree: a B+-tree for flash-based SSDs FB-tree:基于闪存的ssd的B+树
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076629
Martin V. Jørgensen, René Bech Rasmussen, Simonas Šaltenis, Carsten Schjønning
Due to their many advantages, flash-based SSDs (Solid-State Drives) have become a mainstream alternative to magnetic disks for database servers. Nevertheless, database systems, designed and optimized for magnetic disks, still do not fully exploit all the benefits of the new technology. We propose the FB-tree: a combination of an adapted B+-tree, a storage manager, and a buffer manager, all optimized for modern SSDs. Together the techniques enable writing to SSDs in relatively large blocks, thus achieving greater overall throughput. This is achieved by the out-of-place writing, whereby every time a modified index node is written, it is written to a new address, clustered with some other nodes that are written together. While this constantly frees index nodes, the FB-tree does not introduce any garbage-collection overhead, instead relying on naturally occurring free-space segments of sufficient size. As a consequence, the FB-tree outperforms a regular B+-tree in all scenarios tested. For instance, the throughput of a random workload of 75% updates increases by a factor of three using only two times the space of the B+-tree.
由于其许多优点,基于闪存的ssd(固态硬盘)已成为数据库服务器的主流替代磁盘。然而,为磁盘设计和优化的数据库系统仍然没有充分利用新技术的所有好处。我们提出了fb树:一个改编的B+树、一个存储管理器和一个缓冲区管理器的组合,所有这些都针对现代ssd进行了优化。这些技术结合在一起,可以在相对较大的块中写入ssd,从而实现更高的总体吞吐量。这是通过异地写入实现的,即每次写入修改的索引节点时,都会将其写入一个新地址,并与其他一些写入在一起的节点聚集在一起。虽然这会不断释放索引节点,但fb树不会引入任何垃圾收集开销,而是依赖于自然产生的足够大的自由空间段。因此,在所有测试场景中,fb树的性能都优于常规B+树。例如,更新率为75%的随机工作负载的吞吐量仅使用B+树的两倍空间就增加了三倍。
{"title":"FB-tree: a B+-tree for flash-based SSDs","authors":"Martin V. Jørgensen, René Bech Rasmussen, Simonas Šaltenis, Carsten Schjønning","doi":"10.1145/2076623.2076629","DOIUrl":"https://doi.org/10.1145/2076623.2076629","url":null,"abstract":"Due to their many advantages, flash-based SSDs (Solid-State Drives) have become a mainstream alternative to magnetic disks for database servers. Nevertheless, database systems, designed and optimized for magnetic disks, still do not fully exploit all the benefits of the new technology.\u0000 We propose the FB-tree: a combination of an adapted B+-tree, a storage manager, and a buffer manager, all optimized for modern SSDs. Together the techniques enable writing to SSDs in relatively large blocks, thus achieving greater overall throughput. This is achieved by the out-of-place writing, whereby every time a modified index node is written, it is written to a new address, clustered with some other nodes that are written together. While this constantly frees index nodes, the FB-tree does not introduce any garbage-collection overhead, instead relying on naturally occurring free-space segments of sufficient size. As a consequence, the FB-tree outperforms a regular B+-tree in all scenarios tested. For instance, the throughput of a random workload of 75% updates increases by a factor of three using only two times the space of the B+-tree.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"68 1","pages":"34-42"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91349463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Using an inference mechanism for helping the data integration 使用推理机制帮助数据集成
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076660
V. Pequeno, J. Pires
Sharing and integrating information across multiple autonomous and heterogeneous data sources has emerged as a strategic requirement in modern business. We deal with this problem by proposing a declarative approach based on the creation of a reference model and perspective schemata. The former serves as a common semantic meta-model, while the latter defines correspondence between schemata. Furthermore, using the proposed architecture, we developed an inference mechanism which allows the (semi-) automatic derivation of new mappings between schemata from previous ones. The aim of this paper is to present the proposed inference mechanism.
跨多个自治和异构数据源共享和集成信息已成为现代业务中的一种战略需求。为了解决这个问题,我们提出了一种基于参考模型和透视图模式创建的声明性方法。前者用作公共语义元模型,而后者定义模式之间的对应关系。此外,使用所提出的体系结构,我们开发了一种推理机制,该机制允许(半)自动地从以前的模式之间派生出新的映射。本文的目的是提出所提出的推理机制。
{"title":"Using an inference mechanism for helping the data integration","authors":"V. Pequeno, J. Pires","doi":"10.1145/2076623.2076660","DOIUrl":"https://doi.org/10.1145/2076623.2076660","url":null,"abstract":"Sharing and integrating information across multiple autonomous and heterogeneous data sources has emerged as a strategic requirement in modern business. We deal with this problem by proposing a declarative approach based on the creation of a reference model and perspective schemata. The former serves as a common semantic meta-model, while the latter defines correspondence between schemata. Furthermore, using the proposed architecture, we developed an inference mechanism which allows the (semi-) automatic derivation of new mappings between schemata from previous ones. The aim of this paper is to present the proposed inference mechanism.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"22 1","pages":"251-253"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78161131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A predictable storage model for scalable parallel DW 可伸缩并行DW的可预测存储模型
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076628
J. Costa, J. Cecílio, P. Martins, P. Furtado
Star schema model, has been widely used as the facto DW storage organization on RDBMS. Business measures are stored in a central fact table along with a set of foreign keys referencing dimension tables. While this storage organization offers a good trade-off between storage size and performance for a single node, it doesn't scale in a predictable manner in shared-nothing parallel architectures. Although fact tables can be linearly partitioned among nodes, the same doesn't apply to dimensions, which unbalances (increases) the dimensions/fact_table size ratio, and consequently introduces limits to the number of parallel nodes. In this paper we propose and evaluate a parallel DW storage model, that overcomes these limitations and deliver optimal speed-up and scale-up capabilities with top efficiency. We use the TPC-H benchmark to evaluate the scalability and efficiency of the proposed model.
星型模式模型,已被广泛应用于RDBMS的实际数据存储组织。业务度量与一组引用维度表的外键一起存储在一个中央事实表中。虽然这种存储组织在单个节点的存储大小和性能之间提供了很好的权衡,但在无共享的并行体系结构中,它无法以可预测的方式进行扩展。尽管事实表可以在节点之间进行线性分区,但这并不适用于维度,这会使维度/fact_table大小比率失衡(增加),从而限制并行节点的数量。在本文中,我们提出并评估了一个并行DW存储模型,该模型克服了这些限制,并以最高的效率提供了最佳的加速和扩展能力。我们使用TPC-H基准来评估所提出模型的可扩展性和效率。
{"title":"A predictable storage model for scalable parallel DW","authors":"J. Costa, J. Cecílio, P. Martins, P. Furtado","doi":"10.1145/2076623.2076628","DOIUrl":"https://doi.org/10.1145/2076623.2076628","url":null,"abstract":"Star schema model, has been widely used as the facto DW storage organization on RDBMS. Business measures are stored in a central fact table along with a set of foreign keys referencing dimension tables. While this storage organization offers a good trade-off between storage size and performance for a single node, it doesn't scale in a predictable manner in shared-nothing parallel architectures. Although fact tables can be linearly partitioned among nodes, the same doesn't apply to dimensions, which unbalances (increases) the dimensions/fact_table size ratio, and consequently introduces limits to the number of parallel nodes. In this paper we propose and evaluate a parallel DW storage model, that overcomes these limitations and deliver optimal speed-up and scale-up capabilities with top efficiency. We use the TPC-H benchmark to evaluate the scalability and efficiency of the proposed model.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"29 1","pages":"26-33"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76558708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A family of graph-theory-driven algorithms for managing complex probabilistic graph data efficiently 一组有效管理复杂概率图数据的图论驱动算法
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076657
A. Cuzzocrea, Paolo Serafino
Traditionally, a great deal of attention has been devoted to the problem of effectively modeling and querying probabilistic graph data. State-of-the-art proposals are not prone to deal with complex probabilistic data, as they essentially introduce simple data models (e.g., based on confidence intervals) and straightforward query methodologies (e.g., based on the reachability property). According to our vision, these proposals need to be extended towards achieving the definition of innovative models and algorithms capable of dealing with the hardness of novel requirements posed by managing complex probabilistic graph data efficiently. Inspired by this main motivation, in this paper we propose and experimentally assess an innovative family of graph-theory-driven algorithms for managing complex probabilistic graph data, whose main double-fold goal consists in enhancing the expressive power of the underlying probabilistic graph data model and the expressive power of graph queries.
传统上,对概率图数据的有效建模和查询一直是人们关注的问题。最先进的建议不倾向于处理复杂的概率数据,因为它们本质上引入了简单的数据模型(例如,基于置信区间)和直接的查询方法(例如,基于可达性属性)。根据我们的愿景,这些建议需要扩展到实现创新模型和算法的定义,这些模型和算法能够有效地处理管理复杂概率图数据所带来的新需求的硬度。受此主要动机的启发,本文提出并实验评估了一系列创新的图论驱动算法,用于管理复杂的概率图数据,其主要双重目标包括增强底层概率图数据模型的表达能力和图查询的表达能力。
{"title":"A family of graph-theory-driven algorithms for managing complex probabilistic graph data efficiently","authors":"A. Cuzzocrea, Paolo Serafino","doi":"10.1145/2076623.2076657","DOIUrl":"https://doi.org/10.1145/2076623.2076657","url":null,"abstract":"Traditionally, a great deal of attention has been devoted to the problem of effectively modeling and querying probabilistic graph data. State-of-the-art proposals are not prone to deal with complex probabilistic data, as they essentially introduce simple data models (e.g., based on confidence intervals) and straightforward query methodologies (e.g., based on the reachability property). According to our vision, these proposals need to be extended towards achieving the definition of innovative models and algorithms capable of dealing with the hardness of novel requirements posed by managing complex probabilistic graph data efficiently. Inspired by this main motivation, in this paper we propose and experimentally assess an innovative family of graph-theory-driven algorithms for managing complex probabilistic graph data, whose main double-fold goal consists in enhancing the expressive power of the underlying probabilistic graph data model and the expressive power of graph queries.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"13 1","pages":"240-242"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82002829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Aggregates and priorities in P2P data management systems P2P数据管理系统中的聚合和优先级
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076625
Luciano Caroprese, E. Zumpano
This paper investigates the data exchange problem among distributed independent sources. It is based on previous works of the authors [11, 12, 14] in which a declarative semantics for P2P systems has been presented and a mechanism to set different degrees of reliability for neighbor peers has been provided. The basic semantics for P2P systems defines the concept of Maximal Weak Models (in [11, 12, 14] these models have been called Preferred Weak Models. In this paper we rename them and use the term Preferred for the subclass of Weak Model defined here) that represent scenarios in which maximal sets of facts not violating integrity constraints are imported into the peers [11, 12]. Previous priority mechanism defined in [14] is rigid in the sense that the preference between conflicting sets of atoms that a peer can import only depends on the priorities associated to the source peers at design time. In this paper we present a different framework that allows to select among different scenarios looking at the properties of data provided by the peers. The framework presented here allows to model concepts like "in the case of conflicting information, it is preferable to import data from the neighbor peer that can provide the maximum number of tuples" or "in the case of conflicting information, it is preferable to import data from the neighbor peer such that the sum of the values of an attribute is minimum" without selecting a-priori preferred peers. To enforce this preference mechanism we enrich the previous P2P framework with aggregate functions and present significant examples showing the flexibility of the new framework.
本文研究了分布式独立数据源之间的数据交换问题。它基于作者以前的工作[11,12,14],其中提出了P2P系统的声明性语义,并提供了为邻居对等体设置不同程度可靠性的机制。P2P系统的基本语义定义了最大弱模型的概念(在[11,12,14]中,这些模型被称为首选弱模型。在本文中,我们将它们重新命名,并使用术语Preferred来表示这里定义的弱模型的子类),它们表示不违反完整性约束的最大事实集被导入对等体的场景[11,12]。先前在[14]中定义的优先级机制是刚性的,因为对等体可以导入的冲突原子集之间的优先级仅取决于设计时与源对等体相关的优先级。在本文中,我们提出了一个不同的框架,允许在不同的场景中进行选择,查看对等体提供的数据的属性。这里提出的框架允许对诸如“在信息冲突的情况下,最好从可以提供最大数量元组的邻居节点导入数据”或“在信息冲突的情况下,最好从邻居节点导入数据,这样属性值的总和是最小的”这样的概念进行建模,而不需要选择先验的首选节点。为了加强这种偏好机制,我们用聚合函数丰富了以前的P2P框架,并给出了展示新框架灵活性的重要示例。
{"title":"Aggregates and priorities in P2P data management systems","authors":"Luciano Caroprese, E. Zumpano","doi":"10.1145/2076623.2076625","DOIUrl":"https://doi.org/10.1145/2076623.2076625","url":null,"abstract":"This paper investigates the data exchange problem among distributed independent sources. It is based on previous works of the authors [11, 12, 14] in which a declarative semantics for P2P systems has been presented and a mechanism to set different degrees of reliability for neighbor peers has been provided. The basic semantics for P2P systems defines the concept of Maximal Weak Models (in [11, 12, 14] these models have been called Preferred Weak Models. In this paper we rename them and use the term Preferred for the subclass of Weak Model defined here) that represent scenarios in which maximal sets of facts not violating integrity constraints are imported into the peers [11, 12]. Previous priority mechanism defined in [14] is rigid in the sense that the preference between conflicting sets of atoms that a peer can import only depends on the priorities associated to the source peers at design time. In this paper we present a different framework that allows to select among different scenarios looking at the properties of data provided by the peers. The framework presented here allows to model concepts like \"in the case of conflicting information, it is preferable to import data from the neighbor peer that can provide the maximum number of tuples\" or \"in the case of conflicting information, it is preferable to import data from the neighbor peer such that the sum of the values of an attribute is minimum\" without selecting a-priori preferred peers. To enforce this preference mechanism we enrich the previous P2P framework with aggregate functions and present significant examples showing the flexibility of the new framework.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"54 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90248729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Online outlier detection for data streams 数据流的在线异常值检测
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076635
Md. Shiblee Sadik, L. Gruenwald
Outlier detection is a well established area of statistics but most of the existing outlier detection techniques are designed for applications where the entire dataset is available for random access. A typical outlier detection technique constructs a standard data distribution or model and identifies the deviated data points from the model as outliers. Evidently these techniques are not suitable for online data streams where the entire dataset, due to its unbounded volume, is not available for random access. Moreover, the data distribution in data streams change over time which challenges the existing outlier detection techniques that assume a constant standard data distribution for the entire dataset. In addition, data streams are characterized by uncertainty which imposes further complexity. In this paper we propose an adaptive, online outlier detection technique addressing the aforementioned characteristics of data streams, called Adaptive Outlier Detection for Data Streams (A-ODDS), which identifies outliers with respect to all the received data points as well as temporally close data points. The temporally close data points are selected based on time and change of data distribution. We also present an efficient and online implementation of the technique and a performance study showing the superiority of A-ODDS over existing techniques in terms of accuracy and execution time on a real-life dataset collected from meteorological applications.
离群值检测是一个很好的统计学领域,但大多数现有的离群值检测技术都是为随机访问整个数据集的应用而设计的。典型的离群点检测技术构建一个标准的数据分布或模型,并将偏离模型的数据点识别为离群点。显然,这些技术不适合在线数据流,因为整个数据集的体积是无限的,不能随机访问。此外,数据流中的数据分布随着时间的推移而变化,这对现有的离群值检测技术提出了挑战,这些技术假设整个数据集的数据分布是恒定的。此外,数据流的特点是不确定性,这进一步增加了复杂性。在本文中,我们提出了一种自适应的在线离群值检测技术,用于解决数据流的上述特征,称为数据流的自适应离群值检测(A-ODDS),它可以识别所有接收到的数据点以及暂时关闭的数据点的离群值。根据时间和数据分布的变化选择暂时关闭的数据点。我们还介绍了该技术的高效在线实施和性能研究,显示了a - odds在从气象应用中收集的真实数据集的准确性和执行时间方面优于现有技术。
{"title":"Online outlier detection for data streams","authors":"Md. Shiblee Sadik, L. Gruenwald","doi":"10.1145/2076623.2076635","DOIUrl":"https://doi.org/10.1145/2076623.2076635","url":null,"abstract":"Outlier detection is a well established area of statistics but most of the existing outlier detection techniques are designed for applications where the entire dataset is available for random access. A typical outlier detection technique constructs a standard data distribution or model and identifies the deviated data points from the model as outliers. Evidently these techniques are not suitable for online data streams where the entire dataset, due to its unbounded volume, is not available for random access. Moreover, the data distribution in data streams change over time which challenges the existing outlier detection techniques that assume a constant standard data distribution for the entire dataset. In addition, data streams are characterized by uncertainty which imposes further complexity. In this paper we propose an adaptive, online outlier detection technique addressing the aforementioned characteristics of data streams, called Adaptive Outlier Detection for Data Streams (A-ODDS), which identifies outliers with respect to all the received data points as well as temporally close data points. The temporally close data points are selected based on time and change of data distribution. We also present an efficient and online implementation of the technique and a performance study showing the superiority of A-ODDS over existing techniques in terms of accuracy and execution time on a real-life dataset collected from meteorological applications.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"47 1","pages":"88-96"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91212880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Boosting tuple propagation in multi-relational classification 促进多关系分类中的元组传播
Pub Date : 2011-09-21 DOI: 10.1145/2076623.2076637
Lucantonio Ghionna, G. Greco
Multi-relational classification is a mining method aiming at building classifiers for the tuples in some target relation based on its own data as well as on the data possibly dispersed over other non-target relations, by exploiting the relationships among them formalized via foreign key constraints. While improving on the efficacy of the resulting classifiers, propagating data via the foreign key constraints deteriorates the scalability of the underlying algorithm. In the paper, various techniques are discussed to efficiently implement this propagation task, and hence to boost performances of current multi-relational classification algorithms. These techniques are based on suitable adaptations of state-of-the-art query optimization methods, and are conceived to be coupled with database management systems. A system prototype integrating all the techniques is illustrated, and results of experimental activity conducted on top of it are eventually discussed.
多关系分类是一种挖掘方法,其目的是利用外键约束形式化的元组之间的关系,根据目标关系本身的数据,以及可能分散在其他非目标关系上的数据,为目标关系中的元组构建分类器。虽然提高了结果分类器的效率,但通过外键约束传播数据会降低底层算法的可伸缩性。本文讨论了各种技术来有效地实现这种传播任务,从而提高当前多关系分类算法的性能。这些技术基于对最先进的查询优化方法的适当调整,并被认为与数据库管理系统相结合。最后给出了一个集成了所有技术的系统原型,并讨论了在此基础上进行的实验活动的结果。
{"title":"Boosting tuple propagation in multi-relational classification","authors":"Lucantonio Ghionna, G. Greco","doi":"10.1145/2076623.2076637","DOIUrl":"https://doi.org/10.1145/2076623.2076637","url":null,"abstract":"Multi-relational classification is a mining method aiming at building classifiers for the tuples in some target relation based on its own data as well as on the data possibly dispersed over other non-target relations, by exploiting the relationships among them formalized via foreign key constraints. While improving on the efficacy of the resulting classifiers, propagating data via the foreign key constraints deteriorates the scalability of the underlying algorithm. In the paper, various techniques are discussed to efficiently implement this propagation task, and hence to boost performances of current multi-relational classification algorithms. These techniques are based on suitable adaptations of state-of-the-art query optimization methods, and are conceived to be coupled with database management systems. A system prototype integrating all the techniques is illustrated, and results of experimental activity conducted on top of it are eventually discussed.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"1 1","pages":"106-114"},"PeriodicalIF":0.0,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79916460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the expressiveness of generalization rules for XPath query relaxation 关于XPath查询松弛的泛化规则的可表达性
Pub Date : 2010-08-16 DOI: 10.1145/1866480.1866504
Bettina Fazzinga, S. Flesca, F. Furfaro
The problem of defining suitable rewriting mechanisms for XML query languages to support approximate query answering has received a great deal of attention in the last few years, owing to its practical impact in several scenarios. For instance, in the typical scenario of distributed XML data without a shared data scheme, accomplishing the extraction of the information of interest often requires queries to be rewritten into relaxed ones, in order to adapt them to the schemes adopted in the different sources. In this paper, rewriting systems for a wide fragment of XPath (which is the core of several languages for manipulating XML data) are investigated, and a general form of rewriting rules (namely, generalization rules) is considered, which subsumes the forms adopted in the most well-known rewriting systems. Specifically, the expressiveness of rewriting systems based on this form of rules is characterized: on the one hand, it is shown that rewriting systems based on generalization rules are incomplete w.r.t. containment (thus, traditional rewriting mechanisms do not suffice to rewrite a query into any more general one). On the other hand, it is also shown that the expressiveness of state-of-the-art rewriting systems can be improved by employing rewriting primitives as simple as those traditionally used, which enable any query to be relaxed into every more general one related to it via homomorphism.
为XML查询语言定义合适的重写机制以支持近似查询应答的问题在过去几年中受到了极大的关注,因为它在几个场景中会产生实际影响。例如,在没有共享数据模式的分布式XML数据的典型场景中,完成感兴趣信息的提取通常需要将查询重写为宽松的查询,以便使其适应不同数据源中采用的模式。本文研究了广泛的XPath片段(它是操纵XML数据的几种语言的核心)的重写系统,并考虑了重写规则的一般形式(即泛化规则),它包含了最著名的重写系统中采用的形式。具体来说,基于这种规则形式的重写系统的表达性是有特征的:一方面,它表明基于泛化规则的重写系统是不完整的w.r.t.包含(因此,传统的重写机制不足以将查询重写为更通用的查询)。另一方面,它也表明,通过使用与传统使用的一样简单的重写原语,可以改进最先进的重写系统的表达性,这使得任何查询都可以通过同态简化为与之相关的每个更一般的查询。
{"title":"On the expressiveness of generalization rules for XPath query relaxation","authors":"Bettina Fazzinga, S. Flesca, F. Furfaro","doi":"10.1145/1866480.1866504","DOIUrl":"https://doi.org/10.1145/1866480.1866504","url":null,"abstract":"The problem of defining suitable rewriting mechanisms for XML query languages to support approximate query answering has received a great deal of attention in the last few years, owing to its practical impact in several scenarios. For instance, in the typical scenario of distributed XML data without a shared data scheme, accomplishing the extraction of the information of interest often requires queries to be rewritten into relaxed ones, in order to adapt them to the schemes adopted in the different sources.\u0000 In this paper, rewriting systems for a wide fragment of XPath (which is the core of several languages for manipulating XML data) are investigated, and a general form of rewriting rules (namely, generalization rules) is considered, which subsumes the forms adopted in the most well-known rewriting systems. Specifically, the expressiveness of rewriting systems based on this form of rules is characterized: on the one hand, it is shown that rewriting systems based on generalization rules are incomplete w.r.t. containment (thus, traditional rewriting mechanisms do not suffice to rewrite a query into any more general one). On the other hand, it is also shown that the expressiveness of state-of-the-art rewriting systems can be improved by employing rewriting primitives as simple as those traditionally used, which enable any query to be relaxed into every more general one related to it via homomorphism.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"17 1","pages":"157-168"},"PeriodicalIF":0.0,"publicationDate":"2010-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80976172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
An integrative approach to query optimization in native XML database management systems 原生XML数据库管理系统中查询优化的集成方法
Pub Date : 2010-08-16 DOI: 10.1145/1866480.1866491
A. Weiner, T. Härder
Even though an effective cost-based query optimizer is of utmost importance for the efficient evaluation of XQuery expressions in native XML database systems, such a component is currently out of sight, because former approaches do not pay attention to the latest advances in the area of physical operators (e. g., Holistic Twig Joins and advanced indexes) or just focus only on some of them. To support the development of native XML query optimizers, we introduce an extensible cost-based optimization framework that integrates the cutting-edge XML query evaluation operators into a single system. Using the well-known plan generation techniques from the relational world and a novel set of plan equivalences---which allows for the generation of alternative query plans consisting of Structural Joins, Holistic Twig Joins, and numerous indexes (especially path indexes and content-and-structure indexes)---our optimizer can now benefit from the knowledge on native XML query evaluation to speed-up query execution significantly.
尽管有效的基于成本的查询优化器对于在原生XML数据库系统中高效地求值XQuery表达式至关重要,但目前还看不到这样的组件,因为以前的方法不关注物理操作符领域的最新进展(例如,整体小枝连接和高级索引),或者只关注其中的一些。为了支持原生XML查询优化器的开发,我们引入了一个可扩展的基于成本的优化框架,该框架将先进的XML查询求值操作符集成到单个系统中。使用关系领域中众所周知的计划生成技术和一组新颖的计划等价——允许生成由结构连接、整体分支连接和许多索引(特别是路径索引和内容与结构索引)组成的备选查询计划——我们的优化器现在可以从原生XML查询计算知识中获益,从而显著加快查询执行速度。
{"title":"An integrative approach to query optimization in native XML database management systems","authors":"A. Weiner, T. Härder","doi":"10.1145/1866480.1866491","DOIUrl":"https://doi.org/10.1145/1866480.1866491","url":null,"abstract":"Even though an effective cost-based query optimizer is of utmost importance for the efficient evaluation of XQuery expressions in native XML database systems, such a component is currently out of sight, because former approaches do not pay attention to the latest advances in the area of physical operators (e. g., Holistic Twig Joins and advanced indexes) or just focus only on some of them.\u0000 To support the development of native XML query optimizers, we introduce an extensible cost-based optimization framework that integrates the cutting-edge XML query evaluation operators into a single system. Using the well-known plan generation techniques from the relational world and a novel set of plan equivalences---which allows for the generation of alternative query plans consisting of Structural Joins, Holistic Twig Joins, and numerous indexes (especially path indexes and content-and-structure indexes)---our optimizer can now benefit from the knowledge on native XML query evaluation to speed-up query execution significantly.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"110 1","pages":"64-74"},"PeriodicalIF":0.0,"publicationDate":"2010-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72864377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
Proceedings. International Database Engineering and Applications Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1