首页 > 最新文献

Proceedings 18th International Conference on Data Engineering最新文献

英文 中文
Indexing spatio-temporal data warehouses 索引时空数据仓库
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994706
D. Papadias, Yufei Tao, Panos Kalnis, Jun Zhang
Spatio-temporal databases store information about the positions of individual objects over time. In many applications, however, such as traffic supervision or mobile communication systems, only summarized data, like the average number of cars in an area for a specific period, or the number of phones serviced by a cell each day, is required. Although this information can be obtained from operational databases, its computation is expensive, rendering online processing inapplicable. A vital solution is the construction of a spatio-temporal data warehouse. In this paper, we describe a framework for supporting OLAP operations over spatio-temporal data. We argue that the spatial and temporal dimensions should be modeled as a combined dimension on the data cube and we present data structures which integrate spatio-temporal indexing with pre-aggregation. While the well-known materialization techniques require a-priori knowledge of the grouping hierarchy, we develop methods that utilize the proposed structures for efficient execution of ad-hoc group-bys. Our techniques can be used for both static and dynamic dimensions.
时空数据库存储有关单个物体随时间变化的位置信息。然而,在许多应用中,如交通监管或移动通信系统,只需要汇总数据,如特定时期某一地区的平均汽车数量,或每天一个移动电话的服务数量。虽然这些信息可以从操作数据库中获得,但其计算成本很高,使得在线处理不适用。一个重要的解决方案是构建一个时空数据仓库。在本文中,我们描述了一个支持基于时空数据的OLAP操作的框架。我们认为空间维度和时间维度应该建模为数据立方体上的一个组合维度,并提出了将时空索引与预聚合相结合的数据结构。虽然众所周知的物化技术需要先验的分组层次知识,但我们开发了利用所提出的结构来有效执行特设分组的方法。我们的技术可用于静态和动态维度。
{"title":"Indexing spatio-temporal data warehouses","authors":"D. Papadias, Yufei Tao, Panos Kalnis, Jun Zhang","doi":"10.1109/ICDE.2002.994706","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994706","url":null,"abstract":"Spatio-temporal databases store information about the positions of individual objects over time. In many applications, however, such as traffic supervision or mobile communication systems, only summarized data, like the average number of cars in an area for a specific period, or the number of phones serviced by a cell each day, is required. Although this information can be obtained from operational databases, its computation is expensive, rendering online processing inapplicable. A vital solution is the construction of a spatio-temporal data warehouse. In this paper, we describe a framework for supporting OLAP operations over spatio-temporal data. We argue that the spatial and temporal dimensions should be modeled as a combined dimension on the data cube and we present data structures which integrate spatio-temporal indexing with pre-aggregation. While the well-known materialization techniques require a-priori knowledge of the grouping hierarchy, we develop methods that utilize the proposed structures for efficient execution of ad-hoc group-bys. Our techniques can be used for both static and dynamic dimensions.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130539528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 171
P2P information systems P2P信息系统
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994708
K. Aberer, M. Hauswirth
{"title":"P2P information systems","authors":"K. Aberer, M. Hauswirth","doi":"10.1109/ICDE.2002.994708","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994708","url":null,"abstract":"","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115455695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
XGrind: a query-friendly XML compressor XGrind:查询友好的XML压缩器
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994712
Pankaj M. Tolani, J. Haritsa
XML documents are extremely verbose since the "schema" is repeated for every "record" in the document. While a variety of compressors are available to address this problem, they are not designed to support direct querying of the compressed document, a useful feature from a database perspective. In this paper, we propose a new compression tool, called XGrind, that directly supports queries in the compressed domain. A special feature of XGrind is that the compressed document retains the structure of the original document, permitting reuse of the standard XML techniques for processing the compressed document. Performance evaluations over a variety of XML documents and user queries indicate that XGrind simultaneously delivers improved query processing times and reasonable compression ratios.
XML文档非常冗长,因为“模式”对于文档中的每个“记录”都是重复的。虽然有各种各样的压缩器可用于解决这个问题,但它们的设计都不支持对压缩文档的直接查询,而从数据库的角度来看,这是一个有用的特性。在本文中,我们提出了一种新的压缩工具,称为XGrind,它直接支持压缩域中的查询。XGrind的一个特殊特性是压缩文档保留了原始文档的结构,从而允许重用标准XML技术来处理压缩文档。对各种XML文档和用户查询的性能评估表明,XGrind同时提供了改进的查询处理时间和合理的压缩比。
{"title":"XGrind: a query-friendly XML compressor","authors":"Pankaj M. Tolani, J. Haritsa","doi":"10.1109/ICDE.2002.994712","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994712","url":null,"abstract":"XML documents are extremely verbose since the \"schema\" is repeated for every \"record\" in the document. While a variety of compressors are available to address this problem, they are not designed to support direct querying of the compressed document, a useful feature from a database perspective. In this paper, we propose a new compression tool, called XGrind, that directly supports queries in the compressed domain. A special feature of XGrind is that the compressed document retains the structure of the original document, permitting reuse of the standard XML techniques for processing the compressed document. Performance evaluations over a variety of XML documents and user queries indicate that XGrind simultaneously delivers improved query processing times and reasonable compression ratios.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126339568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 274
Integrating workflow management systems with business-to-business interaction standards 集成工作流管理系统与企业对企业的交互标准
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994737
Mehmet Sayal, F. Casati, U. Dayal, M. Shan
Business-to-business (B2B) e-commerce is emerging as a new market with tremendous potential. Organizations are trying to link services across organizational boundaries in order to electronically trade goods and services. Standards such as RosettaNet, CBL, ED1, OB1, and cXML, describe how electronic B2B interactions should be carried on so that dynamic trade partnerships can be established and transactions can be executed across organizations. While the development of standards is a fundamental step towards enabling e-business, the problem of linking B2B interactions with internal business processes is still a challenge. In addition, as the industry standards evolve continuously based on changing needs, organizations have to adopt new standards quickly. In this paper we describe how workflow technology can be extended in order to support B2B interactions and to link them with the internal workflows. The proposed framework can be used to speed up both the development of new business processes that support B2B interaction standards and the enhancement of the existing business processes by the addition of B2B interaction capability.
企业对企业(B2B)电子商务正在成为一个具有巨大潜力的新兴市场。组织正试图将跨组织边界的服务连接起来,以便以电子方式交易商品和服务。诸如RosettaNet、CBL、ED1、OB1和cXML等标准描述了应该如何进行电子B2B交互,以便建立动态贸易伙伴关系,并跨组织执行交易。虽然标准的开发是实现电子商务的基本步骤,但是将B2B交互与内部业务流程联系起来的问题仍然是一个挑战。此外,由于行业标准根据不断变化的需求不断发展,组织必须迅速采用新标准。在本文中,我们描述了如何扩展工作流技术以支持B2B交互并将它们与内部工作流连接起来。所建议的框架可用于加快支持B2B交互标准的新业务流程的开发,并通过添加B2B交互功能来增强现有业务流程。
{"title":"Integrating workflow management systems with business-to-business interaction standards","authors":"Mehmet Sayal, F. Casati, U. Dayal, M. Shan","doi":"10.1109/ICDE.2002.994737","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994737","url":null,"abstract":"Business-to-business (B2B) e-commerce is emerging as a new market with tremendous potential. Organizations are trying to link services across organizational boundaries in order to electronically trade goods and services. Standards such as RosettaNet, CBL, ED1, OB1, and cXML, describe how electronic B2B interactions should be carried on so that dynamic trade partnerships can be established and transactions can be executed across organizations. While the development of standards is a fundamental step towards enabling e-business, the problem of linking B2B interactions with internal business processes is still a challenge. In addition, as the industry standards evolve continuously based on changing needs, organizations have to adopt new standards quickly. In this paper we describe how workflow technology can be extended in order to support B2B interactions and to link them with the internal workflows. The proposed framework can be used to speed up both the development of new business processes that support B2B interaction standards and the enhancement of the existing business processes by the addition of B2B interaction capability.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114643725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Providing database as a service 将数据库作为服务提供
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994695
Hakan Hacıgümüş, S. Mehrotra, B. Iyer
We explore a novel paradigm for data management in which a third party service provider hosts "database as a service", providing its customers with seamless mechanisms to create, store, and access their databases at the host site. Such a model alleviates the need for organizations to purchase expensive hardware and software, deal with software upgrades, and hire professionals for administrative and maintenance tasks which are taken over by the service provider. We have developed and deployed a database service on the Internet, called NetDB2, which is in constant use. In a sense, a data management model supported by NetDB2 provides an effective mechanism for organizations to purchase data management as a service, thereby freeing them to concentrate on their core businesses. Among the primary challenges introduced by "database as a service" are the additional overhead of remote access to data, an infrastructure to guarantee data privacy, and user interface design for such a service. These issues are investigated. We identify data privacy as a particularly vital problem and propose alternative solutions based on data encryption. The paper is meant as a challenge for the database community to explore a rich set of research issues that arise in developing such a service.
我们探索了一种新的数据管理范例,其中第三方服务提供商托管“数据库即服务”,为其客户提供在主机站点上创建、存储和访问数据库的无缝机制。这样的模型减轻了组织购买昂贵的硬件和软件、处理软件升级以及雇佣专业人员来执行由服务提供商接管的管理和维护任务的需要。我们在Internet上开发并部署了一个数据库服务,称为NetDB2,它一直在使用。从某种意义上说,NetDB2支持的数据管理模型为组织提供了一种有效的机制,可以将数据管理作为一种服务来购买,从而使他们能够专注于自己的核心业务。“数据库即服务”带来的主要挑战包括远程访问数据的额外开销、保证数据隐私的基础设施以及此类服务的用户界面设计。对这些问题进行了调查。我们认为数据隐私是一个特别重要的问题,并提出了基于数据加密的替代解决方案。本文旨在向数据库社区提出挑战,以探索开发此类服务时出现的一系列丰富的研究问题。
{"title":"Providing database as a service","authors":"Hakan Hacıgümüş, S. Mehrotra, B. Iyer","doi":"10.1109/ICDE.2002.994695","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994695","url":null,"abstract":"We explore a novel paradigm for data management in which a third party service provider hosts \"database as a service\", providing its customers with seamless mechanisms to create, store, and access their databases at the host site. Such a model alleviates the need for organizations to purchase expensive hardware and software, deal with software upgrades, and hire professionals for administrative and maintenance tasks which are taken over by the service provider. We have developed and deployed a database service on the Internet, called NetDB2, which is in constant use. In a sense, a data management model supported by NetDB2 provides an effective mechanism for organizations to purchase data management as a service, thereby freeing them to concentrate on their core businesses. Among the primary challenges introduced by \"database as a service\" are the additional overhead of remote access to data, an infrastructure to guarantee data privacy, and user interface design for such a service. These issues are investigated. We identify data privacy as a particularly vital problem and propose alternative solutions based on data encryption. The paper is meant as a challenge for the database community to explore a rich set of research issues that arise in developing such a service.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121987309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 749
A fast regular expression indexing engine 一个快速正则表达式索引引擎
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994755
Junghoo Cho, S. Rajagopalan
In this paper; we describe the design, architecture, and lessons learned from the implementation of a fast regular-expression indexing engine FREE. FREE uses a prebuilt index to identify the text data units which may contain a matching string and only examines these further. In this way, FREE shows orders of magnitude performance improvement in certain cases over standard regular expression matching systems, such as lex, awk and grep.
在本文中;我们描述了一个快速正则表达式索引引擎FREE的设计、架构和经验教训。FREE使用预构建的索引来识别可能包含匹配字符串的文本数据单元,并只进一步检查这些数据单元。通过这种方式,FREE在某些情况下比标准正则表达式匹配系统(如lex、awk和grep)显示出数量级的性能改进。
{"title":"A fast regular expression indexing engine","authors":"Junghoo Cho, S. Rajagopalan","doi":"10.1109/ICDE.2002.994755","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994755","url":null,"abstract":"In this paper; we describe the design, architecture, and lessons learned from the implementation of a fast regular-expression indexing engine FREE. FREE uses a prebuilt index to identify the text data units which may contain a matching string and only examines these further. In this way, FREE shows orders of magnitude performance improvement in certain cases over standard regular expression matching systems, such as lex, awk and grep.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127014501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
A sampling-based estimator for top-k selection query top-k选择查询的基于抽样的估计器
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994779
Chung-Min Chen, Y. Ling
Top-k queries arise naturally in many database applications that require searching for records whose attribute values are close to those specified in a query. We study the problem of processing a top-k query by translating it into an approximate range query that can be efficiently processed by traditional relational DBMSs. We propose a sampling-based approach, along with various query mapping strategies, to determine a range query that yields high recall with low access cost. Our experiments on real-world datasets show that, given the same memory budgets, our sampling-based estimator outperforms a previous histogram-based method in terms of access cost, while achieving the same level of recall. Furthermore, unlike the histogram-based approach, our sampling-based query mapping scheme scales well for high dimensional data and is easy to implement with low maintenance cost.
Top-k查询在许多数据库应用程序中自然出现,这些应用程序需要搜索属性值与查询中指定的值接近的记录。我们通过将top-k查询转换为可由传统关系dbms有效处理的近似范围查询来研究处理top-k查询的问题。我们提出了一种基于抽样的方法,以及各种查询映射策略,以确定以低访问成本产生高召回的范围查询。我们在真实数据集上的实验表明,给定相同的内存预算,我们基于抽样的估计器在访问成本方面优于之前基于直方图的方法,同时达到相同的召回水平。此外,与基于直方图的方法不同,我们的基于抽样的查询映射方案适用于高维数据,并且易于实现,维护成本低。
{"title":"A sampling-based estimator for top-k selection query","authors":"Chung-Min Chen, Y. Ling","doi":"10.1109/ICDE.2002.994779","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994779","url":null,"abstract":"Top-k queries arise naturally in many database applications that require searching for records whose attribute values are close to those specified in a query. We study the problem of processing a top-k query by translating it into an approximate range query that can be efficiently processed by traditional relational DBMSs. We propose a sampling-based approach, along with various query mapping strategies, to determine a range query that yields high recall with low access cost. Our experiments on real-world datasets show that, given the same memory budgets, our sampling-based estimator outperforms a previous histogram-based method in terms of access cost, while achieving the same level of recall. Furthermore, unlike the histogram-based approach, our sampling-based query mapping scheme scales well for high dimensional data and is easy to implement with low maintenance cost.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129617706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Decoupled query optimization for federated database systems 联邦数据库系统的解耦查询优化
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994788
A. Deshpande, J. Hellerstein
We study the problem of query optimization in federated relational database systems. The nature of federated databases explicitly decouples many aspects of the optimization process, often making it imperative for the optimizer to consult underlying data sources while doing cost-based optimization. This not only increases the cost of optimization, but also changes the trade-offs involved in the optimization process significantly. The dominant cost in the decoupled optimization process is the "cost of costing" that traditionally has been considered insignificant. The optimizer can only afford a few rounds of messages to the underlying data sources and hence the optimization techniques in this environment must be geared toward gathering all the required cost information with minimal communication. In this paper, we explore the design space for a query optimizer in this environment and demonstrate the need for decoupling various aspects of the optimization process. We present minimum-communication decoupled variants of various query optimization techniques, and discuss tradeoffs in their performance in this scenario. We have implemented these techniques in the Cohera federated database system and our experimental results, somewhat surprisingly, indicate that a simple two-phase optimization scheme performs fairly well as long as the physical database design is known to the optimizer, though more aggressive algorithms are required otherwise.
研究了联邦关系数据库系统中的查询优化问题。联邦数据库的特性显式地解耦了优化过程的许多方面,这通常使得优化器在进行基于成本的优化时必须咨询底层数据源。这不仅增加了优化的成本,而且显著地改变了优化过程中涉及的权衡。解耦优化过程中的主要成本是“成本成本”,这在传统上被认为是微不足道的。优化器只能向底层数据源提供几轮消息,因此该环境中的优化技术必须以最少的通信收集所有所需的成本信息为目标。在本文中,我们探讨了在这种环境下查询优化器的设计空间,并演示了将优化过程的各个方面解耦的必要性。我们给出了各种查询优化技术的最小通信解耦变体,并讨论了它们在此场景中的性能权衡。我们已经在Cohera联邦数据库系统中实现了这些技术,我们的实验结果有些令人惊讶地表明,只要优化器知道物理数据库设计,一个简单的两阶段优化方案就能执行得相当好,尽管在其他情况下需要更激进的算法。
{"title":"Decoupled query optimization for federated database systems","authors":"A. Deshpande, J. Hellerstein","doi":"10.1109/ICDE.2002.994788","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994788","url":null,"abstract":"We study the problem of query optimization in federated relational database systems. The nature of federated databases explicitly decouples many aspects of the optimization process, often making it imperative for the optimizer to consult underlying data sources while doing cost-based optimization. This not only increases the cost of optimization, but also changes the trade-offs involved in the optimization process significantly. The dominant cost in the decoupled optimization process is the \"cost of costing\" that traditionally has been considered insignificant. The optimizer can only afford a few rounds of messages to the underlying data sources and hence the optimization techniques in this environment must be geared toward gathering all the required cost information with minimal communication. In this paper, we explore the design space for a query optimizer in this environment and demonstrate the need for decoupling various aspects of the optimization process. We present minimum-communication decoupled variants of various query optimization techniques, and discuss tradeoffs in their performance in this scenario. We have implemented these techniques in the Cohera federated database system and our experimental results, somewhat surprisingly, indicate that a simple two-phase optimization scheme performs fairly well as long as the physical database design is known to the optimizer, though more aggressive algorithms are required otherwise.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133546847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Improving range query estimation on histograms 改进直方图距离查询估计
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994780
F. Buccafurri, D. Rosaci, L. Pontieri, D. Saccá
Histograms are used to summarize the contents of relations for the estimation of query result sizes into a number of buckets. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide better estimations. This paper proposes to use 32 bit information (4-level tree index) for each bucket for storing approximated cumulative frequencies at 7 internal intervals of a bucket. Both theoretical analysis and experimental results show that the 4-level tree index provides the best frequency estimation inside a bucket. The index is later added to two well-known techniques for constructing histograms, MaxDiff and V-Optimal, thus obtaining high improvements in the frequency estimation over inter-bucket ranges w.r.t. the original methods.
直方图用于将用于估计查询结果大小的关系的内容汇总到多个桶中。过去已经提出了几种技术(例如MaxDiff和V-Optimal)来确定提供更好估计的桶边界。本文建议对每个桶使用32位信息(4级树索引)来存储桶的7个内部间隔的近似累积频率。理论分析和实验结果都表明,4级树索引能提供最佳的桶内频率估计。该指数后来被添加到两种著名的构建直方图的技术中,MaxDiff和V-Optimal,从而在桶间范围的频率估计方面获得了比原始方法更高的改进。
{"title":"Improving range query estimation on histograms","authors":"F. Buccafurri, D. Rosaci, L. Pontieri, D. Saccá","doi":"10.1109/ICDE.2002.994780","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994780","url":null,"abstract":"Histograms are used to summarize the contents of relations for the estimation of query result sizes into a number of buckets. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide better estimations. This paper proposes to use 32 bit information (4-level tree index) for each bucket for storing approximated cumulative frequencies at 7 internal intervals of a bucket. Both theoretical analysis and experimental results show that the 4-level tree index provides the best frequency estimation inside a bucket. The index is later added to two well-known techniques for constructing histograms, MaxDiff and V-Optimal, thus obtaining high improvements in the frequency estimation over inter-bucket ranges w.r.t. the original methods.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134295928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
OSSM: a segmentation approach to optimize frequency counting OSSM:一种优化频率计数的分割方法
Pub Date : 2002-08-07 DOI: 10.1109/ICDE.2002.994776
C. Leung, R. Ng, H. Mannila
Computing the frequency of a pattern is one of the key operations in data mining algorithms. We describe a simple yet powerful way of speeding up any form of frequency counting satisfying the monotonicity condition. Our method, the optimized segment support map (OSSM), is a light-weight structure which partitions the collection of transactions into m segments, so as to reduce the number of candidate patterns that require frequency counting. We study the following problems: (1) what is the optimal number of segments to be used; and (2) given a user-determined m, what is the best segmentation/composition of the m segments? For Problem 1, we provide a thorough analysis and a theorem establishing the minimum value of m for which there is no accuracy lost in using the OSSM. For Problem 2, we develop various algorithms and heuristics, which efficiently generate OSSMs that are compact and effective, to help facilitate segmentation.
计算模式的频率是数据挖掘算法中的关键操作之一。我们描述了一种简单而强大的方法来加速满足单调性条件的任何形式的频率计数。我们的方法,优化段支持映射(OSSM),是一种轻量级结构,它将事务集合划分为m个段,从而减少需要频率计数的候选模式的数量。我们研究了以下问题:(1)使用的最优段数是多少;(2)给定用户确定的m, m段的最佳分割/组合是什么?对于问题1,我们提供了一个彻底的分析和一个定理,建立了m的最小值,在使用OSSM时没有精度损失。对于问题2,我们开发了各种算法和启发式算法,这些算法和启发式算法有效地生成了紧凑有效的ossm,以帮助促进分割。
{"title":"OSSM: a segmentation approach to optimize frequency counting","authors":"C. Leung, R. Ng, H. Mannila","doi":"10.1109/ICDE.2002.994776","DOIUrl":"https://doi.org/10.1109/ICDE.2002.994776","url":null,"abstract":"Computing the frequency of a pattern is one of the key operations in data mining algorithms. We describe a simple yet powerful way of speeding up any form of frequency counting satisfying the monotonicity condition. Our method, the optimized segment support map (OSSM), is a light-weight structure which partitions the collection of transactions into m segments, so as to reduce the number of candidate patterns that require frequency counting. We study the following problems: (1) what is the optimal number of segments to be used; and (2) given a user-determined m, what is the best segmentation/composition of the m segments? For Problem 1, we provide a thorough analysis and a theorem establishing the minimum value of m for which there is no accuracy lost in using the OSSM. For Problem 2, we develop various algorithms and heuristics, which efficiently generate OSSMs that are compact and effective, to help facilitate segmentation.","PeriodicalId":191529,"journal":{"name":"Proceedings 18th International Conference on Data Engineering","volume":"41 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132624790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
期刊
Proceedings 18th International Conference on Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1