首页 > 最新文献

Proceedings of the 2006 ACM SIGMOD international conference on Management of data最新文献

英文 中文
Refreshing the sky: the compressed skycube with efficient support for frequent updates 刷新天空:压缩的天空立方体与有效的支持频繁更新
Tian Xia, Donghui Zhang
The skyline query is important in many applications such as multi-criteria decision making, data mining, and user-preference queries. Given a set of d-dimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different dimensions of the data, and issue queries on any subset of d dimensions. This paper focuses on supporting concurrent and unpredictable subspace skyline queries in frequently updated databases. Simply to compute and store the skyline objects of every subspace in a skycube will incur expensive update cost. In this paper, we investigate the important issue of updating the skycube in a dynamic environment. To balance the query cost and update cost, we propose a new structure, the compressed skycube, which concisely represents the complete skycube. We thoroughly explore the properties of the compressed skycube and provide an efficient object-aware update scheme. Experimental results show that the compressed skycube is both query and update efficient.
skyline查询在许多应用程序中都很重要,例如多标准决策、数据挖掘和用户偏好查询。给定一组d维对象,skyline查询查找未被其他对象主导的对象。在实践中,不同的用户可能对数据的不同维度感兴趣,并对d个维度的任意子集发出查询。本文主要研究如何在频繁更新的数据库中支持并发和不可预测的子空间天际线查询。简单地计算和存储skycube中每个子空间的天际线对象将产生昂贵的更新成本。在本文中,我们研究了在动态环境中更新天空立方体的重要问题。为了平衡查询成本和更新成本,我们提出了一种新的结构——压缩天空立方体,它简洁地表示了完整的天空立方体。我们深入研究了压缩天空立方体的属性,并提供了一个有效的对象感知更新方案。实验结果表明,压缩后的天空立方体具有查询和更新效率。
{"title":"Refreshing the sky: the compressed skycube with efficient support for frequent updates","authors":"Tian Xia, Donghui Zhang","doi":"10.1145/1142473.1142529","DOIUrl":"https://doi.org/10.1145/1142473.1142529","url":null,"abstract":"The skyline query is important in many applications such as multi-criteria decision making, data mining, and user-preference queries. Given a set of d-dimensional objects, the skyline query finds the objects that are not dominated by others. In practice, different users may be interested in different dimensions of the data, and issue queries on any subset of d dimensions. This paper focuses on supporting concurrent and unpredictable subspace skyline queries in frequently updated databases. Simply to compute and store the skyline objects of every subspace in a skycube will incur expensive update cost. In this paper, we investigate the important issue of updating the skycube in a dynamic environment. To balance the query cost and update cost, we propose a new structure, the compressed skycube, which concisely represents the complete skycube. We thoroughly explore the properties of the compressed skycube and provide an efficient object-aware update scheme. Experimental results show that the compressed skycube is both query and update efficient.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124279688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 117
Quality-aware dstributed data delivery for continuous query services 面向连续查询服务的具有质量意识的分布式数据交付
B. Gedik, Ling Liu
We consider the problem of distributed continuous data delivery services in an overlay network of heterogeneous nodes. Each node in the system can be a source for any number of data streams and at the same time be a consumer node that is receiving streams sourced at other nodes. A consumer node may define a filter on a source stream such that only the desired portion of the stream is delivered, minimizing the amount of unnecessary bandwidth consumption. By heterogeneous, we mean that nodes not only may have varying network bandwidths and computing resources but also different interests in terms of the filters and the rates of the data streams they are interested in. Our objective is to construct an efficient stream delivery network in which nodes cooperate in forwarding data streams in the presence of constrained resources. We formalize this distributed stream delivery problem as an optimization one by starting with a simple setup where the network topology is fixed and node bandwidth characteristics are known. The goal of the optimization is to find valid delivery graphs with minimum bandwidth consumption. We extend this problem formulation to QoS-aware stream delivery, in order to handle the bandwidth constrained cases in which unwanted drops and delays are inevitable. We provide a classification of delivery graph construction schemes, and in light of this classification we develop pragmatic quality-aware stream delivery (QASD) algorithms. These algorithms aim at constructing efficient stream delivery graphs in a distributed setting, where global knowledge is not available and network characteristics are not known in advance. We introduce a set of evaluation metrics and provide experimental results to illustrate the effectiveness of our proposed algorithms under these metrics.
研究了异构节点覆盖网络中分布式连续数据传输服务的问题。系统中的每个节点都可以是任意数量数据流的源,同时也是接收来自其他节点的数据流的消费者节点。消费者节点可以在源流上定义一个过滤器,以便只交付流的所需部分,从而最小化不必要的带宽消耗量。通过异构,我们的意思是节点不仅可能具有不同的网络带宽和计算资源,而且在它们感兴趣的过滤器和数据流速率方面也有不同的兴趣。我们的目标是构建一个高效的数据流传输网络,其中节点在资源受限的情况下合作转发数据流。我们从一个简单的设置开始,将这个分布式流传输问题形式化为一个优化问题,其中网络拓扑是固定的,节点带宽特性是已知的。优化的目标是以最小的带宽消耗找到有效的交付图。我们将这个问题的表述扩展到qos感知流传输,以便处理带宽受限的情况,在这种情况下,不必要的丢失和延迟是不可避免的。我们提供了交付图构建方案的分类,并根据这种分类开发了实用的质量感知流交付(QASD)算法。这些算法的目的是在分布式环境下构建高效的流传输图,在这种情况下,全局知识不可用,网络特性也无法提前知道。我们引入了一组评估指标,并提供实验结果来说明我们提出的算法在这些指标下的有效性。
{"title":"Quality-aware dstributed data delivery for continuous query services","authors":"B. Gedik, Ling Liu","doi":"10.1145/1142473.1142521","DOIUrl":"https://doi.org/10.1145/1142473.1142521","url":null,"abstract":"We consider the problem of distributed continuous data delivery services in an overlay network of heterogeneous nodes. Each node in the system can be a source for any number of data streams and at the same time be a consumer node that is receiving streams sourced at other nodes. A consumer node may define a filter on a source stream such that only the desired portion of the stream is delivered, minimizing the amount of unnecessary bandwidth consumption. By heterogeneous, we mean that nodes not only may have varying network bandwidths and computing resources but also different interests in terms of the filters and the rates of the data streams they are interested in. Our objective is to construct an efficient stream delivery network in which nodes cooperate in forwarding data streams in the presence of constrained resources. We formalize this distributed stream delivery problem as an optimization one by starting with a simple setup where the network topology is fixed and node bandwidth characteristics are known. The goal of the optimization is to find valid delivery graphs with minimum bandwidth consumption. We extend this problem formulation to QoS-aware stream delivery, in order to handle the bandwidth constrained cases in which unwanted drops and delays are inevitable. We provide a classification of delivery graph construction schemes, and in light of this classification we develop pragmatic quality-aware stream delivery (QASD) algorithms. These algorithms aim at constructing efficient stream delivery graphs in a distributed setting, where global knowledge is not available and network characteristics are not known in advance. We introduce a set of evaluation metrics and provide experimental results to illustrate the effectiveness of our proposed algorithms under these metrics.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125208243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Extensible optimization in overlay dissemination trees 覆盖传播树的可扩展优化
Olga Papaemmanouil, Yanif Ahmad, U. Çetintemel, John Jannotti, Y. Yildirim
We introduce XPORT, a profile-driven distributed data dissemination system that supports an extensible set of data types, profile types, and optimization metrics. XPORT efficiently implements a generic tree-based overlay network, which can be customized per application using a small number of methods that encapsulate application-specific data filtering, profile aggregation, and optimization logic. The clean separation between the "plumbing" and "application" enables the system to uniformly support disparate dissemination-based applications.We first provide an overview of the basic XPORT model and architecture. We then describe in detail an extensible optimization framework, based on a two-level aggregation model, that facilitates easy specification of a wide range of commonly used performance goals. We discuss distributed tree transformation protocols that allow XPORT to iteratively optimize its operation to achieve these goals under changing network and application conditions. Finally, we demonstrate the flexibility and the effectiveness of XPORT using real-world data and experimental results obtained from both prototype-based LAN emulation and deployment on PlanetLab.
我们介绍XPORT,这是一个概要文件驱动的分布式数据传播系统,它支持一组可扩展的数据类型、概要文件类型和优化指标。XPORT有效地实现了一个通用的基于树的覆盖网络,可以使用少量方法对每个应用程序进行定制,这些方法封装了特定于应用程序的数据过滤、配置文件聚合和优化逻辑。“管道”和“应用程序”之间的清晰分离使系统能够统一地支持不同的基于传播的应用程序。我们首先概述基本的XPORT模型和体系结构。然后,我们详细描述了一个基于两级聚合模型的可扩展优化框架,该框架可以方便地规范广泛的常用性能目标。我们讨论了允许XPORT在不断变化的网络和应用条件下迭代优化其操作以实现这些目标的分布式树转换协议。最后,我们使用基于原型的局域网仿真和PlanetLab部署的实际数据和实验结果来演示XPORT的灵活性和有效性。
{"title":"Extensible optimization in overlay dissemination trees","authors":"Olga Papaemmanouil, Yanif Ahmad, U. Çetintemel, John Jannotti, Y. Yildirim","doi":"10.1145/1142473.1142541","DOIUrl":"https://doi.org/10.1145/1142473.1142541","url":null,"abstract":"We introduce XPORT, a profile-driven distributed data dissemination system that supports an extensible set of data types, profile types, and optimization metrics. XPORT efficiently implements a generic tree-based overlay network, which can be customized per application using a small number of methods that encapsulate application-specific data filtering, profile aggregation, and optimization logic. The clean separation between the \"plumbing\" and \"application\" enables the system to uniformly support disparate dissemination-based applications.We first provide an overview of the basic XPORT model and architecture. We then describe in detail an extensible optimization framework, based on a two-level aggregation model, that facilitates easy specification of a wide range of commonly used performance goals. We discuss distributed tree transformation protocols that allow XPORT to iteratively optimize its operation to achieve these goals under changing network and application conditions. Finally, we demonstrate the flexibility and the effectiveness of XPORT using real-world data and experimental results obtained from both prototype-based LAN emulation and deployment on PlanetLab.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116762993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Locking-aware structural join operators for XML query processing 用于XML查询处理的锁感知结构连接操作符
Christian Mathis, T. Härder, M. Haustein
As observed in many publications so far, the matching of twig pattern queries (i.e., queries that contain only the child and the descendant axis) is a core operation in XML database management systems (XDBMSs) for which the structural join and the holistic twig join algorithms were proposed. In a single-user environment, especially the latter algorithm provides a good evaluation strategy. However, when it comes to multi-user access to a single XML document, it may lead to extensive blocking situations: The XDBMS has to ensure data consistency and, therefore, has to prevent concurrent modification operations from changing elements in the input sequences, a holistic twig algorithm accesses while operating. To circumvent this problem, we propose a set of new locking-aware operators for twig pattern query evaluation that rely on stable path labels (SPLIDs) as well as document and element set indexes. Furthermore, by running extensive tests on our own XDBMS, we show that their performance is comparable to existing approaches in a single-user environment, and leads to higher throughput rates in the case of multi-user access.
正如到目前为止在许多出版物中所观察到的那样,小枝模式查询(即只包含子轴和后代轴的查询)的匹配是XML数据库管理系统(xdbms)中的核心操作,为此提出了结构连接和整体小枝连接算法。在单用户环境下,特别是后一种算法提供了很好的评估策略。然而,当涉及到对单个XML文档的多用户访问时,它可能会导致大量阻塞情况:XDBMS必须确保数据一致性,因此必须防止并发修改操作更改输入序列中的元素,整体分支算法在操作时访问。为了解决这个问题,我们提出了一组新的锁定感知操作符,用于依赖于稳定路径标签(splid)以及文档和元素集索引的小枝模式查询评估。此外,通过在我们自己的XDBMS上运行广泛的测试,我们表明它们的性能与单用户环境中的现有方法相当,并且在多用户访问的情况下具有更高的吞吐量。
{"title":"Locking-aware structural join operators for XML query processing","authors":"Christian Mathis, T. Härder, M. Haustein","doi":"10.1145/1142473.1142526","DOIUrl":"https://doi.org/10.1145/1142473.1142526","url":null,"abstract":"As observed in many publications so far, the matching of twig pattern queries (i.e., queries that contain only the child and the descendant axis) is a core operation in XML database management systems (XDBMSs) for which the structural join and the holistic twig join algorithms were proposed. In a single-user environment, especially the latter algorithm provides a good evaluation strategy. However, when it comes to multi-user access to a single XML document, it may lead to extensive blocking situations: The XDBMS has to ensure data consistency and, therefore, has to prevent concurrent modification operations from changing elements in the input sequences, a holistic twig algorithm accesses while operating. To circumvent this problem, we propose a set of new locking-aware operators for twig pattern query evaluation that rely on stable path labels (SPLIDs) as well as document and element set indexes. Furthermore, by running extensive tests on our own XDBMS, we show that their performance is comparable to existing approaches in a single-user environment, and leads to higher throughput rates in the case of multi-user access.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125339671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Meta-data indexing for XPath location steps 用于XPath定位步骤的元数据索引
SungRan Cho, Nick Koudas, D. Srivastava
XML is the de facto standard for data representation and exchange over the Web. Given the diversity of the information available in XML, it is very useful to annotate XML data with a wide variety of meta-data, such as quality and sensitivity. When querying such XML data, say using XPath, it is important to efficiently identify the data that meet specified constraints on the meta-data. For example, different users may be satisfied with different levels of quality guarantees, or may only have access to different parts of the XML data based on specified security policies. In this paper, we address the problem of efficiently identifying the XML elements along a location step in an XPath query, that satisfy meta-data range constraints, when the meta-data levels are specifically drawn from an ordered domain (e.g., accuracy in [0,1], recency using timestamps, multi-level security, etc.). More specifically, we develop a family of index structures, which we refer to as meta-data indexes, to address this problem. A meta-data index is easily instantiated using a multi-dimensional index structure, such as an R-tree, incorporating novel query and update algorithms. We show that the full meta-data index (FMI), based on associating each XML element with its meta-data level, has a very high update cost for modifying an element's meta-data level. We resolve this problem by designing the inheritance meta-data index (IMI), in which (i) actual meta-data levels are associated only with elements for which this value is explicitly specified, and (ii) inherited meta-data levels and inheritance source nodes are associated with non-leaf nodes of the index structure. We design efficient query (for all XPath axes) and update (of meta-data levels) algorithms for the IMI, and experimentally demonstrate the superiority of the IMI over the FMI using benchmark data sets.
XML是Web上数据表示和交换的事实上的标准。考虑到XML中可用信息的多样性,用各种各样的元数据(如质量和灵敏度)注释XML数据非常有用。在查询这样的XML数据(比如使用XPath)时,重要的是要有效地识别满足元数据上指定约束的数据。例如,不同的用户可能对不同级别的质量保证感到满意,或者可能只能基于指定的安全策略访问XML数据的不同部分。在本文中,我们解决了当元数据级别是从有序域(例如,[0,1]中的准确性、使用时间戳的近时性、多级安全性等)中明确绘制的元数据级别时,沿着XPath查询中的位置步骤有效识别满足元数据范围约束的XML元素的问题。更具体地说,我们开发了一系列索引结构(我们称之为元数据索引)来解决这个问题。元数据索引可以使用多维索引结构(如r树)轻松实例化,并结合新颖的查询和更新算法。我们展示了基于将每个XML元素与其元数据级别相关联的完整元数据索引(FMI)在修改元素的元数据级别时具有非常高的更新成本。我们通过设计继承元数据索引(IMI)来解决这个问题,其中(i)实际元数据级别仅与明确指定此值的元素相关联,(ii)继承元数据级别和继承源节点与索引结构的非叶节点相关联。我们为IMI设计了高效的查询(针对所有XPath轴)和更新(元数据级别)算法,并使用基准数据集实验证明了IMI优于FMI。
{"title":"Meta-data indexing for XPath location steps","authors":"SungRan Cho, Nick Koudas, D. Srivastava","doi":"10.1145/1142473.1142525","DOIUrl":"https://doi.org/10.1145/1142473.1142525","url":null,"abstract":"XML is the de facto standard for data representation and exchange over the Web. Given the diversity of the information available in XML, it is very useful to annotate XML data with a wide variety of meta-data, such as quality and sensitivity. When querying such XML data, say using XPath, it is important to efficiently identify the data that meet specified constraints on the meta-data. For example, different users may be satisfied with different levels of quality guarantees, or may only have access to different parts of the XML data based on specified security policies. In this paper, we address the problem of efficiently identifying the XML elements along a location step in an XPath query, that satisfy meta-data range constraints, when the meta-data levels are specifically drawn from an ordered domain (e.g., accuracy in [0,1], recency using timestamps, multi-level security, etc.). More specifically, we develop a family of index structures, which we refer to as meta-data indexes, to address this problem. A meta-data index is easily instantiated using a multi-dimensional index structure, such as an R-tree, incorporating novel query and update algorithms. We show that the full meta-data index (FMI), based on associating each XML element with its meta-data level, has a very high update cost for modifying an element's meta-data level. We resolve this problem by designing the inheritance meta-data index (IMI), in which (i) actual meta-data levels are associated only with elements for which this value is explicitly specified, and (ii) inherited meta-data levels and inheritance source nodes are associated with non-leaf nodes of the index structure. We design efficient query (for all XPath axes) and update (of meta-data levels) algorithms for the IMI, and experimentally demonstrate the superiority of the IMI over the FMI using benchmark data sets.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114485972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Avatar semantic search: a database approach to information retrieval 头像语义搜索:一种数据库信息检索方法
Eser Kandogan, R. Krishnamurthy, S. Raghavan, Shivakumar Vaithyanathan, Huaiyu Zhu
We present Avatar Semantic Search, a prototype search engine that exploits annotations in the context of classical keyword search. The process of annotations is accomplished offline by using high-precision information extraction techniques to extract facts, con-cepts, and relationships from text. These facts and concepts are represented and indexed in a structured data store. At runtime, keyword queries are interpreted in the context of these extracted facts and converted into one or more precise queries over the structured store. In this demonstration we describe the overall architecture of the Avatar Semantic Search engine. We also demonstrate the superiority of the AVATAR approach over traditional keyword search engines using Enron email data set and a blog corpus.
我们提出了Avatar语义搜索,这是一个原型搜索引擎,利用经典关键字搜索上下文中的注释。注释过程通过使用高精度信息提取技术从文本中提取事实、概念和关系来离线完成。这些事实和概念在结构化数据存储中表示和索引。在运行时,在这些提取事实的上下文中解释关键字查询,并将其转换为结构化存储上的一个或多个精确查询。在这个演示中,我们描述了Avatar语义搜索引擎的整体架构。我们还使用安然电子邮件数据集和博客语料库证明了AVATAR方法比传统关键字搜索引擎的优越性。
{"title":"Avatar semantic search: a database approach to information retrieval","authors":"Eser Kandogan, R. Krishnamurthy, S. Raghavan, Shivakumar Vaithyanathan, Huaiyu Zhu","doi":"10.1145/1142473.1142591","DOIUrl":"https://doi.org/10.1145/1142473.1142591","url":null,"abstract":"We present Avatar Semantic Search, a prototype search engine that exploits annotations in the context of classical keyword search. The process of annotations is accomplished offline by using high-precision information extraction techniques to extract facts, con-cepts, and relationships from text. These facts and concepts are represented and indexed in a structured data store. At runtime, keyword queries are interpreted in the context of these extracted facts and converted into one or more precise queries over the structured store. In this demonstration we describe the overall architecture of the Avatar Semantic Search engine. We also demonstrate the superiority of the AVATAR approach over traditional keyword search engines using Enron email data set and a blog corpus.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117017614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
Derby/S: a DBMS for sample-based query answering Derby/S:用于基于示例的查询应答的DBMS
Anja Klein, Rainer Gemulla, Philipp J. Rösch, Wolfgang Lehner
Although approximate query processing is a prominent way to cope with the requirements of data analysis applications, current database systems do not provide integrated and comprehensive support for these techniques. To improve this situation, we propose an SQL extension---called SQL/S---for approximate query answering using random samples, and present a prototypical implementation within the engine of the open-source database system Derby---called Derby/S. Our approach significantly reduces the required expert knowledge by enabling the definition of samples in a declarative way; the choice of the specific sampling scheme and its parametrization is left to the system. SQL/S introduces new DDL commands to easily define and administrate random samples subject to a given set of optimization criteria. Derby/S automatically takes care of sample maintenance if the underlying dataset changes. Finally, samples are transparently used during query processing, and error bounds are provided. Our extensions do not affect traditional queries and provide the means to integrate sampling as a first-class citizen into a DBMS.
虽然近似查询处理是满足数据分析应用需求的一种重要方式,但目前的数据库系统并没有为这些技术提供集成和全面的支持。为了改善这种情况,我们提出了一个SQL扩展(称为SQL/S),用于使用随机样本进行近似查询应答,并在开源数据库系统Derby(称为Derby/S)的引擎中提供了一个原型实现。我们的方法通过以声明的方式定义样本,显着减少了所需的专家知识;具体采样方案及其参数化的选择由系统自行决定。SQL/S引入了新的DDL命令,可以根据一组给定的优化标准轻松定义和管理随机样本。如果底层数据集发生变化,Derby/S将自动负责样例维护。最后,在查询处理过程中透明地使用样本,并提供错误边界。我们的扩展不影响传统查询,并提供了将采样作为头等公民集成到DBMS中的方法。
{"title":"Derby/S: a DBMS for sample-based query answering","authors":"Anja Klein, Rainer Gemulla, Philipp J. Rösch, Wolfgang Lehner","doi":"10.1145/1142473.1142579","DOIUrl":"https://doi.org/10.1145/1142473.1142579","url":null,"abstract":"Although approximate query processing is a prominent way to cope with the requirements of data analysis applications, current database systems do not provide integrated and comprehensive support for these techniques. To improve this situation, we propose an SQL extension---called SQL/S---for approximate query answering using random samples, and present a prototypical implementation within the engine of the open-source database system Derby---called Derby/S. Our approach significantly reduces the required expert knowledge by enabling the definition of samples in a declarative way; the choice of the specific sampling scheme and its parametrization is left to the system. SQL/S introduces new DDL commands to easily define and administrate random samples subject to a given set of optimization criteria. Derby/S automatically takes care of sample maintenance if the underlying dataset changes. Finally, samples are transparently used during query processing, and error bounds are provided. Our extensions do not affect traditional queries and provide the means to integrate sampling as a first-class citizen into a DBMS.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123867128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
InMAF: indexing music databases via multiple acoustic features InMAF:通过多种声学特征索引音乐数据库
Jialie Shen, J. Shepherd, A. Ngu
Music information processing has become very important due to the ever-growing amount of music data from emerging applications. In this demonstration,we present a novel approach for generating small but comprehensive music descriptors to facilitate efficient content music management (accessing and retrieval, in particular). Unlike previous approaches that rely on low-level spectral features adapted from speech analysis technology, our approach integrates human music perception to enhance the accuracy of the retrieval and classification process via PCA and neural networks. The superiority of our method is demonstrated by comparing it with state-of-the-art approaches in the areas of music classification query effectiveness, and robustness against various audio distortion/alternatives.
由于来自新兴应用程序的音乐数据量不断增长,音乐信息处理变得非常重要。在这个演示中,我们提出了一种新的方法来生成小而全面的音乐描述符,以促进有效的内容音乐管理(特别是访问和检索)。与以往依赖于语音分析技术的低水平频谱特征的方法不同,我们的方法集成了人类音乐感知,通过PCA和神经网络提高了检索和分类过程的准确性。通过将我们的方法与最先进的方法在音乐分类查询有效性和对各种音频失真/替代的鲁棒性方面进行比较,证明了我们方法的优越性。
{"title":"InMAF: indexing music databases via multiple acoustic features","authors":"Jialie Shen, J. Shepherd, A. Ngu","doi":"10.1145/1142473.1142587","DOIUrl":"https://doi.org/10.1145/1142473.1142587","url":null,"abstract":"Music information processing has become very important due to the ever-growing amount of music data from emerging applications. In this demonstration,we present a novel approach for generating small but comprehensive music descriptors to facilitate efficient content music management (accessing and retrieval, in particular). Unlike previous approaches that rely on low-level spectral features adapted from speech analysis technology, our approach integrates human music perception to enhance the accuracy of the retrieval and classification process via PCA and neural networks. The superiority of our method is demonstrated by comparing it with state-of-the-art approaches in the areas of music classification query effectiveness, and robustness against various audio distortion/alternatives.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114755860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Simultaneous scalability and security for data-intensive web applications 数据密集型web应用程序的同时可扩展性和安全性
A. Manjhi, A. Ailamaki, B. Maggs, T. Mowry, Christopher Olston, A. Tomasic
For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and supplies query answers on behalf of the application. Cost-effective DSSPs will need to cache data from many applications, inevitably raising concerns about security. However, if all data passing through a DSSP is encrypted to enhance security, then data updates trigger invalidation of large regions of cache. Consequently, achieving good scalability becomes virtually impossible. There is a tradeoff between security and scalability, which requires careful consideration.In this paper we study the security-scalability tradeoff, both formally and empirically. We begin by providing a method for statically identifying segments of the database that can be encrypted without impacting scalability. Experiments over a prototype DSSP system show the effectiveness of our static analysis method--for all three realistic bench-mark applications that we study, our method enables a significant fraction of the database to be encrypted without impacting scalability. Moreover, most of the data that can be encrypted without impacting scalability is of the type that application designers will want to encrypt, all other things being equal. Based on our static analysis method, we propose a new scalability-conscious security design methodology that features: (a) compulsory encryption of highly sensitive data like credit card information, and (b) encryption of data for which encryption does not impair scalability. As a result, the security-scalability tradeoff needs to be considered only over data for which encryption impacts scalability, thus greatly simplifying the task of managing the tradeoff.
对于数据库组件是瓶颈的Web应用程序,可伸缩性可以由第三方数据库可伸缩性服务提供商(DSSP)提供,该提供商缓存应用程序数据并代表应用程序提供查询答案。具有成本效益的dsp将需要缓存来自许多应用程序的数据,这不可避免地引起了对安全性的担忧。但是,如果通过DSSP的所有数据都经过加密以增强安全性,则数据更新会导致大区域缓存失效。因此,实现良好的可伸缩性几乎是不可能的。安全性和可伸缩性之间需要权衡,这需要仔细考虑。本文从形式和经验两方面研究了安全与可扩展性的权衡。我们首先提供一种方法,用于静态地标识可以在不影响可伸缩性的情况下加密的数据库段。在原型DSSP系统上的实验显示了我们的静态分析方法的有效性——对于我们研究的所有三个实际基准应用程序,我们的方法可以在不影响可伸缩性的情况下对数据库的很大一部分进行加密。此外,在其他条件相同的情况下,大多数可以加密而不影响可伸缩性的数据都是应用程序设计人员想要加密的类型。基于我们的静态分析方法,我们提出了一种新的可扩展性安全设计方法,其特点是:(a)对高度敏感的数据(如信用卡信息)进行强制加密,以及(b)对加密不影响可扩展性的数据进行加密。因此,只需要在加密会影响可伸缩性的数据上考虑安全性-可伸缩性的权衡,从而大大简化了管理权衡的任务。
{"title":"Simultaneous scalability and security for data-intensive web applications","authors":"A. Manjhi, A. Ailamaki, B. Maggs, T. Mowry, Christopher Olston, A. Tomasic","doi":"10.1145/1142473.1142501","DOIUrl":"https://doi.org/10.1145/1142473.1142501","url":null,"abstract":"For Web applications in which the database component is the bottleneck, scalability can be provided by a third-party Database Scalability Service Provider (DSSP) that caches application data and supplies query answers on behalf of the application. Cost-effective DSSPs will need to cache data from many applications, inevitably raising concerns about security. However, if all data passing through a DSSP is encrypted to enhance security, then data updates trigger invalidation of large regions of cache. Consequently, achieving good scalability becomes virtually impossible. There is a tradeoff between security and scalability, which requires careful consideration.In this paper we study the security-scalability tradeoff, both formally and empirically. We begin by providing a method for statically identifying segments of the database that can be encrypted without impacting scalability. Experiments over a prototype DSSP system show the effectiveness of our static analysis method--for all three realistic bench-mark applications that we study, our method enables a significant fraction of the database to be encrypted without impacting scalability. Moreover, most of the data that can be encrypted without impacting scalability is of the type that application designers will want to encrypt, all other things being equal. Based on our static analysis method, we propose a new scalability-conscious security design methodology that features: (a) compulsory encryption of highly sensitive data like credit card information, and (b) encryption of data for which encryption does not impair scalability. As a result, the security-scalability tradeoff needs to be considered only over data for which encryption impacts scalability, thus greatly simplifying the task of managing the tradeoff.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132638421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A geometric approach to monitoring threshold functions over distributed data streams 在分布式数据流上监测阈值函数的几何方法
I. Sharfman, A. Schuster, D. Keren
Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach by which an arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.
分布式系统中的数据流监控是近年来研究的热点。然而,大多数提议的方案处理的是监测简单的聚合值,例如流中项目出现的频率。更复杂的挑战,如重要的特征选择任务(例如,通过监测各种特征的信息增益),仍然需要使用朴素的集中式算法进行非常高的通信开销。我们提出了一种新的几何方法,通过该方法可以将任意的全局监测任务拆分为一组局部应用于每个流的约束。约束用于本地过滤掉不影响监视结果的数据增量,从而避免不必要的通信。因此,我们的方法可以有效地监控分布式数据流上的任意阈值函数。我们在现实世界数据上的实验结果表明,与集中式算法相比,我们的算法具有高度可扩展性,并且大大减少了通信负载。
{"title":"A geometric approach to monitoring threshold functions over distributed data streams","authors":"I. Sharfman, A. Schuster, D. Keren","doi":"10.1145/1142473.1142508","DOIUrl":"https://doi.org/10.1145/1142473.1142508","url":null,"abstract":"Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach by which an arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132858012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 178
期刊
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1