Proceedings. International Database Engineering and Applications Symposium最新文献

英文中文

A compact representation for efficient uncertain-information integration 一种高效不确定信息集成的简洁表示

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513638

Amir Dayyan Borhanian, F. Sadri

The probabilistic relation model has been used for the compact representation of uncertain data in relational databases. In this paper we present the extended probabilistic relation model, a compact representation for uncertain information that admits efficient information integration. We present an algorithm for data integration using this model and prove its correctness. We also explore the complexity of query evaluation under the probabilistic and extended probabilistic models. Finally, we study the problem of obtaining a (pure) probabilistic relation that is equivalent to a given extended probabilistic relation, and present approaches and algorithms for this task. This work is the first and critical step towards practical and efficient uncertain information integration.

采用概率关系模型对关系数据库中的不确定数据进行精简表示。本文提出了一种扩展概率关系模型，它是不确定信息的一种紧凑的表示形式，可以进行有效的信息集成。利用该模型提出了一种数据集成算法，并证明了算法的正确性。我们还探讨了在概率模型和扩展概率模型下查询求值的复杂性。最后，我们研究了获得等价于给定扩展概率关系的(纯)概率关系的问题，并给出了解决该问题的方法和算法。这项工作是实现实用高效的不确定信息集成的第一步，也是关键的一步。

引用次数: 5

Querying data across different legal domains 跨不同法律域查询数据

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513642

Marco Taddeo, Alberto Trombetta, D. Montesi, S. Pierantozzi

The management of legal domains is gaining great importance in the context of data management. In fact, the geographical distribution of data as implied -- for example -- by cloud-based services requires that the legal restrictions and obligations are to be taken into account whenever data circulates across different legal domains. In this paper, we start to investigate an approach for coping with the complex issues that arise when dealing with data spanning different legal domains. Our approach consists of a conceptual model that takes into account the notion of legal domain (to be paired with the corresponding data) and a reference architecture for implementing our approach in an actual relational DBMS.

在数据管理的背景下，法律域的管理变得越来越重要。事实上，数据的地理分布(例如基于云的服务)意味着，每当数据在不同的法律领域流通时，就必须考虑到法律限制和义务。在本文中，我们开始研究一种处理跨越不同法律领域的数据时出现的复杂问题的方法。我们的方法包括一个概念模型，该模型考虑了法律领域的概念(与相应的数据配对)，以及一个参考体系结构，用于在实际的关系DBMS中实现我们的方法。

引用次数: 1

LDBC: benchmarks for graph and RDF data management LDBC:图和RDF数据管理的基准

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2527070

P. Boncz

The Linked Data Benchmark Council (LDBC) is an EU project that aims to develop industry-strength benchmarks for graph and RDF data management systems. LDBC introduces a so-called "choke-point" based benchmark development, through which experts identify key technical challenges, and introduce them in the benchmark workload, which we describe in some detail. We also present the status of two LDBC benchmarks currently in development, one targeting graph data management systems using a social network data case, and the other targeting RDF systems using a data publishing case.

关联数据基准委员会(LDBC)是一个欧盟项目，旨在为图和RDF数据管理系统开发行业强度的基准。LDBC引入了一种所谓的基于“阻塞点”的基准测试开发，专家通过它确定关键的技术挑战，并在基准测试工作负载中引入它们，我们将对此进行详细描述。我们还介绍了目前正在开发的两个LDBC基准的状态，一个针对使用社交网络数据案例的图数据管理系统，另一个针对使用数据发布案例的RDF系统。

引用次数: 24

Big data: a research agenda 大数据:一个研究议程

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2527071

A. Cuzzocrea, D. Saccá, J. Ullman

Recently, a great deal of interest for Big Data has risen, mainly driven from a widespread number of research problems strongly related to real-life applications and systems, such as representing, modeling, processing, querying and mining massive, distributed, large-scale repositories (mostly being of unstructured nature). Inspired by this main trend, in this paper we discuss three important aspects of Big Data research, namely OLAP over Big Data, Big Data Posting, and Privacy of Big Data. We also depict future research directions, hence implicitly defining a research agenda aiming at leading future challenges in this research field.

最近，人们对大数据产生了浓厚的兴趣，这主要是由于与现实生活中的应用和系统密切相关的大量研究问题，如表示、建模、处理、查询和挖掘大规模、分布式、大规模存储库(大多是非结构化的)。受这一主要趋势的启发，本文讨论了大数据研究的三个重要方面，即大数据的OLAP、大数据发布和大数据隐私。我们还描绘了未来的研究方向，从而隐含地定义了一个研究议程，旨在引领该研究领域未来的挑战。

引用次数: 122

A hybrid page layout integrating PAX and NSM 集成PAX和NSM的混合页面布局

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513643

G. Graefe, Ilia Petrov, Todor Ivanov, Veselin Marinov

The paper explores a hybrid page layout (HPL), combining the advantages of NSM and PAX. The design defines a continuum between NSM and PAX supporting both efficient scans minimizing cache faults and efficient insertions and updates. Our evaluation shows that HPL fills the PAX-NSM performance gap.

本文结合NSM和PAX的优点，探讨了一种混合页面布局(HPL)。该设计定义了NSM和PAX之间的连续体，既支持有效的扫描，又支持最小化缓存故障和有效的插入和更新。我们的评估表明，HPL填补了PAX-NSM的性能差距。

引用次数: 2

Approximate high-dimensional nearest neighbor queries using R-forests 使用r森林近似高维最近邻查询

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513652

Michael Nolen, King-Ip Lin

Highly efficient query processing on high-dimensional data, while important, is still a challenge nowadays -- as the curse of dimensionality makes efficient solution very difficult. On the other hand, there have been suggestions that it is better off if one can return a solution quickly, that is close enough, to be sufficient. In this paper we will introduce the concept R-Forest, comprised of a set of disjoint R-trees built over the domain of the search space. Each R-tree will store a sub-set of points in a non-overlapping space, which is maintained throughout the life of the forest. Also included are several new features, Median point used for ordering and searching a pruning parameter, as well as restricted access. When all of these are combined together they can be used to answer Approximate Nearest Neighbor queries, returning a result that is an improvement over alternative methods, such as Locality Sensitive Hashing B-Tree (LSB-tree) with the same amount of IO. With our approach to this difficult problem, we are able to handle different data distribution, even taking advantage of the distribution without any additional parameter tuning, scales with increasing dimensionality and most importantly provides the user with some feedback, in terms of lower bound as to the quality of the results.

高维数据的高效查询处理虽然很重要，但目前仍然是一个挑战——因为维度的诅咒使得高效的解决方案非常困难。另一方面，也有人建议，如果能够迅速返回解决方案，那就更好了，这样就足够了。在本文中，我们将引入R-Forest的概念，它由建立在搜索空间域上的一组不相交的r树组成。每棵r树将在一个不重叠的空间中存储一个点的子集，这将在森林的整个生命周期中保持。还包括几个新特性，用于排序和搜索修剪参数的中值点，以及限制访问。当所有这些组合在一起时，它们可用于回答近似最近邻查询，返回的结果优于其他方法，例如具有相同IO量的Locality Sensitive哈希B-Tree (lsdb -tree)。通过我们解决这个难题的方法，我们能够处理不同的数据分布，甚至在没有任何额外参数调整的情况下利用分布，随着维数的增加而扩大，最重要的是为用户提供一些反馈，就结果质量的下限而言。

{"title":"Approximate high-dimensional nearest neighbor queries using R-forests","authors":"Michael Nolen, King-Ip Lin","doi":"10.1145/2513591.2513652","DOIUrl":"https://doi.org/10.1145/2513591.2513652","url":null,"abstract":"Highly efficient query processing on high-dimensional data, while important, is still a challenge nowadays -- as the curse of dimensionality makes efficient solution very difficult. On the other hand, there have been suggestions that it is better off if one can return a solution quickly, that is close enough, to be sufficient. In this paper we will introduce the concept R-Forest, comprised of a set of disjoint R-trees built over the domain of the search space. Each R-tree will store a sub-set of points in a non-overlapping space, which is maintained throughout the life of the forest. Also included are several new features, Median point used for ordering and searching a pruning parameter, as well as restricted access. When all of these are combined together they can be used to answer Approximate Nearest Neighbor queries, returning a result that is an improvement over alternative methods, such as Locality Sensitive Hashing B-Tree (LSB-tree) with the same amount of IO. With our approach to this difficult problem, we are able to handle different data distribution, even taking advantage of the distribution without any additional parameter tuning, scales with increasing dimensionality and most importantly provides the user with some feedback, in terms of lower bound as to the quality of the results.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"16 1","pages":"48-57"},"PeriodicalIF":0.0,"publicationDate":"2013-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75128120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Efficiency and precision trade-offs in graph summary algorithms 图摘要算法中效率和精度的权衡

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513654

S. Campinas, Renaud Delbru, G. Tummarello

In many applications, it is convenient to substitute a large data graph with a smaller homomorphic graph. This paper investigates approaches for summarising massive data graphs. In general, massive data graphs are processed using a shared-nothing infrastructure such as MapReduce. However, accurate graph summarisation algorithms are suboptimal for this kind of environment as they require multiple iterations over the data graph. We investigate approximate graph summarisation algorithms that are efficient to compute in a shared-nothing infrastructure. We define a quality assessment model of a summary with regards to a gold standard summary. We evaluate over several datasets the trade-offs between efficiency and precision of the algorithms. With regards to an application, experiments highlight the need to trade-off the precision and volume of a graph summary with the complexity of a summarisation technique.

在许多应用中，用较小的同态图代替较大的数据图是很方便的。本文研究了海量数据图的总结方法。一般来说，大量数据图是使用无共享的基础设施(如MapReduce)处理的。然而，对于这种环境，精确的图摘要算法不是最优的，因为它们需要对数据图进行多次迭代。我们研究了在无共享基础设施中有效计算的近似图形摘要算法。我们根据金标准摘要定义了摘要的质量评估模型。我们在几个数据集上评估了算法的效率和精度之间的权衡。在应用程序方面，实验强调需要权衡图形摘要的精度和体积与摘要技术的复杂性。

引用次数: 33

Personalized progressive filtering of skyline queries in high dimensional spaces 个性化的渐进式过滤在高维空间的天际线查询

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513646

Yann Loyer, Isma Sadoun, K. Zeitouni

Skyline queries were introduced to formulate multi-criteria searches. Such a query tries to select in a relation the tuples that optimize all the criteria, called dominant tuples. There rarely exists a single dominant tuple, but usually a set of incomparable ones, the skyline set. Unfortunately, the deterioration of the query (the size of its answer) increases proportionally with the number of criteria. To address this limitation, we propose a flexible approach to categorize and refine the skyline set by applying successive relaxations of the dominance conditions with respect to user's preferences. Our approach, called θ-skyline, is based on decision theory which deals with decision-making in the presence of conflicting choices. We also define global ranking method over the skyline set.

引入了Skyline查询来制定多条件搜索。这样的查询试图在关系中选择优化所有标准的元组，称为主导元组。很少存在一个占主导地位的元组，但通常是一组不可比较的元组，即天际线组。不幸的是，查询的退化(答案的大小)随着标准的数量成比例地增加。为了解决这一限制，我们提出了一种灵活的方法，通过应用相对于用户偏好的主导条件的连续放松来分类和完善天际线集。我们的方法，称为θ-skyline，基于决策理论，该理论处理存在冲突选择的决策。我们还定义了天际线集合的全局排序方法。

引用次数: 7

Sequential pattern mining from trajectory data 基于轨迹数据的顺序模式挖掘

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513653

E. Masciari, Barzan Mozafari

In this paper, we study the problem of mining for frequent trajectories, which is crucial in many application scenarios, such as vehicle traffic management, hand-off in cellular networks, supply chain management. We approach this problem as that of mining for frequent sequential patterns. Our approach consists of a partitioning strategy for incoming streams of trajectories in order to reduce the trajectory size and represent trajectories as strings. We mine frequent trajectories using a sliding windows approach combined with a counting algorithm that allows us to promptly update the frequency of patterns. In order to make counting really efficient, we represent frequent trajectories by prime numbers, whereby the Chinese reminder theorem can then be used to expedite the computation.

在本文中，我们研究了频繁轨迹的挖掘问题，这在许多应用场景中是至关重要的，如车辆交通管理，蜂窝网络中的切换，供应链管理。我们把这个问题看作是挖掘频繁序列模式的问题。我们的方法包括对进入的轨迹流的划分策略，以减少轨迹大小并将轨迹表示为字符串。我们使用滑动窗口方法结合计数算法挖掘频繁轨迹，使我们能够及时更新模式的频率。为了使计数真正有效，我们用质数表示频繁轨迹，这样中国提醒定理就可以用来加快计算速度。

引用次数: 16

Top-k join queries: overcoming the curse of anti-correlation Top-k连接查询:克服反相关的诅咒

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513645

Manish Patil, R. Shah, Sharma V. Thankachan

The existing heuristics for top-k join queries, aiming to minimize the scan-depth, rely heavily on scores and correlation of scores. It is known that for uniformly random scores between two relations of length n, scan-depth of √kn is required. Moreover, optimizing multiple criteria of selections that are anti-correlated may require scan-depth up to (n + k)/2. We build a linear space index, which in anticipation of worst-case queries maintains a subset of answers. Based on this, we achieve Õ(√kn) join trials i.e., average case performance even for the worst-case queries. The experimental evaluation shows superior performance against the well-known Rank-Join algorithm.

现有的top-k连接查询的启发式算法，旨在最小化扫描深度，严重依赖于分数和分数的相关性。已知对于长度为n的两个关系之间的均匀随机分数，需要√kn的扫描深度。此外，优化反相关选择的多个标准可能需要高达(n + k)/2的扫描深度。我们建立了一个线性空间索引，它在预测最坏情况查询时维护了一个答案子集。在此基础上，我们实现了Õ(√kn)的连接试验，即即使是最坏情况下的查询，平均情况下的性能。实验结果表明，该算法优于著名的Rank-Join算法。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. International Database Engineering and Applications Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀