Proceedings. International Database Engineering and Applications Symposium最新文献

英文中文

Load balance for semantic cluster-based data integration systems 基于语义集群的数据集成系统的负载平衡

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513648

Edemberg Rocha da Silva, G. H. B. Souza, A. Salgado

Data integration systems based on Peer-to-Peer environments have been developed to integrate dynamic, autonomous and heterogeneous data sources on the Web. Some of these systems adopt semantic approaches for clustering their data sources, reducing the search space. However, the clusters may become overloaded and traditional strategies of load balance are not suitable to semantic clusters. In this paper, we discuss limitations of load balance strategies in semantic clusters. In addition, we propose a solution for this load balance and we present some experimental results.

基于点对点环境的数据集成系统被开发出来用于集成Web上的动态、自治和异构数据源。其中一些系统采用语义方法对数据源进行聚类，从而减少了搜索空间。然而，语义集群可能会出现负载过重的情况，传统的负载均衡策略不适合语义集群。在本文中，我们讨论了负载平衡策略在语义集群中的局限性。此外，我们提出了一种解决方案，并给出了一些实验结果。

引用次数: 1

On-the-fly generation of multidimensional data cubes for web of things 面向物联网的多维数据集动态生成

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513655

Muntazir Mehdi, Ratnesh Sahay, Wassim Derguech, E. Curry

The dynamicity of sensor data sources and publishing real-time sensor data over a generalised infrastructure like the Web pose a new set of integration challenges. Semantic Sensor Networks demand excessive expressivity for efficient formal analysis of sensor data. This article specifically addresses the problem of adapting data model specific or context-specific properties in automatic generation of multidimensional data cubes. The idea is to generate data cubes on-the-fly from syntactic sensor data to sustain decision making, event processing and to publish this data as Linked Open Data.

传感器数据源的动态性和在Web等通用基础设施上发布实时传感器数据带来了一系列新的集成挑战。语义传感器网络要求对传感器数据进行高效的形式化分析。本文专门讨论在自动生成多维数据集时如何调整特定于数据模型或特定于上下文的属性。这个想法是从语法传感器数据动态生成数据立方体，以支持决策制定、事件处理，并将这些数据作为链接开放数据发布。

引用次数: 16

Matching bounds for the all-pairs MapReduce problem 全对MapReduce问题的匹配边界

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513663

F. Afrati, J. Ullman

The all-pairs problem is an input-output relationship where each output corresponds to a pair of inputs, and each pair of inputs has a corresponding output. It models similarity joins where no simplification of the search for similar pairs, e.g., locality-sensitive hashing, is possible, and each input must be compared with every other input to determine those pairs that are "similar." When implemented by a MapReduce algorithm, there was a gap, a factor of 2, between the lower bound on necessary communication and the communication required by the best known algorithm. In this brief paper we show that the lower bound can essentially be met.

全对问题是一种输入-输出关系，其中每个输出对应一对输入，而每对输入都有相应的输出。它对相似性连接进行建模，其中不可能简化对相似对的搜索，例如，对位置敏感的散列，并且必须将每个输入与每个其他输入进行比较，以确定那些“相似”的对。当由MapReduce算法实现时，在必要通信的下限与最知名算法所需的通信之间存在2倍的差距。在这篇简短的文章中，我们证明了下界基本上是可以满足的。

引用次数: 9

Near real-time with traditional data warehouse architectures: factors and how-to 接近实时的传统数据仓库架构:因素和操作方法

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513650

Nickerson Ferreira, P. Martins, P. Furtado

Traditional data warehouses integrate new data during lengthy offline periods, with indexes being dropped and rebuilt for efficiency reasons. There is the idea that these and other factors make them unfit for realtime warehousing. We analyze how a set of factors influence near-realtime and frequent loading capabilities, and what can be done to improve near-realtime capacity using a traditional architecture. We analyze how the query workload affects and is affected by the ETL process and the influence of factors such as the type of load strategy, the size of the load data, indexing, integrity constraints, refresh activity over summary data, and fact table partitioning. We evaluate the factors experimentally and show that partitioning is an important factor to deliver near-realtime capacity.

传统的数据仓库在长时间的脱机期间集成新数据，出于效率原因，索引会被删除和重建。有一种观点认为，这些和其他因素使它们不适合实时仓储。我们分析了一组因素如何影响近实时和频繁加载能力，以及使用传统架构可以做些什么来提高近实时能力。我们将分析查询工作负载如何影响ETL流程，以及诸如负载策略的类型、负载数据的大小、索引、完整性约束、对汇总数据的刷新活动和事实表分区等因素的影响。我们通过实验评估了这些因素，并表明分区是提供近实时容量的重要因素。

引用次数: 9

Content-based annotation and classification framework: a general multi-purpose approach 基于内容的注释和分类框架:一种通用的多用途方法

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513651

Michal Batko, Jan Botorek, Petra Budíková, P. Zezula

Unprecedented amounts of digital data are becoming available nowadays, but frequently the data lack some semantic information necessary to effectively organize these resources. For images in particular, textual annotations that represent the semantics are highly desirable. Only a small percentage of images is created with reliable annotations, therefore a lot of effort is being invested into automatic image annotation. In this paper, we address the annotation problem from a general perspective and introduce a new annotation model that is applicable to many text assignment problems. We also provide experimental results from several implemented instances of our model.

如今，大量的数字数据变得可用，但这些数据往往缺乏有效组织这些资源所必需的一些语义信息。特别是对于图像，非常需要表示语义的文本注释。只有一小部分图像是用可靠的注释创建的，因此在自动图像注释上投入了大量的精力。在本文中，我们从一般的角度来解决标注问题，并引入了一种适用于许多文本分配问题的新的标注模型。我们还提供了我们模型的几个实现实例的实验结果。

引用次数: 10

Evaluating skyline queries over vertically partitioned tables 评估垂直分区表上的skyline查询

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513647

J. Subero, Marlene Goncalves

In recent years, many researchers have been interested in the problem of Skyline query evaluation because this kind of queries allows to filter high volumes of data. Skyline queries return those objects that are the best ones according to multiple user's criteria. In this work, we propose two algorithms to evaluate Skyline queries over Vertically Partitioned Tables (VPTs). Additionally, we have performed an experimental study that shows our proposed algorithms outperform the existing state-of-art algorithms.

近年来，许多研究人员对Skyline查询评估问题很感兴趣，因为这种查询允许过滤大量数据。Skyline查询将根据多个用户的标准返回那些最好的对象。在这项工作中，我们提出了两种算法来评估垂直分区表(vpt)上的Skyline查询。此外，我们进行了一项实验研究，表明我们提出的算法优于现有的最先进算法。

引用次数: 0

Self-managing online partitioner for databases (SMOPD): a vertical database partitioning system with a fully automatic online approach 数据库的自管理在线分区器(SMOPD):一种垂直数据库分区系统，采用全自动在线方法

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513649

Liangzhe Li, L. Gruenwald

A key factor of measuring database performance is query response time, which is dominated by I/O time. Database partitioning is among techniques that can help users reduce the I/O time significantly. However, how to efficiently partition tables in a database is not an easy problem, especially when we want to have this partitioning task done automatically by the system itself. This paper introduces an algorithm called Self-Managing Online Partitioner for Databases (SMOPD) in vertical partitioning based on closed item sets mining from a query set and system statistic information mined from system statistic views. This algorithm can dynamically monitor the database performance using user-configured parameters and automatically detect the performance trend so that it can decide when to perform a re-partitioning action without feedback from DBAs. This algorithm can free DBAs from the heavy tasks of keeping monitoring the system and struggling against the large statistic tables. The paper also presents the experimental results evaluating the performance of the algorithm using the TPC-H benchmark.

衡量数据库性能的一个关键因素是查询响应时间，它主要由I/O时间决定。数据库分区是可以帮助用户显著减少I/O时间的技术之一。然而，如何有效地对数据库中的表进行分区并不是一个容易的问题，特别是当我们希望由系统本身自动完成此分区任务时。本文介绍了一种基于从查询集中挖掘封闭项集和从系统统计视图中挖掘系统统计信息的数据库自管理在线分区(SMOPD)垂直分区算法。该算法可以使用用户配置的参数动态监视数据库性能，并自动检测性能趋势，以便在没有dba反馈的情况下决定何时执行重新分区操作。该算法可以将dba从监视系统和处理大型统计表的繁重任务中解放出来。文中还给出了用TPC-H基准测试评价算法性能的实验结果。

{"title":"Self-managing online partitioner for databases (SMOPD): a vertical database partitioning system with a fully automatic online approach","authors":"Liangzhe Li, L. Gruenwald","doi":"10.1145/2513591.2513649","DOIUrl":"https://doi.org/10.1145/2513591.2513649","url":null,"abstract":"A key factor of measuring database performance is query response time, which is dominated by I/O time. Database partitioning is among techniques that can help users reduce the I/O time significantly. However, how to efficiently partition tables in a database is not an easy problem, especially when we want to have this partitioning task done automatically by the system itself. This paper introduces an algorithm called Self-Managing Online Partitioner for Databases (SMOPD) in vertical partitioning based on closed item sets mining from a query set and system statistic information mined from system statistic views. This algorithm can dynamically monitor the database performance using user-configured parameters and automatically detect the performance trend so that it can decide when to perform a re-partitioning action without feedback from DBAs. This algorithm can free DBAs from the heavy tasks of keeping monitoring the system and struggling against the large statistic tables. The paper also presents the experimental results evaluating the performance of the algorithm using the TPC-H benchmark.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"195 1","pages":"168-173"},"PeriodicalIF":0.0,"publicationDate":"2013-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75540753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Read optimisations for append storage on flash 读取优化附加存储在闪存上

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513640

R. Gottstein, Ilia Petrov, A. Buchmann

Append-/Log-based Storage Managers (LbSM) for database systems represent a good match for the characteristics and behaviour of Flash technology. LbSM alleviate random writes reducing the impact of Flash read/write asymmetry, increasing endurance and performance. A recently proposed combination of Multi-Versioning database approaches and LbSM called SIAS [9] offers further benefits: it substantially lowers the write rate due to tuple version append granularity and therefore improves the performance. In SIAS a page contains versions of tuples of the same table. Once appended such a page is immutable. The only allowable operations are reads (lookups, scans, version visibility checks) in tuple version granularity. Optimising for them offers an essential performance increase. In the present work-in-progress paper we propose two types of read optimisations: Multi-Version Index and Ordered Log Storage. Benefits of Ordered Log Storage: (i) Read efficiency due to the use of parallel read streams; (ii) Write efficiency since larger amounts of data are appended sequentially; (iii) fast garbage collection: read multiple sorted runs, filter dead tuples and write one single, large (combined) sorted run. (iv) possible cache-efficiency optimisations (for large scans) Benefits of Multi-Version Indexing: (i) index only visibility checks; (ii) postponing of index reorganisations; (iii) no invalid tuple bits in the index (in-place updates); (iv) pre-filtering of invisible tuple versions; (v) facilitate easy identification of tuple versions to be garbage collected. Benefits of the combination of both approaches: (i) Index and ordered access; (ii) Facilitate range searches in sorted runs; (iii) on the fly garbage collection (checking of one bit).

数据库系统的基于附加/日志的存储管理器(LbSM)很好地匹配了Flash技术的特性和行为。LbSM减轻了随机写，减少了Flash读写不对称的影响，提高了耐用性和性能。最近提出的多版本数据库方法和LbSM的组合称为SIAS[9]，它提供了进一步的好处:由于元组版本追加粒度，它大大降低了写入速率，从而提高了性能。在SIAS中，一个页面包含同一表的元组的不同版本。一旦添加，这样的页面是不可变的。唯一允许的操作是元组版本粒度中的读取(查找、扫描、版本可见性检查)。对它们进行优化可以提高性能。在目前正在进行的论文中，我们提出了两种类型的读取优化:多版本索引和有序日志存储。有序日志存储的好处:(i)由于使用并行读流，读取效率高;写效率，因为大量的数据是按顺序追加的;(iii)快速垃圾收集:读取多个排序运行，过滤死元组并写入单个，大型(组合)排序运行。(iv)可能的缓存效率优化(对于大型扫描)多版本索引的好处:(i)仅索引可见性检查;(ii)推迟指数重组;(iii)索引中没有无效的元组位(就地更新);(iv)预过滤不可见元组版本;(v)方便识别要被垃圾收集的元组版本。两种方法结合的好处:(i)索引和有序访问;方便按顺序搜索范围;(iii)动态垃圾收集(检查一个位)。

{"title":"Read optimisations for append storage on flash","authors":"R. Gottstein, Ilia Petrov, A. Buchmann","doi":"10.1145/2513591.2513640","DOIUrl":"https://doi.org/10.1145/2513591.2513640","url":null,"abstract":"Append-/Log-based Storage Managers (LbSM) for database systems represent a good match for the characteristics and behaviour of Flash technology. LbSM alleviate random writes reducing the impact of Flash read/write asymmetry, increasing endurance and performance. A recently proposed combination of Multi-Versioning database approaches and LbSM called SIAS [9] offers further benefits: it substantially lowers the write rate due to tuple version append granularity and therefore improves the performance. In SIAS a page contains versions of tuples of the same table. Once appended such a page is immutable. The only allowable operations are reads (lookups, scans, version visibility checks) in tuple version granularity. Optimising for them offers an essential performance increase. In the present work-in-progress paper we propose two types of read optimisations: Multi-Version Index and Ordered Log Storage.\u0000 Benefits of Ordered Log Storage: (i) Read efficiency due to the use of parallel read streams; (ii) Write efficiency since larger amounts of data are appended sequentially; (iii) fast garbage collection: read multiple sorted runs, filter dead tuples and write one single, large (combined) sorted run. (iv) possible cache-efficiency optimisations (for large scans)\u0000 Benefits of Multi-Version Indexing: (i) index only visibility checks; (ii) postponing of index reorganisations; (iii) no invalid tuple bits in the index (in-place updates); (iv) pre-filtering of invisible tuple versions; (v) facilitate easy identification of tuple versions to be garbage collected.\u0000 Benefits of the combination of both approaches: (i) Index and ordered access; (ii) Facilitate range searches in sorted runs; (iii) on the fly garbage collection (checking of one bit).","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"65 1","pages":"106-113"},"PeriodicalIF":0.0,"publicationDate":"2013-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80386698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Breaking skyline computation down to the metal: the skyline breaker algorithm 将天际线计算分解为金属:天际线破坏者算法

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513637

D. Köppl

Given a sequential input connection, we tackle parallel skyline computation of the read data by means of a spatial tree structure for indexing fine-grained feature vectors. For this purpose, multiple local split decision trees are simultaneously filled before the actual computation starts. We exploit the special tree structure to clip parts of the tree without depth-first search. The split of the data allows us to do this step in a divide and conquer manner. With this schedule we seek to provide an algorithm robust against the "dimension curse" and different data distributions.

给定顺序输入连接，我们通过索引细粒度特征向量的空间树结构来解决读取数据的并行天际线计算。为此，在实际计算开始之前，同时填充多个局部分割决策树。我们利用特殊的树形结构来剪切树的部分，而不需要深度优先搜索。数据的分割允许我们以分而治之的方式完成这一步。通过这个计划，我们寻求提供一种对“维度诅咒”和不同数据分布具有鲁棒性的算法。

引用次数: 4

Verification of k-coverage on query line segments 查询线段上k-coverage的验证

Proceedings. International Database Engineering and Applications Symposium

Pub Date : 2013-10-09 DOI: 10.1145/2513591.2513639

Kun-Han Juang, En Tzu Wang, Chieh-Feng Chiang, Arbee L. P. Chen

The coverage problem is one of the fundamental problems in sensor networks, which reflects the degree of a region being monitored by sensors. In this paper, we make the first attempt to address the k-coverage verification problem regarding a given query line segment, which returns all sub-segments from the line segment that are covered by at least k sensors. To deal with the problem, we propose three methods based on the R-tree index. The first method is the most primitive one, which identifies all intersection points of the query line segment and the circumferences of the covering regions of the sensors and then checks each sub-segment to see whether it is k-coverage. Improving from the first method, the second method calculates the lower bound of the number of sensors covering a specific sub-segment to reduce the computation costs. The third method partitions the query line segment into sub-segments with equal length and then verifies each of them. A series of experiments on a real dataset and two synthetic datasets are performed to evaluate these methods. The experiment results demonstrate that the third method has the best performance among all three methods.

覆盖问题是传感器网络的基本问题之一，它反映了一个区域被传感器监控的程度。在本文中，我们首次尝试解决关于给定查询线段的k覆盖验证问题，该查询线段返回至少k个传感器覆盖的线段中的所有子段。为了解决这个问题，我们提出了三种基于r树索引的方法。第一种方法是最原始的方法，它识别查询线段的所有交点与传感器覆盖区域的周长，然后检查每个子线段是否为k覆盖。第二种方法是在第一种方法的基础上改进的，通过计算覆盖特定子段的传感器个数的下界来降低计算成本。第三种方法将查询行段划分为长度相等的子段，然后对每个子段进行验证。在一个真实数据集和两个合成数据集上进行了一系列实验来评估这些方法。实验结果表明，第三种方法在三种方法中性能最好。

{"title":"Verification of k-coverage on query line segments","authors":"Kun-Han Juang, En Tzu Wang, Chieh-Feng Chiang, Arbee L. P. Chen","doi":"10.1145/2513591.2513639","DOIUrl":"https://doi.org/10.1145/2513591.2513639","url":null,"abstract":"The coverage problem is one of the fundamental problems in sensor networks, which reflects the degree of a region being monitored by sensors. In this paper, we make the first attempt to address the k-coverage verification problem regarding a given query line segment, which returns all sub-segments from the line segment that are covered by at least k sensors. To deal with the problem, we propose three methods based on the R-tree index. The first method is the most primitive one, which identifies all intersection points of the query line segment and the circumferences of the covering regions of the sensors and then checks each sub-segment to see whether it is k-coverage. Improving from the first method, the second method calculates the lower bound of the number of sensors covering a specific sub-segment to reduce the computation costs. The third method partitions the query line segment into sub-segments with equal length and then verifies each of them. A series of experiments on a real dataset and two synthetic datasets are performed to evaluate these methods. The experiment results demonstrate that the third method has the best performance among all three methods.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"192 1","pages":"114-121"},"PeriodicalIF":0.0,"publicationDate":"2013-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74197649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings. International Database Engineering and Applications Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀