Proceedings of the 22nd International Database Engineering & Applications Symposium最新文献

英文中文

Leveraging linked entities to estimate focus time of short texts 利用链接实体来估计短文本的焦点时间

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216158

C. Morbidoni, A. Cucchiarelli, D. Ursino

Time is a useful dimension to explore in text databases especially when historical and factual information is concerned. As documents generally refer to different events and time periods, understanding the focus time of key sentences, defined as the time the content refers to, is a crucial task to temporally annotate a document. In this paper, we leverage a bag of linked entities representation of sentences and temporal information from Wikipedia and DBpedia to implement a novel approach to focus time estimation. We evaluate our approach on sample datasets and compare it with a state of the art method, measuring improvements in MRR.

时间是在文本数据库中探索的一个有用的维度，特别是当涉及历史和事实信息时。由于文件通常指的是不同的事件和时间段，因此了解关键句子的焦点时间(即内容所指的时间)是对文件进行时间注释的关键任务。在本文中，我们利用来自维基百科和DBpedia的句子和时态信息的链接实体表示来实现一种新的焦点时间估计方法。我们在样本数据集上评估我们的方法，并将其与最先进的方法进行比较，测量MRR的改进。

引用次数: 8

On Improving Data Skew Resilience In Main-memory Hash Joins 提高主存哈希连接中数据倾斜弹性的研究

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216156

Puya Memarzia, S. Ray, V. Bhavsar

Main memory hash joins are an important category of in-memory joins. However, the performance of these joins can be hindered by dataset skew, shuffling, and load balancing. We conducted a comprehensive study on the effects of dataset skew on four hash join algorithms. We show that hash joins are acutely affected by dataset skew, and the performance gets worse with shuffled data. To address these issues, we propose non-partitioning hash joins using two different hash tables. First, we use a separate chaining hash table that is based on an existing implementation that we have modified. This version outperforms the original implementation on skewed datasets by up to three orders of magnitude. Second, we propose a novel hash table for hash joins, called Maple hash table. We demonstrate that this hash table is better suited to skewed and/or shuffled datasets. Moreover, this approach further improves performance by up to 17.3×.

主内存哈希连接是内存连接的一个重要类别。然而，这些连接的性能可能会受到数据集倾斜、变换和负载平衡的影响。我们对数据集倾斜对四种散列连接算法的影响进行了全面的研究。我们展示了哈希连接受到数据集倾斜的严重影响，并且对打乱的数据性能会变得更差。为了解决这些问题，我们建议使用两个不同的哈希表进行非分区哈希连接。首先，我们使用一个单独的链散列表，该列表基于我们修改过的现有实现。这个版本在倾斜数据集上的性能比原来的实现高出三个数量级。其次，我们提出了一种新的哈希表用于哈希连接，称为Maple哈希表。我们证明了这个哈希表更适合倾斜和/或洗牌的数据集。此外，这种方法进一步提高了17.3倍的性能。

引用次数: 0

Data Volume Based Data Gathering in WSNs using Mobile Data Collector 基于数据量的移动数据采集器无线传感器网络数据采集

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216166

Syed Muhammad Abrar Akber, I. Khan, S. S. Muhammad, Syed Muhammad Mohsin, I. Khan, Shahaboddin Shamshirband, Anthony T. Chronopoulos

Data collection and transmission are the fundamental operations of WSNs. The performance of WSNs relies upon these essential tasks because data gathering directly affects the efficiency and lifetime of WSNs. This paper presents a data volume based data collection technique using Mobile Data Collector (MDC). In this technique, the MDC uses data volume information to plan visits to the nodes. The MDC visits only those nodes which have generated data while the rests of the nodes are ignored. This scheme is validated with the help of simulations, and the results are compared with existing renowned techniques. The results show that the proposed scheme is energy efficient.

数据采集和传输是无线传感器网络的基本工作。无线传感器网络的性能依赖于这些基本任务，因为数据收集直接影响无线传感器网络的效率和寿命。提出了一种基于移动数据采集器(MDC)的基于数据量的数据收集技术。在这种技术中，MDC使用数据卷信息来规划对节点的访问。MDC只访问已生成数据的节点，而忽略其余节点。通过仿真验证了该方案的有效性，并将结果与现有知名技术进行了比较。结果表明，该方案具有较好的节能效果。

引用次数: 6

Algorithms for Computing Approximate Certain Answers over Incomplete Databases 在不完全数据库上计算近似特定答案的算法

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3220542

S. Greco, Cristian Molinaro, I. Trubitsyna

Incomplete information arises in many database applications, such as data integration, data exchange, inconsistency management, data cleaning, ontological reasoning, and many others. A principled way of answering queries over incomplete databases is to compute certain answers, which are query answers that can be obtained from every complete database represented by an incomplete one. For databases containing (labeled) nulls, certain answers to positive queries can be easily computed in polynomial time, but for more general queries with negation the problem becomes coNP-hard. To make query answering feasible in practice, one might resort to SQL's evaluation, but unfortunately, the way SQL behaves in the presence of nulls may result in wrong answers. Thus, on the one hand, SQL's evaluation is efficient but flawed, on the other hand, certain answers are a principled semantics but with high complexity. To deal with issue, recent research has focused on developing polynomial time approximation algorithms for computing (approximate) certain answers. This paper surveys recent advances in this area.

不完全信息出现在许多数据库应用程序中，例如数据集成、数据交换、不一致管理、数据清理、本体推理等。回答不完整数据库上的查询的一种原则方法是计算某些答案，这些答案是可以从由不完整数据库表示的每个完整数据库中获得的查询答案。对于包含(标记的)null的数据库，可以在多项式时间内轻松计算出正查询的某些答案，但对于更一般的带有否定的查询，这个问题就变得很难计算了。为了使查询回答在实践中可行，可能会求助于SQL的求值，但不幸的是，SQL在null存在时的行为方式可能会导致错误的答案。因此，一方面，SQL的求值是有效的，但有缺陷;另一方面，某些答案是有原则的语义，但具有很高的复杂性。为了解决这个问题，最近的研究集中在开发多项式时间近似算法来计算(近似)某些答案。本文综述了这一领域的最新进展。

引用次数: 3

A Tensor Based Data Model for Polystore: An Application to Social Networks Data 基于张量的Polystore数据模型:在社交网络数据中的应用

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216152

É. Leclercq, M. Savonnet

In this article, we show how the mathematical object tensor can be used to build a multi-paradigm model for the storage of social data in data warehouses. From an architectural point of view, our approach allows to link different storage systems (polystore) and limits the impact of ETL tools performing model transformations required to feed different analysis algorithms. Therefore, systems can take advantage of multiple data models both in terms of query execution performance and the semantic expressiveness of data representation. The proposed model allows to reach the logical independence between data and programs implementing analysis algorithms. With a concrete case study on message virality on Twitter during the French presidential election of 2017, we highlight some of the contributions of our model.

在本文中，我们将展示如何使用数学对象张量来构建用于在数据仓库中存储社交数据的多范式模型。从架构的角度来看，我们的方法允许连接不同的存储系统(polystore)，并限制ETL工具执行模型转换所需的影响，以提供不同的分析算法。因此，系统可以在查询执行性能和数据表示的语义表达性方面利用多种数据模型。所提出的模型允许实现分析算法的数据和程序之间的逻辑独立性。通过对2017年法国总统大选期间Twitter消息病毒式传播的具体案例研究，我们强调了我们模型的一些贡献。

引用次数: 7

Continuous Time-Dependent kNN Join by Binary Sketches 基于二元草图的连续时变kNN连接

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216159

Filip Nálepa, Michal Batko, P. Zezula

An important functionality of current social applications is real-time recommendation, which is responsible for suggesting relevant published data to the users based on their preferences. By representing the users and the published data in a metric space, each user can be recommended with their k nearest neighbors among the published data. We consider the scenario when the relevance of a published data item to a user decreases as the data gets older, i.e., a time-dependent distance function is applied. We define the problem as the continuous time-dependent kNN join and provide a solution to a broad range of time-dependent functions. In addition, we propose a binary sketch-based approximation technique used to speed up the join evaluation by replacing expensive metric distance computations with cheap Hamming distances.

当前社交应用的一个重要功能是实时推荐，它负责根据用户的偏好向用户推荐相关的已发布数据。通过在度量空间中表示用户和已发布的数据，可以使用已发布数据中最近的k个邻居来推荐每个用户。我们考虑这样一种场景，即当发布的数据项与用户的相关性随着数据变老而降低，即应用与时间相关的距离函数。我们将该问题定义为连续时间相关的kNN连接，并提供了广泛的时间相关函数的解决方案。此外，我们提出了一种基于二进制草图的近似技术，用于通过用便宜的汉明距离代替昂贵的度量距离计算来加快连接评估。

引用次数: 1

Distributed Learning of Process Models for Next Activity Prediction 下一活动预测过程模型的分布式学习

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216125

Michelangelo Ceci, Michele Spagnoletta, Pasqua Fabiana Lanotte, D. Malerba

Process mining is a research discipline that aims to discover, monitor and improve real processing using event logs. In this paper we tackle the problem of next activity prediction/recommendation via "nested prediction model" learning, that is, we first identify recurrent and frequent sequences of activities and then we learn a prediction model for each frequent sequence. The key principle underlying the design of the proposed solution is in the ability to process massive logs by means of a parallel and distributed solution (by exploiting the Spark parallel computation framework) which can make reasonable decisions in the absence of perfect models. Indeed, given the classical threshold for minimum support and a user-specified error bound, our approach exploits the Chernoff bound to mine "approximate" frequent sequences with statistical error guarantees on their actual supports. Experiments on real-world log data prove the effectiveness of the proposed approach.

过程挖掘是一门研究学科，旨在使用事件日志发现、监控和改进实际处理。在本文中，我们通过“嵌套预测模型”学习来解决下一个活动预测/推荐的问题，即我们首先识别循环和频繁的活动序列，然后为每个频繁序列学习预测模型。所提出的解决方案设计的关键原则是通过并行和分布式解决方案(通过利用Spark并行计算框架)处理大量日志的能力，该解决方案可以在没有完美模型的情况下做出合理的决策。实际上，给定最小支持度的经典阈值和用户指定的错误界，我们的方法利用Chernoff界来挖掘“近似”频繁序列，并在其实际支持度上提供统计误差保证。在实际测井数据上的实验证明了该方法的有效性。

引用次数: 6

Practical Study of Deterministic Regular Expressions from Large-scale XML and Schema Data 大规模XML和模式数据中确定性正则表达式的实用研究

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-05-31 DOI: 10.1145/3216122.3216126

Yeting Li, Xinyu Chu, Xiaoying Mou, Chunmei Dong, H. Chen

Regular expressions are a fundamental concept in computer science and widely used in various applications. In this paper we focused on deterministic regular expressions (DREs). Considering that researchers did not have large datasets as evidence before, we first harvested a large corpus of real data from the Web then conducted a practical study to investigate the usage of DREs. One feature of our work is that the data set is sufficiently large compared with previous work, which is obtained using several data collection strategies we proposed. The results show more than 98% of expressions in Relax NG are DRE, and more than 56% of expressions from RegExLib are DRE, while both Relax NG and RegExLib do not have the determinism constraint. These observations indicate that DREs are commonly used in practice. The results also show further study of subclasses of DREs is necessary. As far as we know, we are the first to analyze the determinism and the subclasses of DREs of Relax NG and RegExLib, and give these results. Furthermore, we give some discussions and applications of the data set. We find current research in new subclasses of DREs is insufficient, therefore it is necessary to do further study. We also analyze the referencing relationships among XSDs and define SchemaRank, which can be used in XML Schema design.

正则表达式是计算机科学中的一个基本概念，广泛应用于各种应用中。本文主要研究确定性正则表达式(DREs)。考虑到研究人员之前没有大型数据集作为证据，我们首先从网络上收集了大量真实数据，然后进行了实际研究，以调查DREs的使用情况。我们工作的一个特点是，与以前的工作相比，数据集足够大，这是通过我们提出的几种数据收集策略获得的。结果表明，Relax NG中超过98%的表达是DRE, RegExLib中超过56%的表达是DRE，而Relax NG和RegExLib都不具有确定性约束。这些观察结果表明，DREs在实践中被普遍使用。结果还表明，对DREs亚类的进一步研究是必要的。据我们所知，我们首先分析了Relax NG和RegExLib的DREs的决定论和子类，并给出了这些结果。在此基础上，对数据集进行了讨论和应用。我们发现目前对DREs新亚类的研究不足，因此有必要进行进一步的研究。我们还分析了xsd之间的引用关系，并定义了SchemaRank，它可用于XML Schema设计。

{"title":"Practical Study of Deterministic Regular Expressions from Large-scale XML and Schema Data","authors":"Yeting Li, Xinyu Chu, Xiaoying Mou, Chunmei Dong, H. Chen","doi":"10.1145/3216122.3216126","DOIUrl":"https://doi.org/10.1145/3216122.3216126","url":null,"abstract":"Regular expressions are a fundamental concept in computer science and widely used in various applications. In this paper we focused on deterministic regular expressions (DREs). Considering that researchers did not have large datasets as evidence before, we first harvested a large corpus of real data from the Web then conducted a practical study to investigate the usage of DREs. One feature of our work is that the data set is sufficiently large compared with previous work, which is obtained using several data collection strategies we proposed. The results show more than 98% of expressions in Relax NG are DRE, and more than 56% of expressions from RegExLib are DRE, while both Relax NG and RegExLib do not have the determinism constraint. These observations indicate that DREs are commonly used in practice. The results also show further study of subclasses of DREs is necessary. As far as we know, we are the first to analyze the determinism and the subclasses of DREs of Relax NG and RegExLib, and give these results. Furthermore, we give some discussions and applications of the data set. We find current research in new subclasses of DREs is insufficient, therefore it is necessary to do further study. We also analyze the referencing relationships among XSDs and define SchemaRank, which can be used in XML Schema design.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128449166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

The Deployment of an Enhanced Model-Driven Architecture for Business Process Management 为业务流程管理部署增强的模型驱动体系结构

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-03-19 DOI: 10.1145/3216122.3216155

R. McClatchey

Business systems these days need to be agile to address the needs of a changing world. Business modelling requires process management to be highly adaptable with the ability to support dynamic workflows, inter-application integration (potentially between businesses) and process reconfiguration. Designing in the ability to cater for evolution is critical to success. To handle change, systems need the capability to adapt as and when necessary to changes in users' requirements. Using our implementation of a self-describing system, a so-called description-driven approach, new versions of data structures or processes can be created alongside older versions providing a log of changes to the underlying data schema and enabling the gathering of traceable ("provenance") data. The CRISTAL software, which originated at CERN for handling physics data, uses versions of stored descriptions to define data and workflows which can be evolved over time and thereby to handle evolving system needs. It has been customised for use in business as the Agilium-NG product. This paper reports on how the Agilium-NG software has enabled the deployment of an unique business process management solution that can be dynamically evolved to cater for changing user requirements.

如今的业务系统需要灵活，以满足不断变化的世界的需求。业务建模要求流程管理具有高度的适应性，能够支持动态工作流、应用程序间集成(可能在业务之间)和流程重新配置。设计适应进化的能力是成功的关键。为了处理变更，系统需要能够在必要时适应用户需求的变更。使用我们的自描述系统的实现，即所谓的描述驱动方法，可以创建新版本的数据结构或流程，同时提供对底层数据模式的更改日志，并允许收集可追溯(“来源”)数据。CRISTAL软件起源于欧洲核子研究中心，用于处理物理数据，它使用存储描述的版本来定义数据和工作流，这些数据和工作流可以随着时间的推移而发展，从而处理不断发展的系统需求。它已被定制为Agilium-NG产品用于商业用途。本文报告了Agilium-NG软件如何支持独特的业务流程管理解决方案的部署，该解决方案可以动态发展以满足不断变化的用户需求。

{"title":"The Deployment of an Enhanced Model-Driven Architecture for Business Process Management","authors":"R. McClatchey","doi":"10.1145/3216122.3216155","DOIUrl":"https://doi.org/10.1145/3216122.3216155","url":null,"abstract":"Business systems these days need to be agile to address the needs of a changing world. Business modelling requires process management to be highly adaptable with the ability to support dynamic workflows, inter-application integration (potentially between businesses) and process reconfiguration. Designing in the ability to cater for evolution is critical to success. To handle change, systems need the capability to adapt as and when necessary to changes in users' requirements. Using our implementation of a self-describing system, a so-called description-driven approach, new versions of data structures or processes can be created alongside older versions providing a log of changes to the underlying data schema and enabling the gathering of traceable (\"provenance\") data. The CRISTAL software, which originated at CERN for handling physics data, uses versions of stored descriptions to define data and workflows which can be evolved over time and thereby to handle evolving system needs. It has been customised for use in business as the Agilium-NG product. This paper reports on how the Agilium-NG software has enabled the deployment of an unique business process management solution that can be dynamically evolved to cater for changing user requirements.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130461968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Proceedings of the 22nd International Database Engineering & Applications Symposium 第22届国际数据库工程与应用学术研讨会论文集

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 1900-01-01 DOI: 10.1145/3216122

引用次数: 2

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 22nd International Database Engineering & Applications Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀