Proceedings of the 22nd International Database Engineering & Applications Symposium最新文献

英文中文

Modeling Data Lake Metadata with a Data Vault 用数据库建模数据湖元数据

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216130

I. D. Nogueira, Maram Romdhane, J. Darmont

With the rise of big data, business intelligence had to find solutions for managing even greater data volumes and variety than in data warehouses, which proved ill-adapted. Data lakes answer these needs from a storage point of view, but require managing adequate metadata to guarantee an efficient access to data. Starting from a multidimensional metadata model designed for an industrial heritage data lake presenting a lack of schema evolutivity, we propose in this paper to use ensemble modeling, and more precisely a data vault, to address this issue. To illustrate the feasibility of this approach, we instantiate our metadata conceptual model into relational and document-oriented logical and physical models, respectively. We also compare the physical models in terms of metadata storage and query response time.

随着大数据的兴起，商业智能必须找到管理比数据仓库更大的数据量和种类的解决方案，而数据仓库被证明是不适应的。数据湖从存储的角度满足了这些需求，但需要管理足够的元数据以保证对数据的有效访问。本文从为工业遗产数据湖设计的多维元数据模型出发，提出使用集成建模，更准确地说是使用数据库来解决这一问题。为了说明这种方法的可行性，我们分别将元数据概念模型实例化为关系模型和面向文档的逻辑和物理模型。我们还在元数据存储和查询响应时间方面比较了物理模型。

引用次数: 20

3D Visualization of data using SuperSQL and Unity 使用SuperSQL和Unity的3D数据可视化

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216145

Tatsuki Fujimoto, Kento Goto, Motomichi Toyama

When exploring data or communicating it to other people, data is currently visualized through flat diagrams, tables, graphs, etc. Visualization of data in three dimensions (3D) offers more immersive and intuitive representations of the data and, through the added dimension, allows for more compact representations. Still, when representing large amounts of data in 3D, a fine control of the layout becomes a must. Current tools for 3D visualization do not allow for easy and fine tuned control of this layout. SuperSQL is an extension of the SQL language allowing users to declaratively and concisely specify the layout of, and generate structured documents such as web pages. In this work we extend SuperSQL to allow the generation of 3D data representations in the Unity game engine. With this system, users can represent their data through basic shapes, colors, and animations, or even their own custom 3D assets, by writing simple SQL-like queries.

在探索数据或与其他人交流时，数据目前是通过平面图表、表格、图形等可视化的。三维数据可视化(3D)提供了更加身临其境和直观的数据表示，并且通过增加的维度，允许更紧凑的表示。然而，当在3D中表示大量数据时，必须对布局进行精细控制。当前的3D可视化工具不允许对这种布局进行简单而精细的控制。SuperSQL是SQL语言的扩展，允许用户声明性地、简洁地指定布局，并生成结构化文档，如网页。在这项工作中，我们扩展了SuperSQL以允许在Unity游戏引擎中生成3D数据表示。有了这个系统，用户可以通过编写简单的类似sql的查询，通过基本的形状、颜色和动画，甚至是他们自己的自定义3D资产来表示他们的数据。

引用次数: 4

An Approach for Testing the Extract-Transform-Load Process in Data Warehouse Systems 数据仓库系统中提取-转换-加载过程的测试方法

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216149

Hajar Homayouni, Sudipto Ghosh, I. Ray

The Extract-Transform-Load (ETL) process in data warehousing involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex transformations involving sources and targets that use different schemas, databases, and technologies, which make ETL implementations fault-prone. In this paper, we present an approach for validating ETL processes using automated balancing tests that check for various types of discrepancies between the source and target data. We formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Our approach uses the rules provided in the ETL specifications to generate source-to-target mappings, from which balancing test assertions are generated for each property. We evaluated the approach on a real-world health data warehouse project and revealed 11 previously undetected faults. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse.

数据仓库中的提取-转换-加载(Extract-Transform-Load, ETL)过程包括从源数据库中提取数据，将其转换为适合研究和分析的形式，并将其加载到数据仓库中。ETL过程可以使用复杂的转换，涉及使用不同模式、数据库和技术的源和目标，这使得ETL实现容易出错。在本文中，我们提出了一种使用自动平衡测试来验证ETL过程的方法，该测试可以检查源数据和目标数据之间的各种类型的差异。我们形式化了三种性质，即完整性、一致性和句法有效性，它们必须在测试期间进行检查。我们的方法使用ETL规范中提供的规则来生成源到目标的映射，从中为每个属性生成平衡测试断言。我们在一个真实的健康数据仓库项目中评估了该方法，发现了11个以前未检测到的错误。使用突变分析，我们演示了自动生成的断言可以检测目标数据仓库内数据中的错误。

{"title":"An Approach for Testing the Extract-Transform-Load Process in Data Warehouse Systems","authors":"Hajar Homayouni, Sudipto Ghosh, I. Ray","doi":"10.1145/3216122.3216149","DOIUrl":"https://doi.org/10.1145/3216122.3216149","url":null,"abstract":"The Extract-Transform-Load (ETL) process in data warehousing involves extracting data from source databases, transforming it into a form suitable for research and analysis, and loading it into a data warehouse. ETL processes can use complex transformations involving sources and targets that use different schemas, databases, and technologies, which make ETL implementations fault-prone. In this paper, we present an approach for validating ETL processes using automated balancing tests that check for various types of discrepancies between the source and target data. We formalize three categories of properties, namely, completeness, consistency, and syntactic validity that must be checked during testing. Our approach uses the rules provided in the ETL specifications to generate source-to-target mappings, from which balancing test assertions are generated for each property. We evaluated the approach on a real-world health data warehouse project and revealed 11 previously undetected faults. Using mutation analysis, we demonstrated that our auto-generated assertions can detect faults in the data inside the target data warehouse.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129152378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Predictive Learning Framework for Monitoring Aggregated Performance Indicators over Business Process Events 用于监控业务流程事件上聚合性能指标的预测学习框架

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216143

A. Cuzzocrea, Francesco Folino, M. Guarascio, L. Pontieri

In many application contexts, a business process' executions are subject to performance constraints expressed in an aggregated form, usually over predefined time windows, and detecting a likely violation to such a constraint in advance could help undertake corrective measures for preventing it. This paper illustrates a prediction-aware event processing framework that addresses the problem of estimating whether the process instances of a given (unfinished) window w will violate an aggregate performance constraint, based on the continuous learning and application of an ensemble of models, capable each of making and integrating two kinds of predictions: single-instance predictions concerning the ongoing process instances of w, and time-series predictions concerning the "future" process instances of w (i.e. those that have not started yet, but will start by the end of w). Notably, the framework can continuously update the ensemble, fully exploiting the raw event data produced by the process under monitoring, suitably lifted to an adequate level of abstraction. The framework has been validated against historical event data coming from real-life business processes, showing promising results in terms of both accuracy and efficiency.

在许多应用程序上下文中，业务流程的执行受到以聚合形式表示的性能约束的约束，通常是在预定义的时间窗口内，提前检测可能违反此类约束的情况有助于采取纠正措施来防止这种情况发生。本文演示了一个预测感知事件处理框架，该框架基于模型集合的持续学习和应用，解决了估计给定(未完成)窗口w的过程实例是否会违反聚合性能约束的问题，每个模型都能够做出和集成两种预测:关于w的正在进行的流程实例的单实例预测，以及关于w的“未来”流程实例的时间序列预测(即那些尚未开始，但将在w结束时开始的时间序列预测)。值得注意的是，框架可以不断更新集成，充分利用由监控下的流程产生的原始事件数据，适当地提升到适当的抽象级别。该框架已经针对来自实际业务流程的历史事件数据进行了验证，在准确性和效率方面都显示出令人鼓舞的结果。

{"title":"A Predictive Learning Framework for Monitoring Aggregated Performance Indicators over Business Process Events","authors":"A. Cuzzocrea, Francesco Folino, M. Guarascio, L. Pontieri","doi":"10.1145/3216122.3216143","DOIUrl":"https://doi.org/10.1145/3216122.3216143","url":null,"abstract":"In many application contexts, a business process' executions are subject to performance constraints expressed in an aggregated form, usually over predefined time windows, and detecting a likely violation to such a constraint in advance could help undertake corrective measures for preventing it. This paper illustrates a prediction-aware event processing framework that addresses the problem of estimating whether the process instances of a given (unfinished) window w will violate an aggregate performance constraint, based on the continuous learning and application of an ensemble of models, capable each of making and integrating two kinds of predictions: single-instance predictions concerning the ongoing process instances of w, and time-series predictions concerning the \"future\" process instances of w (i.e. those that have not started yet, but will start by the end of w). Notably, the framework can continuously update the ensemble, fully exploiting the raw event data produced by the process under monitoring, suitably lifted to an adequate level of abstraction. The framework has been validated against historical event data coming from real-life business processes, showing promising results in terms of both accuracy and efficiency.","PeriodicalId":422509,"journal":{"name":"Proceedings of the 22nd International Database Engineering & Applications Symposium","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132833601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A paradigm for the cooperation of objects belonging to different IoTs 属于不同物联网的对象之间的合作范例

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216171

Giorgio Baldassarre, Paolo Lo Giudice, Lorenzo Musarella, D. Ursino

The Internet of Things (IoT) is currently considered the new frontier of the Internet. One of the most effective ways to investigate and implement IoT is based on the use of the social network paradigm. In the last years, social network researchers have introduced new models capable of capturing the growing complexity of this scenario. One of the most known of them is the Social Internetworking System, which models a scenario comprising several related social networks. In this paper, we investigate the possibility of applying the ideas characterizing the Social Internetworking System to IoT and we propose a new paradigm capable of modelling this scenario and of favoring the cooperation of objects belonging to different IoTs. Furthermore, in order to give an idea of both the potentialities and the complexity of this new paradigm, we illustrate in more detail one of the most interesting issues regarding it, namely the redefinition of the betweenness centrality measure.

物联网(IoT)目前被认为是互联网的新前沿。调查和实施物联网的最有效方法之一是基于社交网络范式的使用。在过去的几年里，社交网络研究人员引入了新的模型，能够捕捉到这种情况日益增长的复杂性。其中最著名的是社会网络系统，它模拟了一个由几个相关的社会网络组成的场景。在本文中，我们研究了将社会互联系统特征思想应用于物联网的可能性，并提出了一种新的范式，能够对这种场景进行建模，并有利于属于不同物联网的对象之间的合作。此外，为了对这种新范式的潜力和复杂性有一个概念，我们更详细地说明了关于它的一个最有趣的问题，即对中间性中心性度量的重新定义。

引用次数: 11

Top-k Query Processing over Distributed Sensitive Data 分布式敏感数据Top-k查询处理

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216153

S. Mahboubi, Reza Akbarinia, P. Valduriez

Distributed systems provide users with powerful capabilities to store and process their data in third-party machines. However, the privacy of the outsourced data is not guaranteed. One solution for protecting the user data against privacy attacks is to encrypt the sensitive data before sending to the nodes of the distributed system. Then, the main problem is to evaluate user queries over the encrypted data. In this paper, we propose a complete solution for processing top-k queries over encrypted databases stored across the nodes of a distributed system. The problem of distributed top-k query processing has been well addressed over plaintext (non encrypted) data. However, the proposed approaches cannot be used in the case of encrypted data.

分布式系统为用户提供了在第三方机器上存储和处理数据的强大功能。但是，外包数据的隐私性得不到保证。保护用户数据免受隐私攻击的一种解决方案是在将敏感数据发送到分布式系统的节点之前对其进行加密。然后，主要问题是评估用户对加密数据的查询。在本文中，我们提出了一个完整的解决方案，用于处理存储在分布式系统节点上的加密数据库上的top-k查询。分布式top-k查询处理的问题已经在明文(非加密)数据上得到了很好的解决。然而，所提出的方法不能用于加密数据的情况。

引用次数: 4

CELPB: A Cache Invalidation Policy for Location Dependent Data in Mobile Environment CELPB:移动环境中位置依赖数据的缓存失效策略

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216147

Ajay K. Gupta, Udai Shanker

Location dependent information services (LDIS) can be characterized as the applications that coordinate a cell phone's area or position with other data to give enhanced value of services to the client at right place in the right time from anywhere. In this paper, an algorithm Caching Efficiency with Next Location Prediction Based (CELPB) has been developed that uses a newly developed metric i.e. caching efficiency with next location prediction (CELP) for the computation of valid scope in prediction interval. This metric takes account the future movement behavior of client with the help of Sequential Pattern Mining and Clustering. The mobility rules have also been framed for the prediction of an accurate next location, which can be used in estimating the future movement path (edges) of client if he reached in valid scope area of any data item. Simulation results show that proposed policy achieves up to 10 percent performance improvement compared to earlier cache invalidation policy (CEBAB) for LDIS.

位置相关信息服务(LDIS)可以被描述为一种应用程序，它将移动电话的区域或位置与其他数据进行协调，以便在正确的时间、正确的地点从任何地方为客户提供更高的服务价值。本文提出了一种基于下一位置预测的缓存效率(CELPB)算法，该算法采用新提出的下一位置预测的缓存效率(CELP)来计算预测区间内的有效范围。该度量在序列模式挖掘和聚类的帮助下考虑了客户端的未来移动行为。移动规则也被框架用于预测准确的下一个位置，这可以用于估计客户端的未来移动路径(边缘)，如果他到达任何数据项的有效范围区域。仿真结果表明，与早期用于LDIS的缓存失效策略(CEBAB)相比，所提出的策略的性能提高了10%。

引用次数: 8

Feature Reduction Improves Classification Accuracy in Healthcare 特征缩减可提高医疗保健领域的分类准确性

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216165

Maha Asiri, Hamid R. Nemati, F. Sadri

Our work focuses on inductive transfer learning, a setting in which one assumes that both source and target tasks share the same features and label spaces. We demonstrate that transfer learning can be successfully used for feature reduction and hence for more efficient classification performance. Further, our experiments show that this approach increases the precision of the classification task as well.

我们的工作重点是归纳迁移学习，即假设源任务和目标任务共享相同的特征和标签空间。我们证明了迁移学习可以成功地用于特征约简，从而获得更有效的分类性能。此外，我们的实验表明，这种方法也提高了分类任务的精度。

引用次数: 1

A useful four-valued database logic 一个有用的四值数据库逻辑

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216157

G. Grahne, A. Moallemi

Recently there has been an effort to solve the problems caused by the infamous NULL in relational databases, by systematically applying Kleene's three-valued logic to SQL. The third truth-value is unknown. In this paper we show that by using a fourth truth-value inconsistent, all the advantages of the three-valued approach can be retained, and that negation can be given a constructive, intuitionistic meaning that allows negative knowledge to be specified in the logic explicitly, without having to resort to extra-logical notions of stratification or to non-monotonic reasoning. The four-valued approach also allows for a computationally efficient treatment of query answering in the presence of inconsistencies. This is in contrast to the computationally intractable repair approach to inconsistency management. From a practical view-point we show that the Cylindric Star Algebra, developed by the authors, is particularly well suited for evaluating First Order queries on four-valued databases, and that the framework of data exchange can smoothly adapted to the four truth-values.

最近有一种努力，通过系统地将Kleene的三值逻辑应用于SQL，来解决关系数据库中臭名昭著的NULL所引起的问题。第三个真值是未知的。在本文中，我们证明了通过使用第四个真值不一致，可以保留三值方法的所有优点，并且否定可以被赋予建设性的，直观的意义，允许在逻辑中明确指定否定知识，而不必诉诸于分层的逻辑外概念或非单调推理。四值方法还允许在存在不一致的情况下对查询回答进行计算效率高的处理。这与不一致管理的难以计算的修复方法形成对比。从实际应用的角度来看，本文提出的圆柱星形代数特别适合于求解四值数据库上的一阶查询，并且数据交换的框架可以很好地适应这四个真值。

引用次数: 2

Twitter-based Influenza Surveillance: An Analysis of the 2016-2017 and 2017-2018 Seasons in Italy 基于twitter的流感监测:意大利2016-2017和2017-2018季节分析

Proceedings of the 22nd International Database Engineering & Applications Symposium

Pub Date : 2018-06-18 DOI: 10.1145/3216122.3216128

C. Comito, Agostino Forestiero, C. Pizzuti

Influenza surveillance through social media data is becoming an important research topic because it could enhance the capabilities of official surveillance systems in monitoring the outbreak of seasonal flu, by providing healthcare organization with improved situational awareness. In this paper, the two influenza seasons 2016-2017 and 2017-2018, restricted to Italy, are investigated by analyzing the tweets posted by users regarding influenza-like illness. Two types of analysis are performed. The first studies the correlation between the tweets containing the most frequent flu related words with the data provided by the Italian InfluNet surveillance system. The second one examines the sentiment of people on the medicines used to heal flu. We show that there is a strict correlation between the reports published on the InfluNet system, and the contents posted by Twitter users about their symptoms and health state. Moreover, we found that the sentiment expressed by people regarding the treatment, in terms of medicines, taken to heal seems rather negative.

通过社交媒体数据进行流感监测正在成为一个重要的研究课题，因为它可以提高官方监测系统监测季节性流感爆发的能力，为卫生保健组织提供更好的态势感知。本文通过分析用户发布的关于流感样疾病的推文，对2016-2017和2017-2018两个流感季节进行调查，仅限于意大利。执行两种类型的分析。第一项研究研究了包含最常见流感相关词汇的推文与意大利流感监测系统提供的数据之间的相关性。第二篇调查了人们对治疗流感的药物的看法。我们表明，在influunet系统上发布的报告与Twitter用户发布的有关其症状和健康状态的内容之间存在严格的相关性。此外，我们发现，人们对治疗的看法，就药物而言，似乎相当消极。

引用次数: 8

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 22nd International Database Engineering & Applications Symposium

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀