首页 > 最新文献

Advances in database technology : proceedings. International Conference on Extending Database Technology最新文献

英文 中文
Topio Marketplace: Search and Discovery of Geospatial Data Topio市场:地理空间数据的搜索和发现
Andra Ionescu, A. Alexandridou, Leonidas Ikonomou, Kyriakos Psarakis, Kostas Patroumpas, Georgios Chatzigeorgakidis, Dimitrios Skoutas, Spiros Athanasiou, Rihan Hai, Asterios Katsifodimos
The increasing need for data trading has created a high demand for data marketplaces. These marketplaces require a set of value-added services, such as advanced search and discovery, that have been proposed in the database research community for years, but are yet to be put to practice. In this paper we propose to demonstrate the Topio Marketplace, an open-source data market platform that facilitates the search, exploration, discovery and augmentation of data assets. To support filtering, searching and discovery of data assets, we developed methods to extract and visualise a variety of metadata, as well as methods to discover related assets and mechanism to augment them. This paper aims at presenting these methods with a real deployment of the Topio marketplace, comprising hundreds of open and proprietary datasets.
对数据交易日益增长的需求对数据市场产生了很高的需求。这些市场需要一系列增值服务,例如高级搜索和发现,这些服务在数据库研究社区中已经提出多年,但尚未付诸实践。在本文中,我们提议展示Topio市场,这是一个开源数据市场平台,可以促进数据资产的搜索、探索、发现和扩展。为了支持数据资产的过滤、搜索和发现,我们开发了提取和可视化各种元数据的方法,以及发现相关资产的方法和增强它们的机制。本文旨在通过Topio市场的实际部署来展示这些方法,该市场包含数百个开放和专有数据集。
{"title":"Topio Marketplace: Search and Discovery of Geospatial Data","authors":"Andra Ionescu, A. Alexandridou, Leonidas Ikonomou, Kyriakos Psarakis, Kostas Patroumpas, Georgios Chatzigeorgakidis, Dimitrios Skoutas, Spiros Athanasiou, Rihan Hai, Asterios Katsifodimos","doi":"10.48786/edbt.2023.73","DOIUrl":"https://doi.org/10.48786/edbt.2023.73","url":null,"abstract":"The increasing need for data trading has created a high demand for data marketplaces. These marketplaces require a set of value-added services, such as advanced search and discovery, that have been proposed in the database research community for years, but are yet to be put to practice. In this paper we propose to demonstrate the Topio Marketplace, an open-source data market platform that facilitates the search, exploration, discovery and augmentation of data assets. To support filtering, searching and discovery of data assets, we developed methods to extract and visualise a variety of metadata, as well as methods to discover related assets and mechanism to augment them. This paper aims at presenting these methods with a real deployment of the Topio marketplace, comprising hundreds of open and proprietary datasets.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"114 1","pages":"819-822"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77595192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FLIRT: A Fast Learned Index for Rolling Time frames 调情:滚动时间框架的快速学习索引
Guang Yang, Liang Liang, A. Hadian, T. Heinis
Efficiently managing and querying sliding windows is a key com-ponent in stream processing systems. Conventional index structures such as the B+Tree are not efficient for handling a stream of time-series data, where the data is very dynamic, and the indexes must be updated on a continuous basis. Stream processing structures such as queues can accommodate large volumes of updates (enqueue and dequeue); however, they are not efficient for fast retrieval. This paper proposes FLIRT, a parameter-free index structure that manages a sliding window over a high-velocity stream of data and simultaneously supports efficient range queries on the sliding window. FLIRT uses learned indexing to reduce the lookup time. This is enabled by organising the incoming stream of time-series data into linearly predictable segments, allowing fast queue operations such as enqueue, dequeue, and search. We further boost the search performance by introducing two multithreaded versions of FLIRT for different query workloads. Experimental results show up to 7 × speedup over conventional indexes, 8 × speedup over queues, and up to 109 × speedup over learned indexes.
有效地管理和查询滑动窗口是流处理系统的关键组成部分。传统的索引结构(如B+Tree)对于处理时间序列数据流来说效率不高,因为数据是非常动态的,索引必须连续更新。流处理结构,如队列,可以容纳大量的更新(enqueue和dequeue);然而,对于快速检索来说,它们的效率不高。本文提出了一种无参数索引结构FLIRT,它可以管理高速数据流上的滑动窗口,同时支持对滑动窗口的有效范围查询。FLIRT使用学习索引来减少查找时间。这是通过将传入的时间序列数据流组织成线性可预测的段来实现的,允许快速队列操作,如排队、脱队列和搜索。我们通过为不同的查询工作负载引入两个多线程版本的FLIRT来进一步提高搜索性能。实验结果表明,与传统索引相比,该方法的速度提高了7倍,与队列相比,速度提高了8倍,与学习索引相比,速度提高了109倍。
{"title":"FLIRT: A Fast Learned Index for Rolling Time frames","authors":"Guang Yang, Liang Liang, A. Hadian, T. Heinis","doi":"10.48786/edbt.2023.19","DOIUrl":"https://doi.org/10.48786/edbt.2023.19","url":null,"abstract":"Efficiently managing and querying sliding windows is a key com-ponent in stream processing systems. Conventional index structures such as the B+Tree are not efficient for handling a stream of time-series data, where the data is very dynamic, and the indexes must be updated on a continuous basis. Stream processing structures such as queues can accommodate large volumes of updates (enqueue and dequeue); however, they are not efficient for fast retrieval. This paper proposes FLIRT, a parameter-free index structure that manages a sliding window over a high-velocity stream of data and simultaneously supports efficient range queries on the sliding window. FLIRT uses learned indexing to reduce the lookup time. This is enabled by organising the incoming stream of time-series data into linearly predictable segments, allowing fast queue operations such as enqueue, dequeue, and search. We further boost the search performance by introducing two multithreaded versions of FLIRT for different query workloads. Experimental results show up to 7 × speedup over conventional indexes, 8 × speedup over queues, and up to 109 × speedup over learned indexes.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"39 1","pages":"234-246"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85503539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Smart Derivative Contracts in DatalogMTL DatalogMTL中的智能衍生品合约
Andrea Colombo, Luigi Bellomarini, S. Ceri, Eleonora Laurenza
{"title":"Smart Derivative Contracts in DatalogMTL","authors":"Andrea Colombo, Luigi Bellomarini, S. Ceri, Eleonora Laurenza","doi":"10.48786/edbt.2023.65","DOIUrl":"https://doi.org/10.48786/edbt.2023.65","url":null,"abstract":"","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"1 1","pages":"773-781"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89342053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GAM Forest Explanation GAM森林解说
C. Lucchese, S. Orlando, R. Perego, Alberto Veneri
Most accurate machine learning models unfortunately produce black-box predictions, for which it is impossible to grasp the internal logic that leads to a specific decision. Unfolding the logic of such black-box models is of increasing importance, especially when they are used in sensitive decision-making processes. In this work we focus on forests of decision trees, which may include hundreds to thousands of decision trees to produce accurate predictions. Such complexity raises the need of developing explanations for the predictions generated by large forests. We propose a post hoc explanation method of large forests, named GAM-based Explanation of Forests (GEF), which builds a Generalized Additive Model (GAM) able to explain, both locally and globally, the impact on the predictions of a limited set of features and feature interactions. We evaluate GEF over both synthetic and real-world datasets and show that GEF can create a GAM model with high fidelity by analyzing the given forest only and without using any further information, not even the initial training dataset.
不幸的是,大多数准确的机器学习模型都会产生黑箱预测,因此不可能掌握导致特定决策的内部逻辑。揭示这种黑箱模型的逻辑变得越来越重要,特别是当它们被用于敏感的决策过程时。在这项工作中,我们关注决策树的森林,它可能包括数百到数千个决策树来产生准确的预测。这种复杂性提出了对大型森林产生的预测作出解释的需要。我们提出了一种大型森林的事后解释方法,称为基于GAM的森林解释(GEF),它建立了一个广义加性模型(GAM),能够在局部和全局上解释有限特征集和特征相互作用对预测的影响。我们在合成数据集和真实数据集上评估了GEF,并表明GEF可以通过仅分析给定森林而不使用任何进一步的信息(甚至不使用初始训练数据集)来创建高保真度的GAM模型。
{"title":"GAM Forest Explanation","authors":"C. Lucchese, S. Orlando, R. Perego, Alberto Veneri","doi":"10.48786/edbt.2023.14","DOIUrl":"https://doi.org/10.48786/edbt.2023.14","url":null,"abstract":"Most accurate machine learning models unfortunately produce black-box predictions, for which it is impossible to grasp the internal logic that leads to a specific decision. Unfolding the logic of such black-box models is of increasing importance, especially when they are used in sensitive decision-making processes. In this work we focus on forests of decision trees, which may include hundreds to thousands of decision trees to produce accurate predictions. Such complexity raises the need of developing explanations for the predictions generated by large forests. We propose a post hoc explanation method of large forests, named GAM-based Explanation of Forests (GEF), which builds a Generalized Additive Model (GAM) able to explain, both locally and globally, the impact on the predictions of a limited set of features and feature interactions. We evaluate GEF over both synthetic and real-world datasets and show that GEF can create a GAM model with high fidelity by analyzing the given forest only and without using any further information, not even the initial training dataset.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"35 1","pages":"171-182"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80735581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast and Efficient Update Handling for Graph H2TAP 快速和有效的更新处理图H2TAP
M. Jibril, Hani Al-Sayeh, Alexander Baumstark, K. Sattler
Offloading graph analytics to GPU yields significant performance speedups. In heterogeneous hybrid transactional/analytical graph processing (graph H 2 TAP), where each graph workload type is executed on the most suitable processor, transactions are executed on a CPU-based main graph and analytics are executed on a GPU-optimized graph replica. The problem that arises, as a result, is that updates by transactions on the main graph have to be particularly handled with respect to the graph replica. In this paper, we present a fast and efficient approach to this update handling problem, based on a delta store optimized for graphs. The delta store is a differential graph store that captures the transactional updates, which are later propagated to the graph replica so that analytical queries are executed on the most recently committed version of the graph in accordance with freshness requirements. Our approach ensures consistency be-tween the main graph and the replica. Our evaluation shows the performance advantage of our approach over existing HTAP approaches.
将图形分析卸载到GPU可以显著提高性能。在异构混合事务/分析图形处理(graph h2 TAP)中,每种图形工作负载类型都在最合适的处理器上执行,事务在基于cpu的主图形上执行,分析在gpu优化的图形副本上执行。因此,出现的问题是,主图上的事务更新必须针对图副本进行特别处理。在本文中,我们提出了一种快速有效的方法来解决此更新处理问题,该方法基于针对图形优化的增量存储。增量存储是捕获事务性更新的差异图存储,稍后将这些更新传播到图副本,以便根据新鲜度要求在最近提交的图版本上执行分析查询。我们的方法确保了主图和副本之间的一致性。我们的评估表明,我们的方法比现有的HTAP方法具有性能优势。
{"title":"Fast and Efficient Update Handling for Graph H2TAP","authors":"M. Jibril, Hani Al-Sayeh, Alexander Baumstark, K. Sattler","doi":"10.48786/edbt.2023.60","DOIUrl":"https://doi.org/10.48786/edbt.2023.60","url":null,"abstract":"Offloading graph analytics to GPU yields significant performance speedups. In heterogeneous hybrid transactional/analytical graph processing (graph H 2 TAP), where each graph workload type is executed on the most suitable processor, transactions are executed on a CPU-based main graph and analytics are executed on a GPU-optimized graph replica. The problem that arises, as a result, is that updates by transactions on the main graph have to be particularly handled with respect to the graph replica. In this paper, we present a fast and efficient approach to this update handling problem, based on a delta store optimized for graphs. The delta store is a differential graph store that captures the transactional updates, which are later propagated to the graph replica so that analytical queries are executed on the most recently committed version of the graph in accordance with freshness requirements. Our approach ensures consistency be-tween the main graph and the replica. Our evaluation shows the performance advantage of our approach over existing HTAP approaches.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"38 1","pages":"723-736"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75037683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EGG-SynC: Exact GPU-parallelized Grid-based Clustering by Synchronization EGG-SynC:基于同步的精确gpu并行网格聚类
Jakob Rødsgaard Jørgensen, I. Assent
Clustering by synchronization (SynC) is a clustering method that is motivated by the natural phenomena of synchronization and is based on the Kuramoto model. The idea is to iteratively drag similar objects closer to each other until they have synchronized. SynC has been adapted to solve several well-known data mining tasks such as subspace clustering, hierarchical clustering, and streaming clustering. This shows that the SynC model is very versatile. Sadly, SynC has an 𝑂 ( 𝑇 × 𝑛 2 × 𝑑 ) complexity, which makes it impractical for larger datasets. E.g., Chen et al. [8] show runtimes of more than 10 hours for just 𝑛 = 70 , 000 data points, but improve this to just above one hour by using R-Trees in their method FSynC. Both are still impractical in real-life scenarios. Furthermore, SynC uses a termination criterion that brings no guarantees that the points have synchronized but instead just stops when most points are close to synchronizing. In this paper, our contributions are manifold. We propose a new termination criterion that guarantees that all points have synchronized. To achieve a much-needed reduction in runtime, we propose a strategy to summarize partitions of the data into a grid structure, a GPU-friendly grid structure to support this and neighborhood queries, and a GPU-parallelized algorithm for clustering by synchronization (EGG-SynC) that utilize these ideas. Furthermore, we provide an extensive evaluation against state-of-the-art showing 2 to 3 orders of magnitude speedup compared to SynC and FSynC.
同步聚类(SynC)是一种基于Kuramoto模型的基于同步自然现象的聚类方法。其思想是迭代地拖动相似的对象彼此靠近,直到它们同步。SynC已被用于解决几个众所周知的数据挖掘任务,如子空间聚类、分层聚类和流聚类。这表明SynC模型是非常通用的。遗憾的是,SynC具有𝑂(𝑇×𝑛2 ×𝑑)的复杂性,这使得它不适合大型数据集。例如,Chen等人的[8]显示,对于𝑛= 70,000个数据点,运行时间超过10小时,但通过在他们的方法FSynC中使用R-Trees,将其改善到略高于1小时。这两种方法在现实生活中仍然不切实际。此外,SynC使用的终止条件不能保证点已经同步,而是在大多数点接近同步时停止。在本文中,我们的贡献是多方面的。我们提出了一个新的终止准则,保证所有的点已经同步。为了减少运行时间,我们提出了一种策略,将数据的分区总结为网格结构,一种gpu友好的网格结构来支持此查询和邻域查询,以及一种利用这些思想的gpu并行化算法(EGG-SynC)进行同步聚类。此外,我们提供了一个广泛的评估,对最先进的显示2到3个数量级的加速相比,同步和FSynC。
{"title":"EGG-SynC: Exact GPU-parallelized Grid-based Clustering by Synchronization","authors":"Jakob Rødsgaard Jørgensen, I. Assent","doi":"10.48786/edbt.2023.16","DOIUrl":"https://doi.org/10.48786/edbt.2023.16","url":null,"abstract":"Clustering by synchronization (SynC) is a clustering method that is motivated by the natural phenomena of synchronization and is based on the Kuramoto model. The idea is to iteratively drag similar objects closer to each other until they have synchronized. SynC has been adapted to solve several well-known data mining tasks such as subspace clustering, hierarchical clustering, and streaming clustering. This shows that the SynC model is very versatile. Sadly, SynC has an 𝑂 ( 𝑇 × 𝑛 2 × 𝑑 ) complexity, which makes it impractical for larger datasets. E.g., Chen et al. [8] show runtimes of more than 10 hours for just 𝑛 = 70 , 000 data points, but improve this to just above one hour by using R-Trees in their method FSynC. Both are still impractical in real-life scenarios. Furthermore, SynC uses a termination criterion that brings no guarantees that the points have synchronized but instead just stops when most points are close to synchronizing. In this paper, our contributions are manifold. We propose a new termination criterion that guarantees that all points have synchronized. To achieve a much-needed reduction in runtime, we propose a strategy to summarize partitions of the data into a grid structure, a GPU-friendly grid structure to support this and neighborhood queries, and a GPU-parallelized algorithm for clustering by synchronization (EGG-SynC) that utilize these ideas. Furthermore, we provide an extensive evaluation against state-of-the-art showing 2 to 3 orders of magnitude speedup compared to SynC and FSynC.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"27 1","pages":"195-207"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75089729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Desis: Efficient Window Aggregation in Decentralized Networks 分布式网络中的高效窗口聚合
W. Yue, Lawrence Benson, T. Rabl
Stream processing is widely applied in industry as well as in research to process unbounded data streams. In many use cases, specific data streams are processed by multiple continuous queries. Current systems group events of an unbounded data stream into bounded windows to produce results of individual queries in a timely fashion. For multiple concurrent queries, multiple concurrent and usually overlapping windows are generated. To reduce redundant computations and share partial results, state-of-the-art solutions divide windows into slices and then share the results of those slices. However, this is only applicable for queries with the same aggregation function and window measure, as in the case of overlaps for sliding windows. For multiple queries on the same stream with different aggregation functions and window measures, partial results cannot be shared. Furthermore, data streams are produced from devices that are distributed in large decentralized networks. Current systems cannot process queries on decentralized data streams efficiently. All queries in a decentralized network are either computed centrally or processed individually without exploiting partial results across queries. We present Desis, a stream processing system that can efficiently process multiple stream aggregation queries. We propose an aggregation engine that can share partial results between multiple queries with different window types, measures, and aggregation functions. In decentralized networks, Desis moves computation to data sources and shares overlapping computation as early as possible between queries. Desis outperforms existing solutions by orders of magnitude in throughput when processing multiple queries and can scale to millions of queries. In a decentralized setup, Desis can save up to 99% of network traffic and scale performance linearly.
流处理广泛应用于工业和研究中,以处理无界数据流。在许多用例中,特定的数据流由多个连续查询处理。当前系统将无界数据流的事件分组到有界窗口中,以便及时生成单个查询的结果。对于多个并发查询,将生成多个并发且通常重叠的窗口。为了减少冗余计算并共享部分结果,最先进的解决方案将窗口划分为片,然后共享这些片的结果。但是,这只适用于具有相同聚合函数和窗口度量的查询,就像滑动窗口重叠的情况一样。对于具有不同聚合函数和窗口度量的同一流上的多个查询,部分结果不能共享。此外,数据流由分布在大型分散网络中的设备产生。当前的系统无法有效地处理分散数据流上的查询。去中心化网络中的所有查询要么集中计算,要么单独处理,而不会跨查询利用部分结果。提出了一种能够有效处理多个流聚合查询的流处理系统Desis。我们提出了一个聚合引擎,它可以在具有不同窗口类型、度量和聚合函数的多个查询之间共享部分结果。在分散式网络中,Desis将计算转移到数据源,并在查询之间尽早共享重叠计算。在处理多个查询时,Desis的吞吐量比现有解决方案高出几个数量级,并且可以扩展到数百万个查询。在分散式设置中,Desis可以节省高达99%的网络流量并线性扩展性能。
{"title":"Desis: Efficient Window Aggregation in Decentralized Networks","authors":"W. Yue, Lawrence Benson, T. Rabl","doi":"10.48786/edbt.2023.52","DOIUrl":"https://doi.org/10.48786/edbt.2023.52","url":null,"abstract":"Stream processing is widely applied in industry as well as in research to process unbounded data streams. In many use cases, specific data streams are processed by multiple continuous queries. Current systems group events of an unbounded data stream into bounded windows to produce results of individual queries in a timely fashion. For multiple concurrent queries, multiple concurrent and usually overlapping windows are generated. To reduce redundant computations and share partial results, state-of-the-art solutions divide windows into slices and then share the results of those slices. However, this is only applicable for queries with the same aggregation function and window measure, as in the case of overlaps for sliding windows. For multiple queries on the same stream with different aggregation functions and window measures, partial results cannot be shared. Furthermore, data streams are produced from devices that are distributed in large decentralized networks. Current systems cannot process queries on decentralized data streams efficiently. All queries in a decentralized network are either computed centrally or processed individually without exploiting partial results across queries. We present Desis, a stream processing system that can efficiently process multiple stream aggregation queries. We propose an aggregation engine that can share partial results between multiple queries with different window types, measures, and aggregation functions. In decentralized networks, Desis moves computation to data sources and shares overlapping computation as early as possible between queries. Desis outperforms existing solutions by orders of magnitude in throughput when processing multiple queries and can scale to millions of queries. In a decentralized setup, Desis can save up to 99% of network traffic and scale performance linearly.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"2 1","pages":"618-631"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78974443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demonstrating Interactive SPARQL Formulation through Positive and Negative Examples and Feedback 通过正负例子和反馈演示交互式SPARQL公式
Akritas Akritidis, Yannis Tzitzikas
The formulation of structured queries in Knowledge Graphs is a challenging task since it presupposes familiarity with the syntax of the query language and the contents of the knowledge graph. To alleviate this problem, for enabling plain users to formulate SPARQL queries, and advanced users to formulate queries with less effort, in this paper we introduce a novel method for “SPARQL by Example". According to this method the user points to positive/negative entities, the system computes one query that describes these entities, and then the user refines the query interactively by providing positive/negative feedback on entities and suggested constraints. We shall demonstrate SPARQL-QBE , a tool that implements this approach, and we will briefly refer to the results of a task-based evaluation with users that provided positive evidence about the usability of the approach.
知识图中结构化查询的表述是一项具有挑战性的任务,因为它以熟悉查询语言的语法和知识图的内容为前提。为了缓解这一问题,使普通用户能够更轻松地制定SPARQL查询,而高级用户也能够更轻松地制定查询,本文介绍了一种“SPARQL by Example”的新方法。根据该方法,用户指向正/负实体,系统计算一个描述这些实体的查询,然后用户通过对实体和建议约束提供正/负反馈来交互式地改进查询。我们将演示SPARQL-QBE,这是一个实现此方法的工具,我们将简要地引用基于任务的用户评估的结果,该结果提供了关于该方法可用性的积极证据。
{"title":"Demonstrating Interactive SPARQL Formulation through Positive and Negative Examples and Feedback","authors":"Akritas Akritidis, Yannis Tzitzikas","doi":"10.48786/edbt.2023.71","DOIUrl":"https://doi.org/10.48786/edbt.2023.71","url":null,"abstract":"The formulation of structured queries in Knowledge Graphs is a challenging task since it presupposes familiarity with the syntax of the query language and the contents of the knowledge graph. To alleviate this problem, for enabling plain users to formulate SPARQL queries, and advanced users to formulate queries with less effort, in this paper we introduce a novel method for “SPARQL by Example\". According to this method the user points to positive/negative entities, the system computes one query that describes these entities, and then the user refines the query interactively by providing positive/negative feedback on entities and suggested constraints. We shall demonstrate SPARQL-QBE , a tool that implements this approach, and we will briefly refer to the results of a task-based evaluation with users that provided positive evidence about the usability of the approach.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"49 1","pages":"811-814"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86139878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning over Sets for Databases 学习数据库的集合
Angjela Davitkova, Damjan Gjurovski, S. Michel
In this work, we consider using deep learning models over a collection of sets to replace traditional approaches utilized in database systems. We propose solutions for data indexing, membership queries, and cardinality estimation. Unlike relational data, learned models over sets need to be permutation invariant and able to deal with variable set sizes. The proposed models are based on the DeepSets architecture and include per-element compression to achieve acceptable accuracy with modest model sizes. We further suggest a hybrid structure with bounded error guarantees using guided learning to mitigate the inherent challenges when working with set data. We outline challenges and opportunities when dealing with set data and demonstrate the suitability of the models through extensive experimental evaluation with one synthetic and two real-world datasets.
在这项工作中,我们考虑在一组集合上使用深度学习模型来取代数据库系统中使用的传统方法。我们提出了数据索引、成员查询和基数估计的解决方案。与关系数据不同,集合上的学习模型需要是排列不变的,并且能够处理可变的集合大小。提出的模型基于DeepSets架构,并包括每个元素的压缩,以在适度的模型尺寸下达到可接受的精度。我们进一步建议使用有界误差保证的混合结构,使用引导学习来减轻处理集合数据时的固有挑战。在处理数据集时,我们概述了挑战和机遇,并通过一个合成数据集和两个真实数据集的广泛实验评估来证明模型的适用性。
{"title":"Learning over Sets for Databases","authors":"Angjela Davitkova, Damjan Gjurovski, S. Michel","doi":"10.48786/edbt.2024.07","DOIUrl":"https://doi.org/10.48786/edbt.2024.07","url":null,"abstract":"In this work, we consider using deep learning models over a collection of sets to replace traditional approaches utilized in database systems. We propose solutions for data indexing, membership queries, and cardinality estimation. Unlike relational data, learned models over sets need to be permutation invariant and able to deal with variable set sizes. The proposed models are based on the DeepSets architecture and include per-element compression to achieve acceptable accuracy with modest model sizes. We further suggest a hybrid structure with bounded error guarantees using guided learning to mitigate the inherent challenges when working with set data. We outline challenges and opportunities when dealing with set data and demonstrate the suitability of the models through extensive experimental evaluation with one synthetic and two real-world datasets.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"45 1","pages":"68-80"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72821433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patched Multi-Key Partitioning for Robust Query Performance 补丁多键分区鲁棒查询性能
Steffen Kläbe, K. Sattler
Data partitioning is the key for parallel query processing in modern analytical database systems. Choosing the right partitioning key for a given dataset is a difficult task and crucial for query performance. Real world data warehouses contain a large amount of tables connected in complex schemes resulting in an over-whelming amount of partition key candidates. In this paper, we present the approach of patched multi-key partitioning, allowing to define multiple partition keys simultaneously without data replication. The key idea is to map the relational table partitioning problem to a graph partition problem in order to use existing graph partitioning algorithms to find connectivity components in the data and maintain exceptions (patches) to the partitioning separately. We show that patched multi-key partitioning offer opportunities for achieving robust query performance, i.e. reaching reasonably good performance for many queries instead of optimal performance for only a few queries.
在现代分析数据库系统中,数据分区是并行查询处理的关键。为给定的数据集选择正确的分区键是一项困难的任务,对查询性能至关重要。现实世界的数据仓库包含大量以复杂模式连接的表,从而导致大量的分区键候选。在本文中,我们提出了修补多键分区的方法,允许同时定义多个分区键,而不需要数据复制。其关键思想是将关系表分区问题映射为图分区问题,以便使用现有的图分区算法在数据中查找连接性组件并分别维护分区的异常(补丁)。我们表明,修补的多键分区为实现健壮的查询性能提供了机会,即为许多查询达到相当好的性能,而不是仅为少数查询达到最佳性能。
{"title":"Patched Multi-Key Partitioning for Robust Query Performance","authors":"Steffen Kläbe, K. Sattler","doi":"10.48786/edbt.2023.26","DOIUrl":"https://doi.org/10.48786/edbt.2023.26","url":null,"abstract":"Data partitioning is the key for parallel query processing in modern analytical database systems. Choosing the right partitioning key for a given dataset is a difficult task and crucial for query performance. Real world data warehouses contain a large amount of tables connected in complex schemes resulting in an over-whelming amount of partition key candidates. In this paper, we present the approach of patched multi-key partitioning, allowing to define multiple partition keys simultaneously without data replication. The key idea is to map the relational table partitioning problem to a graph partition problem in order to use existing graph partitioning algorithms to find connectivity components in the data and maintain exceptions (patches) to the partitioning separately. We show that patched multi-key partitioning offer opportunities for achieving robust query performance, i.e. reaching reasonably good performance for many queries instead of optimal performance for only a few queries.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"9 1","pages":"324-336"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74353789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Advances in database technology : proceedings. International Conference on Extending Database Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1