Graph databases are increasingly receiving attention from industry and academia, due in part to their flexibility; a schema is often not required. However, schemas can significantly benefit query optimization, data integrity, and documentation. There currently does not exist a formal framework which captures the design space of state-of-the-art schema solutions. We present a formal design framework for property graph schema languages based on first-order logic rules, which balances expressivity and practicality. We show how this framework can be adapted to integrate a core set of constraints common in conceptual data modeling methods. To demonstrate practical feasibility, this model is imple-mented using graph queries for modern graph database systems, which we evaluate through a controlled experiment. We find that validation time scales linearly with the size of the data, while only using unoptimized straightforward implementations.
{"title":"A Formal Design Framework for Practical Property Graph Schema Languages","authors":"Nimo Beeren, G. Fletcher","doi":"10.48786/edbt.2023.40","DOIUrl":"https://doi.org/10.48786/edbt.2023.40","url":null,"abstract":"Graph databases are increasingly receiving attention from industry and academia, due in part to their flexibility; a schema is often not required. However, schemas can significantly benefit query optimization, data integrity, and documentation. There currently does not exist a formal framework which captures the design space of state-of-the-art schema solutions. We present a formal design framework for property graph schema languages based on first-order logic rules, which balances expressivity and practicality. We show how this framework can be adapted to integrate a core set of constraints common in conceptual data modeling methods. To demonstrate practical feasibility, this model is imple-mented using graph queries for modern graph database systems, which we evaluate through a controlled experiment. We find that validation time scales linearly with the size of the data, while only using unoptimized straightforward implementations.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"19 1","pages":"478-484"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81536390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data narration is the process of telling stories with insights ex-tracted from data. It is an instance of data science [4] where the pipeline focuses on data collection and exploration, answering questions, structuring answers, and finally presenting them to stakeholders [16, 17]. This tutorial reviews the challenges and opportunities of the full and semi-automation of these steps. In doing so, it draws from the extensive literature in data narration, data exploration and data visualization. In particular, we point out key theoretical and practical contributions in each domain such as next-step recommendation and policy learning for data exploration, insight interestingness and evaluation frameworks, and the crafting of data stories for the people who will exploit them. We also identify topics that are still worth investigating, such as the inclusion of different stakeholders’ profiles in designing data pipelines with the goal of providing data narration for all.
{"title":"Data Narration for the People: Challenges and Opportunities","authors":"S. Amer-Yahia, Patrick Marcel, Verónika Peralta","doi":"10.48786/edbt.2023.82","DOIUrl":"https://doi.org/10.48786/edbt.2023.82","url":null,"abstract":"Data narration is the process of telling stories with insights ex-tracted from data. It is an instance of data science [4] where the pipeline focuses on data collection and exploration, answering questions, structuring answers, and finally presenting them to stakeholders [16, 17]. This tutorial reviews the challenges and opportunities of the full and semi-automation of these steps. In doing so, it draws from the extensive literature in data narration, data exploration and data visualization. In particular, we point out key theoretical and practical contributions in each domain such as next-step recommendation and policy learning for data exploration, insight interestingness and evaluation frameworks, and the crafting of data stories for the people who will exploit them. We also identify topics that are still worth investigating, such as the inclusion of different stakeholders’ profiles in designing data pipelines with the goal of providing data narration for all.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"56 1","pages":"855-858"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84774846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The formulation of structured queries in knowledge graphs is a challenging task that presupposes familiarity with the syntax of the query language and the contents of the knowledge graph. To alleviate this difficulty in this paper we introduce RDF-ANALYTICS , a novel system that enables plain users to formulate analytic queries over complex, i.e. not necessarily star-schema based, RDF knowledge graphs. To come up with an intuitive interface, we leverage the familiarity of users with Faceted Search (FS) systems, i.e. we extend FS with actions that enable users to formulate analytic queries, too. Distinctive characteristics of the approach is the ability to include arbitrarily long paths in the analytic query (accompanied with count information), interactive formulation of HAVING restrictions, the support of both Faceted Search (i.e. the locating of the desired resources in a faceted search manner) and analytic queries, and the ability to formulate nested analytic queries. Finally, we present the results of a preliminary task-based evaluation with users, which are very promising.
{"title":"RDF-Analytics: Interactive Analytics over RDF Knowledge Graphs","authors":"Maria-Evangelia Papadaki, Yannis Tzitzikas","doi":"10.48786/edbt.2023.70","DOIUrl":"https://doi.org/10.48786/edbt.2023.70","url":null,"abstract":"The formulation of structured queries in knowledge graphs is a challenging task that presupposes familiarity with the syntax of the query language and the contents of the knowledge graph. To alleviate this difficulty in this paper we introduce RDF-ANALYTICS , a novel system that enables plain users to formulate analytic queries over complex, i.e. not necessarily star-schema based, RDF knowledge graphs. To come up with an intuitive interface, we leverage the familiarity of users with Faceted Search (FS) systems, i.e. we extend FS with actions that enable users to formulate analytic queries, too. Distinctive characteristics of the approach is the ability to include arbitrarily long paths in the analytic query (accompanied with count information), interactive formulation of HAVING restrictions, the support of both Faceted Search (i.e. the locating of the desired resources in a faceted search manner) and analytic queries, and the ability to formulate nested analytic queries. Finally, we present the results of a preliminary task-based evaluation with users, which are very promising.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"323 1","pages":"807-810"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76296786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties are known as “shapes”. Using SHACL, we propose in this paper the notion of neighborhood of a node 𝑣 satisfying a given shape in a graph 𝐺 . This neighborhood is a subgraph of 𝐺 , and provides data provenance of 𝑣 for the given shape. We establish a correctness property for the obtained provenance mechanism, by proving that neighborhoods adhere to the Sufficiency requirement articulated for provenance semantics for database queries. As an additional benefit, neighborhoods allow a novel use of shapes: the extraction of a subgraph from an RDF graph, the so-called shape fragment. We compare shape fragments with SPARQL queries. We discuss implementation strategies for computing neighborhoods, and present initial experiments demonstrating that our ideas are fea-sible.
{"title":"Data Provenance for SHACL","authors":"Thomas Delva, Maxim Jakubowski","doi":"10.48786/edbt.2023.23","DOIUrl":"https://doi.org/10.48786/edbt.2023.23","url":null,"abstract":"In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties are known as “shapes”. Using SHACL, we propose in this paper the notion of neighborhood of a node 𝑣 satisfying a given shape in a graph 𝐺 . This neighborhood is a subgraph of 𝐺 , and provides data provenance of 𝑣 for the given shape. We establish a correctness property for the obtained provenance mechanism, by proving that neighborhoods adhere to the Sufficiency requirement articulated for provenance semantics for database queries. As an additional benefit, neighborhoods allow a novel use of shapes: the extraction of a subgraph from an RDF graph, the so-called shape fragment. We compare shape fragments with SPARQL queries. We discuss implementation strategies for computing neighborhoods, and present initial experiments demonstrating that our ideas are fea-sible.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"16 1","pages":"285-297"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90343818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chengcheng Wan, Yiwen Zhu, Joyce Cahoon, Wenjing Wang, K. Lin, Sean Liu, Raymond Truong, Neetu Singh, Alexandra Ciortea, Konstantinos Karanasos, Subru Krishnan
Database benchmarking and workload replay have been widely used to drive system design, evaluate workload performance, de-termine product evolution, and guide cloud migration. However, they both suffer from some key limitations: the former fails to capture the variety and complexity of production workloads; the latter requires access to user data, queries, and machine specifications, deeming it inapplicable in the face of user privacy concerns. Here we introduce our vision of learned workload synthesis to overcome these issues: given the performance profile of a customer workload (e.g., CPU/memory counters), synthesize a new workload that yields the same performance profile when executed on a range of hardware/software configurations. We present Stitcher as a first step towards realizing this vision, which synthesizes workloads by combining pieces from standard benchmarks. We believe that our vision will spark new research avenues in database workload replay.
{"title":"Stitcher: Learned Workload Synthesis from Historical Performance Footprints","authors":"Chengcheng Wan, Yiwen Zhu, Joyce Cahoon, Wenjing Wang, K. Lin, Sean Liu, Raymond Truong, Neetu Singh, Alexandra Ciortea, Konstantinos Karanasos, Subru Krishnan","doi":"10.48786/edbt.2023.33","DOIUrl":"https://doi.org/10.48786/edbt.2023.33","url":null,"abstract":"Database benchmarking and workload replay have been widely used to drive system design, evaluate workload performance, de-termine product evolution, and guide cloud migration. However, they both suffer from some key limitations: the former fails to capture the variety and complexity of production workloads; the latter requires access to user data, queries, and machine specifications, deeming it inapplicable in the face of user privacy concerns. Here we introduce our vision of learned workload synthesis to overcome these issues: given the performance profile of a customer workload (e.g., CPU/memory counters), synthesize a new workload that yields the same performance profile when executed on a range of hardware/software configurations. We present Stitcher as a first step towards realizing this vision, which synthesizes workloads by combining pieces from standard benchmarks. We believe that our vision will spark new research avenues in database workload replay.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"108 1","pages":"417-423"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91107488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
X. Liu, Xu Cheng, Yanyan Yang, Huan Huo, Yongping Liu, P. S. Nielsen
Understanding crowd behavior is crucial for energy demand-side management. In this paper, we employ the fluid dynamics concept potential flow to model the energy demand shift patterns of the crowd in both temporal and spatial dimensions. To facilitate the use of the proposed method, we implement a visual analysis platform that allows users to interactively explore and interpret the shift patterns. The effectiveness of the proposed method will be evaluated through a hands-on experience with a real case study during the conference demonstration.
{"title":"Understanding crowd energy consumption behaviors","authors":"X. Liu, Xu Cheng, Yanyan Yang, Huan Huo, Yongping Liu, P. S. Nielsen","doi":"10.48786/edbt.2023.68","DOIUrl":"https://doi.org/10.48786/edbt.2023.68","url":null,"abstract":"Understanding crowd behavior is crucial for energy demand-side management. In this paper, we employ the fluid dynamics concept potential flow to model the energy demand shift patterns of the crowd in both temporal and spatial dimensions. To facilitate the use of the proposed method, we implement a visual analysis platform that allows users to interactively explore and interpret the shift patterns. The effectiveness of the proposed method will be evaluated through a hands-on experience with a real case study during the conference demonstration.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"6 1","pages":"799-802"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91288051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ludovic Javet, N. Anciaux, Luc Bouganim, Léo Lamoureux, P. Pucheral
Can we push Edge computing one step further? This demonstration paper proposes an answer to this question by leveraging the generalization of Trusted Execution Environments at the very edge of the network to enable resilient and privacy-preserving computation on personal devices. Based on preliminary published results, we show that this can drastically change the way distributed processing over personal data is conceived and achieved. The platform presented here demonstrates the pertinence of the approach through execution scenarios integrating heterogeneous secure personal devices.
{"title":"Pushing Edge Computing one Step Further: Resilient and Privacy-Preserving Processing on Personal Devices","authors":"Ludovic Javet, N. Anciaux, Luc Bouganim, Léo Lamoureux, P. Pucheral","doi":"10.48786/edbt.2023.77","DOIUrl":"https://doi.org/10.48786/edbt.2023.77","url":null,"abstract":"Can we push Edge computing one step further? This demonstration paper proposes an answer to this question by leveraging the generalization of Trusted Execution Environments at the very edge of the network to enable resilient and privacy-preserving computation on personal devices. Based on preliminary published results, we show that this can drastically change the way distributed processing over personal data is conceived and achieved. The platform presented here demonstrates the pertinence of the approach through execution scenarios integrating heterogeneous secure personal devices.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"46 1","pages":"835-838"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90898008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Loredana Caruccio, Stefano Cirillo, V. Deufemia, G. Polese, R. Stanzione
Query relaxation aims to relax the query constraints in order to derive some approximate results when the answer set is small. In this demo paper, we present REQUIRED, an automatized, portable, and scalable query relaxation tool leveraging metadata learned from an input dataset. The intuition is to use relationships underlying attribute values to derive a new query whose approximate results still meet the user’s expectations. In particular, REQUIRED exploits relaxed functional dependencies to modify the original query in two different ways: ( 𝑖 ) relaxing some query conditions by replacing the equality constraints with ranges and/or collections of admissible values, and ( 𝑖𝑖 ) rewriting the original query by replacing some or all the attributes involved in the conditions of the query with attributes related to them. Our demonstration scenarios show that REQUIRED is effective in properly relaxing queries according to the considered strategy.
{"title":"REQUIRED: A Tool to Relax Queries through Relaxed Functional Dependencies","authors":"Loredana Caruccio, Stefano Cirillo, V. Deufemia, G. Polese, R. Stanzione","doi":"10.48786/edbt.2023.74","DOIUrl":"https://doi.org/10.48786/edbt.2023.74","url":null,"abstract":"Query relaxation aims to relax the query constraints in order to derive some approximate results when the answer set is small. In this demo paper, we present REQUIRED, an automatized, portable, and scalable query relaxation tool leveraging metadata learned from an input dataset. The intuition is to use relationships underlying attribute values to derive a new query whose approximate results still meet the user’s expectations. In particular, REQUIRED exploits relaxed functional dependencies to modify the original query in two different ways: ( 𝑖 ) relaxing some query conditions by replacing the equality constraints with ranges and/or collections of admissible values, and ( 𝑖𝑖 ) rewriting the original query by replacing some or all the attributes involved in the conditions of the query with attributes related to them. Our demonstration scenarios show that REQUIRED is effective in properly relaxing queries according to the considered strategy.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"10 1","pages":"823-826"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86751714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nils Strassenburg, Dominic Kupfer, J. Kowal, T. Rabl
Deep learning models are deployed in an increasing number of industrial domains, such as retail and automotive applications. An instance of a model typically performs one specific task, which is why larger software systems use multiple models in parallel. Given that all models in production software have to be managed, this leads to the problem of managing sets of related models, i.e., multi-model management. Existing approaches perform poorly on this task because they are optimized for saving single large models but not for simultaneously saving a set of related models. In this paper, we explore the space of multi-model management by presenting three optimized approaches: (1) A baseline approach that saves full model representations and minimizes the amount of saved metadata. (2) An update approach that reduces the storage consumption compared to the baseline by saving parameter updates instead of full models. (3) A provenance approach that saves model provenance data instead of model parameters. We evaluate the approaches for the multi-model management use cases of managing car battery cell models and image classification models. Our results show that the baseline outperforms existing approaches for save and recover times by more than an order of magnitude and that more sophisticated approaches reduce the storage consumption by up to 99%.
{"title":"Efficient Multi-Model Management","authors":"Nils Strassenburg, Dominic Kupfer, J. Kowal, T. Rabl","doi":"10.48786/edbt.2023.37","DOIUrl":"https://doi.org/10.48786/edbt.2023.37","url":null,"abstract":"Deep learning models are deployed in an increasing number of industrial domains, such as retail and automotive applications. An instance of a model typically performs one specific task, which is why larger software systems use multiple models in parallel. Given that all models in production software have to be managed, this leads to the problem of managing sets of related models, i.e., multi-model management. Existing approaches perform poorly on this task because they are optimized for saving single large models but not for simultaneously saving a set of related models. In this paper, we explore the space of multi-model management by presenting three optimized approaches: (1) A baseline approach that saves full model representations and minimizes the amount of saved metadata. (2) An update approach that reduces the storage consumption compared to the baseline by saving parameter updates instead of full models. (3) A provenance approach that saves model provenance data instead of model parameters. We evaluate the approaches for the multi-model management use cases of managing car battery cell models and image classification models. Our results show that the baseline outperforms existing approaches for save and recover times by more than an order of magnitude and that more sophisticated approaches reduce the storage consumption by up to 99%.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"77 1","pages":"457-463"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86764458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}