As users migrate their analytical workloads to cloud databases, it is becoming just as important to reduce monetary costs as it is to optimize query runtime. In the cloud, a query is billed based on either its compute time or the amount of data it processes. We observe that analytical queries are either compute- or IO-bound and each query type executes cheaper in a different pricing model. We exploit this opportunity and propose methods to build cheaper execution plans across pricing models that complete within user-defined runtime constraints. We implement these methods and produce execution plans spanning multiple pricing models that reduce the monetary cost for workloads by as much as 56%. We reduce individual query costs by as much as 90%. The prices chosen by cloud vendors for cloud services also impact savings opportunities. To study this effect, we simulate our proposed methods with different cloud prices and observe that multi-cloud savings are robust to changes in cloud vendor prices. These results indicate the massive opportunity to save money by executing workloads across multiple pricing models.
{"title":"Saving Money for Analytical Workloads in the Cloud","authors":"Tapan Srivastava, Raul Castro Fernandez","doi":"arxiv-2408.00253","DOIUrl":"https://doi.org/arxiv-2408.00253","url":null,"abstract":"As users migrate their analytical workloads to cloud databases, it is\u0000becoming just as important to reduce monetary costs as it is to optimize query\u0000runtime. In the cloud, a query is billed based on either its compute time or\u0000the amount of data it processes. We observe that analytical queries are either\u0000compute- or IO-bound and each query type executes cheaper in a different\u0000pricing model. We exploit this opportunity and propose methods to build cheaper\u0000execution plans across pricing models that complete within user-defined runtime\u0000constraints. We implement these methods and produce execution plans spanning\u0000multiple pricing models that reduce the monetary cost for workloads by as much\u0000as 56%. We reduce individual query costs by as much as 90%. The prices chosen\u0000by cloud vendors for cloud services also impact savings opportunities. To study\u0000this effect, we simulate our proposed methods with different cloud prices and\u0000observe that multi-cloud savings are robust to changes in cloud vendor prices.\u0000These results indicate the massive opportunity to save money by executing\u0000workloads across multiple pricing models.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"298 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Database queries traditionally operate under the closed-world assumption, providing no answers to questions that require information beyond the data stored in the database. Hybrid querying using SQL offers an alternative by integrating relational databases with large language models (LLMs) to answer beyond-database questions. In this paper, we present the first cross-domain benchmark, SWAN, containing 120 beyond-database questions over four real-world databases. To leverage state-of-the-art language models in addressing these complex questions in SWAN, we present, HQDL, a preliminary solution for hybrid querying, and also discuss potential future directions. Our evaluation demonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0% in execution accuracy and 48.2% in data factuality. These results highlights both the potential and challenges for hybrid querying. We believe that our work will inspire further research in creating more efficient and accurate data systems that seamlessly integrate relational databases and large language models to address beyond-database questions.
{"title":"Hybrid Querying Over Relational Databases and Large Language Models","authors":"Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi","doi":"arxiv-2408.00884","DOIUrl":"https://doi.org/arxiv-2408.00884","url":null,"abstract":"Database queries traditionally operate under the closed-world assumption,\u0000providing no answers to questions that require information beyond the data\u0000stored in the database. Hybrid querying using SQL offers an alternative by\u0000integrating relational databases with large language models (LLMs) to answer\u0000beyond-database questions. In this paper, we present the first cross-domain\u0000benchmark, SWAN, containing 120 beyond-database questions over four real-world\u0000databases. To leverage state-of-the-art language models in addressing these\u0000complex questions in SWAN, we present, HQDL, a preliminary solution for hybrid\u0000querying, and also discuss potential future directions. Our evaluation\u0000demonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0%\u0000in execution accuracy and 48.2% in data factuality. These results highlights\u0000both the potential and challenges for hybrid querying. We believe that our work\u0000will inspire further research in creating more efficient and accurate data\u0000systems that seamlessly integrate relational databases and large language\u0000models to address beyond-database questions.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Temporal knowledge graphs (TKGs) are valuable resources for capturing evolving relationships among entities, yet they are often plagued by noise, necessitating robust anomaly detection mechanisms. Existing dynamic graph anomaly detection approaches struggle to capture the rich semantics introduced by node and edge categories within TKGs, while TKG embedding methods lack interpretability, undermining the credibility of anomaly detection. Moreover, these methods falter in adapting to pattern changes and semantic drifts resulting from knowledge updates. To tackle these challenges, we introduce AnoT, an efficient TKG summarization method tailored for interpretable online anomaly detection in TKGs. AnoT begins by summarizing a TKG into a novel rule graph, enabling flexible inference of complex patterns in TKGs. When new knowledge emerges, AnoT maps it onto a node in the rule graph and traverses the rule graph recursively to derive the anomaly score of the knowledge. The traversal yields reachable nodes that furnish interpretable evidence for the validity or the anomalous of the new knowledge. Overall, AnoT embodies a detector-updater-monitor architecture, encompassing a detector for offline TKG summarization and online scoring, an updater for real-time rule graph updates based on emerging knowledge, and a monitor for estimating the approximation error of the rule graph. Experimental results on four real-world datasets demonstrate that AnoT surpasses existing methods significantly in terms of accuracy and interoperability. All of the raw datasets and the implementation of AnoT are provided in https://github.com/zjs123/ANoT.
{"title":"Online Detection of Anomalies in Temporal Knowledge Graphs with Interpretability","authors":"Jiasheng Zhang, Jie Shao, Rex Ying","doi":"arxiv-2408.00872","DOIUrl":"https://doi.org/arxiv-2408.00872","url":null,"abstract":"Temporal knowledge graphs (TKGs) are valuable resources for capturing\u0000evolving relationships among entities, yet they are often plagued by noise,\u0000necessitating robust anomaly detection mechanisms. Existing dynamic graph\u0000anomaly detection approaches struggle to capture the rich semantics introduced\u0000by node and edge categories within TKGs, while TKG embedding methods lack\u0000interpretability, undermining the credibility of anomaly detection. Moreover,\u0000these methods falter in adapting to pattern changes and semantic drifts\u0000resulting from knowledge updates. To tackle these challenges, we introduce\u0000AnoT, an efficient TKG summarization method tailored for interpretable online\u0000anomaly detection in TKGs. AnoT begins by summarizing a TKG into a novel rule\u0000graph, enabling flexible inference of complex patterns in TKGs. When new\u0000knowledge emerges, AnoT maps it onto a node in the rule graph and traverses the\u0000rule graph recursively to derive the anomaly score of the knowledge. The\u0000traversal yields reachable nodes that furnish interpretable evidence for the\u0000validity or the anomalous of the new knowledge. Overall, AnoT embodies a\u0000detector-updater-monitor architecture, encompassing a detector for offline TKG\u0000summarization and online scoring, an updater for real-time rule graph updates\u0000based on emerging knowledge, and a monitor for estimating the approximation\u0000error of the rule graph. Experimental results on four real-world datasets\u0000demonstrate that AnoT surpasses existing methods significantly in terms of\u0000accuracy and interoperability. All of the raw datasets and the implementation\u0000of AnoT are provided in https://github.com/zjs123/ANoT.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego Arroyuelo, Fabrizio Barisione, Antonio Fariña, Adrián Gómez-Brandón, Gonzalo Navarro
A recent surprising result in the implementation of worst-case-optimal (wco) multijoins in graph databases (specifically, basic graph patterns) is that they can be supported on graph representations that take even less space than a plain representation, and orders of magnitude less space than classical indices, while offering comparable performance. In this paper we uncover a wide set of new wco space-time tradeoffs: we (1) introduce new compact indices that handle multijoins in wco time, and (2) combine them with new query resolution strategies that offer better times in practice. As a result, we improve the average query times of current compact representations by a factor of up to 13 to produce the first 1000 results, and using twice their space, reduce their total average query time by a factor of 2. Our experiments suggest that there is more room for improvement in terms of generating better query plans for multijoins.
{"title":"New Compressed Indices for Multijoins on Graph Databases","authors":"Diego Arroyuelo, Fabrizio Barisione, Antonio Fariña, Adrián Gómez-Brandón, Gonzalo Navarro","doi":"arxiv-2408.00558","DOIUrl":"https://doi.org/arxiv-2408.00558","url":null,"abstract":"A recent surprising result in the implementation of worst-case-optimal (wco)\u0000multijoins in graph databases (specifically, basic graph patterns) is that they\u0000can be supported on graph representations that take even less space than a\u0000plain representation, and orders of magnitude less space than classical\u0000indices, while offering comparable performance. In this paper we uncover a wide\u0000set of new wco space-time tradeoffs: we (1) introduce new compact indices that\u0000handle multijoins in wco time, and (2) combine them with new query resolution\u0000strategies that offer better times in practice. As a result, we improve the\u0000average query times of current compact representations by a factor of up to 13\u0000to produce the first 1000 results, and using twice their space, reduce their\u0000total average query time by a factor of 2. Our experiments suggest that there\u0000is more room for improvement in terms of generating better query plans for\u0000multijoins.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"99 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiny Pan, Daniel Hernández, Philipp Seifer, Ralf Lämmel, Steffen Staab
Over the past few years, we have seen the emergence of large knowledge graphs combining information from multiple sources. Sometimes, this information is provided in the form of assertions about other assertions, defining contexts where assertions are valid. A recent extension to RDF which admits statements over statements, called RDF-star, is in revision to become a W3C standard. However, there is no proposal for a semantics of these RDF-star statements nor a built-in facility to operate over them. In this paper, we propose a query language for epistemic RDF-star metadata based on a four-valued logic, called eSPARQL. Our proposed query language extends SPARQL-star, the query language for RDF-star, with a new type of FROM clause to facilitate operating with multiple and sometimes conflicting beliefs. We show that the proposed query language can express four use case queries, including the following features: (i) querying the belief of an individual, (ii) the aggregating of beliefs, (iii) querying who is conflicting with somebody, and (iv) beliefs about beliefs (i.e., nesting of beliefs).
{"title":"eSPARQL: Representing and Reconciling Agnostic and Atheistic Beliefs in RDF-star Knowledge Graphs","authors":"Xiny Pan, Daniel Hernández, Philipp Seifer, Ralf Lämmel, Steffen Staab","doi":"arxiv-2407.21483","DOIUrl":"https://doi.org/arxiv-2407.21483","url":null,"abstract":"Over the past few years, we have seen the emergence of large knowledge graphs\u0000combining information from multiple sources. Sometimes, this information is\u0000provided in the form of assertions about other assertions, defining contexts\u0000where assertions are valid. A recent extension to RDF which admits statements\u0000over statements, called RDF-star, is in revision to become a W3C standard.\u0000However, there is no proposal for a semantics of these RDF-star statements nor\u0000a built-in facility to operate over them. In this paper, we propose a query\u0000language for epistemic RDF-star metadata based on a four-valued logic, called\u0000eSPARQL. Our proposed query language extends SPARQL-star, the query language\u0000for RDF-star, with a new type of FROM clause to facilitate operating with\u0000multiple and sometimes conflicting beliefs. We show that the proposed query\u0000language can express four use case queries, including the following features:\u0000(i) querying the belief of an individual, (ii) the aggregating of beliefs,\u0000(iii) querying who is conflicting with somebody, and (iv) beliefs about beliefs\u0000(i.e., nesting of beliefs).","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper studies the completeness of conjunctive queries over a partially complete database and the approximation of incomplete queries. Given a query and a set of completeness rules (a special kind of tuple generating dependencies) that specify which parts of the database are complete, we investigate whether the query can be fully answered, as if all data were available. If not, we explore reformulating the query into either Maximal Complete Specializations (MCSs) or the (unique up to equivalence) Minimal Complete Generalization (MCG) that can be fully answered, that is, the best complete approximations of the query from below or above in the sense of query containment. We show that the MSG can be characterized as the least fixed-point of a monotonic operator in a preorder. Then, we show that an MCS can be computed by recursive backward application of completeness rules. We study the complexity of both problems and discuss implementation techniques that rely on an ASP and Prolog engines, respectively.
本文研究部分完整数据库上连接查询的完备性以及不完备查询的近似。给定一个查询和一组指定数据库哪些部分是完整的完备性规则(一种特殊的元组生成依赖关系),我们会研究查询是否能得到完整的回答,就像所有数据都可用一样。如果不能,我们就会探索将查询重新表述为最大完整特化(MCS)或可完全回答的(唯一等价)最小完整泛化(MCG),也就是说,在查询包含的意义上,查询从下往上的最佳完整近似。我们证明,MSG 可以表征为前序中单调算子的最小定点。然后,我们证明可以通过递归反向应用完备性规则来计算 MCS。我们研究了这两个问题的复杂性,并讨论了分别依赖 ASP 和 Prolog 引擎的实现技术。
{"title":"Complete Approximations of Incomplete Queries","authors":"Julien Corman, Werner Nutt, Ognjen Savković","doi":"arxiv-2407.20932","DOIUrl":"https://doi.org/arxiv-2407.20932","url":null,"abstract":"This paper studies the completeness of conjunctive queries over a partially\u0000complete database and the approximation of incomplete queries. Given a query\u0000and a set of completeness rules (a special kind of tuple generating\u0000dependencies) that specify which parts of the database are complete, we\u0000investigate whether the query can be fully answered, as if all data were\u0000available. If not, we explore reformulating the query into either Maximal\u0000Complete Specializations (MCSs) or the (unique up to equivalence) Minimal\u0000Complete Generalization (MCG) that can be fully answered, that is, the best\u0000complete approximations of the query from below or above in the sense of query\u0000containment. We show that the MSG can be characterized as the least fixed-point\u0000of a monotonic operator in a preorder. Then, we show that an MCS can be\u0000computed by recursive backward application of completeness rules. We study the\u0000complexity of both problems and discuss implementation techniques that rely on\u0000an ASP and Prolog engines, respectively.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego Figueira, S. Krishna, Om Swostik Mishra, Anantha Padmanabha
The problem of checking whether a recursive query can be rewritten as query without recursion is a fundamental reasoning task, known as the boundedness problem. Here we study the boundedness problem for Unions of Conjunctive Regular Path Queries (UCRPQs), a navigational query language extensively used in ontology and graph database querying. The boundedness problem for UCRPQs is ExpSpace-complete. Here we focus our analysis on UCRPQs using simple regular expressions, which are of high practical relevance and enjoy a lower reasoning complexity. We show that the complexity for the boundedness problem for this UCRPQs fragment is $Pi^P_2$-complete, and that an equivalent bounded query can be produced in polynomial time whenever possible. When the query turns out to be unbounded, we also study the task of finding an equivalent maximally bounded query, which we show to be feasible in $Pi^P_2$. As a side result of independent interest stemming from our developments, we study a notion of succinct finite automata and prove that its membership problem is in NP.
{"title":"Boundedness for Unions of Conjunctive Regular Path Queries over Simple Regular Expressions","authors":"Diego Figueira, S. Krishna, Om Swostik Mishra, Anantha Padmanabha","doi":"arxiv-2407.20782","DOIUrl":"https://doi.org/arxiv-2407.20782","url":null,"abstract":"The problem of checking whether a recursive query can be rewritten as query\u0000without recursion is a fundamental reasoning task, known as the boundedness\u0000problem. Here we study the boundedness problem for Unions of Conjunctive\u0000Regular Path Queries (UCRPQs), a navigational query language extensively used\u0000in ontology and graph database querying. The boundedness problem for UCRPQs is\u0000ExpSpace-complete. Here we focus our analysis on UCRPQs using simple regular\u0000expressions, which are of high practical relevance and enjoy a lower reasoning\u0000complexity. We show that the complexity for the boundedness problem for this\u0000UCRPQs fragment is $Pi^P_2$-complete, and that an equivalent bounded query can\u0000be produced in polynomial time whenever possible. When the query turns out to\u0000be unbounded, we also study the task of finding an equivalent maximally bounded\u0000query, which we show to be feasible in $Pi^P_2$. As a side result of\u0000independent interest stemming from our developments, we study a notion of\u0000succinct finite automata and prove that its membership problem is in NP.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minxiao Chen, Haitao Yuan, Nan Jiang, Zhifeng Bao, Shangguang Wang
Traffic accidents pose a significant risk to human health and property safety. Therefore, to prevent traffic accidents, predicting their risks has garnered growing interest. We argue that a desired prediction solution should demonstrate resilience to the complexity of traffic accidents. In particular, it should adequately consider the regional background, accurately capture both spatial proximity and semantic similarity, and effectively address the sparsity of traffic accidents. However, these factors are often overlooked or difficult to incorporate. In this paper, we propose a novel multi-granularity hierarchical spatio-temporal network. Initially, we innovate by incorporating remote sensing data, facilitating the creation of hierarchical multi-granularity structure and the comprehension of regional background. We construct multiple high-level risk prediction tasks to enhance model's ability to cope with sparsity. Subsequently, to capture both spatial proximity and semantic similarity, region feature and multi-view graph undergo encoding processes to distill effective representations. Additionally, we propose message passing and adaptive temporal attention module that bridges different granularities and dynamically captures time correlations inherent in traffic accident patterns. At last, a multivariate hierarchical loss function is devised considering the complexity of the prediction purpose. Extensive experiments on two real datasets verify the superiority of our model against the state-of-the-art methods.
{"title":"Urban Traffic Accident Risk Prediction Revisited: Regionality, Proximity, Similarity and Sparsity","authors":"Minxiao Chen, Haitao Yuan, Nan Jiang, Zhifeng Bao, Shangguang Wang","doi":"arxiv-2407.19668","DOIUrl":"https://doi.org/arxiv-2407.19668","url":null,"abstract":"Traffic accidents pose a significant risk to human health and property\u0000safety. Therefore, to prevent traffic accidents, predicting their risks has\u0000garnered growing interest. We argue that a desired prediction solution should\u0000demonstrate resilience to the complexity of traffic accidents. In particular,\u0000it should adequately consider the regional background, accurately capture both\u0000spatial proximity and semantic similarity, and effectively address the sparsity\u0000of traffic accidents. However, these factors are often overlooked or difficult\u0000to incorporate. In this paper, we propose a novel multi-granularity\u0000hierarchical spatio-temporal network. Initially, we innovate by incorporating\u0000remote sensing data, facilitating the creation of hierarchical\u0000multi-granularity structure and the comprehension of regional background. We\u0000construct multiple high-level risk prediction tasks to enhance model's ability\u0000to cope with sparsity. Subsequently, to capture both spatial proximity and\u0000semantic similarity, region feature and multi-view graph undergo encoding\u0000processes to distill effective representations. Additionally, we propose\u0000message passing and adaptive temporal attention module that bridges different\u0000granularities and dynamically captures time correlations inherent in traffic\u0000accident patterns. At last, a multivariate hierarchical loss function is\u0000devised considering the complexity of the prediction purpose. Extensive\u0000experiments on two real datasets verify the superiority of our model against\u0000the state-of-the-art methods.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars Vogt, Marcel Konrad, Kheir Eddine Farfar, Manuel Prinz, Allard Oelen
Machines need data and metadata to be machine-actionable and FAIR (findable, accessible, interoperable, reusable) to manage increasing data volumes. Knowledge graphs and ontologies are key to this, but their use is hampered by high access barriers due to required prior knowledge in semantics and data modelling. The Rosetta Statement approach proposes modeling English natural language statements instead of a mind-independent reality. We propose a metamodel for creating semantic schema patterns for simple statement types. The approach supports versioning of statements and provides a detailed editing history. Each Rosetta Statement pattern has a dynamic label for displaying statements as natural language sentences. Implemented in the Open Research Knowledge Graph (ORKG) as a use case, this approach allows domain experts to define data schema patterns without needing semantic knowledge. Future plans include combining Rosetta Statements with semantic units to organize ORKG into meaningful subgraphs, improving usability. A search interface for querying statements without needing SPARQL or Cypher knowledge is also planned, along with tools for data entry and display using Large Language Models and NLP. The Rosetta Statement metamodel supports a two-step knowledge graph construction procedure. Domain experts can model semantic content without support from ontology engineers, lowering entry barriers and increasing cognitive interoperability. The second level involves developing semantic graph patterns for reasoning, requiring collaboration with ontology engineers.
{"title":"Rosetta Statements: Lowering the Barrier for Semantic Parsing and Increasing the Cognitive Interoperability of Knowledge Graphs","authors":"Lars Vogt, Marcel Konrad, Kheir Eddine Farfar, Manuel Prinz, Allard Oelen","doi":"arxiv-2407.20007","DOIUrl":"https://doi.org/arxiv-2407.20007","url":null,"abstract":"Machines need data and metadata to be machine-actionable and FAIR (findable,\u0000accessible, interoperable, reusable) to manage increasing data volumes.\u0000Knowledge graphs and ontologies are key to this, but their use is hampered by\u0000high access barriers due to required prior knowledge in semantics and data\u0000modelling. The Rosetta Statement approach proposes modeling English natural\u0000language statements instead of a mind-independent reality. We propose a\u0000metamodel for creating semantic schema patterns for simple statement types. The\u0000approach supports versioning of statements and provides a detailed editing\u0000history. Each Rosetta Statement pattern has a dynamic label for displaying\u0000statements as natural language sentences. Implemented in the Open Research\u0000Knowledge Graph (ORKG) as a use case, this approach allows domain experts to\u0000define data schema patterns without needing semantic knowledge. Future plans\u0000include combining Rosetta Statements with semantic units to organize ORKG into\u0000meaningful subgraphs, improving usability. A search interface for querying\u0000statements without needing SPARQL or Cypher knowledge is also planned, along\u0000with tools for data entry and display using Large Language Models and NLP. The\u0000Rosetta Statement metamodel supports a two-step knowledge graph construction\u0000procedure. Domain experts can model semantic content without support from\u0000ontology engineers, lowering entry barriers and increasing cognitive\u0000interoperability. The second level involves developing semantic graph patterns\u0000for reasoning, requiring collaboration with ontology engineers.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"150 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In data-intensive real-time applications, such as smart transportation and manufacturing, ensuring data freshness is essential, as using obsolete data can lead to negative outcomes. Validity intervals serve as the standard means to specify freshness requirements in real-time databases. In this paper, we bring attention to significant drawbacks of validity intervals that have largely been unnoticed and introduce a new definition of data freshness, while discussing future research directions to address these limitations.
{"title":"Limitations of Validity Intervals in Data Freshness Management","authors":"Kyoung-Don Kang","doi":"arxiv-2407.20431","DOIUrl":"https://doi.org/arxiv-2407.20431","url":null,"abstract":"In data-intensive real-time applications, such as smart transportation and\u0000manufacturing, ensuring data freshness is essential, as using obsolete data can\u0000lead to negative outcomes. Validity intervals serve as the standard means to\u0000specify freshness requirements in real-time databases. In this paper, we bring\u0000attention to significant drawbacks of validity intervals that have largely been\u0000unnoticed and introduce a new definition of data freshness, while discussing\u0000future research directions to address these limitations.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}