arXiv - CS - Databases最新文献

英文中文

Saving Money for Analytical Workloads in the Cloud 为云分析工作负载节省资金

arXiv - CS - Databases

Pub Date : 2024-08-01 DOI: arxiv-2408.00253

Tapan Srivastava, Raul Castro Fernandez

As users migrate their analytical workloads to cloud databases, it isbecoming just as important to reduce monetary costs as it is to optimize queryruntime. In the cloud, a query is billed based on either its compute time orthe amount of data it processes. We observe that analytical queries are eithercompute- or IO-bound and each query type executes cheaper in a differentpricing model. We exploit this opportunity and propose methods to build cheaperexecution plans across pricing models that complete within user-defined runtimeconstraints. We implement these methods and produce execution plans spanningmultiple pricing models that reduce the monetary cost for workloads by as muchas 56%. We reduce individual query costs by as much as 90%. The prices chosenby cloud vendors for cloud services also impact savings opportunities. To studythis effect, we simulate our proposed methods with different cloud prices andobserve that multi-cloud savings are robust to changes in cloud vendor prices.These results indicate the massive opportunity to save money by executingworkloads across multiple pricing models.

随着用户将分析工作负载迁移到云数据库，降低货币成本与优化查询时间变得同等重要。在云中，查询根据其计算时间或处理的数据量计费。我们观察到，分析查询要么受计算约束，要么受 IO 约束，而且每种查询类型在不同的定价模式下执行成本更低。我们利用这一机会，提出了在用户定义的运行时间限制内跨定价模型构建廉价执行计划的方法。我们实施了这些方法，并生成了跨越多个定价模型的执行计划，这些计划将工作负载的货币成本降低了 56%。我们将单个查询成本降低了 90%。云供应商为云服务选择的价格也会影响节省成本的机会。为了研究这种影响，我们用不同的云价格模拟了我们提出的方法，结果发现多云节省的成本对云供应商价格的变化是稳健的。

引用次数: 0

Hybrid Querying Over Relational Databases and Large Language Models 关系数据库和大型语言模型的混合查询

arXiv - CS - Databases

Pub Date : 2024-08-01 DOI: arxiv-2408.00884

Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi

Database queries traditionally operate under the closed-world assumption,providing no answers to questions that require information beyond the datastored in the database. Hybrid querying using SQL offers an alternative byintegrating relational databases with large language models (LLMs) to answerbeyond-database questions. In this paper, we present the first cross-domainbenchmark, SWAN, containing 120 beyond-database questions over four real-worlddatabases. To leverage state-of-the-art language models in addressing thesecomplex questions in SWAN, we present, HQDL, a preliminary solution for hybridquerying, and also discuss potential future directions. Our evaluationdemonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0%in execution accuracy and 48.2% in data factuality. These results highlightsboth the potential and challenges for hybrid querying. We believe that our workwill inspire further research in creating more efficient and accurate datasystems that seamlessly integrate relational databases and large languagemodels to address beyond-database questions.

数据库查询传统上是在封闭世界假设下运行的，无法回答需要数据库数据以外信息的问题。通过将关系数据库与大型语言模型（LLM）相结合来回答数据库之外的问题，使用 SQL 的混合查询提供了另一种选择。在本文中，我们介绍了首个跨领域基准 SWAN，其中包含四个真实世界数据库中的 120 个数据库外问题。为了利用最先进的语言模型解决 SWAN 中的这些复杂问题，我们提出了混合查询的初步解决方案 HQDL，并讨论了潜在的未来发展方向。我们的评估结果表明，HQDL 使用 GPT-4 Turbo 和少量提示，执行准确率达到了 40.0%，数据真实性达到了 48.2%。这些结果凸显了混合查询的潜力和挑战。我们相信，我们的工作将激励进一步的研究，以创建更高效、更准确的数据系统，无缝集成关系数据库和大型语言模型，解决数据库之外的问题。

引用次数: 0

Online Detection of Anomalies in Temporal Knowledge Graphs with Interpretability 具有可解释性的时态知识图谱异常在线检测

arXiv - CS - Databases

Pub Date : 2024-08-01 DOI: arxiv-2408.00872

Jiasheng Zhang, Jie Shao, Rex Ying

Temporal knowledge graphs (TKGs) are valuable resources for capturingevolving relationships among entities, yet they are often plagued by noise,necessitating robust anomaly detection mechanisms. Existing dynamic graphanomaly detection approaches struggle to capture the rich semantics introducedby node and edge categories within TKGs, while TKG embedding methods lackinterpretability, undermining the credibility of anomaly detection. Moreover,these methods falter in adapting to pattern changes and semantic driftsresulting from knowledge updates. To tackle these challenges, we introduceAnoT, an efficient TKG summarization method tailored for interpretable onlineanomaly detection in TKGs. AnoT begins by summarizing a TKG into a novel rulegraph, enabling flexible inference of complex patterns in TKGs. When newknowledge emerges, AnoT maps it onto a node in the rule graph and traverses therule graph recursively to derive the anomaly score of the knowledge. Thetraversal yields reachable nodes that furnish interpretable evidence for thevalidity or the anomalous of the new knowledge. Overall, AnoT embodies adetector-updater-monitor architecture, encompassing a detector for offline TKGsummarization and online scoring, an updater for real-time rule graph updatesbased on emerging knowledge, and a monitor for estimating the approximationerror of the rule graph. Experimental results on four real-world datasetsdemonstrate that AnoT surpasses existing methods significantly in terms ofaccuracy and interoperability. All of the raw datasets and the implementationof AnoT are provided in https://github.com/zjs123/ANoT.

时态知识图谱（TKG）是捕捉实体间不断变化的关系的宝贵资源，但它们经常受到噪声的困扰，因此需要强大的异常检测机制。现有的动态图异常检测方法难以捕捉 TKG 中节点和边的类别所引入的丰富语义，而 TKG 嵌入方法缺乏可解释性，从而削弱了异常检测的可信度。此外，这些方法在适应知识更新导致的模式变化和语义漂移方面也存在缺陷。为了应对这些挑战，我们引入了一种高效的 TKG 总结方法 AnoT，它是为 TKG 中可解释的在线异常检测而量身定制的。AnoT 首先将 TKG 总结为一个新颖的规则图，从而能够灵活推断 TKG 中的复杂模式。当出现新知识时，AnoT 会将其映射到规则图中的节点上，并递归遍历规则图，从而得出该知识的异常得分。遍历产生的可到达节点为新知识的有效性或异常性提供了可解释的证据。总的来说，AnoT 包含一个检测器-更新器-监控器架构，其中包括一个用于离线 TKGs 总结和在线评分的检测器、一个用于根据新知识实时更新规则图的更新器和一个用于估计规则图近似误差的监控器。在四个真实数据集上的实验结果表明，AnoT 在准确性和互操作性方面大大超过了现有方法。所有原始数据集和 AnoT 的实现都在 https://github.com/zjs123/ANoT 中提供。

{"title":"Online Detection of Anomalies in Temporal Knowledge Graphs with Interpretability","authors":"Jiasheng Zhang, Jie Shao, Rex Ying","doi":"arxiv-2408.00872","DOIUrl":"https://doi.org/arxiv-2408.00872","url":null,"abstract":"Temporal knowledge graphs (TKGs) are valuable resources for capturing\u0000evolving relationships among entities, yet they are often plagued by noise,\u0000necessitating robust anomaly detection mechanisms. Existing dynamic graph\u0000anomaly detection approaches struggle to capture the rich semantics introduced\u0000by node and edge categories within TKGs, while TKG embedding methods lack\u0000interpretability, undermining the credibility of anomaly detection. Moreover,\u0000these methods falter in adapting to pattern changes and semantic drifts\u0000resulting from knowledge updates. To tackle these challenges, we introduce\u0000AnoT, an efficient TKG summarization method tailored for interpretable online\u0000anomaly detection in TKGs. AnoT begins by summarizing a TKG into a novel rule\u0000graph, enabling flexible inference of complex patterns in TKGs. When new\u0000knowledge emerges, AnoT maps it onto a node in the rule graph and traverses the\u0000rule graph recursively to derive the anomaly score of the knowledge. The\u0000traversal yields reachable nodes that furnish interpretable evidence for the\u0000validity or the anomalous of the new knowledge. Overall, AnoT embodies a\u0000detector-updater-monitor architecture, encompassing a detector for offline TKG\u0000summarization and online scoring, an updater for real-time rule graph updates\u0000based on emerging knowledge, and a monitor for estimating the approximation\u0000error of the rule graph. Experimental results on four real-world datasets\u0000demonstrate that AnoT surpasses existing methods significantly in terms of\u0000accuracy and interoperability. All of the raw datasets and the implementation\u0000of AnoT are provided in https://github.com/zjs123/ANoT.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

New Compressed Indices for Multijoins on Graph Databases 用于图数据库多重连接的新压缩索引

arXiv - CS - Databases

Pub Date : 2024-08-01 DOI: arxiv-2408.00558

Diego Arroyuelo, Fabrizio Barisione, Antonio Fariña, Adrián Gómez-Brandón, Gonzalo Navarro

A recent surprising result in the implementation of worst-case-optimal (wco)multijoins in graph databases (specifically, basic graph patterns) is that theycan be supported on graph representations that take even less space than aplain representation, and orders of magnitude less space than classicalindices, while offering comparable performance. In this paper we uncover a wideset of new wco space-time tradeoffs: we (1) introduce new compact indices thathandle multijoins in wco time, and (2) combine them with new query resolutionstrategies that offer better times in practice. As a result, we improve theaverage query times of current compact representations by a factor of up to 13to produce the first 1000 results, and using twice their space, reduce theirtotal average query time by a factor of 2. Our experiments suggest that thereis more room for improvement in terms of generating better query plans formultijoins.

最近，在图数据库（特别是基本图模式）中实现最坏情况最优（wco）多连接的一个令人惊讶的结果是，图表示法可以支持多连接，其占用的空间甚至比普通表示法更少，比经典索引占用的空间更少，而性能却相当。在本文中，我们发现了一系列新的 wco 时空权衡：我们（1）引入了新的紧凑型索引，可以在 wco 时间内处理多连接；（2）将它们与新的查询解析策略相结合，在实践中提供更好的时间。结果是，我们将当前紧凑表示法的平均查询时间提高了 13 倍，以生成前 1000 个结果，并使用其两倍空间，将其总平均查询时间缩短了 2 倍。我们的实验表明，在生成更好的多重连接查询计划方面，还有更大的改进空间。

引用次数: 0

eSPARQL: Representing and Reconciling Agnostic and Atheistic Beliefs in RDF-star Knowledge Graphs eSPARQL：在 RDF-star 知识图谱中表示和调和不可知论与无神论信仰

arXiv - CS - Databases

Pub Date : 2024-07-31 DOI: arxiv-2407.21483

Xiny Pan, Daniel Hernández, Philipp Seifer, Ralf Lämmel, Steffen Staab

Over the past few years, we have seen the emergence of large knowledge graphscombining information from multiple sources. Sometimes, this information isprovided in the form of assertions about other assertions, defining contextswhere assertions are valid. A recent extension to RDF which admits statementsover statements, called RDF-star, is in revision to become a W3C standard.However, there is no proposal for a semantics of these RDF-star statements nora built-in facility to operate over them. In this paper, we propose a querylanguage for epistemic RDF-star metadata based on a four-valued logic, calledeSPARQL. Our proposed query language extends SPARQL-star, the query languagefor RDF-star, with a new type of FROM clause to facilitate operating withmultiple and sometimes conflicting beliefs. We show that the proposed querylanguage can express four use case queries, including the following features:(i) querying the belief of an individual, (ii) the aggregating of beliefs,(iii) querying who is conflicting with somebody, and (iv) beliefs about beliefs(i.e., nesting of beliefs).

在过去几年中，我们看到了将多种来源的信息整合在一起的大型知识图谱的出现。有时，这些信息是以关于其他断言的断言的形式提供的，定义了断言有效的上下文。然而，目前还没有关于这些 RDF-star 语句语义的建议，也没有对其进行操作的内置工具。在本文中，我们提出了一种基于四值逻辑的RDF-star元数据查询语言，称为eSPARQL。我们提出的查询语言扩展了RDF-star的查询语言SPARQL-star，增加了一种新型FROM子句，以方便操作多个有时相互冲突的信念。我们展示了所提出的查询语言可以表达四种用例查询，包括以下功能：(i) 查询个人的信念，(ii) 聚集信念，(iii) 查询谁与某人有冲突，以及 (iv) 关于信念的信念（即信念嵌套）。

{"title":"eSPARQL: Representing and Reconciling Agnostic and Atheistic Beliefs in RDF-star Knowledge Graphs","authors":"Xiny Pan, Daniel Hernández, Philipp Seifer, Ralf Lämmel, Steffen Staab","doi":"arxiv-2407.21483","DOIUrl":"https://doi.org/arxiv-2407.21483","url":null,"abstract":"Over the past few years, we have seen the emergence of large knowledge graphs\u0000combining information from multiple sources. Sometimes, this information is\u0000provided in the form of assertions about other assertions, defining contexts\u0000where assertions are valid. A recent extension to RDF which admits statements\u0000over statements, called RDF-star, is in revision to become a W3C standard.\u0000However, there is no proposal for a semantics of these RDF-star statements nor\u0000a built-in facility to operate over them. In this paper, we propose a query\u0000language for epistemic RDF-star metadata based on a four-valued logic, called\u0000eSPARQL. Our proposed query language extends SPARQL-star, the query language\u0000for RDF-star, with a new type of FROM clause to facilitate operating with\u0000multiple and sometimes conflicting beliefs. We show that the proposed query\u0000language can express four use case queries, including the following features:\u0000(i) querying the belief of an individual, (ii) the aggregating of beliefs,\u0000(iii) querying who is conflicting with somebody, and (iv) beliefs about beliefs\u0000(i.e., nesting of beliefs).","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Complete Approximations of Incomplete Queries 不完整查询的完整近似值

arXiv - CS - Databases

Pub Date : 2024-07-30 DOI: arxiv-2407.20932

Julien Corman, Werner Nutt, Ognjen Savković

This paper studies the completeness of conjunctive queries over a partiallycomplete database and the approximation of incomplete queries. Given a queryand a set of completeness rules (a special kind of tuple generatingdependencies) that specify which parts of the database are complete, weinvestigate whether the query can be fully answered, as if all data wereavailable. If not, we explore reformulating the query into either MaximalComplete Specializations (MCSs) or the (unique up to equivalence) MinimalComplete Generalization (MCG) that can be fully answered, that is, the bestcomplete approximations of the query from below or above in the sense of querycontainment. We show that the MSG can be characterized as the least fixed-pointof a monotonic operator in a preorder. Then, we show that an MCS can becomputed by recursive backward application of completeness rules. We study thecomplexity of both problems and discuss implementation techniques that rely onan ASP and Prolog engines, respectively.

本文研究部分完整数据库上连接查询的完备性以及不完备查询的近似。给定一个查询和一组指定数据库哪些部分是完整的完备性规则（一种特殊的元组生成依赖关系），我们会研究查询是否能得到完整的回答，就像所有数据都可用一样。如果不能，我们就会探索将查询重新表述为最大完整特化（MCS）或可完全回答的（唯一等价）最小完整泛化（MCG），也就是说，在查询包含的意义上，查询从下往上的最佳完整近似。我们证明，MSG 可以表征为前序中单调算子的最小定点。然后，我们证明可以通过递归反向应用完备性规则来计算 MCS。我们研究了这两个问题的复杂性，并讨论了分别依赖 ASP 和 Prolog 引擎的实现技术。

引用次数: 0

Boundedness for Unions of Conjunctive Regular Path Queries over Simple Regular Expressions 简单正则表达式上连接正则路径查询的联合有界性

arXiv - CS - Databases

Pub Date : 2024-07-30 DOI: arxiv-2407.20782

Diego Figueira, S. Krishna, Om Swostik Mishra, Anantha Padmanabha

The problem of checking whether a recursive query can be rewritten as querywithout recursion is a fundamental reasoning task, known as the boundednessproblem. Here we study the boundedness problem for Unions of ConjunctiveRegular Path Queries (UCRPQs), a navigational query language extensively usedin ontology and graph database querying. The boundedness problem for UCRPQs isExpSpace-complete. Here we focus our analysis on UCRPQs using simple regularexpressions, which are of high practical relevance and enjoy a lower reasoningcomplexity. We show that the complexity for the boundedness problem for thisUCRPQs fragment is $Pi^P_2$-complete, and that an equivalent bounded query canbe produced in polynomial time whenever possible. When the query turns out tobe unbounded, we also study the task of finding an equivalent maximally boundedquery, which we show to be feasible in $Pi^P_2$. As a side result ofindependent interest stemming from our developments, we study a notion ofsuccinct finite automata and prove that its membership problem is in NP.

检查递归查询能否被改写为无递归查询是一个基本的推理任务，被称为有界性问题。在这里，我们研究了关联规则路径查询联盟（UCRPQs）的有界性问题，这是一种在本体和图数据库查询中广泛使用的导航查询语言。UCRPQs的有界性问题是ExpSpace-complete问题。在这里，我们重点分析了使用简单调节表达式的 UCRPQ，这种表达式具有很高的实用性，而且推理复杂度较低。我们证明，这个 UCRPQs 片段的有界性问题的复杂度是 $Pi^P_2$-完全的，而且只要有可能，就能在多项式时间内产生等价的有界查询。当查询被证明是无界的时候，我们还研究了寻找等价的最大有界查询的任务，我们证明它在 $Pi^P_2$ 内是可行的。作为与我们的发展相关的一个附带结果，我们研究了一个简洁有限自动机的概念，并证明其成员问题在 NP 中。

引用次数: 0

Urban Traffic Accident Risk Prediction Revisited: Regionality, Proximity, Similarity and Sparsity 重新审视城市交通事故风险预测：区域性、邻近性、相似性和稀疏性

arXiv - CS - Databases

Pub Date : 2024-07-29 DOI: arxiv-2407.19668

Minxiao Chen, Haitao Yuan, Nan Jiang, Zhifeng Bao, Shangguang Wang

Traffic accidents pose a significant risk to human health and propertysafety. Therefore, to prevent traffic accidents, predicting their risks hasgarnered growing interest. We argue that a desired prediction solution shoulddemonstrate resilience to the complexity of traffic accidents. In particular,it should adequately consider the regional background, accurately capture bothspatial proximity and semantic similarity, and effectively address the sparsityof traffic accidents. However, these factors are often overlooked or difficultto incorporate. In this paper, we propose a novel multi-granularityhierarchical spatio-temporal network. Initially, we innovate by incorporatingremote sensing data, facilitating the creation of hierarchicalmulti-granularity structure and the comprehension of regional background. Weconstruct multiple high-level risk prediction tasks to enhance model's abilityto cope with sparsity. Subsequently, to capture both spatial proximity andsemantic similarity, region feature and multi-view graph undergo encodingprocesses to distill effective representations. Additionally, we proposemessage passing and adaptive temporal attention module that bridges differentgranularities and dynamically captures time correlations inherent in trafficaccident patterns. At last, a multivariate hierarchical loss function isdevised considering the complexity of the prediction purpose. Extensiveexperiments on two real datasets verify the superiority of our model againstthe state-of-the-art methods.

交通事故对人类健康和财产安全构成重大风险。因此，为了预防交通事故，预测交通事故风险越来越受到人们的关注。我们认为，一个理想的预测解决方案应能适应交通事故的复杂性。特别是，它应充分考虑区域背景，准确捕捉空间接近性和语义相似性，并有效解决交通事故稀少的问题。然而，这些因素往往被忽视或难以纳入。本文提出了一种新颖的多粒度层次时空网络。首先，我们创新性地纳入了遥感数据，促进了多粒度层次结构的建立和区域背景的理解。我们构建了多个高级风险预测任务，以增强模型应对稀疏性的能力。随后，为了捕捉空间接近性和语义相似性，对区域特征和多视图进行编码处理，提炼出有效的表征。此外，我们还提出了信息传递和自适应时间关注模块，该模块可连接不同粒度，动态捕捉交通事故模式中固有的时间相关性。最后，考虑到预测目的的复杂性，我们设计了一个多变量分层损失函数。在两个真实数据集上的广泛实验验证了我们的模型优于最先进的方法。

{"title":"Urban Traffic Accident Risk Prediction Revisited: Regionality, Proximity, Similarity and Sparsity","authors":"Minxiao Chen, Haitao Yuan, Nan Jiang, Zhifeng Bao, Shangguang Wang","doi":"arxiv-2407.19668","DOIUrl":"https://doi.org/arxiv-2407.19668","url":null,"abstract":"Traffic accidents pose a significant risk to human health and property\u0000safety. Therefore, to prevent traffic accidents, predicting their risks has\u0000garnered growing interest. We argue that a desired prediction solution should\u0000demonstrate resilience to the complexity of traffic accidents. In particular,\u0000it should adequately consider the regional background, accurately capture both\u0000spatial proximity and semantic similarity, and effectively address the sparsity\u0000of traffic accidents. However, these factors are often overlooked or difficult\u0000to incorporate. In this paper, we propose a novel multi-granularity\u0000hierarchical spatio-temporal network. Initially, we innovate by incorporating\u0000remote sensing data, facilitating the creation of hierarchical\u0000multi-granularity structure and the comprehension of regional background. We\u0000construct multiple high-level risk prediction tasks to enhance model's ability\u0000to cope with sparsity. Subsequently, to capture both spatial proximity and\u0000semantic similarity, region feature and multi-view graph undergo encoding\u0000processes to distill effective representations. Additionally, we propose\u0000message passing and adaptive temporal attention module that bridges different\u0000granularities and dynamically captures time correlations inherent in traffic\u0000accident patterns. At last, a multivariate hierarchical loss function is\u0000devised considering the complexity of the prediction purpose. Extensive\u0000experiments on two real datasets verify the superiority of our model against\u0000the state-of-the-art methods.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rosetta Statements: Lowering the Barrier for Semantic Parsing and Increasing the Cognitive Interoperability of Knowledge Graphs 罗塞塔语句：降低语义解析障碍，提高知识图谱的认知互操作性

arXiv - CS - Databases

Pub Date : 2024-07-29 DOI: arxiv-2407.20007

Lars Vogt, Marcel Konrad, Kheir Eddine Farfar, Manuel Prinz, Allard Oelen

Machines need data and metadata to be machine-actionable and FAIR (findable,accessible, interoperable, reusable) to manage increasing data volumes.Knowledge graphs and ontologies are key to this, but their use is hampered byhigh access barriers due to required prior knowledge in semantics and datamodelling. The Rosetta Statement approach proposes modeling English naturallanguage statements instead of a mind-independent reality. We propose ametamodel for creating semantic schema patterns for simple statement types. Theapproach supports versioning of statements and provides a detailed editinghistory. Each Rosetta Statement pattern has a dynamic label for displayingstatements as natural language sentences. Implemented in the Open ResearchKnowledge Graph (ORKG) as a use case, this approach allows domain experts todefine data schema patterns without needing semantic knowledge. Future plansinclude combining Rosetta Statements with semantic units to organize ORKG intomeaningful subgraphs, improving usability. A search interface for queryingstatements without needing SPARQL or Cypher knowledge is also planned, alongwith tools for data entry and display using Large Language Models and NLP. TheRosetta Statement metamodel supports a two-step knowledge graph constructionprocedure. Domain experts can model semantic content without support fromontology engineers, lowering entry barriers and increasing cognitiveinteroperability. The second level involves developing semantic graph patternsfor reasoning, requiring collaboration with ontology engineers.

知识图谱和本体是实现这一目标的关键，但由于需要具备语义学和数据建模方面的先验知识，它们的使用受到了高访问门槛的阻碍。罗塞塔语句（Rosetta Statement）方法建议对英语自然语言语句建模，而不是对与思维无关的现实建模。我们提出了一种为简单语句类型创建语义模式的模型。该方法支持语句版本化，并提供详细的编辑历史。每个 Rosetta 语句模式都有一个动态标签，用于将语句显示为自然语言句子。作为一个用例在开放研究知识图谱（ORKG）中实施，这种方法允许领域专家在不需要语义知识的情况下定义数据模式。未来的计划包括将 Rosetta 语句与语义单元相结合，将 ORKG 组织成有意义的子图，从而提高可用性。此外，我们还计划提供一个搜索界面，无需 SPARQL 或 Cypher 知识即可查询语句，并提供使用大型语言模型和 NLP 进行数据输入和显示的工具。罗塞塔语句元模型支持两步式知识图谱构建程序。领域专家可以对语义内容进行建模，无需本体论工程师的支持，从而降低了入门门槛，提高了认知互操作性。第二个层次涉及开发用于推理的语义图模式，需要与本体工程师合作。

{"title":"Rosetta Statements: Lowering the Barrier for Semantic Parsing and Increasing the Cognitive Interoperability of Knowledge Graphs","authors":"Lars Vogt, Marcel Konrad, Kheir Eddine Farfar, Manuel Prinz, Allard Oelen","doi":"arxiv-2407.20007","DOIUrl":"https://doi.org/arxiv-2407.20007","url":null,"abstract":"Machines need data and metadata to be machine-actionable and FAIR (findable,\u0000accessible, interoperable, reusable) to manage increasing data volumes.\u0000Knowledge graphs and ontologies are key to this, but their use is hampered by\u0000high access barriers due to required prior knowledge in semantics and data\u0000modelling. The Rosetta Statement approach proposes modeling English natural\u0000language statements instead of a mind-independent reality. We propose a\u0000metamodel for creating semantic schema patterns for simple statement types. The\u0000approach supports versioning of statements and provides a detailed editing\u0000history. Each Rosetta Statement pattern has a dynamic label for displaying\u0000statements as natural language sentences. Implemented in the Open Research\u0000Knowledge Graph (ORKG) as a use case, this approach allows domain experts to\u0000define data schema patterns without needing semantic knowledge. Future plans\u0000include combining Rosetta Statements with semantic units to organize ORKG into\u0000meaningful subgraphs, improving usability. A search interface for querying\u0000statements without needing SPARQL or Cypher knowledge is also planned, along\u0000with tools for data entry and display using Large Language Models and NLP. The\u0000Rosetta Statement metamodel supports a two-step knowledge graph construction\u0000procedure. Domain experts can model semantic content without support from\u0000ontology engineers, lowering entry barriers and increasing cognitive\u0000interoperability. The second level involves developing semantic graph patterns\u0000for reasoning, requiring collaboration with ontology engineers.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"150 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Limitations of Validity Intervals in Data Freshness Management 数据新鲜度管理中有效性区间的局限性

arXiv - CS - Databases

Pub Date : 2024-07-29 DOI: arxiv-2407.20431

Kyoung-Don Kang

In data-intensive real-time applications, such as smart transportation andmanufacturing, ensuring data freshness is essential, as using obsolete data canlead to negative outcomes. Validity intervals serve as the standard means tospecify freshness requirements in real-time databases. In this paper, we bringattention to significant drawbacks of validity intervals that have largely beenunnoticed and introduce a new definition of data freshness, while discussingfuture research directions to address these limitations.

在智能交通和制造业等数据密集型实时应用中，确保数据的新鲜度至关重要，因为使用过时的数据会导致负面结果。有效区间是确定实时数据库新鲜度要求的标准方法。在本文中，我们提请大家注意有效区间的重大弊端，这些弊端在很大程度上尚未引起人们的注意，我们还介绍了数据新鲜度的新定义，同时讨论了解决这些局限性的未来研究方向。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Databases

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀