基于估算的 RDF 知识库语义压缩优化技术

IF 7.4 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Information Processing & Management Pub Date : 2024-06-08 DOI:10.1016/j.ipm.2024.103799

Ruoyu Wang , Raymond Wong , Daniel Sun

{"title":"基于估算的 RDF 知识库语义压缩优化技术","authors":"Ruoyu Wang , Raymond Wong , Daniel Sun","doi":"10.1016/j.ipm.2024.103799","DOIUrl":null,"url":null,"abstract":"<div><p>Structured knowledge bases are critical for the interpretability of AI techniques. RDF KBs, which are the dominant representation of structured knowledge, are expanding extremely fast to increase their knowledge coverage, enhancing the capability of knowledge reasoning while bringing heavy burdens to downstream applications. Recent studies employ semantic compression to detect and remove knowledge redundancies via semantic models and use the induced model for further applications, such as knowledge completion and error detection. However, semantic models that are sufficiently expressive for semantic compression cannot be efficiently induced, especially for large-scale KBs, due to the hardness of logic induction. In this article, we present estimation-based optimizations for the semantic compression of RDF KBs from the perspectives of input and intermediate data involved in the induction of first-order logic rules. The negative sampling technique selects a representative subset of all negative tuples with respect to the closed-world assumption, reducing the cost of evaluating the quality of a logic rule used for knowledge inference. The number of logic inference operations used during a compression procedure is reduced by a statistical estimation technique that prunes logic rules of low quality. The evaluation results show that the two techniques are feasible for the purpose of semantic compression and accelerate the compression algorithm by up to 47x compared to the state-of-the-art system.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306457324001584/pdfft?md5=1434ced08cb844b2e1fe9c678d211fae&pid=1-s2.0-S0306457324001584-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Estimation-based optimizations for the semantic compression of RDF knowledge bases\",\"authors\":\"Ruoyu Wang , Raymond Wong , Daniel Sun\",\"doi\":\"10.1016/j.ipm.2024.103799\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Structured knowledge bases are critical for the interpretability of AI techniques. RDF KBs, which are the dominant representation of structured knowledge, are expanding extremely fast to increase their knowledge coverage, enhancing the capability of knowledge reasoning while bringing heavy burdens to downstream applications. Recent studies employ semantic compression to detect and remove knowledge redundancies via semantic models and use the induced model for further applications, such as knowledge completion and error detection. However, semantic models that are sufficiently expressive for semantic compression cannot be efficiently induced, especially for large-scale KBs, due to the hardness of logic induction. In this article, we present estimation-based optimizations for the semantic compression of RDF KBs from the perspectives of input and intermediate data involved in the induction of first-order logic rules. The negative sampling technique selects a representative subset of all negative tuples with respect to the closed-world assumption, reducing the cost of evaluating the quality of a logic rule used for knowledge inference. The number of logic inference operations used during a compression procedure is reduced by a statistical estimation technique that prunes logic rules of low quality. The evaluation results show that the two techniques are feasible for the purpose of semantic compression and accelerate the compression algorithm by up to 47x compared to the state-of-the-art system.</p></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0306457324001584/pdfft?md5=1434ced08cb844b2e1fe9c678d211fae&pid=1-s2.0-S0306457324001584-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457324001584\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457324001584","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

结构化知识库对于人工智能技术的可解释性至关重要。RDF 知识库是结构化知识的主流表示形式，它正以极快的速度扩展以增加知识覆盖面，在增强知识推理能力的同时，也给下游应用带来了沉重负担。最近的研究采用了语义压缩的方法，通过语义模型检测和去除知识冗余，并将诱导出的模型用于进一步的应用，如知识补全和错误检测。然而，由于逻辑归纳的困难性，无法有效地诱导出具有足够表达力的语义模型来进行语义压缩，尤其是对于大规模知识库而言。在本文中，我们从一阶逻辑规则归纳所涉及的输入数据和中间数据的角度，提出了基于估计的 RDF 知识库语义压缩优化方案。负抽样技术根据封闭世界假设从所有负元组中选择一个有代表性的子集，从而降低了用于知识推理的逻辑规则的质量评估成本。在压缩过程中，使用统计估算技术对低质量的逻辑规则进行修剪，从而减少了逻辑推理操作的次数。评估结果表明，这两种技术在语义压缩方面是可行的，与最先进的系统相比，压缩算法的速度提高了 47 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Estimation-based optimizations for the semantic compression of RDF knowledge bases

Structured knowledge bases are critical for the interpretability of AI techniques. RDF KBs, which are the dominant representation of structured knowledge, are expanding extremely fast to increase their knowledge coverage, enhancing the capability of knowledge reasoning while bringing heavy burdens to downstream applications. Recent studies employ semantic compression to detect and remove knowledge redundancies via semantic models and use the induced model for further applications, such as knowledge completion and error detection. However, semantic models that are sufficiently expressive for semantic compression cannot be efficiently induced, especially for large-scale KBs, due to the hardness of logic induction. In this article, we present estimation-based optimizations for the semantic compression of RDF KBs from the perspectives of input and intermediate data involved in the induction of first-order logic rules. The negative sampling technique selects a representative subset of all negative tuples with respect to the closed-world assumption, reducing the cost of evaluating the quality of a logic rule used for knowledge inference. The number of logic inference operations used during a compression procedure is reduced by a statistical estimation technique that prunes logic rules of low quality. The evaluation results show that the two techniques are feasible for the purpose of semantic compression and accelerate the compression algorithm by up to 47x compared to the state-of-the-art system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.