Quality assessment of cyber threat intelligence knowledge graph based on adaptive joining of embedding model

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2024-11-26 DOI:10.1007/s40747-024-01661-3

Bin Chen, Hongyi Li, Di Zhao, Yitang Yang, Chengwei Pan

{"title":"Quality assessment of cyber threat intelligence knowledge graph based on adaptive joining of embedding model","authors":"Bin Chen, Hongyi Li, Di Zhao, Yitang Yang, Chengwei Pan","doi":"10.1007/s40747-024-01661-3","DOIUrl":null,"url":null,"abstract":"<p>In the research of cyber threat intelligence knowledge graphs, the current challenge is that there are errors, inconsistencies, or missing knowledge graph triples, which makes it difficult to cope with the complexity and diversified application requirements. Currently, the predominant approach in quality assessment research for knowledge graphs involves employing word embeddings. This method evaluates the rationality of triples to assess the quality of knowledge graphs. Recent studies have found that better word representations can be obtained by splicing different types of embeddings, and applied to tasks such as named entity recognition (NER). However, amidst the proliferation of embedding typologies, the conundrum of selecting optimal embeddings for constructing connection representations has emerged as a pressing issue. In this paper, we propose an adaptive joining of embedding (AJE) model to automatically find better word embedding representations for knowledge graph quality assessment. The AJE model operates through a coordinated interplay between a task model and a selector. The former samples word embeddings generated by various models, while the latter generates rewards predicated on feedback obtained from current task outcomes to decide whether or not to splice the embedding. Experiments were conducted on two generic datasets and one cybersecurity dataset for knowledge graph quality assessment. The results show that our model outperforms the baseline model and achieves significant advantages in key metrics such as accuracy and F1 value, obtaining accuracy of 95.8%, 95.6% and 91.3% on the generic datasets WN11, FB13 and cybersecurity dataset CS13K, respectively, representing increases of 1.0%, 0.2% and 0.5% over the AttTucker model.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"257 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01661-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In the research of cyber threat intelligence knowledge graphs, the current challenge is that there are errors, inconsistencies, or missing knowledge graph triples, which makes it difficult to cope with the complexity and diversified application requirements. Currently, the predominant approach in quality assessment research for knowledge graphs involves employing word embeddings. This method evaluates the rationality of triples to assess the quality of knowledge graphs. Recent studies have found that better word representations can be obtained by splicing different types of embeddings, and applied to tasks such as named entity recognition (NER). However, amidst the proliferation of embedding typologies, the conundrum of selecting optimal embeddings for constructing connection representations has emerged as a pressing issue. In this paper, we propose an adaptive joining of embedding (AJE) model to automatically find better word embedding representations for knowledge graph quality assessment. The AJE model operates through a coordinated interplay between a task model and a selector. The former samples word embeddings generated by various models, while the latter generates rewards predicated on feedback obtained from current task outcomes to decide whether or not to splice the embedding. Experiments were conducted on two generic datasets and one cybersecurity dataset for knowledge graph quality assessment. The results show that our model outperforms the baseline model and achieves significant advantages in key metrics such as accuracy and F1 value, obtaining accuracy of 95.8%, 95.6% and 91.3% on the generic datasets WN11, FB13 and cybersecurity dataset CS13K, respectively, representing increases of 1.0%, 0.2% and 0.5% over the AttTucker model.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于嵌入模型自适应加入的网络威胁情报知识图谱质量评估

在网络威胁情报知识图谱的研究中，目前面临的挑战是知识图谱三元组存在错误、不一致或缺失，难以应对复杂多样的应用需求。目前，知识图谱质量评估研究的主要方法是采用词嵌入。这种方法通过评估三元组的合理性来评估知识图谱的质量。最近的研究发现，通过拼接不同类型的嵌入可以获得更好的词表示，并将其应用于命名实体识别（NER）等任务中。然而，随着嵌入类型的激增，如何选择最佳嵌入来构建连接表征已成为一个亟待解决的难题。在本文中，我们提出了一种自适应连接嵌入（AJE）模型，可自动为知识图谱质量评估找到更好的词嵌入表示。AJE 模型通过任务模型和选择器之间的协调互动来运行。前者对各种模型生成的单词嵌入进行采样，后者则根据从当前任务结果中获得的反馈生成奖励，以决定是否拼接嵌入。我们在两个通用数据集和一个网络安全数据集上进行了知识图谱质量评估实验。结果表明，我们的模型优于基线模型，并在准确率和 F1 值等关键指标上取得了显著优势，在通用数据集 WN11、FB13 和网络安全数据集 CS13K 上的准确率分别为 95.8%、95.6% 和 91.3%，比 AttTucker 模型分别提高了 1.0%、0.2% 和 0.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.