Bin Chen, Hongyi Li, Di Zhao, Yitang Yang, Chengwei Pan
{"title":"Quality assessment of cyber threat intelligence knowledge graph based on adaptive joining of embedding model","authors":"Bin Chen, Hongyi Li, Di Zhao, Yitang Yang, Chengwei Pan","doi":"10.1007/s40747-024-01661-3","DOIUrl":null,"url":null,"abstract":"<p>In the research of cyber threat intelligence knowledge graphs, the current challenge is that there are errors, inconsistencies, or missing knowledge graph triples, which makes it difficult to cope with the complexity and diversified application requirements. Currently, the predominant approach in quality assessment research for knowledge graphs involves employing word embeddings. This method evaluates the rationality of triples to assess the quality of knowledge graphs. Recent studies have found that better word representations can be obtained by splicing different types of embeddings, and applied to tasks such as named entity recognition (NER). However, amidst the proliferation of embedding typologies, the conundrum of selecting optimal embeddings for constructing connection representations has emerged as a pressing issue. In this paper, we propose an adaptive joining of embedding (AJE) model to automatically find better word embedding representations for knowledge graph quality assessment. The AJE model operates through a coordinated interplay between a task model and a selector. The former samples word embeddings generated by various models, while the latter generates rewards predicated on feedback obtained from current task outcomes to decide whether or not to splice the embedding. Experiments were conducted on two generic datasets and one cybersecurity dataset for knowledge graph quality assessment. The results show that our model outperforms the baseline model and achieves significant advantages in key metrics such as accuracy and F1 value, obtaining accuracy of 95.8%, 95.6% and 91.3% on the generic datasets WN11, FB13 and cybersecurity dataset CS13K, respectively, representing increases of 1.0%, 0.2% and 0.5% over the AttTucker model.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"257 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-024-01661-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In the research of cyber threat intelligence knowledge graphs, the current challenge is that there are errors, inconsistencies, or missing knowledge graph triples, which makes it difficult to cope with the complexity and diversified application requirements. Currently, the predominant approach in quality assessment research for knowledge graphs involves employing word embeddings. This method evaluates the rationality of triples to assess the quality of knowledge graphs. Recent studies have found that better word representations can be obtained by splicing different types of embeddings, and applied to tasks such as named entity recognition (NER). However, amidst the proliferation of embedding typologies, the conundrum of selecting optimal embeddings for constructing connection representations has emerged as a pressing issue. In this paper, we propose an adaptive joining of embedding (AJE) model to automatically find better word embedding representations for knowledge graph quality assessment. The AJE model operates through a coordinated interplay between a task model and a selector. The former samples word embeddings generated by various models, while the latter generates rewards predicated on feedback obtained from current task outcomes to decide whether or not to splice the embedding. Experiments were conducted on two generic datasets and one cybersecurity dataset for knowledge graph quality assessment. The results show that our model outperforms the baseline model and achieves significant advantages in key metrics such as accuracy and F1 value, obtaining accuracy of 95.8%, 95.6% and 91.3% on the generic datasets WN11, FB13 and cybersecurity dataset CS13K, respectively, representing increases of 1.0%, 0.2% and 0.5% over the AttTucker model.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.