CDTier：威胁情报实体关系中文数据集

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Sustainable Computing Pub Date : 2023-01-30 DOI:10.1109/TSUSC.2023.3240411

Yinghai Zhou;Yitong Ren;Ming Yi;Yanjun Xiao;Zhiyuan Tan;Nour Moustafa;Zhihong Tian

{"title":"CDTier：威胁情报实体关系中文数据集","authors":"Yinghai Zhou;Yitong Ren;Ming Yi;Yanjun Xiao;Zhiyuan Tan;Nour Moustafa;Zhihong Tian","doi":"10.1109/TSUSC.2023.3240411","DOIUrl":null,"url":null,"abstract":"Cyber Threat Intelligence (CTI), which is knowledge of cyberspace threats gathered from security data, is critical in defending against cyberattacks.However, there is no open-source CTI dataset for security researchers to effectively apply enormous CTI information for security analysis in the field of threat intelligence, particularly in the field of Chinese threat intelligence. As a result, for network security research and development, this article constructed a Chinese CTI entity relationship dataset–CDTier, which includes: 1) A threat entity extraction dataset composed of 100 CTI reports, 3744 threat sentences and 4259 threat knowledge objects; 2) A dataset for entity relation extraction including 100 CTI reports, 2598 threat sentences and 2562 knowledge object relations. CDTier is, as far as we know, the first CTI dataset. On the CDTier, we trained 4 models for threat entity extraction and relation extraction using well-established and widely used deep learning methods in the NLP. The results showed that the model trained on CDTier extracts knowledge objects and their relationships described in threat intelligence more accurately. This significantly minimizes threat intelligence analysts’ work while assessing threat intelligence.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"8 4","pages":"627-638"},"PeriodicalIF":3.0000,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"CDTier: A Chinese Dataset of Threat Intelligence Entity Relationships\",\"authors\":\"Yinghai Zhou;Yitong Ren;Ming Yi;Yanjun Xiao;Zhiyuan Tan;Nour Moustafa;Zhihong Tian\",\"doi\":\"10.1109/TSUSC.2023.3240411\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cyber Threat Intelligence (CTI), which is knowledge of cyberspace threats gathered from security data, is critical in defending against cyberattacks.However, there is no open-source CTI dataset for security researchers to effectively apply enormous CTI information for security analysis in the field of threat intelligence, particularly in the field of Chinese threat intelligence. As a result, for network security research and development, this article constructed a Chinese CTI entity relationship dataset–CDTier, which includes: 1) A threat entity extraction dataset composed of 100 CTI reports, 3744 threat sentences and 4259 threat knowledge objects; 2) A dataset for entity relation extraction including 100 CTI reports, 2598 threat sentences and 2562 knowledge object relations. CDTier is, as far as we know, the first CTI dataset. On the CDTier, we trained 4 models for threat entity extraction and relation extraction using well-established and widely used deep learning methods in the NLP. The results showed that the model trained on CDTier extracts knowledge objects and their relationships described in threat intelligence more accurately. This significantly minimizes threat intelligence analysts’ work while assessing threat intelligence.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"8 4\",\"pages\":\"627-638\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10029930/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10029930/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 2

摘要

网络威胁情报（CTI）是从安全数据中收集到的网络空间威胁知识，是防御网络攻击的关键。然而，在威胁情报领域，尤其是在中国威胁情报领域，目前还没有一个开源的CTI数据集供安全研究人员有效地将海量的CTI信息用于安全分析。因此，为了网络安全研究与开发，本文构建了一个中文 CTI 实体关系数据集--CDTier，其中包括：1）由 100 份 CTI 报告、3744 个威胁句子和 4259 个威胁知识对象组成的威胁实体抽取数据集；2）由 100 份 CTI 报告、2598 个威胁句子和 2562 个知识对象关系组成的实体关系抽取数据集。据我们所知，CDTier 是第一个 CTI 数据集。在 CDTier 上，我们使用 NLP 中成熟且广泛使用的深度学习方法训练了 4 个威胁实体提取和关系提取模型。结果表明，在 CDTier 上训练的模型能更准确地提取威胁情报中描述的知识对象及其关系。这大大减少了威胁情报分析师在评估威胁情报时的工作量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CDTier: A Chinese Dataset of Threat Intelligence Entity Relationships

Cyber Threat Intelligence (CTI), which is knowledge of cyberspace threats gathered from security data, is critical in defending against cyberattacks.However, there is no open-source CTI dataset for security researchers to effectively apply enormous CTI information for security analysis in the field of threat intelligence, particularly in the field of Chinese threat intelligence. As a result, for network security research and development, this article constructed a Chinese CTI entity relationship dataset–CDTier, which includes: 1) A threat entity extraction dataset composed of 100 CTI reports, 3744 threat sentences and 4259 threat knowledge objects; 2) A dataset for entity relation extraction including 100 CTI reports, 2598 threat sentences and 2562 knowledge object relations. CDTier is, as far as we know, the first CTI dataset. On the CDTier, we trained 4 models for threat entity extraction and relation extraction using well-established and widely used deep learning methods in the NLP. The results showed that the model trained on CDTier extracts knowledge objects and their relationships described in threat intelligence more accurately. This significantly minimizes threat intelligence analysts’ work while assessing threat intelligence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊