DNRTI: A Large-scale Dataset for Named Entity Recognition in Threat Intelligence

Xuren Wang, Xinpei Liu, Shengqin Ao, Ning Li, Zhengwei Jiang, Zongyi Xu, Zihan Xiong, Mengbo Xiong, Xiaoqing Zhang
{"title":"DNRTI: A Large-scale Dataset for Named Entity Recognition in Threat Intelligence","authors":"Xuren Wang, Xinpei Liu, Shengqin Ao, Ning Li, Zhengwei Jiang, Zongyi Xu, Zihan Xiong, Mengbo Xiong, Xiaoqing Zhang","doi":"10.1109/TrustCom50675.2020.00252","DOIUrl":null,"url":null,"abstract":"Named entity recognition is an important and challenging problem in Natural language processing. Although the past decade has witnessed major advances in entity recognition in many fields, such successes have been slow to network security field, not only because of the data in the network security field is very professional, but also due to the sensitive information in the data. To advance named entity recognition research in network security field, we introduce a large-scale Dataset for Named Entity Recognition in Threat Intelligence (DNRTI). To this end, we collect more than 300 pieces of threat intelligence. The data in DNRTI is all annotated by experts in threat intelligence interpretation using 13 object categories. The fully annotated DNRTI contains 175220 words. To build a baseline for named entity recognition in the threat intelligence field, we evaluate some deep learning model on DNRTI. Experiments demonstrate that DNRTI well represents the key information in threat intelligence and are quite challenging.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom50675.2020.00252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Named entity recognition is an important and challenging problem in Natural language processing. Although the past decade has witnessed major advances in entity recognition in many fields, such successes have been slow to network security field, not only because of the data in the network security field is very professional, but also due to the sensitive information in the data. To advance named entity recognition research in network security field, we introduce a large-scale Dataset for Named Entity Recognition in Threat Intelligence (DNRTI). To this end, we collect more than 300 pieces of threat intelligence. The data in DNRTI is all annotated by experts in threat intelligence interpretation using 13 object categories. The fully annotated DNRTI contains 175220 words. To build a baseline for named entity recognition in the threat intelligence field, we evaluate some deep learning model on DNRTI. Experiments demonstrate that DNRTI well represents the key information in threat intelligence and are quite challenging.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
DNRTI:威胁情报中命名实体识别的大规模数据集
命名实体识别是自然语言处理中的一个重要而富有挑战性的问题。尽管在过去的十年中,实体识别在许多领域都取得了重大进展,但这种成功在网络安全领域却进展缓慢,这不仅是因为网络安全领域的数据非常专业,而且还因为数据中的敏感信息。为了推进网络安全领域的命名实体识别研究,我们引入了一个大规模的威胁情报命名实体识别数据集(DNRTI)。为此,我们收集了300多条威胁情报。DNRTI中的数据全部由威胁情报解释专家使用13个对象类别进行注释。完整注释的DNRTI包含175220个单词。为了在威胁情报领域建立命名实体识别的基线,我们在DNRTI上评估了一些深度学习模型。实验表明,DNRTI很好地代表了威胁情报中的关键信息,具有一定的挑战性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Research on Stitching and Alignment of Mouse Carcass EM Images One Covert Channel to Rule Them All: A Practical Approach to Data Exfiltration in the Cloud MAUSPAD: Mouse-based Authentication Using Segmentation-based, Progress-Adjusted DTW Finding Geometric Medians with Location Privacy Multi-Input Functional Encryption: Efficient Applications from Symmetric Primitives
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1