{"title":"DNRTI: A Large-scale Dataset for Named Entity Recognition in Threat Intelligence","authors":"Xuren Wang, Xinpei Liu, Shengqin Ao, Ning Li, Zhengwei Jiang, Zongyi Xu, Zihan Xiong, Mengbo Xiong, Xiaoqing Zhang","doi":"10.1109/TrustCom50675.2020.00252","DOIUrl":null,"url":null,"abstract":"Named entity recognition is an important and challenging problem in Natural language processing. Although the past decade has witnessed major advances in entity recognition in many fields, such successes have been slow to network security field, not only because of the data in the network security field is very professional, but also due to the sensitive information in the data. To advance named entity recognition research in network security field, we introduce a large-scale Dataset for Named Entity Recognition in Threat Intelligence (DNRTI). To this end, we collect more than 300 pieces of threat intelligence. The data in DNRTI is all annotated by experts in threat intelligence interpretation using 13 object categories. The fully annotated DNRTI contains 175220 words. To build a baseline for named entity recognition in the threat intelligence field, we evaluate some deep learning model on DNRTI. Experiments demonstrate that DNRTI well represents the key information in threat intelligence and are quite challenging.","PeriodicalId":221956,"journal":{"name":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TrustCom50675.2020.00252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Named entity recognition is an important and challenging problem in Natural language processing. Although the past decade has witnessed major advances in entity recognition in many fields, such successes have been slow to network security field, not only because of the data in the network security field is very professional, but also due to the sensitive information in the data. To advance named entity recognition research in network security field, we introduce a large-scale Dataset for Named Entity Recognition in Threat Intelligence (DNRTI). To this end, we collect more than 300 pieces of threat intelligence. The data in DNRTI is all annotated by experts in threat intelligence interpretation using 13 object categories. The fully annotated DNRTI contains 175220 words. To build a baseline for named entity recognition in the threat intelligence field, we evaluate some deep learning model on DNRTI. Experiments demonstrate that DNRTI well represents the key information in threat intelligence and are quite challenging.