Nitin Ramrakhiyani, Sangameshwar Patil, Manideep Jella, Alok Kumar, G. Palshikar
{"title":"Extracting Entities and Events from Cyber-Physical Security Incident Reports","authors":"Nitin Ramrakhiyani, Sangameshwar Patil, Manideep Jella, Alok Kumar, G. Palshikar","doi":"10.1109/ICDMW58026.2022.00083","DOIUrl":null,"url":null,"abstract":"Cyber- physical systems are an important part of many industries such as the chemical process industry, manufac- turing industry, automobiles, and even sophisticated weaponry. Given the economic importance and influence of these systems, they have increasingly faced the cybersecurity attacks. In this paper, we provide a dataset of real-life security incident reports on cyber-physical systems annotated with entities and events that are important for analysing such security incidents. We analyze and identify the limitations of the 'Domain Objects' in Structured Threat Information Expression (STIX) standard as well as recent research literature for the entity type clas- sification schemes in cybersecurity domain. We propose an updated classification scheme for entity types in the cybersecurity domain. The enhanced coverage provided by the entity scheme is important for automated information extraction and natural language understanding of textual reports containing details of the cybersecurity incident reports. We use deep-learning based sequence labelling techniques and cybersecurity domain specific word embed dings to set up a benchmark for entity and event extraction for cyber- physical security incident report analysis. The annotated dataset of real-life industrial security incidents will be made available for research purpose.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW58026.2022.00083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cyber- physical systems are an important part of many industries such as the chemical process industry, manufac- turing industry, automobiles, and even sophisticated weaponry. Given the economic importance and influence of these systems, they have increasingly faced the cybersecurity attacks. In this paper, we provide a dataset of real-life security incident reports on cyber-physical systems annotated with entities and events that are important for analysing such security incidents. We analyze and identify the limitations of the 'Domain Objects' in Structured Threat Information Expression (STIX) standard as well as recent research literature for the entity type clas- sification schemes in cybersecurity domain. We propose an updated classification scheme for entity types in the cybersecurity domain. The enhanced coverage provided by the entity scheme is important for automated information extraction and natural language understanding of textual reports containing details of the cybersecurity incident reports. We use deep-learning based sequence labelling techniques and cybersecurity domain specific word embed dings to set up a benchmark for entity and event extraction for cyber- physical security incident report analysis. The annotated dataset of real-life industrial security incidents will be made available for research purpose.