Alastair Nottingham, Molly Buchanan, Mark Gardner, Jason Hiser, J. Davidson
{"title":"Sentinel: A Multi-institution Enterprise Scale Platform for Data-driven Cybersecurity Research","authors":"Alastair Nottingham, Molly Buchanan, Mark Gardner, Jason Hiser, J. Davidson","doi":"10.1109/ISSREW55968.2022.00075","DOIUrl":null,"url":null,"abstract":"Current cybersecurity research is constrained by the general scarcity of large, realistic, labeled network traffic datasets. To address said scarcity, this paper introduces Sentinel: a multi-enterprise scientific instrument developed to support data-driven cybersecurity research. Sentinel provides researchers access to virtual computing infrastructure and petabytes of data collected over several years from network sensors at two large, disjoint educational institutions - the University of Virginia and Virginia Tech. The network dataset is supplemented by multi-modal malware activity logs generated by attack recreation exercises which realistically integrate ground truth into collected edge sensor data. To mitigate risks associated with providing access to enterprise network sensor logs, Sentinel uses a combination of a code-to-data policy, data usage agreements, and pattern-preserving anonymization. Sentinel has been used as part of a government-funded effort to investigate new machine learning algorithms, cybersecurity forensics, and data retention techniques.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW55968.2022.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Current cybersecurity research is constrained by the general scarcity of large, realistic, labeled network traffic datasets. To address said scarcity, this paper introduces Sentinel: a multi-enterprise scientific instrument developed to support data-driven cybersecurity research. Sentinel provides researchers access to virtual computing infrastructure and petabytes of data collected over several years from network sensors at two large, disjoint educational institutions - the University of Virginia and Virginia Tech. The network dataset is supplemented by multi-modal malware activity logs generated by attack recreation exercises which realistically integrate ground truth into collected edge sensor data. To mitigate risks associated with providing access to enterprise network sensor logs, Sentinel uses a combination of a code-to-data policy, data usage agreements, and pattern-preserving anonymization. Sentinel has been used as part of a government-funded effort to investigate new machine learning algorithms, cybersecurity forensics, and data retention techniques.