A practical approach for applying Machine Learning in the detection and classification of network devices used in building management

Applied AI letters Pub Date : 2020-12-02 DOI:10.22541/au.160689781.19054555/v1

Maroun Touma, Shalisha Witherspoon, S. Witherspoon, Isabelle Crawford-Eng

{"title":"A practical approach for applying Machine Learning in the detection and classification of network devices used in building management","authors":"Maroun Touma, Shalisha Witherspoon, S. Witherspoon, Isabelle Crawford-Eng","doi":"10.22541/au.160689781.19054555/v1","DOIUrl":null,"url":null,"abstract":"With the increasing deployment of smart buildings and infrastructure,\nSupervisory Control and Data Acquisition (SCADA) devices and the\nunderlying IT network have become essential elements for the proper\noperations of these highly complex systems. Of course, with the increase\nin automation and the proliferation of SCADA devices, a corresponding\nincrease in surface area of attack on critical infrastructure has\nincreased. Understanding device behaviors in terms of known and\nunderstood or potentially qualified activities versus unknown and\npotentially nefarious activities in near-real time is a key component of\nany security solution. In this paper, we investigate the challenges with\nbuilding robust machine learning models to identify unknowns purely from\nnetwork traffic both inside and outside firewalls, starting with missing\nor inconsistent labels across sites, feature engineering and learning,\ntemporal dependencies and analysis, and training data quality (including\nsmall sample sizes) for both shallow and deep learning methods. To\ndemonstrate these challenges and the capabilities we have developed, we\nfocus on Building Automation and Control networks (BACnet) from a\nprivate commercial building system. Our results show that ”Model Zoo”\nbuilt from binary classifiers based on each device or behavior combined\nwith an ensemble classifier integrating information from all classifiers\nprovides a reliable methodology to identify unknown devices as well as\ndetermining specific known devices when the device type is in the\ntraining set. The capability of the Model Zoo framework is shown to be\ndirectly linked to feature engineering and learning, and the dependency\nof the feature selection varies depending on both the binary and\nensemble classifiers as well.","PeriodicalId":72253,"journal":{"name":"Applied AI letters","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied AI letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22541/au.160689781.19054555/v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the increasing deployment of smart buildings and infrastructure, Supervisory Control and Data Acquisition (SCADA) devices and the underlying IT network have become essential elements for the proper operations of these highly complex systems. Of course, with the increase in automation and the proliferation of SCADA devices, a corresponding increase in surface area of attack on critical infrastructure has increased. Understanding device behaviors in terms of known and understood or potentially qualified activities versus unknown and potentially nefarious activities in near-real time is a key component of any security solution. In this paper, we investigate the challenges with building robust machine learning models to identify unknowns purely from network traffic both inside and outside firewalls, starting with missing or inconsistent labels across sites, feature engineering and learning, temporal dependencies and analysis, and training data quality (including small sample sizes) for both shallow and deep learning methods. To demonstrate these challenges and the capabilities we have developed, we focus on Building Automation and Control networks (BACnet) from a private commercial building system. Our results show that ”Model Zoo” built from binary classifiers based on each device or behavior combined with an ensemble classifier integrating information from all classifiers provides a reliable methodology to identify unknown devices as well as determining specific known devices when the device type is in the training set. The capability of the Model Zoo framework is shown to be directly linked to feature engineering and learning, and the dependency of the feature selection varies depending on both the binary and ensemble classifiers as well.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一个实用的方法，应用机器学习在检测和分类的网络设备用于楼宇管理

随着智能建筑和基础设施部署的增加，监控和数据采集(SCADA)设备和底层IT网络已成为这些高度复杂系统正常运行的基本要素。当然，随着自动化程度的提高和SCADA设备的普及，对关键基础设施的攻击面积也相应增加。在接近实时的情况下，根据已知和理解的或潜在的合格活动来理解设备行为，而不是未知和潜在的恶意活动，是任何安全解决方案的关键组成部分。在本文中，我们研究了构建强大的机器学习模型以从防火墙内外的网络流量中识别未知因素的挑战，从跨站点的缺失或不一致的标签，特征工程和学习，时间依赖性和分析，以及浅层和深度学习方法的训练数据质量(包括小样本量)开始。为了展示这些挑战和我们已经开发的能力，我们专注于私人商业建筑系统的楼宇自动化和控制网络(BACnet)。我们的研究结果表明，“模型动物园”由基于每个设备或行为的二元分类器构建，结合集成所有分类器信息的集成分类器，提供了一种可靠的方法来识别未知设备，以及当设备类型在训练集中时确定特定的已知设备。模型动物园框架的能力被证明与特征工程和学习直接相关，并且特征选择的依赖性也取决于二元分类器和集成分类器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Applied AI letters

自引率

0.00%

发文量

期刊最新文献

Issue Information Fine-Tuned Pretrained Transformer for Amharic News Headline Generation TL-GNN: Android Malware Detection Using Transfer Learning Issue Information Building Text and Speech Benchmark Datasets and Models for Low-Resourced East African Languages: Experiences and Lessons