Douglas Woodward, M. Hobbs, James Andrew Gilbertson, N. Cohen
{"title":"Uncertainty Quantification for Trusted Machine Learning in Space System Cyber Security","authors":"Douglas Woodward, M. Hobbs, James Andrew Gilbertson, N. Cohen","doi":"10.1109/SMC-IT51442.2021.00012","DOIUrl":null,"url":null,"abstract":"In recent years, the Aerospace Corporation has been developing machine learning systems to detect cyber anomalies in space system command and telemetry streams. However, to enable the use of deep learning in such high consequence environments, the models must be trustworthy. One aspect of trust is a model’s ability to accurately quantify the uncertainty of its predictions. Although many deep learning models output what seem to be confidence scores, current academic research has repeatedly shown that models often return high confidence even when very wrong and are unable to diagnose and respond appropriately to out-of-distribution inputs. This can result in catastrophic overconfidence when models are faced with adversarial inputs or concept drift. Even on routine inputs, without reliable uncertainty quantification, it is hard for human-machine teaming to take place as humans cannot trust the model’s reported confidence score. In short, all models are wrong sometimes, but models which know when they are wrong are considerably more useful. To this end, The Aerospace Corporation conducted a literature review and implemented current state of the art methods, including deep ensembles and temperature scaling for confidence calibration, to accurately quantify the uncertainty of deep learning model predictions. We further incorporated and tested these techniques within the existing cyber defense model framework for more trustworthy cyber anomaly detection models. We show that not only are these techniques successful, they are also easy to implement, extensible to many applications and machine learning model variants, and provide interpretable results for a wide audience. From this, Aerospace recommends further adoption of such techniques in high consequence environments.","PeriodicalId":292159,"journal":{"name":"2021 IEEE 8th International Conference on Space Mission Challenges for Information Technology (SMC-IT)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 8th International Conference on Space Mission Challenges for Information Technology (SMC-IT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMC-IT51442.2021.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In recent years, the Aerospace Corporation has been developing machine learning systems to detect cyber anomalies in space system command and telemetry streams. However, to enable the use of deep learning in such high consequence environments, the models must be trustworthy. One aspect of trust is a model’s ability to accurately quantify the uncertainty of its predictions. Although many deep learning models output what seem to be confidence scores, current academic research has repeatedly shown that models often return high confidence even when very wrong and are unable to diagnose and respond appropriately to out-of-distribution inputs. This can result in catastrophic overconfidence when models are faced with adversarial inputs or concept drift. Even on routine inputs, without reliable uncertainty quantification, it is hard for human-machine teaming to take place as humans cannot trust the model’s reported confidence score. In short, all models are wrong sometimes, but models which know when they are wrong are considerably more useful. To this end, The Aerospace Corporation conducted a literature review and implemented current state of the art methods, including deep ensembles and temperature scaling for confidence calibration, to accurately quantify the uncertainty of deep learning model predictions. We further incorporated and tested these techniques within the existing cyber defense model framework for more trustworthy cyber anomaly detection models. We show that not only are these techniques successful, they are also easy to implement, extensible to many applications and machine learning model variants, and provide interpretable results for a wide audience. From this, Aerospace recommends further adoption of such techniques in high consequence environments.