Nitin Sukhija, Elizabeth Bautista, Owen James, Daniel Gens, Siqi Deng, Yu Lam, Tony Quan, Basil Lalli
The challenge of monitoring and event response management of a high performance computing facility grows significantly as the facilities employs and orchestrates more complex and heterogeneous systems and infrastructure. As the computational components encompassing the HPC facility system increases, the computational staff experiences rise in alert fatigue due to the false alarms and noise related to the similar events generated by monitoring tools. The National Energy Research Scientific Computing Center (NERSC) at the Lawrence Berkeley National Laboratory (LBNL) has begun to address the issues of duplication of alerts and alert remediation. However, more automation and integration is needed for collecting, aggregating, correlating, analyzing, managing and visualizing the scale of events that will be generated by the emergent hybrid computing infrastructures. In this paper, we present an event management and monitoring framework that addresses the operational needs of the future pre-exascale systems at the Lawrence Berkeley National Laboratory's National Energy Research Scientific Computing Center (NERSC). The framework integrates the Operations Monitoring and Notification Infrastructure (OMNI) at NERSC with the Prometheus, Grafana and ServiceNow platforms to help identify, diagnose, and resolve incidents in real-time, as well as conduct more thorough post-incident reviews enabled by the intuitive dashboards that provides a single pane of glass console for an efficient operations management and real-time proactive monitoring.
{"title":"Event Management and Monitoring Framework for HPC Environments using ServiceNow and Prometheus","authors":"Nitin Sukhija, Elizabeth Bautista, Owen James, Daniel Gens, Siqi Deng, Yu Lam, Tony Quan, Basil Lalli","doi":"10.1145/3415958.3433046","DOIUrl":"https://doi.org/10.1145/3415958.3433046","url":null,"abstract":"The challenge of monitoring and event response management of a high performance computing facility grows significantly as the facilities employs and orchestrates more complex and heterogeneous systems and infrastructure. As the computational components encompassing the HPC facility system increases, the computational staff experiences rise in alert fatigue due to the false alarms and noise related to the similar events generated by monitoring tools. The National Energy Research Scientific Computing Center (NERSC) at the Lawrence Berkeley National Laboratory (LBNL) has begun to address the issues of duplication of alerts and alert remediation. However, more automation and integration is needed for collecting, aggregating, correlating, analyzing, managing and visualizing the scale of events that will be generated by the emergent hybrid computing infrastructures. In this paper, we present an event management and monitoring framework that addresses the operational needs of the future pre-exascale systems at the Lawrence Berkeley National Laboratory's National Energy Research Scientific Computing Center (NERSC). The framework integrates the Operations Monitoring and Notification Infrastructure (OMNI) at NERSC with the Prometheus, Grafana and ServiceNow platforms to help identify, diagnose, and resolve incidents in real-time, as well as conduct more thorough post-incident reviews enabled by the intuitive dashboards that provides a single pane of glass console for an efficient operations management and real-time proactive monitoring.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128868068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Banihashemi, Z. Maamar, Noura Faci, E. Ugljanin
In the Internet of Things (IoT), mutation aims to enhance functionality and survivability of a thing in the context of dynamic environments by allowing the thing to satisfy a need or seize a potential opportunity. In this paper, we propose a framework based on thing mutation as a countermeasure to certain security threats that could impact things' operations (e.g., sensing and communicating). We then provide a case study which focuses on battery draining attack and propose a mutation-based strategy as a countermeasure to this attack.
{"title":"Thing Mutation as a Countermeasure to Safeguard IoT","authors":"B. Banihashemi, Z. Maamar, Noura Faci, E. Ugljanin","doi":"10.1145/3415958.3433033","DOIUrl":"https://doi.org/10.1145/3415958.3433033","url":null,"abstract":"In the Internet of Things (IoT), mutation aims to enhance functionality and survivability of a thing in the context of dynamic environments by allowing the thing to satisfy a need or seize a potential opportunity. In this paper, we propose a framework based on thing mutation as a countermeasure to certain security threats that could impact things' operations (e.g., sensing and communicating). We then provide a case study which focuses on battery draining attack and propose a mutation-based strategy as a countermeasure to this attack.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128003569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.
{"title":"Machine Learning Pipeline for Reusing Pretrained Models","authors":"M. Alshehhi, Di Wang","doi":"10.1145/3415958.3433054","DOIUrl":"https://doi.org/10.1145/3415958.3433054","url":null,"abstract":"Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134164110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Popular microblogging platforms (such as Twitter) offer a fertile ground for open communication among humans, however, they also attract many bots and automated accounts "disguised" as human users. Typically, such accounts favor malicious activities such as phishing, public opinion manipulation and hate speech spreading, to name a few. Although several AI driven bot detection methods have been implemented, the justification of bot classification and characterization remains quite opaque and AI decisions lack in ethical responsibility. Most of these approaches operate with AI black-boxed algorithms and their efficiency is often questionable. In this work we propose Bot-Detective, a web service that takes into account both the efficient detection of bot users and the interpretability of the results as well. Our main contributions are summarized as follows: i) we propose a novel explainable bot-detection approach, which, to the best of authors' knowledge, is the first one to offer interpretable, responsible, and AI driven bot identification in Twitter, ii) we deploy a publicly available bot detection Web service which integrates an explainable ML framework along with users feedback functionality under an effective crowdsourcing mechanism; iii) we build the proposed service under a newly created annotated dataset by exploiting Twitter's rules and existing tools. This dataset is publicly shared for further use. In situ experimentation has showcased that Bot-Detective produces comprehensive and accurate results, with a promising service take up at scale.
{"title":"Bot-Detective: An explainable Twitter bot detection service with crowdsourcing functionalities","authors":"Maria Kouvela, Ilias Dimitriadis, A. Vakali","doi":"10.1145/3415958.3433075","DOIUrl":"https://doi.org/10.1145/3415958.3433075","url":null,"abstract":"Popular microblogging platforms (such as Twitter) offer a fertile ground for open communication among humans, however, they also attract many bots and automated accounts \"disguised\" as human users. Typically, such accounts favor malicious activities such as phishing, public opinion manipulation and hate speech spreading, to name a few. Although several AI driven bot detection methods have been implemented, the justification of bot classification and characterization remains quite opaque and AI decisions lack in ethical responsibility. Most of these approaches operate with AI black-boxed algorithms and their efficiency is often questionable. In this work we propose Bot-Detective, a web service that takes into account both the efficient detection of bot users and the interpretability of the results as well. Our main contributions are summarized as follows: i) we propose a novel explainable bot-detection approach, which, to the best of authors' knowledge, is the first one to offer interpretable, responsible, and AI driven bot identification in Twitter, ii) we deploy a publicly available bot detection Web service which integrates an explainable ML framework along with users feedback functionality under an effective crowdsourcing mechanism; iii) we build the proposed service under a newly created annotated dataset by exploiting Twitter's rules and existing tools. This dataset is publicly shared for further use. In situ experimentation has showcased that Bot-Detective produces comprehensive and accurate results, with a promising service take up at scale.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115640030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}