Kevin Wallis, C. Reich, B. Varghese, C. Schindelhauer
Distributed machine learning algorithms that employ Deep Neural Networks (DNNs) are widely used in Industry 4.0 applications, such as smart manufacturing. The layers of a DNN can be mapped onto different nodes located in the cloud, edge and shop floor for preserving privacy. The quality of the data that is fed into and processed through the DNN is of utmost importance for critical tasks, such as inspection and quality control. Distributed Data Validation Networks (DDVNs) are used to validate the quality of the data. However, they are prone to single points of failure when an attack occurs. This paper proposes QUDOS, an approach that enhances the security of a distributed DNN that is supported by DDVNs using quorums. The proposed approach allows individual nodes that are corrupted due to an attack to be detected or excluded when the DNN produces an output. Metrics such as corruption factor and success probability of an attack are considered for evaluating the security aspects of DNNs. A simulation study demonstrates that if the number of corrupted nodes is less than a given threshold for decision-making in a quorum, the QUDOS approach always prevents attacks. Furthermore, the study shows that increasing the size of the quorum has a better impact on security than increasing the number of layers. One merit of QUDOS is that it enhances the security of DNNs without requiring any modifications to the algorithm and can therefore be applied to other classes of problems.
{"title":"QUDOS: quorum-based cloud-edge distributed DNNs for security enhanced industry 4.0","authors":"Kevin Wallis, C. Reich, B. Varghese, C. Schindelhauer","doi":"10.1145/3468737.3494094","DOIUrl":"https://doi.org/10.1145/3468737.3494094","url":null,"abstract":"Distributed machine learning algorithms that employ Deep Neural Networks (DNNs) are widely used in Industry 4.0 applications, such as smart manufacturing. The layers of a DNN can be mapped onto different nodes located in the cloud, edge and shop floor for preserving privacy. The quality of the data that is fed into and processed through the DNN is of utmost importance for critical tasks, such as inspection and quality control. Distributed Data Validation Networks (DDVNs) are used to validate the quality of the data. However, they are prone to single points of failure when an attack occurs. This paper proposes QUDOS, an approach that enhances the security of a distributed DNN that is supported by DDVNs using quorums. The proposed approach allows individual nodes that are corrupted due to an attack to be detected or excluded when the DNN produces an output. Metrics such as corruption factor and success probability of an attack are considered for evaluating the security aspects of DNNs. A simulation study demonstrates that if the number of corrupted nodes is less than a given threshold for decision-making in a quorum, the QUDOS approach always prevents attacks. Furthermore, the study shows that increasing the size of the quorum has a better impact on security than increasing the number of layers. One merit of QUDOS is that it enhances the security of DNNs without requiring any modifications to the algorithm and can therefore be applied to other classes of problems.","PeriodicalId":254382,"journal":{"name":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129911938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giacomo Lanciano, Filippo Galli, T. Cucinotta, D. Bacciu, A. Passarella
Cloud auto-scaling mechanisms are typically based on reactive automation rules that scale a cluster whenever some metric, e.g., the average CPU usage among instances, exceeds a predefined threshold. Tuning these rules becomes particularly cumbersome when scaling-up a cluster involves non-negligible times to bootstrap new instances, as it happens frequently in production cloud services. To deal with this problem, we propose an architecture for auto-scaling cloud services based on the status in which the system is expected to evolve in the near future. Our approach leverages on time-series forecasting techniques, like those based on machine learning and artificial neural networks, to predict the future dynamics of key metrics, e.g., resource consumption metrics, and apply a threshold-based scaling policy on them. The result is a predictive automation policy that is able, for instance, to automatically anticipate peaks in the load of a cloud application and trigger ahead of time appropriate scaling actions to accommodate the expected increase in traffic. We prototyped our approach as an open-source OpenStack component, which relies on, and extends, the monitoring capabilities offered by Monasca, resulting in the addition of predictive metrics that can be leveraged by orchestration components like Heat or Senlin. We show experimental results using a recurrent neural network and a multi-layer perceptron as predictor, which are compared with a simple linear regression and a traditional non-predictive auto-scaling policy. However, the proposed framework allows for the easy customization of the prediction policy as needed.
{"title":"Predictive auto-scaling with OpenStack Monasca","authors":"Giacomo Lanciano, Filippo Galli, T. Cucinotta, D. Bacciu, A. Passarella","doi":"10.1145/3468737.3494104","DOIUrl":"https://doi.org/10.1145/3468737.3494104","url":null,"abstract":"Cloud auto-scaling mechanisms are typically based on reactive automation rules that scale a cluster whenever some metric, e.g., the average CPU usage among instances, exceeds a predefined threshold. Tuning these rules becomes particularly cumbersome when scaling-up a cluster involves non-negligible times to bootstrap new instances, as it happens frequently in production cloud services. To deal with this problem, we propose an architecture for auto-scaling cloud services based on the status in which the system is expected to evolve in the near future. Our approach leverages on time-series forecasting techniques, like those based on machine learning and artificial neural networks, to predict the future dynamics of key metrics, e.g., resource consumption metrics, and apply a threshold-based scaling policy on them. The result is a predictive automation policy that is able, for instance, to automatically anticipate peaks in the load of a cloud application and trigger ahead of time appropriate scaling actions to accommodate the expected increase in traffic. We prototyped our approach as an open-source OpenStack component, which relies on, and extends, the monitoring capabilities offered by Monasca, resulting in the addition of predictive metrics that can be leveraged by orchestration components like Heat or Senlin. We show experimental results using a recurrent neural network and a multi-layer perceptron as predictor, which are compared with a simple linear regression and a traditional non-predictive auto-scaling policy. However, the proposed framework allows for the easy customization of the prediction policy as needed.","PeriodicalId":254382,"journal":{"name":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121280213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zoltan Zvara, Péter G. N. Szabó, Bal'azs Barnab'as L'or'ant, Andr'as A. Bencz'ur
When processing data streams with highly skewed and nonstationary key distributions, we often observe overloaded partitions when the hash partitioning fails to balance data correctly. To avoid slow tasks that delay the completion of the whole stage of computation, it is necessary to apply adaptive, on-the-fly partitioning that continuously recomputes an optimal partitioner, given the observed key distribution. While such solutions exist for batch processing of static data sets and stateless stream processing, the task is difficult for long-running stateful streaming jobs where key distribution changes over time. Careful checkpointing and operator state migration is necessary to change the partitioning while the operation is running. Our key result is a lightweight on-the-fly Dynamic Repartitioning (DR) module for distributed data processing systems (DDPS), including Apache Spark and Flink, which improves the performance with negligible overhead. DR can adaptively repartition data during execution using our Key Isolator Partitioner (KIP). In our experiments with real workloads and power-law distributions, we reach a speedup of 1.5-6 for a variety of Spark and Flink jobs.
{"title":"System-aware dynamic partitioning for batch and streaming workloads","authors":"Zoltan Zvara, Péter G. N. Szabó, Bal'azs Barnab'as L'or'ant, Andr'as A. Bencz'ur","doi":"10.1145/3468737.3494087","DOIUrl":"https://doi.org/10.1145/3468737.3494087","url":null,"abstract":"When processing data streams with highly skewed and nonstationary key distributions, we often observe overloaded partitions when the hash partitioning fails to balance data correctly. To avoid slow tasks that delay the completion of the whole stage of computation, it is necessary to apply adaptive, on-the-fly partitioning that continuously recomputes an optimal partitioner, given the observed key distribution. While such solutions exist for batch processing of static data sets and stateless stream processing, the task is difficult for long-running stateful streaming jobs where key distribution changes over time. Careful checkpointing and operator state migration is necessary to change the partitioning while the operation is running. Our key result is a lightweight on-the-fly Dynamic Repartitioning (DR) module for distributed data processing systems (DDPS), including Apache Spark and Flink, which improves the performance with negligible overhead. DR can adaptively repartition data during execution using our Key Isolator Partitioner (KIP). In our experiments with real workloads and power-law distributions, we reach a speedup of 1.5-6 for a variety of Spark and Flink jobs.","PeriodicalId":254382,"journal":{"name":"Proceedings of the 14th IEEE/ACM International Conference on Utility and Cloud Computing","volume":"372 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115987024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}