Iure Fé, Tuan Anh Nguyen, Mario Di Mauro, Fabio Postiglione, Alex Ramos, André Soares, Eunmi Choi, Dugki Min, Jae Woo Lee, Francisco Airton Silva
{"title":"Energy-aware dynamic response and efficient consolidation strategies for disaster survivability of cloud microservices architecture","authors":"Iure Fé, Tuan Anh Nguyen, Mario Di Mauro, Fabio Postiglione, Alex Ramos, André Soares, Eunmi Choi, Dugki Min, Jae Woo Lee, Francisco Airton Silva","doi":"10.1007/s00607-024-01305-x","DOIUrl":null,"url":null,"abstract":"<p>Computer system resilience refers to the ability of a computer system to continue functioning even in the face of unexpected events or disruptions. These disruptions can be caused by a variety of factors, such as hardware failures, software glitches, cyber attacks, or even natural disasters. Modern computational environments need applications that can recover quickly from major disruptions while also being environmentally sustainable. Balancing system resilience with energy efficiency is challenging, as efforts to improve one can harm the other. This paper presents a method to enhance disaster survivability in microservice architectures, particularly those using Kubernetes in cloud-based environments, focusing on optimizing electrical energy use. Aiming to save energy, our work adopt the consolidation strategy that means grouping multiple microservices on a single host. Our aproach uses a widely adopted analytical model, the Generalized Stochastic Petri Net (GSPN). GSPN are a powerful modeling technique that is widely used in various fields, including engineering, computer science, and operations research. One of the primary advantages of GSPN is its ability to model complex systems with a high degree of accuracy. Additionally, GSPN allows for the modeling of both logical and stochastic behavior, making it ideal for systems that involve a combination of both. Our GSPN models compute a number of metrics such as: recovery time, system availability, reliability, Mean Time to Failure, and the configuration of cloud-based microservices. We compared our approach against others focusing on survivability or efficiency. Our approach aligns with Recovery Time Objectives during sudden disasters and offers the fastest recovery, requiring 9% less warning time to fully recover in cases of disaster with alert when compared to strategies with similar electrical consumption. It also saves about 27% energy compared to low consolidation strategies and 5% against high consolidation under static conditions.</p>","PeriodicalId":10718,"journal":{"name":"Computing","volume":"9 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00607-024-01305-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Computer system resilience refers to the ability of a computer system to continue functioning even in the face of unexpected events or disruptions. These disruptions can be caused by a variety of factors, such as hardware failures, software glitches, cyber attacks, or even natural disasters. Modern computational environments need applications that can recover quickly from major disruptions while also being environmentally sustainable. Balancing system resilience with energy efficiency is challenging, as efforts to improve one can harm the other. This paper presents a method to enhance disaster survivability in microservice architectures, particularly those using Kubernetes in cloud-based environments, focusing on optimizing electrical energy use. Aiming to save energy, our work adopt the consolidation strategy that means grouping multiple microservices on a single host. Our aproach uses a widely adopted analytical model, the Generalized Stochastic Petri Net (GSPN). GSPN are a powerful modeling technique that is widely used in various fields, including engineering, computer science, and operations research. One of the primary advantages of GSPN is its ability to model complex systems with a high degree of accuracy. Additionally, GSPN allows for the modeling of both logical and stochastic behavior, making it ideal for systems that involve a combination of both. Our GSPN models compute a number of metrics such as: recovery time, system availability, reliability, Mean Time to Failure, and the configuration of cloud-based microservices. We compared our approach against others focusing on survivability or efficiency. Our approach aligns with Recovery Time Objectives during sudden disasters and offers the fastest recovery, requiring 9% less warning time to fully recover in cases of disaster with alert when compared to strategies with similar electrical consumption. It also saves about 27% energy compared to low consolidation strategies and 5% against high consolidation under static conditions.
期刊介绍:
Computing publishes original papers, short communications and surveys on all fields of computing. The contributions should be written in English and may be of theoretical or applied nature, the essential criteria are computational relevance and systematic foundation of results.