{"title":"Automated Fault-Tolerance Testing","authors":"Adithya Nagarajan, Ajay Vaddadi","doi":"10.1109/ICSTW.2016.34","DOIUrl":null,"url":null,"abstract":"Software Fault Tolerance is an ability of computer software to continue its normal operation despite the presence of system or hardware faults. Most companies are moving towards a microservices-based architecture where complex applications are developed with a suite of small services, each of which communicates using some common protocols like Hypertext Transfer Protocol (HTTP). While this architecture enables agility in software development and go-to-market, it poses a critical challenge of assessing fault tolerance and resiliency of the overall system. A failure in one of the dependent services can cause an unexpected impact on the upstream services causing severe customer facing issues. Such issues are a result of lack of resiliency in the architecture of the system. There is a need for an automated tool to be able to understand the service architecture, topology, and be able to inject faults to assess fault tolerance and resiliency of the system. In this paper, we present Screwdriver -- a new automated solution developed at Groupon to address this need.","PeriodicalId":335145,"journal":{"name":"2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Ninth International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSTW.2016.34","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Software Fault Tolerance is an ability of computer software to continue its normal operation despite the presence of system or hardware faults. Most companies are moving towards a microservices-based architecture where complex applications are developed with a suite of small services, each of which communicates using some common protocols like Hypertext Transfer Protocol (HTTP). While this architecture enables agility in software development and go-to-market, it poses a critical challenge of assessing fault tolerance and resiliency of the overall system. A failure in one of the dependent services can cause an unexpected impact on the upstream services causing severe customer facing issues. Such issues are a result of lack of resiliency in the architecture of the system. There is a need for an automated tool to be able to understand the service architecture, topology, and be able to inject faults to assess fault tolerance and resiliency of the system. In this paper, we present Screwdriver -- a new automated solution developed at Groupon to address this need.