Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00068
Majid Rezazadeh, Naser Ezzati-Jivan, Evan Galea, M. Dagenais
multi-threaded programming is a near-universal architecture in modern computer systems. Thread based programs usually utilize locks to coordinate access to shared resources. However, contention for locks can reduce parallel efficiency and degrade scalability.In this paper, we propose an execution-trace based method to analyze lock contention problems, without requiring an application’s source code. Our methodology uses dynamic analysis through execution tracing, running in several levels of the system to collect detailed runtime data. We combine it with an extended critical path algorithm which allows us to identify locking issues occurring in userspace. The result is a framework that is able to diagnose all contention issues while adding minimal impact on the system. We propose new views and structures to model and visualize collected data, giving programmers powerful comprehension tools to address contention issues.
{"title":"Multi-Level Execution Trace Based Lock Contention Analysis","authors":"Majid Rezazadeh, Naser Ezzati-Jivan, Evan Galea, M. Dagenais","doi":"10.1109/ISSREW51248.2020.00068","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00068","url":null,"abstract":"multi-threaded programming is a near-universal architecture in modern computer systems. Thread based programs usually utilize locks to coordinate access to shared resources. However, contention for locks can reduce parallel efficiency and degrade scalability.In this paper, we propose an execution-trace based method to analyze lock contention problems, without requiring an application’s source code. Our methodology uses dynamic analysis through execution tracing, running in several levels of the system to collect detailed runtime data. We combine it with an extended critical path algorithm which allows us to identify locking issues occurring in userspace. The result is a framework that is able to diagnose all contention issues while adding minimal impact on the system. We propose new views and structures to model and visualize collected data, giving programmers powerful comprehension tools to address contention issues.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"4 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132644923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00095
Alberto Avritzer, Michael Grottke, D. Menasché
This extended abstract summarizes the background, goals, applicability domain, method, results, and lessons learned presented in the corresponding chapter of the Handbook of Software Aging and Rejuvenation.
{"title":"Chapter 8: Software Aging Monitoring and Rejuvenation for the Assessment of High Availability Systems - Extended Abstract","authors":"Alberto Avritzer, Michael Grottke, D. Menasché","doi":"10.1109/ISSREW51248.2020.00095","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00095","url":null,"abstract":"This extended abstract summarizes the background, goals, applicability domain, method, results, and lessons learned presented in the corresponding chapter of the Handbook of Software Aging and Rejuvenation.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131554917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00051
José Flora
Microservice architectures adoption is growing expeditiously in market size and adoption, including in business-critical systems. This is due to agility in development and deployment further increased by containers and their characteristics. Ensuring security is still a major concern due to challenges faced such as resource separation and isolation, as improper access to one service might compromise complete systems. This doctoral work intends to advance the security of microservice systems through research and improvement of methodologies for detection, tolerance and mitigation of security intrusions, while overcoming challenges related to multi-tenancy, heterogeneity, dynamicity of systems and environments. Our preliminary research shows that host-based IDSes are applicable in container environments. This will be extended to dynamic scenarios, serving as a steppingstone to research intrusion tolerance techniques suited to these environments. These methodologies will be demonstrated in realistic microservice systems: complex, dynamic, scalable and elastic.
{"title":"Improving the Security of Microservice Systems by Detecting and Tolerating Intrusions","authors":"José Flora","doi":"10.1109/ISSREW51248.2020.00051","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00051","url":null,"abstract":"Microservice architectures adoption is growing expeditiously in market size and adoption, including in business-critical systems. This is due to agility in development and deployment further increased by containers and their characteristics. Ensuring security is still a major concern due to challenges faced such as resource separation and isolation, as improper access to one service might compromise complete systems. This doctoral work intends to advance the security of microservice systems through research and improvement of methodologies for detection, tolerance and mitigation of security intrusions, while overcoming challenges related to multi-tenancy, heterogeneity, dynamicity of systems and environments. Our preliminary research shows that host-based IDSes are applicable in container environments. This will be extended to dynamic scenarios, serving as a steppingstone to research intrusion tolerance techniques suited to these environments. These methodologies will be demonstrated in realistic microservice systems: complex, dynamic, scalable and elastic.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"106 s415","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132227396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00036
Vaibhav Anu, Kazi Zakia Sultana, B. Samanthula
Many security incidents can be traced back to software vulnerabilities, which can be described as security-related defects/bugs in the code that can potentially be exploited by the attackers to perform unauthorized actions. An analysis of vulnerability data disseminated by organizations such as NIST’ s National Vulnerability (NVD) and SANS Institute shows that a majority of vulnerabilities can be traced back to a relatively small set of root causes mostly related to the repeated mistakes by the programmers. That is, programmers exhibit a pattern of erroneous coding practices or behavior which lead to vulnerable code. Cognitive Psychologists have long been studying these erroneous behavior patterns and have termed them as human cognition failures or simply, human errors. The primary goal of this paper is to propose a classification for the most frequently observed human errors committed by the programmers (the commitment of a human error can lead to injection of one or more security defects/bugs). Such a classification can be useful for software development organizations as they can train developers on the human errors so that developers can avoid committing the human errors themselves, thereby reducing the chances of vulnerability injection in their code.
{"title":"A Human Error Based Approach to Understanding Programmer-Induced Software Vulnerabilities","authors":"Vaibhav Anu, Kazi Zakia Sultana, B. Samanthula","doi":"10.1109/ISSREW51248.2020.00036","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00036","url":null,"abstract":"Many security incidents can be traced back to software vulnerabilities, which can be described as security-related defects/bugs in the code that can potentially be exploited by the attackers to perform unauthorized actions. An analysis of vulnerability data disseminated by organizations such as NIST’ s National Vulnerability (NVD) and SANS Institute shows that a majority of vulnerabilities can be traced back to a relatively small set of root causes mostly related to the repeated mistakes by the programmers. That is, programmers exhibit a pattern of erroneous coding practices or behavior which lead to vulnerable code. Cognitive Psychologists have long been studying these erroneous behavior patterns and have termed them as human cognition failures or simply, human errors. The primary goal of this paper is to propose a classification for the most frequently observed human errors committed by the programmers (the commitment of a human error can lead to injection of one or more security defects/bugs). Such a classification can be useful for software development organizations as they can train developers on the human errors so that developers can avoid committing the human errors themselves, thereby reducing the chances of vulnerability injection in their code.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132348706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00072
Gianluca Filippone, M. Autili, Massimo Tivoli
Modern technologies and emerging wireless communication solutions in the ICT world are empowering the spread of the most disparate ready-to-use software services distributed over the globe that can be easily accessed by an increasing number of connected devices. This state of affairs offers a dynamic and productive, yet distributed and complex, execution environment that encourages the development of systems based on the reuse of existing services through composition approaches, notably choreographies. However, in order to realize the distributed coordination logic that is required to enforce the correct choreography execution, automatic support is needed. Moreover, environmental changing conditions require the realization of choreographies capable of adapting their behavior to the execution context. This work presents our proposal for addressing the choreography realization problem, by describing an automated process for the synthesis of choreography-based systems capable of performing adaptation according to environmental and context conditions.
{"title":"Towards the synthesis of context-aware choreographies","authors":"Gianluca Filippone, M. Autili, Massimo Tivoli","doi":"10.1109/ISSREW51248.2020.00072","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00072","url":null,"abstract":"Modern technologies and emerging wireless communication solutions in the ICT world are empowering the spread of the most disparate ready-to-use software services distributed over the globe that can be easily accessed by an increasing number of connected devices. This state of affairs offers a dynamic and productive, yet distributed and complex, execution environment that encourages the development of systems based on the reuse of existing services through composition approaches, notably choreographies. However, in order to realize the distributed coordination logic that is required to enforce the correct choreography execution, automatic support is needed. Moreover, environmental changing conditions require the realization of choreographies capable of adapting their behavior to the execution context. This work presents our proposal for addressing the choreography realization problem, by describing an automated process for the synthesis of choreography-based systems capable of performing adaptation according to environmental and context conditions.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131847077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00037
Nuno Silva, Xavier Ferreira, Jesper Troelsen, Tomasz Kacmajor
Independent Software Verification and Validation (ISVV) is a process targeted at safety-critical software systems. It aims to increase the quality of software products, thereby reducing risks and costs through the operational life of the software. Since 2008, the European Space Agency and its partners have been using the ESA ISVV Guide for the application of ISVV activities and methods. Over these years, the stakeholders have collected a set of lessons learned and experiences, as well as a need to adapt the application of the guide towards new environments and new technologies. For this purpose, and to harmonize the ISVV Guide into a formal ECSS handbook, an update and improvement of the ISVV guide is currently on-going. This work is considering industry feedback and covering topics such as: ISVV to system level, guidelines for agile and iterative projects, reuse, verification and validation of data, auto-generated code, model-based techniques. This paper covers the ISVV Handbook improvement topics and the process being followed to collect and confirm proposed modifications.
{"title":"Independent Verification and Validation for the Space Industry: Guide Evolution Experience","authors":"Nuno Silva, Xavier Ferreira, Jesper Troelsen, Tomasz Kacmajor","doi":"10.1109/ISSREW51248.2020.00037","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00037","url":null,"abstract":"Independent Software Verification and Validation (ISVV) is a process targeted at safety-critical software systems. It aims to increase the quality of software products, thereby reducing risks and costs through the operational life of the software. Since 2008, the European Space Agency and its partners have been using the ESA ISVV Guide for the application of ISVV activities and methods. Over these years, the stakeholders have collected a set of lessons learned and experiences, as well as a need to adapt the application of the guide towards new environments and new technologies. For this purpose, and to harmonize the ISVV Guide into a formal ECSS handbook, an update and improvement of the ISVV guide is currently on-going. This work is considering industry feedback and covering topics such as: ISVV to system level, guidelines for agile and iterative projects, reuse, verification and validation of data, auto-generated code, model-based techniques. This paper covers the ISVV Handbook improvement topics and the process being followed to collect and confirm proposed modifications.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133428371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00040
N. Kajtazovic, Peter Hödl, Georg Macher
Ensuring traceability between software code and its runtime memory is a required design measure in a number of application fields to achieve functional safety targets. For mixed-critical systems, where a code with different levels of criticality may coexist, this aspect is of particular importance. In the course of safety audits for example, this information may serve to build an evidence that safety-critical code/data is sufficiently isolated from non-critical parts. Unfortunately, addressing the evidence for every byte in memory is not supported by modern compilers. In this paper, we introduce a method where the compiler pipeline is instrumented to recover traceability links between the code and runtime memory. We qualify our proposal on a real-world industrial use case in which the C/C++ code is synthesised for ARM Cortex-M3 controllers. Our experimental results suggest that such an accurate traceability support may serve as a solid basis when analysing memories for mixed-critical applications.
{"title":"Instrumenting Compiler Pipeline to Synthesise Traceable Runtime Memory Layouts in Mixed-critical Applications","authors":"N. Kajtazovic, Peter Hödl, Georg Macher","doi":"10.1109/ISSREW51248.2020.00040","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00040","url":null,"abstract":"Ensuring traceability between software code and its runtime memory is a required design measure in a number of application fields to achieve functional safety targets. For mixed-critical systems, where a code with different levels of criticality may coexist, this aspect is of particular importance. In the course of safety audits for example, this information may serve to build an evidence that safety-critical code/data is sufficiently isolated from non-critical parts. Unfortunately, addressing the evidence for every byte in memory is not supported by modern compilers. In this paper, we introduce a method where the compiler pipeline is instrumented to recover traceability links between the code and runtime memory. We qualify our proposal on a real-world industrial use case in which the C/C++ code is synthesised for ARM Cortex-M3 controllers. Our experimental results suggest that such an accurate traceability support may serve as a solid basis when analysing memories for mixed-critical applications.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131133038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00090
Lilian Barros, C. Hirata, Johnny Cardoso Marques, A. Ambrosio
DO-178C establishes considerations for developers, installers, and users to design software of embedded equipment in the aviation sector. Organizations must define processes and verify that they help to demonstrate that the DO-178C objectives are satisfied. We propose a test case generation method for process evaluation and improvement. The proposed method consists of an adaptation of the CoFI (Conformance and Fault Injection) approach to generate test cases to processes. Test cases verify if the deployed processes comply with their requirements. We applied the innovative approach to a corrective action process for safety-critical software reviews. The results show that the method is helpful to elicit and analyze unexpected behaviors.
{"title":"Generating test cases to evaluate and improve processes of safety-critical systems development","authors":"Lilian Barros, C. Hirata, Johnny Cardoso Marques, A. Ambrosio","doi":"10.1109/ISSREW51248.2020.00090","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00090","url":null,"abstract":"DO-178C establishes considerations for developers, installers, and users to design software of embedded equipment in the aviation sector. Organizations must define processes and verify that they help to demonstrate that the DO-178C objectives are satisfied. We propose a test case generation method for process evaluation and improvement. The proposed method consists of an adaptation of the CoFI (Conformance and Fault Injection) approach to generate test cases to processes. Test cases verify if the deployed processes comply with their requirements. We applied the innovative approach to a corrective action process for safety-critical software reviews. The results show that the method is helpful to elicit and analyze unexpected behaviors.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131255168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00097
Keita Suzuki, Takafumi Kubota, K. Kono
Struct member-related memory leak can become a serious problem. Linux kernel is not an exception. According to our study of Linux Kernel patches, 54.6% of all memory leak-related patches within the last two years were related to the leak of struct members. This occurs when a struct is freed before freeing its dynamically allocated struct members. Detecting these bugs in large-scale software requires to reduce analysis cost for scalability and effectively collect the state of a struct and its members.In this paper, we present a simple static-analysis approach to detect struct member-related memory leak in the Linux Kernel. Our analysis first collects alloc/free information by conducting a path-insensitive analysis. To efficiently conduct inter-procedural analysis, we introduce error-code analysis, which is an optimization to efficiently pass back the alloc/free information by focusing on the return value of callee and its use in the caller. When detecting a struct free, we scan through the collected information to detect any member that remains unfreed, and generate warnings to them. We evaluated our method by analyzing the Linux Kernel 5.3-rc4, and found two new bugs. Both of the bugs were reviewed and confirmed by Linux Kernel developers.
{"title":"Detecting Struct Member-Related Memory Leaks Using Error Code Analysis in Linux Kernel","authors":"Keita Suzuki, Takafumi Kubota, K. Kono","doi":"10.1109/ISSREW51248.2020.00097","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00097","url":null,"abstract":"Struct member-related memory leak can become a serious problem. Linux kernel is not an exception. According to our study of Linux Kernel patches, 54.6% of all memory leak-related patches within the last two years were related to the leak of struct members. This occurs when a struct is freed before freeing its dynamically allocated struct members. Detecting these bugs in large-scale software requires to reduce analysis cost for scalability and effectively collect the state of a struct and its members.In this paper, we present a simple static-analysis approach to detect struct member-related memory leak in the Linux Kernel. Our analysis first collects alloc/free information by conducting a path-insensitive analysis. To efficiently conduct inter-procedural analysis, we introduce error-code analysis, which is an optimization to efficiently pass back the alloc/free information by focusing on the return value of callee and its use in the caller. When detecting a struct free, we scan through the collected information to detect any member that remains unfreed, and generate warnings to them. We evaluated our method by analyzing the Linux Kernel 5.3-rc4, and found two new bugs. Both of the bugs were reviewed and confirmed by Linux Kernel developers.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"19 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121279101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-01DOI: 10.1109/ISSREW51248.2020.00030
Hongzhang Yang, Yahui Yang, Zhengguang Chen, Zongzhao Li, Yaofeng Tu
The reliability of distributed file system is inevitably affected by hard disk failure. This paper proposes an active disk failure forecasting and tolerance software. Firstly, multiple SMART records in the time window are merged into one sample, and after sliding, tens of times of positive samples are created. Secondly, the features are selected by two-stage sorting method, so that the most conducive features are used in machine learning modeling, and the time for model training can be shortened obviously. Thirdly, through two-stage verification, parameters can be adjusted in time for unreasonable proactive reconstruction strategies. Experiments show that modeling and forecast of ZTE data set and Backblaze data set respectively, the recall rate is 95.66% and 84.28%, and the error rate is 0.23% and 2.45%. The work in this paper has been commercially used for more than one year in ZTE data center. The reliability of distributed file system software is significantly improved.
{"title":"ADF2T: an Active Disk Failure Forecasting and Tolerance Software","authors":"Hongzhang Yang, Yahui Yang, Zhengguang Chen, Zongzhao Li, Yaofeng Tu","doi":"10.1109/ISSREW51248.2020.00030","DOIUrl":"https://doi.org/10.1109/ISSREW51248.2020.00030","url":null,"abstract":"The reliability of distributed file system is inevitably affected by hard disk failure. This paper proposes an active disk failure forecasting and tolerance software. Firstly, multiple SMART records in the time window are merged into one sample, and after sliding, tens of times of positive samples are created. Secondly, the features are selected by two-stage sorting method, so that the most conducive features are used in machine learning modeling, and the time for model training can be shortened obviously. Thirdly, through two-stage verification, parameters can be adjusted in time for unreasonable proactive reconstruction strategies. Experiments show that modeling and forecast of ZTE data set and Backblaze data set respectively, the recall rate is 95.66% and 84.28%, and the error rate is 0.23% and 2.45%. The work in this paper has been commercially used for more than one year in ZTE data center. The reliability of distributed file system software is significantly improved.","PeriodicalId":202247,"journal":{"name":"2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"242 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122920109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}