Pub Date : 2022-10-01DOI: 10.1109/ISSREW55968.2022.00063
A. Fantechi, G. Gori, Marco Papini
The overall system reliability of complex Cyber Physical systems is contributed both by hardware reliability and software reliability. The former can be often increased through fault-tolerant mechanisms and architectures, while the latter can take advantage of a suitable rejuvenation policy. These characteristics call for flexible runtime safety checks of system executions that go beyond conventional runtime mon-itoring of pre-programmed safety conditions, also in order to minimize maintenance costs. Defining a satisfying monitoring model for complex systems is still a challenge. In this paper, we investigate the application of a novel approach, named Reliability Based Monitoring (RBM), that allows for a flexible runtime monitoring of software reliability in complex systems. The approach leverages a hierarchical reliability model periodically applied to runtime diagnostics data: this allows to dynamically plan rejuvenation activities that are able to prevent software failures.
{"title":"Software rejuvenation and runtime reliability monitoring","authors":"A. Fantechi, G. Gori, Marco Papini","doi":"10.1109/ISSREW55968.2022.00063","DOIUrl":"https://doi.org/10.1109/ISSREW55968.2022.00063","url":null,"abstract":"The overall system reliability of complex Cyber Physical systems is contributed both by hardware reliability and software reliability. The former can be often increased through fault-tolerant mechanisms and architectures, while the latter can take advantage of a suitable rejuvenation policy. These characteristics call for flexible runtime safety checks of system executions that go beyond conventional runtime mon-itoring of pre-programmed safety conditions, also in order to minimize maintenance costs. Defining a satisfying monitoring model for complex systems is still a challenge. In this paper, we investigate the application of a novel approach, named Reliability Based Monitoring (RBM), that allows for a flexible runtime monitoring of software reliability in complex systems. The approach leverages a hierarchical reliability model periodically applied to runtime diagnostics data: this allows to dynamically plan rejuvenation activities that are able to prevent software failures.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124042557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ISSREW55968.2022.00086
Pei Yang, Jing Wang, Huandong Wang
Deep neural networks (DNNs) are widely applied in autonomous intelligent systems. However, DNNs are vulnerable to adversarial attacks from exclusively crafted input images, leading to performance degradation such as wrong classifications. A wrong classification made by an AIS could result in severe and possibly lethal consequences. While several existing works proposed applying classic computer vision techniques to adversarial defense, these methods generally deteriorate the input information to a considerable extent. To re-store model performances while minimising such deterioration, we propose a novel method for adversarial defence named Colour Space Defence. We first demonstrated the weak transferability of adversarial information across different colour spaces. We then proposed to defend against adversarial examples by ensembling models trained in multiple colour spaces. Experiments have verified the validity of Colour Space Defence in maintaining performances on clean images. In most cases of defence, this method outperformed several of its comparators.
{"title":"Colour Space Defence: Simple, Intuitive, but Effective","authors":"Pei Yang, Jing Wang, Huandong Wang","doi":"10.1109/ISSREW55968.2022.00086","DOIUrl":"https://doi.org/10.1109/ISSREW55968.2022.00086","url":null,"abstract":"Deep neural networks (DNNs) are widely applied in autonomous intelligent systems. However, DNNs are vulnerable to adversarial attacks from exclusively crafted input images, leading to performance degradation such as wrong classifications. A wrong classification made by an AIS could result in severe and possibly lethal consequences. While several existing works proposed applying classic computer vision techniques to adversarial defense, these methods generally deteriorate the input information to a considerable extent. To re-store model performances while minimising such deterioration, we propose a novel method for adversarial defence named Colour Space Defence. We first demonstrated the weak transferability of adversarial information across different colour spaces. We then proposed to defend against adversarial examples by ensembling models trained in multiple colour spaces. Experiments have verified the validity of Colour Space Defence in maintaining performances on clean images. In most cases of defence, this method outperformed several of its comparators.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124193163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ISSREW55968.2022.00033
A. Ketterer, Asha Shekar, E. Yi, S. Bagchi, Abraham A. Clements
Firmware emulation is useful for finding vulnerabil-ities, performing debugging, and testing functionalities. However, the process of enabling firmware to execute in an emulator (i.e., re-hosting) is difficult. Each piece of the firmware may depend on hardware peripherals outside the microcontroller that are inaccessible during emulation. Current practices involve painstakingly disentangling these dependencies or replacing them with developed models that emulate functions interacting with hardware. Unfortunately, both are highly manual and error-prone. In this paper, we introduce a systematic graph-based approach to analyze firmware binaries and determine which functions need to be replaced. Our approach is customizable to balance the fidelity of the emulation and the amount of effort it would take to achieve the emulation by modeling functions. We run our algorithm across a number of firmware binaries and show its ability to capture and remove a large majority of hardware dependencies.
{"title":"An Automated Approach to Re-Hosting Embedded Firmware by Removing Hardware Dependencies","authors":"A. Ketterer, Asha Shekar, E. Yi, S. Bagchi, Abraham A. Clements","doi":"10.1109/ISSREW55968.2022.00033","DOIUrl":"https://doi.org/10.1109/ISSREW55968.2022.00033","url":null,"abstract":"Firmware emulation is useful for finding vulnerabil-ities, performing debugging, and testing functionalities. However, the process of enabling firmware to execute in an emulator (i.e., re-hosting) is difficult. Each piece of the firmware may depend on hardware peripherals outside the microcontroller that are inaccessible during emulation. Current practices involve painstakingly disentangling these dependencies or replacing them with developed models that emulate functions interacting with hardware. Unfortunately, both are highly manual and error-prone. In this paper, we introduce a systematic graph-based approach to analyze firmware binaries and determine which functions need to be replaced. Our approach is customizable to balance the fidelity of the emulation and the amount of effort it would take to achieve the emulation by modeling functions. We run our algorithm across a number of firmware binaries and show its ability to capture and remove a large majority of hardware dependencies.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127294995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ISSREW55968.2022.00062
Takeru Wada, Hiroshi Yamada
Software rejuvenation is a simple but powerful method for improving the availability of computer systems. Software rejuvenation faces a challenge to apply itself to a new type of application, the Unikernel which is a library OS where OS functions are linked to the target applications like libraries. Since the unikernel layer is tightly coupled to applications, rebooting the unikernel layers involves the applications' reboots, eliminating and reconstructing memory contents unrelated to the unikernels. This paper presents VampOS that allows us to rejuve-nate the only unikernellayer. VampOS performs component-level rejuvenation of the unikernel by logging interactions between the components and replaying them to restarted components while simultaneously keeping the linked applications running. We implemented a prototype of VampOS, not well-optimized, on Unikraft 0.8.0 and the experimental results show that its runtime overhead is up to 13.6x and the VampOS-linked SQLite mitigates the effects of the intentionally injected memory leak bugs without any downtime. This paper also describes the next directions for efficient rejuvenation of the unikernel-linked applications.
{"title":"Towards Making Unikernels Rejuvenatable","authors":"Takeru Wada, Hiroshi Yamada","doi":"10.1109/ISSREW55968.2022.00062","DOIUrl":"https://doi.org/10.1109/ISSREW55968.2022.00062","url":null,"abstract":"Software rejuvenation is a simple but powerful method for improving the availability of computer systems. Software rejuvenation faces a challenge to apply itself to a new type of application, the Unikernel which is a library OS where OS functions are linked to the target applications like libraries. Since the unikernel layer is tightly coupled to applications, rebooting the unikernel layers involves the applications' reboots, eliminating and reconstructing memory contents unrelated to the unikernels. This paper presents VampOS that allows us to rejuve-nate the only unikernellayer. VampOS performs component-level rejuvenation of the unikernel by logging interactions between the components and replaying them to restarted components while simultaneously keeping the linked applications running. We implemented a prototype of VampOS, not well-optimized, on Unikraft 0.8.0 and the experimental results show that its runtime overhead is up to 13.6x and the VampOS-linked SQLite mitigates the effects of the intentionally injected memory leak bugs without any downtime. This paper also describes the next directions for efficient rejuvenation of the unikernel-linked applications.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116338210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ISSREW55968.2022.00049
Tiancheng Li, Xiaohui Wan, Muhammed Murat Özbek
In recent years, deep reinforcement learning (DRL) technology has developed rapidly, and the application of DRL has been extended to many fields such as game gaming, au-tonomous driving, financial transactions, and robot control. As DRL applications expand and enrich, quality assurance of DRL software is increasingly important, especially in safety -critical areas. Therefore, it is necessary and urgent to adequately test DRL models to ensure the reliability and security of DRL systems. However, due to fundamental differences, traditional software testing methods cannot be directly applied to D RL systems. To bridge this gap, we introduce a new DRL system testing framework in this proposal, which aims to generate various test cases that can cause D RL systems to fail. The proposed testing framework is the first fuzzing framework for systematically testing DRL systems which we call AgentFuzz.
{"title":"AgentFuzz: Fuzzing for Deep Reinforcement Learning Systems","authors":"Tiancheng Li, Xiaohui Wan, Muhammed Murat Özbek","doi":"10.1109/ISSREW55968.2022.00049","DOIUrl":"https://doi.org/10.1109/ISSREW55968.2022.00049","url":null,"abstract":"In recent years, deep reinforcement learning (DRL) technology has developed rapidly, and the application of DRL has been extended to many fields such as game gaming, au-tonomous driving, financial transactions, and robot control. As DRL applications expand and enrich, quality assurance of DRL software is increasingly important, especially in safety -critical areas. Therefore, it is necessary and urgent to adequately test DRL models to ensure the reliability and security of DRL systems. However, due to fundamental differences, traditional software testing methods cannot be directly applied to D RL systems. To bridge this gap, we introduce a new DRL system testing framework in this proposal, which aims to generate various test cases that can cause D RL systems to fail. The proposed testing framework is the first fuzzing framework for systematically testing DRL systems which we call AgentFuzz.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122854593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ISSREW55968.2022.00075
Alastair Nottingham, Molly Buchanan, Mark Gardner, Jason Hiser, J. Davidson
Current cybersecurity research is constrained by the general scarcity of large, realistic, labeled network traffic datasets. To address said scarcity, this paper introduces Sentinel: a multi-enterprise scientific instrument developed to support data-driven cybersecurity research. Sentinel provides researchers access to virtual computing infrastructure and petabytes of data collected over several years from network sensors at two large, disjoint educational institutions - the University of Virginia and Virginia Tech. The network dataset is supplemented by multi-modal malware activity logs generated by attack recreation exercises which realistically integrate ground truth into collected edge sensor data. To mitigate risks associated with providing access to enterprise network sensor logs, Sentinel uses a combination of a code-to-data policy, data usage agreements, and pattern-preserving anonymization. Sentinel has been used as part of a government-funded effort to investigate new machine learning algorithms, cybersecurity forensics, and data retention techniques.
{"title":"Sentinel: A Multi-institution Enterprise Scale Platform for Data-driven Cybersecurity Research","authors":"Alastair Nottingham, Molly Buchanan, Mark Gardner, Jason Hiser, J. Davidson","doi":"10.1109/ISSREW55968.2022.00075","DOIUrl":"https://doi.org/10.1109/ISSREW55968.2022.00075","url":null,"abstract":"Current cybersecurity research is constrained by the general scarcity of large, realistic, labeled network traffic datasets. To address said scarcity, this paper introduces Sentinel: a multi-enterprise scientific instrument developed to support data-driven cybersecurity research. Sentinel provides researchers access to virtual computing infrastructure and petabytes of data collected over several years from network sensors at two large, disjoint educational institutions - the University of Virginia and Virginia Tech. The network dataset is supplemented by multi-modal malware activity logs generated by attack recreation exercises which realistically integrate ground truth into collected edge sensor data. To mitigate risks associated with providing access to enterprise network sensor logs, Sentinel uses a combination of a code-to-data policy, data usage agreements, and pattern-preserving anonymization. Sentinel has been used as part of a government-funded effort to investigate new machine learning algorithms, cybersecurity forensics, and data retention techniques.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125298376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ISSREW55968.2022.00031
Kangjin Wang, Chuanjia Hou, Ying Li, Yaoyong Dou, Cheng Wang, Yang Wen, Jie Yao, Liping Zhang
Colocating latency-critical (LC) jobs and best-effort (BE) jobs on a host effectively improve resource efficiency in modern datacenters. But it increases resource contention between jobs, which seriously affects job performance. In Alibaba's real- world LC- BE colocation datacenters, we observed that cache is one of the most contended resources in the CPU. When cache contention occurs, identifying the antagonists that caused cache resource contention is the first step to mitigate cache contention, called cache antagonists identification (CAl). However, it is chal-lenging to identify cache antagonists because cache contention is difficult to observe and quantify. In this paper, we first propose cache usage graph (CUG) to finely characterize cache usage of jobs in the multiple CPU microarchitectural hierarchies and locations, and we provide a monitoring tool to collect “per-container-per-logic CPU” Ll/2/3 cache misses and build CUG. Then we propose a CUG-based CAl approach, $mu$ Tactic. $mu$ Tactic leverages machine learning models to quantify the cache contention on every cache hierarchy, then reasons out the cache antagonists with CUG. Experiments in production datacenters show that $mu$ Tactic has a high precision (85+%) and low cost (32 ms), which are better than state-of-the-art approaches.
{"title":"Cache Antagonists Identification: A Practice from Alibaba Colocation Datacenter","authors":"Kangjin Wang, Chuanjia Hou, Ying Li, Yaoyong Dou, Cheng Wang, Yang Wen, Jie Yao, Liping Zhang","doi":"10.1109/ISSREW55968.2022.00031","DOIUrl":"https://doi.org/10.1109/ISSREW55968.2022.00031","url":null,"abstract":"Colocating latency-critical (LC) jobs and best-effort (BE) jobs on a host effectively improve resource efficiency in modern datacenters. But it increases resource contention between jobs, which seriously affects job performance. In Alibaba's real- world LC- BE colocation datacenters, we observed that cache is one of the most contended resources in the CPU. When cache contention occurs, identifying the antagonists that caused cache resource contention is the first step to mitigate cache contention, called cache antagonists identification (CAl). However, it is chal-lenging to identify cache antagonists because cache contention is difficult to observe and quantify. In this paper, we first propose cache usage graph (CUG) to finely characterize cache usage of jobs in the multiple CPU microarchitectural hierarchies and locations, and we provide a monitoring tool to collect “per-container-per-logic CPU” Ll/2/3 cache misses and build CUG. Then we propose a CUG-based CAl approach, $mu$ Tactic. $mu$ Tactic leverages machine learning models to quantify the cache contention on every cache hierarchy, then reasons out the cache antagonists with CUG. Experiments in production datacenters show that $mu$ Tactic has a high precision (85+%) and low cost (32 ms), which are better than state-of-the-art approaches.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121414988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the introduction of the integrated modular avionics (IMA), recent trends in avionics are to integrate dif-ferent software applications on the same hardware platform. In this context, the underlying platform embodied by a real-time operating system (RTOS) must be designed in compliance with the ARIN C 653 specification. ARIN C 653 defines an application executive (APEX) interface between the RTOS and avionics applications within IMA architecture. It specifies requirements of an environment that provides partitioning, i.e. separation of applications to ensure fault containment and ease of verification. Designing an RTOS that complies with ARIN C 653 is costly and requires significant efforts. In this paper, we introduce a domain-specific language (DSL) that supports the specification of an ARINC653-compliant RTOS. In particular, we consider ARINC 653 as a set of generic and high-level requirements, and we use model-driven technologies to specify these requirements in the form of a metamodel. The ARINC metamodel aims at supporting and reducing the cost of certification by reusing the metamodel across multiple RTOS development projects. Other benefits of the ARIN C metamodel include generating data required for certification such as ARIN C configuration tables and test data.
随着集成模块化航空电子系统(IMA)的引入,航空电子系统的最新趋势是在同一硬件平台上集成不同的软件应用程序。在这种情况下,实时操作系统(RTOS)所包含的底层平台必须按照ARIN C 653规范进行设计。ARIN C 653定义了IMA体系结构中RTOS和航空电子应用程序之间的应用程序执行(APEX)接口。它指定了提供分区的环境的需求,即应用程序的分离,以确保故障控制和易于验证。设计一个符合ARIN C 653的实时操作系统是昂贵的,需要付出巨大的努力。在本文中,我们介绍了一种支持arinc653兼容的RTOS规范的领域特定语言(DSL)。特别地,我们将ARINC 653视为一组通用和高级需求,并且我们使用模型驱动技术以元模型的形式指定这些需求。ARINC元模型旨在通过跨多个RTOS开发项目重用元模型来支持和降低认证成本。ARIN元模型的其他好处包括生成认证所需的数据,如ARIN配置表和测试数据。
{"title":"A Domain Specific Language for the ARINC 653 Specification","authors":"Ikram Darif, Cristiano Politowski, Ghizlane El-Boussaidi, Sègla Kpodjedo","doi":"10.1109/ISSREW55968.2022.00073","DOIUrl":"https://doi.org/10.1109/ISSREW55968.2022.00073","url":null,"abstract":"With the introduction of the integrated modular avionics (IMA), recent trends in avionics are to integrate dif-ferent software applications on the same hardware platform. In this context, the underlying platform embodied by a real-time operating system (RTOS) must be designed in compliance with the ARIN C 653 specification. ARIN C 653 defines an application executive (APEX) interface between the RTOS and avionics applications within IMA architecture. It specifies requirements of an environment that provides partitioning, i.e. separation of applications to ensure fault containment and ease of verification. Designing an RTOS that complies with ARIN C 653 is costly and requires significant efforts. In this paper, we introduce a domain-specific language (DSL) that supports the specification of an ARINC653-compliant RTOS. In particular, we consider ARINC 653 as a set of generic and high-level requirements, and we use model-driven technologies to specify these requirements in the form of a metamodel. The ARINC metamodel aims at supporting and reducing the cost of certification by reusing the metamodel across multiple RTOS development projects. Other benefits of the ARIN C metamodel include generating data required for certification such as ARIN C configuration tables and test data.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127612898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-01DOI: 10.1109/ISSREW55968.2022.00074
Sabuj Laskar, Md. Hasanur Rahman, Guanpeng Li
Deep Neural Networks (DNNs) are widely deployed in various applications such as autonomous vehicles, healthcare, space applications. TensorFlow is the most popular framework for developing DNN models. After the release of TensorFlow 2, a software-level fault injector named TensorFI is developed for TensorFlow 2 models, which is limited to inject faults only in sequential models. However, most popular DNN models today are non-sequential. In this paper, we are the first to propose TensorFI+, an extension to TensorFI to support for non-sequential models so that developers can assess resiliency of any DNN model developed with TensorFlow 2. For the evaluation, we conduct a large-scale fault injection experiment on 30 sequential and non-sequential models with three popularly used classification datasets. We observe that our tool can inject faults in any layer for any sequential or non-sequential DNN model, and fault-injected inference incurs only 7.62 x overhead compared to fault-free inference.
{"title":"TensorFI+: A Scalable Fault Injection Framework for Modern Deep Learning Neural Networks","authors":"Sabuj Laskar, Md. Hasanur Rahman, Guanpeng Li","doi":"10.1109/ISSREW55968.2022.00074","DOIUrl":"https://doi.org/10.1109/ISSREW55968.2022.00074","url":null,"abstract":"Deep Neural Networks (DNNs) are widely deployed in various applications such as autonomous vehicles, healthcare, space applications. TensorFlow is the most popular framework for developing DNN models. After the release of TensorFlow 2, a software-level fault injector named TensorFI is developed for TensorFlow 2 models, which is limited to inject faults only in sequential models. However, most popular DNN models today are non-sequential. In this paper, we are the first to propose TensorFI+, an extension to TensorFI to support for non-sequential models so that developers can assess resiliency of any DNN model developed with TensorFlow 2. For the evaluation, we conduct a large-scale fault injection experiment on 30 sequential and non-sequential models with three popularly used classification datasets. We observe that our tool can inject faults in any layer for any sequential or non-sequential DNN model, and fault-injected inference incurs only 7.62 x overhead compared to fault-free inference.","PeriodicalId":178302,"journal":{"name":"2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134333436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}