Pub Date : 2021-04-01DOI: 10.1109/ICSTW52544.2021.00018
F. Schwander, Rahul Gopinath, A. Zeller
Mutation analysis is the gold standard for assessing the effectiveness of a test suite to prevent bugs. It involves injecting syntactic changes in the program, generating variants (mutants) of the program under test, and checking whether the test suite detects the mutant. Practitioners often rely on these live mutants to decide what test cases to write for improving the test suite effectiveness.While a majority of such syntactic changes result in semantic differences from the original, it is possible that such a change fails to induce a corresponding semantic change in the mutant. Such equivalent mutants can lead to wastage of manual effort.We describe a novel technique that produces high-quality mutants while avoiding the generation of equivalent mutants for input processors. Our idea is to generate plausible, near correct inputs for the program, collect those rejected, and generate variants that accept these rejected strings. This technique allows us to provide an enhanced set of mutants along with newly generated test cases that kill them.We evaluate our method on eight python programs and show that our technique can generate new mutants that are both interesting for the developer and guaranteed to be mortal.
{"title":"Inducing Subtle Mutations with Program Repair","authors":"F. Schwander, Rahul Gopinath, A. Zeller","doi":"10.1109/ICSTW52544.2021.00018","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00018","url":null,"abstract":"Mutation analysis is the gold standard for assessing the effectiveness of a test suite to prevent bugs. It involves injecting syntactic changes in the program, generating variants (mutants) of the program under test, and checking whether the test suite detects the mutant. Practitioners often rely on these live mutants to decide what test cases to write for improving the test suite effectiveness.While a majority of such syntactic changes result in semantic differences from the original, it is possible that such a change fails to induce a corresponding semantic change in the mutant. Such equivalent mutants can lead to wastage of manual effort.We describe a novel technique that produces high-quality mutants while avoiding the generation of equivalent mutants for input processors. Our idea is to generate plausible, near correct inputs for the program, collect those rejected, and generate variants that accept these rejected strings. This technique allows us to provide an enhanced set of mutants along with newly generated test cases that kill them.We evaluate our method on eight python programs and show that our technique can generate new mutants that are both interesting for the developer and guaranteed to be mortal.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132493756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-01DOI: 10.1109/ICSTW52544.2021.00029
Marc P. Hauer, R. Adler, K. Zweig
Assuring fairness of an algorithmic decision making (ADM) system is a challenging task involving different and possibly conflicting views on fairness as expressed by multiple fairness measures. We argue that a combination of the agile development framework Acceptance Test-Driven Development (ATDD) and the concept of Assurance Cases from safety engineering is a pragmatic way to assure fairness levels that are adequate for a predefined application. The approach supports examinations by regulating bodies or related auditing processes by providing a structured argument explaining the achieved level of fairness and its sufficiency for the application.
{"title":"Assuring Fairness of Algorithmic Decision Making","authors":"Marc P. Hauer, R. Adler, K. Zweig","doi":"10.1109/ICSTW52544.2021.00029","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00029","url":null,"abstract":"Assuring fairness of an algorithmic decision making (ADM) system is a challenging task involving different and possibly conflicting views on fairness as expressed by multiple fairness measures. We argue that a combination of the agile development framework Acceptance Test-Driven Development (ATDD) and the concept of Assurance Cases from safety engineering is a pragmatic way to assure fairness levels that are adequate for a predefined application. The approach supports examinations by regulating bodies or related auditing processes by providing a structured argument explaining the achieved level of fairness and its sufficiency for the application.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123280446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-01DOI: 10.1109/ICSTW52544.2021.00019
Jaganmohan Chandrasekaran, Yu Lei, R. Kacker, D. R. Kuhn
Machine Learning (ML) models, a core component to artificial intelligence systems, often come as a black box to the user, leading to the problem of interpretability. Explainable Artificial Intelligence (XAI) is key to providing confidence and trustworthiness for machine learning-based software systems. We observe a fundamental connection between XAI and software fault localization. In this paper, we present an approach that uses BEN, a combinatorial testing-based software fault localization approach, to produce explanations for decisions made by ML models.
{"title":"A Combinatorial Approach to Explaining Image Classifiers","authors":"Jaganmohan Chandrasekaran, Yu Lei, R. Kacker, D. R. Kuhn","doi":"10.1109/ICSTW52544.2021.00019","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00019","url":null,"abstract":"Machine Learning (ML) models, a core component to artificial intelligence systems, often come as a black box to the user, leading to the problem of interpretability. Explainable Artificial Intelligence (XAI) is key to providing confidence and trustworthiness for machine learning-based software systems. We observe a fundamental connection between XAI and software fault localization. In this paper, we present an approach that uses BEN, a combinatorial testing-based software fault localization approach, to produce explanations for decisions made by ML models.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"13 1-4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120964651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-01DOI: 10.1109/ICSTW52544.2021.00055
K. Meinke
Autonomous driving represents a significant challenge to all software quality assurance techniques, including testing. Generative machine learning (ML) techniques including active ML have considerable potential to generate high quality synthetic test data that can complement and improve on existing techniques such as hardware-in-the-loop and road testing.
{"title":"Active Machine Learning to Test Autonomous Driving","authors":"K. Meinke","doi":"10.1109/ICSTW52544.2021.00055","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00055","url":null,"abstract":"Autonomous driving represents a significant challenge to all software quality assurance techniques, including testing. Generative machine learning (ML) techniques including active ML have considerable potential to generate high quality synthetic test data that can complement and improve on existing techniques such as hardware-in-the-loop and road testing.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122315662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-01DOI: 10.1109/ICSTW52544.2021.00030
Joel Öhrling, D. Truscan, S. Lafond
The management of the energy consumption and thermal dissipation of multi-core heterogeneous platforms is becoming increasingly important as it can have direct impact on the platform performance. This paper discusses an approach that enables fast exploration and validation of heterogeneous system on chips (SoCs) platform configurations with respect to their thermal dissipation. Such platforms can be configured to find the optimal trade-off between performance and power consumption. This directly reflects in the head dissipation of the platform, which when increases over a given threshold will actually decrease the performance of the platform. Therefore, it is important to be able to quickly probe and explore different configurations and identify the most suitable one. However, this task is hindered by the large space of possible configurations of such platforms and by the time required to benchmark each configurations. As such, we propose an approach in which we construct a model of the thermal dissipation of a given platform using a system identification methods and then we use this model to explore and validate different configurations. The approach allows us to decrease the exploration time with several orders of magnitude. We exemplify the approach on an Odroid-XU4 board featuring an Exynos 5422 SoC.
{"title":"Enabling Fast Exploration and Validation of Thermal Dissipation Requirements for Heterogeneous SoCs","authors":"Joel Öhrling, D. Truscan, S. Lafond","doi":"10.1109/ICSTW52544.2021.00030","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00030","url":null,"abstract":"The management of the energy consumption and thermal dissipation of multi-core heterogeneous platforms is becoming increasingly important as it can have direct impact on the platform performance. This paper discusses an approach that enables fast exploration and validation of heterogeneous system on chips (SoCs) platform configurations with respect to their thermal dissipation. Such platforms can be configured to find the optimal trade-off between performance and power consumption. This directly reflects in the head dissipation of the platform, which when increases over a given threshold will actually decrease the performance of the platform. Therefore, it is important to be able to quickly probe and explore different configurations and identify the most suitable one. However, this task is hindered by the large space of possible configurations of such platforms and by the time required to benchmark each configurations. As such, we propose an approach in which we construct a model of the thermal dissipation of a given platform using a system identification methods and then we use this model to explore and validate different configurations. The approach allows us to decrease the exploration time with several orders of magnitude. We exemplify the approach on an Odroid-XU4 board featuring an Exynos 5422 SoC.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128149887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-01DOI: 10.1109/ICSTW52544.2021.00026
Bernhard Garn, Daniel Sebastian Lang, Manuel Leithner, D. R. Kuhn, R. Kacker, D. Simos
Cross-Site scripting (XSS) is a common class of vulnerabilities in the domain of web applications. As it re-mains prevalent despite continued efforts by practitioners and researchers, site operators often seek to protect their assets using web application firewalls (WAFs). These systems employ filtering mechanisms to intercept and reject requests that may be suitable to exploit XSS flaws and related vulnerabilities such as SQL injections. However, they generally do not offer complete protection and can often be bypassed using specifically crafted exploits. In this work, we evaluate the effectiveness of WAFs to detect XSS exploits. We develop an attack grammar and use a combinatorial testing approach to generate attack vectors. We compare our vectors with conventional counterparts and their ability to bypass different WAFs. Our results show that the vectors generated with combinatorial testing perform equal or better in almost all cases. They further confirm that most of the rule sets evaluated in this work can be bypassed by at least one of these crafted inputs.
{"title":"Combinatorially XSSing Web Application Firewalls","authors":"Bernhard Garn, Daniel Sebastian Lang, Manuel Leithner, D. R. Kuhn, R. Kacker, D. Simos","doi":"10.1109/ICSTW52544.2021.00026","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00026","url":null,"abstract":"Cross-Site scripting (XSS) is a common class of vulnerabilities in the domain of web applications. As it re-mains prevalent despite continued efforts by practitioners and researchers, site operators often seek to protect their assets using web application firewalls (WAFs). These systems employ filtering mechanisms to intercept and reject requests that may be suitable to exploit XSS flaws and related vulnerabilities such as SQL injections. However, they generally do not offer complete protection and can often be bypassed using specifically crafted exploits. In this work, we evaluate the effectiveness of WAFs to detect XSS exploits. We develop an attack grammar and use a combinatorial testing approach to generate attack vectors. We compare our vectors with conventional counterparts and their ability to bypass different WAFs. Our results show that the vectors generated with combinatorial testing perform equal or better in almost all cases. They further confirm that most of the rule sets evaluated in this work can be bypassed by at least one of these crafted inputs.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130113584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-01DOI: 10.1109/ICSTW52544.2021.00034
Jordan Doyle, Takfarinas Saber, Paolo Arcaini, Anthony Ventresque
Testing mobile applications often relies on tools, such as Exerciser Monkey for Android systems, that simulate user input. Exerciser Monkey, for example, generates random events (e.g., touches, gestures, navigational keys) that give developers a sense of what their application will do when deployed on real mobile phones with real users interacting with it. These tools, however, have no knowledge of the underlying applications' structures and only interact with them randomly or in a predefined manner (e.g., if developers designed scenarios, a labour-intensive task) - making them slow and poor at finding bugs.In this paper, we propose a novel control flow structure able to represent the code of Android applications, including all the interactive elements. We show that our structure can increase the effectiveness (higher coverage) and efficiency (removing duplicate/redundant tests) of the Exerciser Monkey by giving it knowledge of the test environment. We compare the interface coverage achieved by the Exerciser Monkey with our new Monkey++ using a depth first search of our control flow structure and show that while the random nature of Exerciser Monkey creates slow test suites of poor coverage, the test suite created by a depth first search is one order of magnitude faster and achieves full coverage of the user interaction elements. We believe this research will lead to a more effective and efficient Exerciser Monkey, as well as better targeted search based techniques for automated Android testing.
{"title":"Improving Mobile User Interface Testing with Model Driven Monkey Search","authors":"Jordan Doyle, Takfarinas Saber, Paolo Arcaini, Anthony Ventresque","doi":"10.1109/ICSTW52544.2021.00034","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00034","url":null,"abstract":"Testing mobile applications often relies on tools, such as Exerciser Monkey for Android systems, that simulate user input. Exerciser Monkey, for example, generates random events (e.g., touches, gestures, navigational keys) that give developers a sense of what their application will do when deployed on real mobile phones with real users interacting with it. These tools, however, have no knowledge of the underlying applications' structures and only interact with them randomly or in a predefined manner (e.g., if developers designed scenarios, a labour-intensive task) - making them slow and poor at finding bugs.In this paper, we propose a novel control flow structure able to represent the code of Android applications, including all the interactive elements. We show that our structure can increase the effectiveness (higher coverage) and efficiency (removing duplicate/redundant tests) of the Exerciser Monkey by giving it knowledge of the test environment. We compare the interface coverage achieved by the Exerciser Monkey with our new Monkey++ using a depth first search of our control flow structure and show that while the random nature of Exerciser Monkey creates slow test suites of poor coverage, the test suite created by a depth first search is one order of magnitude faster and achieves full coverage of the user interaction elements. We believe this research will lead to a more effective and efficient Exerciser Monkey, as well as better targeted search based techniques for automated Android testing.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130266958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-01DOI: 10.1109/ICSTW52544.2021.00051
F. Ricca, A. Marchetto, Andrea Stocco
This paper provides the results of a survey of the grey literature concerning the use of artificial intelligence to improve test automation practices. We surveyed more than 1,200 sources of grey literature (e.g., blogs, white-papers, user manuals, StackOverflow posts) looking for highlights by professionals on how AI is adopted to aid the development and evolution of test code. Ultimately, we filtered 136 relevant documents from which we extracted a taxonomy of problems that AI aims to tackle, along with a taxonomy of AI-enabled solutions to such problems. Manual code development and automated test generation are the most cited problem and solution, respectively. The paper concludes by distilling the six most prevalent tools on the market, along with think-aloud reflections about the current and future status of artificial intelligence for test automation.
{"title":"AI-based Test Automation: A Grey Literature Analysis","authors":"F. Ricca, A. Marchetto, Andrea Stocco","doi":"10.1109/ICSTW52544.2021.00051","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00051","url":null,"abstract":"This paper provides the results of a survey of the grey literature concerning the use of artificial intelligence to improve test automation practices. We surveyed more than 1,200 sources of grey literature (e.g., blogs, white-papers, user manuals, StackOverflow posts) looking for highlights by professionals on how AI is adopted to aid the development and evolution of test code. Ultimately, we filtered 136 relevant documents from which we extracted a taxonomy of problems that AI aims to tackle, along with a taxonomy of AI-enabled solutions to such problems. Manual code development and automated test generation are the most cited problem and solution, respectively. The paper concludes by distilling the six most prevalent tools on the market, along with think-aloud reflections about the current and future status of artificial intelligence for test automation.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122945632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-04-01DOI: 10.1109/ICSTW52544.2021.00020
Brian Elgaard Bennett
Enterprises are increasingly adopting an API-first approach to connect and expose software services. Saxo Bank is no exception to this.Crafting test suites for such APIs can seem straight forward due to the headless nature, but our experience shows that test suites often have two problems. The first problem is that execution of tests tends to fail and pass in seemingly nondeterministic ways (tests are flaky). The second problem is that functional coverage is not clearly documented.We have found that both problems stem from a lack of explicit focus on initial context (IC), a concept from behavior driven development. When a test is flaky it is often because actual IC in the test environment is not as required by the test. When functional coverage is not clear, it is most often because a systematic analysis involving IC was not performed.We propose a method for test analysis in which we include IC in the input space when analyzing functional coverage for an API, thereby including anything which can influence the outcome of test cases.Establishing IC is in general a hard problem. We have found that focus on the bounded context, a concept from domain driven design, of the system under test is a practical way to establish relevant IC.Experience with Saxo Bank's Open API shows that this method allows testers and developers to cooperate continuously, producing test plan documents which include the reasoning behind functional coverage. Explicit focus on IC in automated test case implementations turns flaky tests into tests which report on required IC in a test environment. The method easily generalizes to all levels of API tests.
{"title":"A Practical Method for API Testing in the Context of Continuous Delivery and Behavior Driven Development","authors":"Brian Elgaard Bennett","doi":"10.1109/ICSTW52544.2021.00020","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00020","url":null,"abstract":"Enterprises are increasingly adopting an API-first approach to connect and expose software services. Saxo Bank is no exception to this.Crafting test suites for such APIs can seem straight forward due to the headless nature, but our experience shows that test suites often have two problems. The first problem is that execution of tests tends to fail and pass in seemingly nondeterministic ways (tests are flaky). The second problem is that functional coverage is not clearly documented.We have found that both problems stem from a lack of explicit focus on initial context (IC), a concept from behavior driven development. When a test is flaky it is often because actual IC in the test environment is not as required by the test. When functional coverage is not clear, it is most often because a systematic analysis involving IC was not performed.We propose a method for test analysis in which we include IC in the input space when analyzing functional coverage for an API, thereby including anything which can influence the outcome of test cases.Establishing IC is in general a hard problem. We have found that focus on the bounded context, a concept from domain driven design, of the system under test is a practical way to establish relevant IC.Experience with Saxo Bank's Open API shows that this method allows testers and developers to cooperate continuously, producing test plan documents which include the reasoning behind functional coverage. Explicit focus on IC in automated test case implementations turns flaky tests into tests which report on required IC in a test environment. The method easily generalizes to all levels of API tests.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123087320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-03-02DOI: 10.1109/ICSTW52544.2021.00039
Markus Borg, Ronald Jabangwe, Simon Åberg, Arvid Ekblom, Ludwig Hedlund, August Lidfeldt
Machine Learning (ML) is a fundamental part of modern perception systems. In the last decade, the performance of computer vision using trained deep neural networks has outperformed previous approaches based on careful feature engineering. However, the opaqueness of large ML models is a substantial impediment for critical applications such as in the automotive context. As a remedy, Gradient-weighted Class Activation Mapping (Grad-CAM) has been proposed to provide visual explanations of model internals. In this paper, we demonstrate how Grad-CAM heatmaps can be used to increase the explainability of an image recognition model trained for a pedestrian underpass. We argue how the heatmaps support compliance to the EU’s seven key requirements for Trustworthy AI. Finally, we propose adding automated heatmap analysis as a pipe segment in an MLOps pipeline. We believe that such a building block can be used to automatically detect if a trained ML-model is activated based on invalid pixels in test images, suggesting biased models.
{"title":"Test Automation with Grad-CAM Heatmaps - A Future Pipe Segment in MLOps for Vision AI?","authors":"Markus Borg, Ronald Jabangwe, Simon Åberg, Arvid Ekblom, Ludwig Hedlund, August Lidfeldt","doi":"10.1109/ICSTW52544.2021.00039","DOIUrl":"https://doi.org/10.1109/ICSTW52544.2021.00039","url":null,"abstract":"Machine Learning (ML) is a fundamental part of modern perception systems. In the last decade, the performance of computer vision using trained deep neural networks has outperformed previous approaches based on careful feature engineering. However, the opaqueness of large ML models is a substantial impediment for critical applications such as in the automotive context. As a remedy, Gradient-weighted Class Activation Mapping (Grad-CAM) has been proposed to provide visual explanations of model internals. In this paper, we demonstrate how Grad-CAM heatmaps can be used to increase the explainability of an image recognition model trained for a pedestrian underpass. We argue how the heatmaps support compliance to the EU’s seven key requirements for Trustworthy AI. Finally, we propose adding automated heatmap analysis as a pipe segment in an MLOps pipeline. We believe that such a building block can be used to automatically detect if a trained ML-model is activated based on invalid pixels in test images, suggesting biased models.","PeriodicalId":371680,"journal":{"name":"2021 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126511173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}