{"title":"Fault tolerance and metamorphic relation prediction","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1896","DOIUrl":"https://doi.org/10.1002/stvr.1896","url":null,"abstract":"","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhouxian Jiang, Honghui Li, Rui Wang, Xuetao Tian, Ci Liang, Fei Yan, Junwen Zhang, Zhen Liu
Despite numerous applications of deep learning technologies on critical tasks in various domains, advanced deep neural networks (DNNs) face persistent safety and security challenges, such as the overconfidence in predicting out‐of‐distribution samples and susceptibility to adversarial examples. Thorough testing by exploring the input space serves as a key strategy to ensure their robustness and trustworthiness of these networks. However, existing testing methods focus on disclosing more erroneous model behaviours, overlooking the validity of the generated test inputs. To mitigate this issue, we investigate devising valid test input generation method for DNNs from a predictive uncertainty perspective. Through a large‐scale empirical study across 11 predictive uncertainty metrics for DNNs, we explore the correlation between validity and uncertainty of test inputs. Our findings reveal that the predictive entropy‐based and ensemble‐based uncertainty metrics effectively characterize the input validity demonstration. Building on these insights, we introduce UCTest, an uncertainty‐guided deep learning testing approach, to efficiently generate valid and authentic test inputs. We formulate a joint optimization objective: to uncover the model's misbehaviours by maximizing the loss function and concurrently generate valid test input by minimizing uncertainty. Extensive experiments demonstrate that our approach outperforms the current testing methods in generating valid test inputs. Furthermore, incorporating natural variation through data augmentation techniques into UCTest effectively boosts the diversity of generated test inputs.
{"title":"Validity Matters: Uncertainty‐Guided Testing of Deep Neural Networks","authors":"Zhouxian Jiang, Honghui Li, Rui Wang, Xuetao Tian, Ci Liang, Fei Yan, Junwen Zhang, Zhen Liu","doi":"10.1002/stvr.1894","DOIUrl":"https://doi.org/10.1002/stvr.1894","url":null,"abstract":"Despite numerous applications of deep learning technologies on critical tasks in various domains, advanced deep neural networks (DNNs) face persistent safety and security challenges, such as the overconfidence in predicting out‐of‐distribution samples and susceptibility to adversarial examples. Thorough testing by exploring the input space serves as a key strategy to ensure their robustness and trustworthiness of these networks. However, existing testing methods focus on disclosing more erroneous model behaviours, overlooking the validity of the generated test inputs. To mitigate this issue, we investigate devising valid test input generation method for DNNs from a predictive uncertainty perspective. Through a large‐scale empirical study across 11 predictive uncertainty metrics for DNNs, we explore the correlation between validity and uncertainty of test inputs. Our findings reveal that the predictive entropy‐based and ensemble‐based uncertainty metrics effectively characterize the input validity demonstration. Building on these insights, we introduce UCTest, an uncertainty‐guided deep learning testing approach, to efficiently generate valid and authentic test inputs. We formulate a joint optimization objective: to uncover the model's misbehaviours by maximizing the loss function and concurrently generate valid test input by minimizing uncertainty. Extensive experiments demonstrate that our approach outperforms the current testing methods in generating valid test inputs. Furthermore, incorporating natural variation through data augmentation techniques into UCTest effectively boosts the diversity of generated test inputs.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web‐based test automation heavily relies on accurately finding web elements. Traditional methods compare attributes but do not grasp the context and meaning of elements and words. The emergence of large language models (LLMs) like GPT‐4, which can show human‐like reasoning abilities on some tasks, offers new opportunities for software engineering and web element localization. This paper introduces and evaluates VON Similo LLM, an enhanced web element localization approach. Using an LLM, it selects the most likely web element from the top‐ranked ones identified by the existing VON Similo method, ideally aiming to get closer to human‐like selection accuracy. An experimental study was conducted using 804 web element pairs from 48 real‐world web applications. We measured the number of correctly identified elements as well as the execution times, comparing the effectiveness and efficiency of VON Similo LLM against the baseline algorithm. In addition, motivations from the LLM were recorded and analysed for 140 instances. VON Similo LLM demonstrated improved performance, reducing failed localizations from 70 to 40 (out of 804), a 43% reduction. Despite its slower execution time and additional costs of using the GPT‐4 model, the LLM's human‐like reasoning showed promise in enhancing web element localization. LLM technology can enhance web element localization in GUI test automation, reducing false positives and potentially lowering maintenance costs. However, further research is necessary to fully understand LLMs' capabilities, limitations and practical use in GUI testing.
{"title":"Improving Web Element Localization by Using a Large Language Model","authors":"Michel Nass, Emil Alégroth, Robert Feldt","doi":"10.1002/stvr.1893","DOIUrl":"https://doi.org/10.1002/stvr.1893","url":null,"abstract":"Web‐based test automation heavily relies on accurately finding web elements. Traditional methods compare attributes but do not grasp the context and meaning of elements and words. The emergence of large language models (LLMs) like GPT‐4, which can show human‐like reasoning abilities on some tasks, offers new opportunities for software engineering and web element localization. This paper introduces and evaluates VON Similo LLM, an enhanced web element localization approach. Using an LLM, it selects the most likely web element from the top‐ranked ones identified by the existing VON Similo method, ideally aiming to get closer to human‐like selection accuracy. An experimental study was conducted using 804 web element pairs from 48 real‐world web applications. We measured the number of correctly identified elements as well as the execution times, comparing the effectiveness and efficiency of VON Similo LLM against the baseline algorithm. In addition, motivations from the LLM were recorded and analysed for 140 instances. VON Similo LLM demonstrated improved performance, reducing failed localizations from 70 to 40 (out of 804), a 43% reduction. Despite its slower execution time and additional costs of using the GPT‐4 model, the LLM's human‐like reasoning showed promise in enhancing web element localization. LLM technology can enhance web element localization in GUI test automation, reducing false positives and potentially lowering maintenance costs. However, further research is necessary to fully understand LLMs' capabilities, limitations and practical use in GUI testing.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yifan Zhang, Dave Towey, Matthew Pike, Jia Cheng Han, Zhi Quan Zhou, Chenghao Yin, Qian Wang, Chen Xie
The proliferation of driver‐assistance features in vehicles has resulted in a growing interest among the public in fully autonomous driving systems (ADSs). However, the integration of software and hardware in these complex systems presents significant testing challenges, particularly with respect to ensuring passenger safety. To address these challenges, simulation has emerged as a crucial step in the testing of ADSs. This paper presents a solution to the challenges faced in testing ADSs, with a focus on the validation of ADS simulators. The proposed approach involves using simulations and metamorphic testing (MT) to generate multiple concrete metamorphic relations (MRs) for testing ADS simulators. In order to accomplish this goal, we introduce three metamorphic relation patterns (MRPs). Each MRP is accompanied by a metamorphic relation input pattern (MRIP) that aids in generating detailed MRs. These MRs are designed to identify potential issues within the ADS simulator. To simplify the testing process and facilitate MT for testers, a self‐evolving scenario‐testing framework is also presented. The framework allows testers to improve test cases and MRs iteratively until issues detected are confirmed. The benefits and limitations of the framework are demonstrated using an industry case study. Overall, this study offers a practical solution to the challenges in testing ADSs and provides useful insights into improving testing efficiency for researchers and practitioners in the field.
{"title":"Scenario‐Driven Metamorphic Testing for Autonomous Driving Simulators","authors":"Yifan Zhang, Dave Towey, Matthew Pike, Jia Cheng Han, Zhi Quan Zhou, Chenghao Yin, Qian Wang, Chen Xie","doi":"10.1002/stvr.1892","DOIUrl":"https://doi.org/10.1002/stvr.1892","url":null,"abstract":"The proliferation of driver‐assistance features in vehicles has resulted in a growing interest among the public in fully autonomous driving systems (ADSs). However, the integration of software and hardware in these complex systems presents significant testing challenges, particularly with respect to ensuring passenger safety. To address these challenges, simulation has emerged as a crucial step in the testing of ADSs. This paper presents a solution to the challenges faced in testing ADSs, with a focus on the validation of ADS simulators. The proposed approach involves using simulations and metamorphic testing (MT) to generate multiple concrete metamorphic relations (MRs) for testing ADS simulators. In order to accomplish this goal, we introduce three metamorphic relation patterns (MRPs). Each MRP is accompanied by a metamorphic relation input pattern (MRIP) that aids in generating detailed MRs. These MRs are designed to identify potential issues within the ADS simulator. To simplify the testing process and facilitate MT for testers, a self‐evolving scenario‐testing framework is also presented. The framework allows testers to improve test cases and MRs iteratively until issues detected are confirmed. The benefits and limitations of the framework are demonstrated using an industry case study. Overall, this study offers a practical solution to the challenges in testing ADSs and provides useful insights into improving testing efficiency for researchers and practitioners in the field.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141781078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The DHR architecture provides a revolutionary security defense structure for cyberspace. The multimode ruling in DHR is expected to alleviate the oracle problem, which still suffers from the existence of common model vulnerability. In this work, we design a test segmentation method to transform multimode ruling to a metamorphic testing problem. The text test input that causes inconsistency of heterogeneous executors is converted to a condition set, and we extract subsets of conditions based on its syntax tree. The original test can exploit a specific vulnerability, the follow‐up tests are composed by different subsets of conditions within the original test. We collect the execution matrix for the follow‐up tests to analyse the impact of each subset of conditions on ruling decision. Metamorphic relations are extracted based on the localization of independent condition, that is, the subsets of conditions that can impact ruling decision independently. The executors in an inconsistent ruling should be examined with metamorphic testing methods, rather than traditional majority voting mechanism. The proposed test segmentation and improved multimode ruling methods are evaluated on two DHR‐based cases, SQL injection in cyber‐range system and deserialization attack in ‐ project. The experimental results show that our test segmentation can help to locate malicious expressions and the metamorphic testing‐based multimode ruling can generate more correct results than majority voting mechanism with an average 15.8% performance loss.
{"title":"Boosting Multimode Ruling in DHR Architecture With Metamorphic Relations","authors":"Ruosi Li, Xianglong Kong, Wei Guo, Jingdong Guo, Hongfa Li, Fan Zhang","doi":"10.1002/stvr.1890","DOIUrl":"https://doi.org/10.1002/stvr.1890","url":null,"abstract":"The DHR architecture provides a revolutionary security defense structure for cyberspace. The multimode ruling in DHR is expected to alleviate the oracle problem, which still suffers from the existence of common model vulnerability. In this work, we design a test segmentation method to transform multimode ruling to a metamorphic testing problem. The text test input that causes inconsistency of heterogeneous executors is converted to a condition set, and we extract subsets of conditions based on its syntax tree. The original test can exploit a specific vulnerability, the follow‐up tests are composed by different subsets of conditions within the original test. We collect the execution matrix for the follow‐up tests to analyse the impact of each subset of conditions on ruling decision. Metamorphic relations are extracted based on the localization of independent condition, that is, the subsets of conditions that can impact ruling decision independently. The executors in an inconsistent ruling should be examined with metamorphic testing methods, rather than traditional majority voting mechanism. The proposed test segmentation and improved multimode ruling methods are evaluated on two DHR‐based cases, SQL injection in cyber‐range system and deserialization attack in ‐ project. The experimental results show that our test segmentation can help to locate malicious expressions and the metamorphic testing‐based multimode ruling can generate more correct results than majority voting mechanism with an average 15.8% performance loss.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141781077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Metamorphic testing (MT) is an effective testing technique having a broad range of applications. One key task for MT is the identification of metamorphic relations (MRs), which is a fundamental mechanism in MT and is critical to the automation of MT. Prior studies have proposed approaches for predicting MRs (PMR). One major idea behind these PMR approaches is to represent program source code information via manually designed code features and then to apply machine‐learning–based classifiers to automatically predict whether a specific MR can be applied on the target program. Nevertheless, the human‐involved procedure of selecting and extracting code features is costly, and it may not be easy to obtain sufficiently comprehensive features for representing source code. To overcome this limitation, in this study, we explore and evaluate the effectiveness of code representation learning techniques for PMR. By applying neural code representation models for automatically mapping program source code to code vectors, the PMR procedure can be boosted with learned code representations. We develop 32 PMR instances by, respectively, combining 8 code representation models with 4 typical classification models and conduct an extensive empirical study to investigate the effectiveness of code representation learning techniques in the context of MR prediction. Our findings reveal that code representation learning can positively contribute to the prediction of MRs and provide insights into the practical usage of code representation models in the context of MR prediction. Our findings could help researchers and practitioners to gain a deeper understanding of the strength of code representation learning for PMR and, hence, pave the way for future research in deriving or extracting MRs from program source code.
{"title":"Boosting Metamorphic Relation Prediction via Code Representation Learning: An Empirical Study","authors":"Xuedan Zheng, Mingyue Jiang, Zhi Quan Zhou","doi":"10.1002/stvr.1889","DOIUrl":"https://doi.org/10.1002/stvr.1889","url":null,"abstract":"Metamorphic testing (MT) is an effective testing technique having a broad range of applications. One key task for MT is the identification of metamorphic relations (MRs), which is a fundamental mechanism in MT and is critical to the automation of MT. Prior studies have proposed approaches for predicting MRs (PMR). One major idea behind these PMR approaches is to represent program source code information via manually designed code features and then to apply machine‐learning–based classifiers to automatically predict whether a specific MR can be applied on the target program. Nevertheless, the human‐involved procedure of selecting and extracting code features is costly, and it may not be easy to obtain sufficiently comprehensive features for representing source code. To overcome this limitation, in this study, we explore and evaluate the effectiveness of code representation learning techniques for PMR. By applying neural code representation models for automatically mapping program source code to code vectors, the PMR procedure can be boosted with learned code representations. We develop 32 PMR instances by, respectively, combining 8 code representation models with 4 typical classification models and conduct an extensive empirical study to investigate the effectiveness of code representation learning techniques in the context of MR prediction. Our findings reveal that code representation learning can positively contribute to the prediction of MRs and provide insights into the practical usage of code representation models in the context of MR prediction. Our findings could help researchers and practitioners to gain a deeper understanding of the strength of code representation learning for PMR and, hence, pave the way for future research in deriving or extracting MRs from program source code.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsafe code detection in Rust and metamorphic testing of autonomous driving systems","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1891","DOIUrl":"https://doi.org/10.1002/stvr.1891","url":null,"abstract":"","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ayoosh Bansal, Hunmin Kim, Simon Yu, Bo Li, Naira Hovakimyan, Marco Caccamo, Lui Sha
Advances in deep learning have revolutionized cyber‐physical applications, including the development of autonomous vehicles. However, real‐world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of deep neural networks (DNNs) in safety‐critical tasks, particularly perception. The inherent unverifiability of DNNs poses a key challenge in ensuring their safe and reliable operation. In this work, we propose perception simplex (), a fault‐tolerant application architecture designed for obstacle detection and collision avoidance. We analyse an existing LiDAR‐based classical obstacle detection algorithm to establish strict bounds on its capabilities and limitations. Such analysis and verification have not been possible for deep learning‐based perception systems yet. By employing verifiable obstacle detection algorithms, identifies obstacle existence detection faults in the output of unverifiable DNN‐based object detectors. When faults with potential collision risks are detected, appropriate corrective actions are initiated. Through extensive analysis and software‐in‐the‐loop simulations, we demonstrate that provides deterministic fault tolerance against obstacle existence detection faults, establishing a robust safety guarantee.
{"title":"Perception simplex: Verifiable collision avoidance in autonomous vehicles amidst obstacle detection faults","authors":"Ayoosh Bansal, Hunmin Kim, Simon Yu, Bo Li, Naira Hovakimyan, Marco Caccamo, Lui Sha","doi":"10.1002/stvr.1879","DOIUrl":"https://doi.org/10.1002/stvr.1879","url":null,"abstract":"Advances in deep learning have revolutionized cyber‐physical applications, including the development of autonomous vehicles. However, real‐world collisions involving autonomous control of vehicles have raised significant safety concerns regarding the use of deep neural networks (DNNs) in safety‐critical tasks, particularly perception. The inherent unverifiability of DNNs poses a key challenge in ensuring their safe and reliable operation. In this work, we propose perception simplex (), a fault‐tolerant application architecture designed for obstacle detection and collision avoidance. We analyse an existing LiDAR‐based classical obstacle detection algorithm to establish strict bounds on its capabilities and limitations. Such analysis and verification have not been possible for deep learning‐based perception systems yet. By employing verifiable obstacle detection algorithms, identifies obstacle existence detection faults in the output of unverifiable DNN‐based object detectors. When faults with potential collision risks are detected, appropriate corrective actions are initiated. Through extensive analysis and software‐in‐the‐loop simulations, we demonstrate that provides deterministic fault tolerance against obstacle existence detection faults, establishing a robust safety guarantee.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"90 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating fault injection techniques in hardware‐based deep neural networks and mutation‐based fault localization","authors":"Yves Le Traon, Tao Xie","doi":"10.1002/stvr.1880","DOIUrl":"https://doi.org/10.1002/stvr.1880","url":null,"abstract":"","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Yang, Song Huang, Tongtong Bai, Yongming Yao, Yang Wang, Changyou Zheng, Chunyan Xia
The development of artificial intelligence and information communication technology has significantly propelled advancements in autonomous driving. The advent of autonomous driving has a profound impact on societal development and transportation methods. However, as intelligent systems, autonomous driving systems (ADSs) often make wrong judgements in specific scenarios, resulting in accidents. There is an urgent need for comprehensive testing and validation of ADSs. Metamorphic testing (MT) techniques have demonstrated effectiveness in testing ADSs. Nevertheless, existing testing methods primarily encompass relatively simple metamorphic relations (MRs) that only verify ADSs from a single perspective. To ensure the safety of ADSs, it is essential to consider the various elements of driving scenarios during the testing process. Therefore, this paper proposes MetaSem, a novel metamorphic testing method based on semantic information of autonomous driving scenes. Based on semantic information of the autonomous driving scenes and traffic regulations, we design 11 MRs targeting different scenario elements. Three transformation modules are developed to execute addition, deletion and replacement operations on various scene elements within the images. Finally, corresponding evaluation metrics are defined based on MRs. MetaSem automatically discovers inconsistent behaviours according to the evaluation metrics. Our empirical study on three advanced and popular autonomous driving models demonstrates that MetaSem not only efficiently generates visually natural and realistic scene images but also detects 11,787 inconsistent behaviours on three driving models.
{"title":"MetaSem: metamorphic testing based on semantic information of autonomous driving scenes","authors":"Zhen Yang, Song Huang, Tongtong Bai, Yongming Yao, Yang Wang, Changyou Zheng, Chunyan Xia","doi":"10.1002/stvr.1878","DOIUrl":"https://doi.org/10.1002/stvr.1878","url":null,"abstract":"The development of artificial intelligence and information communication technology has significantly propelled advancements in autonomous driving. The advent of autonomous driving has a profound impact on societal development and transportation methods. However, as intelligent systems, autonomous driving systems (ADSs) often make wrong judgements in specific scenarios, resulting in accidents. There is an urgent need for comprehensive testing and validation of ADSs. Metamorphic testing (MT) techniques have demonstrated effectiveness in testing ADSs. Nevertheless, existing testing methods primarily encompass relatively simple metamorphic relations (MRs) that only verify ADSs from a single perspective. To ensure the safety of ADSs, it is essential to consider the various elements of driving scenarios during the testing process. Therefore, this paper proposes MetaSem, a novel metamorphic testing method based on semantic information of autonomous driving scenes. Based on semantic information of the autonomous driving scenes and traffic regulations, we design 11 MRs targeting different scenario elements. Three transformation modules are developed to execute addition, deletion and replacement operations on various scene elements within the images. Finally, corresponding evaluation metrics are defined based on MRs. MetaSem automatically discovers inconsistent behaviours according to the evaluation metrics. Our empirical study on three advanced and popular autonomous driving models demonstrates that MetaSem not only efficiently generates visually natural and realistic scene images but also detects 11,787 inconsistent behaviours on three driving models.","PeriodicalId":501413,"journal":{"name":"Software Testing, Verification and Reliability","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140830550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}