Automatic speech recognition systems transform speech audio data into text data, i.e., word sequences, as the recognition results. These word sequences are generally defined by the language model of the speech recognition system. Therefore, the capability of the speech recognition system to translate audio data obtained by typically pronouncing word sequences that are accepted by the language model into word sequences that are equivalent to the original ones can be regarded as a basic capability of the speech recognition systems. This work describes a testing method that checks whether speech recognition systems have this basic recognition capability. The method can verify the basic capability by performing the testing separately from recognition robustness testing. It can also be fully automated. We constructed a test automation system and evaluated though several experiments whether it could detect defects in speech recognition systems. The results demonstrate that the test automation system can effectively detect basic defects at an early phase of speech recognition development or refinement.
{"title":"Automated Testing of Basic Recognition Capability for Speech Recognition Systems","authors":"Futoshi Iwama, Takashi Fukuda","doi":"10.1109/ICST.2019.00012","DOIUrl":"https://doi.org/10.1109/ICST.2019.00012","url":null,"abstract":"Automatic speech recognition systems transform speech audio data into text data, i.e., word sequences, as the recognition results. These word sequences are generally defined by the language model of the speech recognition system. Therefore, the capability of the speech recognition system to translate audio data obtained by typically pronouncing word sequences that are accepted by the language model into word sequences that are equivalent to the original ones can be regarded as a basic capability of the speech recognition systems. This work describes a testing method that checks whether speech recognition systems have this basic recognition capability. The method can verify the basic capability by performing the testing separately from recognition robustness testing. It can also be fully automated. We constructed a test automation system and evaluated though several experiments whether it could detect defects in speech recognition systems. The results demonstrate that the test automation system can effectively detect basic defects at an early phase of speech recognition development or refinement.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123162025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Program repair has been an active research area for over a decade and has achieved great strides in terms of scalable automated repair tools. In this paper we argue that existing program repair tools lack an important ingredient, which limits their scope and their efficiency: a formal definition of a fault, and a formal characterization of fault removal. To support our conjecture, we consider GenProg, an archetypical program repair tool, and modify it according to our definitions of fault and fault removal; then we show, by means of empirical experiments, the impact that this has on the effectiveness and efficiency of thee tool.
{"title":"Program Repair at Arbitrary Fault Depth","authors":"Besma Khaireddine, Matias Martinez, A. Mili","doi":"10.1109/ICST.2019.00056","DOIUrl":"https://doi.org/10.1109/ICST.2019.00056","url":null,"abstract":"Program repair has been an active research area for over a decade and has achieved great strides in terms of scalable automated repair tools. In this paper we argue that existing program repair tools lack an important ingredient, which limits their scope and their efficiency: a formal definition of a fault, and a formal characterization of fault removal. To support our conjecture, we consider GenProg, an archetypical program repair tool, and modify it according to our definitions of fault and fault removal; then we show, by means of empirical experiments, the impact that this has on the effectiveness and efficiency of thee tool.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127485023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vulnerable source code fragments remain unfixed for many years and they always propagate to other systems. Unfortunately, this happens often, when patch files are not propagated to all vulnerable code clones. An unpatched bug is a critical security problem, which should be detected and repaired as early as possible. In this paper, we present VCIPR, a scalable system for vulnerability detection in unpatched source code. We present a unique way, that uses a fast, token-based approach to detect vulnerabilities at function level granularity. This approach is language independent, which supports multiple programming languages including Java, C/C++, JavaScript. VCIPR detects most common repair patterns in patch files for the vulnerability code evaluation. We build fingerprint index of top critical CVE's source code, which were retrieved from a reliable source. Then we detect unpatched (vulnerable/non-vulnerable) code fragments in common open source software with high accuracy. A comparison with the state-of-the-art tools proves the effectiveness, efficiency and scalability of our approach. Furthermore, this paper shows that how the hackers can easily identify the vulnerable software whenever a patch file is released.
{"title":"VCIPR: Vulnerable Code is Identifiable When a Patch is Released (Hacker's Perspective)","authors":"Junaid Akram, Liang Qi, Ping Luo","doi":"10.1109/ICST.2019.00049","DOIUrl":"https://doi.org/10.1109/ICST.2019.00049","url":null,"abstract":"Vulnerable source code fragments remain unfixed for many years and they always propagate to other systems. Unfortunately, this happens often, when patch files are not propagated to all vulnerable code clones. An unpatched bug is a critical security problem, which should be detected and repaired as early as possible. In this paper, we present VCIPR, a scalable system for vulnerability detection in unpatched source code. We present a unique way, that uses a fast, token-based approach to detect vulnerabilities at function level granularity. This approach is language independent, which supports multiple programming languages including Java, C/C++, JavaScript. VCIPR detects most common repair patterns in patch files for the vulnerability code evaluation. We build fingerprint index of top critical CVE's source code, which were retrieved from a reliable source. Then we detect unpatched (vulnerable/non-vulnerable) code fragments in common open source software with high accuracy. A comparison with the state-of-the-art tools proves the effectiveness, efficiency and scalability of our approach. Furthermore, this paper shows that how the hackers can easily identify the vulnerable software whenever a patch file is released.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117213238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thread-safe classes are common in concurrent object-oriented programs. Testing such classes is important to ensure the reliability of the concurrent programs that rely on them. Recently, researchers have proposed the automated generation of concurrent (multi-threaded) tests to expose concurrency faults in thread-safe classes (thread-safety violations). However, generating fault-revealing concurrent tests within an affordable time-budget is difficult due to the huge search space of possible concurrent tests. In this paper, we present DepCon, an approach to effectively reduce the search space of concurrent tests by means of both parallel and conflict dependency analyses. DepCon is based on the intuition that only methods that can both interleave (parallel dependent) and access the same shared memory locations (conflict dependent) can lead to thread-safety violations when concurrently executed. DepCon implements an efficient static analysis to compute the parallel and conflict dependencies among the methods of a class and uses the computed dependencies to steer the generation of tests towards concurrent tests that exhibit the computed dependencies. We evaluated DepCon by experimenting with a prototype implementation for Java programs on a set of thread-safe classes with known concurrency faults. The experimental results show that DepCon is more effective in exposing concurrency faults than state-of-the-art techniques.
{"title":"Coverage-Driven Test Generation for Thread-Safe Classes via Parallel and Conflict Dependencies","authors":"Valerio Terragni, M. Pezzè, F. A. Bianchi","doi":"10.1109/ICST.2019.00034","DOIUrl":"https://doi.org/10.1109/ICST.2019.00034","url":null,"abstract":"Thread-safe classes are common in concurrent object-oriented programs. Testing such classes is important to ensure the reliability of the concurrent programs that rely on them. Recently, researchers have proposed the automated generation of concurrent (multi-threaded) tests to expose concurrency faults in thread-safe classes (thread-safety violations). However, generating fault-revealing concurrent tests within an affordable time-budget is difficult due to the huge search space of possible concurrent tests. In this paper, we present DepCon, an approach to effectively reduce the search space of concurrent tests by means of both parallel and conflict dependency analyses. DepCon is based on the intuition that only methods that can both interleave (parallel dependent) and access the same shared memory locations (conflict dependent) can lead to thread-safety violations when concurrently executed. DepCon implements an efficient static analysis to compute the parallel and conflict dependencies among the methods of a class and uses the computed dependencies to steer the generation of tests towards concurrent tests that exhibit the computed dependencies. We evaluated DepCon by experimenting with a prototype implementation for Java programs on a set of thread-safe classes with known concurrency faults. The experimental results show that DepCon is more effective in exposing concurrency faults than state-of-the-art techniques.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131454302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quentin Plazar, M. Acher, Gilles Perrouin, Xavier Devroey, Maxime Cordy
Uniform or near-uniform generation of solutions for large satisfiability formulas is a problem of theoretical and practical interest for the testing community. Recent works proposed two algorithms (namely UniGen and QuickSampler) for reaching a good compromise between execution time and uniformity guarantees, with empirical evidence on SAT benchmarks. In the context of highly-configurable software systems (e.g., Linux), it is unclear whether UniGen and QuickSampler can scale and sample uniform software configurations. In this paper, we perform a thorough experiment on 128 real-world feature models. We find that UniGen is unable to produce SAT solutions out of such feature models. Furthermore, we show that QuickSampler does not generate uniform samples and that some features are either never part of the sample or too frequently present. Finally, using a case study, we characterize the impacts of these results on the ability to find bugs in a configurable system. Overall, our results suggest that we are not there: more research is needed to explore the cost-effectiveness of uniform sampling when testing large configurable systems.
{"title":"Uniform Sampling of SAT Solutions for Configurable Systems: Are We There Yet?","authors":"Quentin Plazar, M. Acher, Gilles Perrouin, Xavier Devroey, Maxime Cordy","doi":"10.1109/ICST.2019.00032","DOIUrl":"https://doi.org/10.1109/ICST.2019.00032","url":null,"abstract":"Uniform or near-uniform generation of solutions for large satisfiability formulas is a problem of theoretical and practical interest for the testing community. Recent works proposed two algorithms (namely UniGen and QuickSampler) for reaching a good compromise between execution time and uniformity guarantees, with empirical evidence on SAT benchmarks. In the context of highly-configurable software systems (e.g., Linux), it is unclear whether UniGen and QuickSampler can scale and sample uniform software configurations. In this paper, we perform a thorough experiment on 128 real-world feature models. We find that UniGen is unable to produce SAT solutions out of such feature models. Furthermore, we show that QuickSampler does not generate uniform samples and that some features are either never part of the sample or too frequently present. Finally, using a case study, we characterize the impacts of these results on the ability to find bugs in a configurable system. Overall, our results suggest that we are not there: more research is needed to explore the cost-effectiveness of uniform sampling when testing large configurable systems.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123659155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Search based software testing is a popular and successful approach both in academia and industry. SBST methods typically aim to increase coverage whereas searching for executions with specific properties is largely unresearched. Fitness functions for execution properties often possess search landscapes that are difficult or intractable. We demonstrate how machine learning techniques can convert a property that is not searchable, in this case crashes, into one that is. Through experimentation on 6000 C programs drawn from the Codeflaws repository, we demonstrate a strong, program independent correlation between crashing executions and library function call patterns within those executions as discovered by a neural net. We then exploit the correlation to produce a searchable fitness landscape to modify American Fuzzy Lop, a widely used fuzz testing tool. On a test set of previously unseen programs drawn from Codeflaws, a search strategy based on a crash targeting fitness function outperformed a baseline in 80.1% of cases. The experiments were then repeated on three real world programs: the VLC media player, and the libjpeg and mpg321 libraries. The correlation between library call traces and crashes generalises as indicated by ROC AUC scores of 0.91, 0.88 and 0.61. The produced search landscape however is not convenient due to plateaus. This is likely because these programs do not use standard C libraries as often as do those in Codeflaws. This limitation can be overcome by considering a more powerful observation domain and a broader training corpus in future work. Despite limited generalisability of the experimental setup, this research opens new possibilities in the intersection of machine learning, fitness functions, and search based testing in general.
{"title":"Directing a Search Towards Execution Properties with a Learned Fitness Function","authors":"Leonid Joffe, D. Clark","doi":"10.1109/ICST.2019.00029","DOIUrl":"https://doi.org/10.1109/ICST.2019.00029","url":null,"abstract":"Search based software testing is a popular and successful approach both in academia and industry. SBST methods typically aim to increase coverage whereas searching for executions with specific properties is largely unresearched. Fitness functions for execution properties often possess search landscapes that are difficult or intractable. We demonstrate how machine learning techniques can convert a property that is not searchable, in this case crashes, into one that is. Through experimentation on 6000 C programs drawn from the Codeflaws repository, we demonstrate a strong, program independent correlation between crashing executions and library function call patterns within those executions as discovered by a neural net. We then exploit the correlation to produce a searchable fitness landscape to modify American Fuzzy Lop, a widely used fuzz testing tool. On a test set of previously unseen programs drawn from Codeflaws, a search strategy based on a crash targeting fitness function outperformed a baseline in 80.1% of cases. The experiments were then repeated on three real world programs: the VLC media player, and the libjpeg and mpg321 libraries. The correlation between library call traces and crashes generalises as indicated by ROC AUC scores of 0.91, 0.88 and 0.61. The produced search landscape however is not convenient due to plateaus. This is likely because these programs do not use standard C libraries as often as do those in Codeflaws. This limitation can be overcome by considering a more powerful observation domain and a broader training corpus in future work. Despite limited generalisability of the experimental setup, this research opens new possibilities in the intersection of machine learning, fitness functions, and search based testing in general.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123761193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mutation testing is a powerful technique for evaluating the quality of test suite which plays a key role in ensuring software quality. The concept of mutation testing has also been widely used in other software engineering studies, e.g., test generation, fault localization, and program repair. During the process of mutation testing, large number of mutants may be generated and then executed against the test suite to examine whether they can be killed, making the process extremely computational expensive. Several techniques have been proposed to speed up this process, including selective, weakened, and predictive mutation testing. Among those techniques, Predictive Mutation Testing (PMT) tries to build a classification model based on an amount of mutant execution records to predict whether coming new mutants would be killed or alive without mutant execution, and can achieve significant mutation cost reduction. In PMT, each mutant is represented as a list of features related to the mutant itself and the test suite, transforming the mutation testing problem to a binary classification problem. In this paper, we perform an extensive study on the effectiveness and efficiency of the promising PMT technique under the cross-project setting using a total 654 real world projects with more than 4 Million mutants. Our work also complements the original PMT work by considering more features and the powerful deep learning models. The experimental results show an average of over 0.85 prediction accuracy on 654 projects using cross validation, demonstrating the effectiveness of PMT. Meanwhile, a clear speed up is also observed with an average of 28.7X compared to traditional mutation testing with 5 threads. In addition, we analyze the importance of different groups of features in classification model, which provides important implications for the future research.
{"title":"An Extensive Study on Cross-Project Predictive Mutation Testing","authors":"Dongyu Mao, Lingchao Chen, Lingming Zhang","doi":"10.1109/ICST.2019.00025","DOIUrl":"https://doi.org/10.1109/ICST.2019.00025","url":null,"abstract":"Mutation testing is a powerful technique for evaluating the quality of test suite which plays a key role in ensuring software quality. The concept of mutation testing has also been widely used in other software engineering studies, e.g., test generation, fault localization, and program repair. During the process of mutation testing, large number of mutants may be generated and then executed against the test suite to examine whether they can be killed, making the process extremely computational expensive. Several techniques have been proposed to speed up this process, including selective, weakened, and predictive mutation testing. Among those techniques, Predictive Mutation Testing (PMT) tries to build a classification model based on an amount of mutant execution records to predict whether coming new mutants would be killed or alive without mutant execution, and can achieve significant mutation cost reduction. In PMT, each mutant is represented as a list of features related to the mutant itself and the test suite, transforming the mutation testing problem to a binary classification problem. In this paper, we perform an extensive study on the effectiveness and efficiency of the promising PMT technique under the cross-project setting using a total 654 real world projects with more than 4 Million mutants. Our work also complements the original PMT work by considering more features and the powerful deep learning models. The experimental results show an average of over 0.85 prediction accuracy on 654 projects using cross validation, demonstrating the effectiveness of PMT. Meanwhile, a clear speed up is also observed with an average of 28.7X compared to traditional mutation testing with 5 threads. In addition, we analyze the importance of different groups of features in classification model, which provides important implications for the future research.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126163476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie M. Zhang, Lingming Zhang, Dan Hao, Meng Wang, Lu Zhang
Code coverage is the most widely adopted criteria for measuring test effectiveness in software quality assurance. The performance of coverage criteria (in indicating test suites' effectiveness) has been widely studied in prior work. Most of the studies use randomly constructed pseudo test suites to facilitate data collection for correlation analysis, yet no previous work has systematically studied whether pseudo test suites would lead to inflated correlation results. This paper focuses on the potentially wide-spread threat with a study over 123 real-world Java projects. Following the typical experimental process of studying coverage criteria, we investigate the correlation between statement/assertion coverage and mutation score using both pseudo and original test suites. Except for direct correlation analysis, we control the number of assertions and the test suite size to conduct partial correlation analysis. The results reveal that 1) the correlation (between coverage criteria and mutation score) derived from pseudo test suites is much higher than from original test suites (from 0.21 to 0.39 higher in Kendall value); 2) contrary to previously reported, statement coverage has a stronger correlation with mutation score than assertion coverage.
{"title":"Do Pseudo Test Suites Lead to Inflated Correlation in Measuring Test Effectiveness?","authors":"Jie M. Zhang, Lingming Zhang, Dan Hao, Meng Wang, Lu Zhang","doi":"10.1109/ICST.2019.00033","DOIUrl":"https://doi.org/10.1109/ICST.2019.00033","url":null,"abstract":"Code coverage is the most widely adopted criteria for measuring test effectiveness in software quality assurance. The performance of coverage criteria (in indicating test suites' effectiveness) has been widely studied in prior work. Most of the studies use randomly constructed pseudo test suites to facilitate data collection for correlation analysis, yet no previous work has systematically studied whether pseudo test suites would lead to inflated correlation results. This paper focuses on the potentially wide-spread threat with a study over 123 real-world Java projects. Following the typical experimental process of studying coverage criteria, we investigate the correlation between statement/assertion coverage and mutation score using both pseudo and original test suites. Except for direct correlation analysis, we control the number of assertions and the test suite size to conduct partial correlation analysis. The results reveal that 1) the correlation (between coverage criteria and mutation score) derived from pseudo test suites is much higher than from original test suites (from 0.21 to 0.39 higher in Kendall value); 2) contrary to previously reported, statement coverage has a stronger correlation with mutation score than assertion coverage.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123302405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nima Dini, Cagdas Yelen, Miloš Gligorić, S. Khurshid
Bounded exhaustive testing (BET) techniques have been shown to be effective for detecting faults in software. BET techniques based on imperative predicates, enumerate all test inputs up to the given bounds such that each test input satisfies the properties encoded by the predicate. The search space is bounded by the user, who specifies the number of objects of each type and the list of values for each field of each type. To optimize the search, existing techniques detect isomorphic instances and record accessed fields during the execution of a predicate. However, these optimizations are extension-unaware, i.e., they do not speed up the search when the predicate is modified, say due to a fix or additional properties. We present a technique, named iGen, that speeds up test generation when imperative predicates are extended. iGen memoizes intermediate results of a test generation and reuses the results in a future search – even when the new search space differs from the old space. We integrated our technique in two BET tools (one for Java and one for Python) and evaluated these implementations with several data structure pairs, including two pairs from the Standard Java Library. Our results show that iGen speeds up test generation by up to 46.59x for the Java tool and up to 49.47x for the Python tool. Additionally, we show that the speedup obtained by iGen increases for larger test instances.
{"title":"Extension-Aware Automated Testing Based on Imperative Predicates","authors":"Nima Dini, Cagdas Yelen, Miloš Gligorić, S. Khurshid","doi":"10.1109/ICST.2019.00013","DOIUrl":"https://doi.org/10.1109/ICST.2019.00013","url":null,"abstract":"Bounded exhaustive testing (BET) techniques have been shown to be effective for detecting faults in software. BET techniques based on imperative predicates, enumerate all test inputs up to the given bounds such that each test input satisfies the properties encoded by the predicate. The search space is bounded by the user, who specifies the number of objects of each type and the list of values for each field of each type. To optimize the search, existing techniques detect isomorphic instances and record accessed fields during the execution of a predicate. However, these optimizations are extension-unaware, i.e., they do not speed up the search when the predicate is modified, say due to a fix or additional properties. We present a technique, named iGen, that speeds up test generation when imperative predicates are extended. iGen memoizes intermediate results of a test generation and reuses the results in a future search – even when the new search space differs from the old space. We integrated our technique in two BET tools (one for Java and one for Python) and evaluated these implementations with several data structure pairs, including two pairs from the Standard Java Library. Our results show that iGen speeds up test generation by up to 46.59x for the Java tool and up to 49.47x for the Python tool. Additionally, we show that the speedup obtained by iGen increases for larger test instances.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121898987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Having access to high-quality test data is an important requirement to ensure effective cross-organizational integration testing. The common practice for addressing this need is to generate synthetic data. However, existing approaches cannot generate representative datasets that can evolve to allow the simulation of the dynamics of the systems under test. In this PhD project, and in collaboration with an industrial partner, we investigate the use of machine learning techniques for developing novel solutions that can generate synthetic, dynamic and representative test data.
{"title":"A Model-Based Approach to Generate Dynamic Synthetic Test Data","authors":"Chao Tan","doi":"10.1109/ICST.2019.00063","DOIUrl":"https://doi.org/10.1109/ICST.2019.00063","url":null,"abstract":"Having access to high-quality test data is an important requirement to ensure effective cross-organizational integration testing. The common practice for addressing this need is to generate synthetic data. However, existing approaches cannot generate representative datasets that can evolve to allow the simulation of the dynamics of the systems under test. In this PhD project, and in collaboration with an industrial partner, we investigate the use of machine learning techniques for developing novel solutions that can generate synthetic, dynamic and representative test data.","PeriodicalId":446827,"journal":{"name":"2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120969000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}