The article from this special issue was previously published in Software Testing, Verification and Reliability, Volume 29, Issue 4–5, 2019. For completeness we are including the title page of the article below. The full text of the article can be read in Issue 29:4–5 on Wiley Online Library: https://onlinelibrary.wiley.com/doi/10.1002/stvr.1701
{"title":"Choosing the fitness function for the job: Automated generation of test suites that detect real faults","authors":"Alireza Salahirad, H. Almulla, Gregory Gay","doi":"10.1002/stvr.1758","DOIUrl":"https://doi.org/10.1002/stvr.1758","url":null,"abstract":"The article from this special issue was previously published in Software Testing, Verification and Reliability, Volume 29, Issue 4–5, 2019. For completeness we are including the title page of the article below. The full text of the article can be read in Issue 29:4–5 on Wiley Online Library: https://onlinelibrary.wiley.com/doi/10.1002/stvr.1701","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"15 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78805636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Martino, A. R. Fasolino, L. L. L. Starace, Porfirio Tramontana
Exploratory testing and fully automated testing tools represent two viable and cheap alternatives to traditional test‐case‐based approaches for graphical user interface (GUI) testing of Android apps. The former can be executed by capture and replay tools that directly translate execution scenarios registered by testers in test cases, without requiring preliminary test‐case design and advanced programming/testing skills. The latter tools are able to test Android GUIs without tester intervention. Even if these two strategies are widely employed, to the best of our knowledge, no empirical investigation has been performed to compare their performance and obtain useful insights for a project manager to establish an effective testing strategy. In this paper, we present two experiments we carried out to compare the effectiveness of exploratory testing approaches using a capture and replay tool (Robotium Recorder) against three freely available automatic testing tools (AndroidRipper, Sapienz, and Google Robo). The first experiment involved 20 computer engineering students who were asked to record testing executions, under strict temporal limits and no access to the source code. Results were slightly better than those of fully automated tools, but not in a conclusive way. In the second experiment, the same students were asked to improve the achieved testing coverage by exploiting the source code and the coverage obtained in the previous tests, without strict temporal constraints. The results of this second experiment showed that students outperformed the automated tools especially for long/complex execution scenarios. The obtained findings provide useful indications for deciding testing strategies that combine manual exploratory testing and automated testing.
{"title":"Comparing the effectiveness of capture and replay against automatic input generation for Android graphical user interface testing","authors":"S. Martino, A. R. Fasolino, L. L. L. Starace, Porfirio Tramontana","doi":"10.1002/stvr.1754","DOIUrl":"https://doi.org/10.1002/stvr.1754","url":null,"abstract":"Exploratory testing and fully automated testing tools represent two viable and cheap alternatives to traditional test‐case‐based approaches for graphical user interface (GUI) testing of Android apps. The former can be executed by capture and replay tools that directly translate execution scenarios registered by testers in test cases, without requiring preliminary test‐case design and advanced programming/testing skills. The latter tools are able to test Android GUIs without tester intervention. Even if these two strategies are widely employed, to the best of our knowledge, no empirical investigation has been performed to compare their performance and obtain useful insights for a project manager to establish an effective testing strategy. In this paper, we present two experiments we carried out to compare the effectiveness of exploratory testing approaches using a capture and replay tool (Robotium Recorder) against three freely available automatic testing tools (AndroidRipper, Sapienz, and Google Robo). The first experiment involved 20 computer engineering students who were asked to record testing executions, under strict temporal limits and no access to the source code. Results were slightly better than those of fully automated tools, but not in a conclusive way. In the second experiment, the same students were asked to improve the achieved testing coverage by exploiting the source code and the coverage obtained in the previous tests, without strict temporal constraints. The results of this second experiment showed that students outperformed the automated tools especially for long/complex execution scenarios. The obtained findings provide useful indications for deciding testing strategies that combine manual exploratory testing and automated testing.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"51 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91386826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the ever‐growing Android graphical user interface (GUI) application market, there have been many studies on automated test generation for Android GUI applications. These studies successfully demonstrate how to detect fatal exceptions and achieve high coverage with fully automated test generation engines. However, it is unclear how many GUI functions these engines manage to test. The current best practice for the functional testing of Android GUI applications is to design user interface (UI) test scenarios with a non‐technical and human‐readable language such as Gherkin and implement Java/Kotlin methods for every statement of all the UI test scenarios. Writing tests for UI test scenarios is hard, especially when some scenario statements are high‐level and declarative, so it is not clear what actions should the generated test perform. We propose the Fully Automated Reinforcement LEArning‐Driven specification‐based test generator for Android (FARLEAD‐Android). FARLEAD‐Android first translates the UI test scenario to a GUI‐level formal specification as a linear‐time temporal logic (LTL) formula. The LTL formula guides the test generation and acts as a specified test oracle. By dynamically executing the application under test (AUT), and monitoring the LTL formula, FARLEAD‐Android learns how to produce a witness for the UI test scenario, using reinforcement learning (RL). Our evaluation shows that FARLEAD‐Android is more effective and achieves higher performance in generating tests for UI test scenarios than three known engines: Random, Monkey and QBEa. To the best of our knowledge, FARLEAD‐Android is the first fully automated mobile GUI testing engine that uses formal specifications.
{"title":"Functional test generation from UI test scenarios using reinforcement learning for android applications","authors":"Yavuz Köroglu, A. Sen","doi":"10.1002/stvr.1752","DOIUrl":"https://doi.org/10.1002/stvr.1752","url":null,"abstract":"With the ever‐growing Android graphical user interface (GUI) application market, there have been many studies on automated test generation for Android GUI applications. These studies successfully demonstrate how to detect fatal exceptions and achieve high coverage with fully automated test generation engines. However, it is unclear how many GUI functions these engines manage to test. The current best practice for the functional testing of Android GUI applications is to design user interface (UI) test scenarios with a non‐technical and human‐readable language such as Gherkin and implement Java/Kotlin methods for every statement of all the UI test scenarios. Writing tests for UI test scenarios is hard, especially when some scenario statements are high‐level and declarative, so it is not clear what actions should the generated test perform. We propose the Fully Automated Reinforcement LEArning‐Driven specification‐based test generator for Android (FARLEAD‐Android). FARLEAD‐Android first translates the UI test scenario to a GUI‐level formal specification as a linear‐time temporal logic (LTL) formula. The LTL formula guides the test generation and acts as a specified test oracle. By dynamically executing the application under test (AUT), and monitoring the LTL formula, FARLEAD‐Android learns how to produce a witness for the UI test scenario, using reinforcement learning (RL). Our evaluation shows that FARLEAD‐Android is more effective and achieves higher performance in generating tests for UI test scenarios than three known engines: Random, Monkey and QBEa. To the best of our knowledge, FARLEAD‐Android is the first fully automated mobile GUI testing engine that uses formal specifications.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"39 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81566077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, William G. J. Halfond
Companies often employ (i18n) frameworks to provide translated text and localized media content on their websites in order to effectively communicate with a global audience. However, the varying lengths of text from different languages can cause undesired distortions in the layout of a web page. Such distortions, called Internationalization Presentation Failures (IPFs), can negatively affect the aesthetics or usability of the website. Most of the existing automated techniques developed for assisting repair of IPFs either produce fixes that are likely to significantly reduce the legibility and attractiveness of the pages or are limited to only detecting IPFs, with the actual repair itself remaining a labour intensive manual task. To address this problem, we propose a search‐based technique for automatically repairing IPFs in web applications, while ensuring a legible and attractive page. The empirical evaluation of our approach reported that our approach was able to successfully resolve 94% of the detected IPFs for 46 real‐world web pages. In a user study, participants rated the visual quality of our fixes significantly higher than the unfixed versions and also considered the repairs generated by our approach to be notably more legible and visually appealing than the repairs generated by existing techniques.
{"title":"Effective automated repair of internationalization presentation failures in web applications using style similarity clustering and search‐based techniques","authors":"Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, William G. J. Halfond","doi":"10.1002/stvr.1746","DOIUrl":"https://doi.org/10.1002/stvr.1746","url":null,"abstract":"Companies often employ (i18n) frameworks to provide translated text and localized media content on their websites in order to effectively communicate with a global audience. However, the varying lengths of text from different languages can cause undesired distortions in the layout of a web page. Such distortions, called Internationalization Presentation Failures (IPFs), can negatively affect the aesthetics or usability of the website. Most of the existing automated techniques developed for assisting repair of IPFs either produce fixes that are likely to significantly reduce the legibility and attractiveness of the pages or are limited to only detecting IPFs, with the actual repair itself remaining a labour intensive manual task. To address this problem, we propose a search‐based technique for automatically repairing IPFs in web applications, while ensuring a legible and attractive page. The empirical evaluation of our approach reported that our approach was able to successfully resolve 94% of the detected IPFs for 46 real‐world web pages. In a user study, participants rated the visual quality of our fixes significantly higher than the unfixed versions and also considered the repairs generated by our approach to be notably more legible and visually appealing than the repairs generated by existing techniques.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"236 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77563069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic monitoring of service reliability for web applications: a simulation-based approach","authors":"Sundeuk Kim, Ilhyun Suh, Y. Chung","doi":"10.1002/stvr.1747","DOIUrl":"https://doi.org/10.1002/stvr.1747","url":null,"abstract":"","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"29 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81022291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A performance regression in software is defined as an increase in an application step's response time as a result of code changes. Detecting such regressions can be done using profiling tools; however, investigating their root cause is a mostly‐manual and time‐consuming task. This statement holds true especially when comparing execution timelines, which are dynamic function call trees augmented with response time data; these timelines are compared to find the performance regression‐causes – the lowest‐level function calls that regressed during execution. When done manually, these comparisons often require the investigator to analyze thousands of function call nodes. Further, performing these comparisons on web applications is challenging due to JavaScript's asynchronous and event‐driven model, which introduce noise in the timelines. In response, we propose a design – Zam – that automatically compares execution timelines collected from web applications, to identify performance regression‐causes. Our approach uses a hybrid node matching algorithm that recursively attempts to find the longest common subsequence in each call tree level, then aggregates multiple comparisons' results to eliminate noise. Our evaluation of Zam on 10 web applications indicates that it can identify performance regression‐causes with a path recall of 100% and a path precision of 96%, while performing comparisons in under a minute on average. We also demonstrate the real‐world applicability of Zam, which has been used to successfully complete performance investigations by the performance and reliability team in SAP.
{"title":"Localizing software performance regressions in web applications by comparing execution timelines","authors":"Frolin S. Ocariza, Boyang Zhao","doi":"10.1002/stvr.1750","DOIUrl":"https://doi.org/10.1002/stvr.1750","url":null,"abstract":"A performance regression in software is defined as an increase in an application step's response time as a result of code changes. Detecting such regressions can be done using profiling tools; however, investigating their root cause is a mostly‐manual and time‐consuming task. This statement holds true especially when comparing execution timelines, which are dynamic function call trees augmented with response time data; these timelines are compared to find the performance regression‐causes – the lowest‐level function calls that regressed during execution. When done manually, these comparisons often require the investigator to analyze thousands of function call nodes. Further, performing these comparisons on web applications is challenging due to JavaScript's asynchronous and event‐driven model, which introduce noise in the timelines. In response, we propose a design – Zam – that automatically compares execution timelines collected from web applications, to identify performance regression‐causes. Our approach uses a hybrid node matching algorithm that recursively attempts to find the longest common subsequence in each call tree level, then aggregates multiple comparisons' results to eliminate noise. Our evaluation of Zam on 10 web applications indicates that it can identify performance regression‐causes with a path recall of 100% and a path precision of 96%, while performing comparisons in under a minute on average. We also demonstrate the real‐world applicability of Zam, which has been used to successfully complete performance investigations by the performance and reliability team in SAP.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"77 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83968987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes the novel past‐faults fault prediction algorithm Linespots, based on the Bugspots algorithm. We analyse the predictive performance and runtime of Linespots compared with Bugspots with an empirical study using the most significant self‐built dataset as of now, including high‐quality samples for validation. As a novelty in fault prediction, we use Bayesian data analysis and Directed Acyclic Graphs to model the effects. We found consistent improvements in the predictive performance of Linespots over Bugspots for all seven evaluation metrics. We conclude that Linespots should be used over Bugspots in all cases where no real‐time performance is necessary.
{"title":"An empirical study of Linespots: A novel past‐fault algorithm","authors":"Maximilian Scholz, Richard Torkar","doi":"10.1002/stvr.1787","DOIUrl":"https://doi.org/10.1002/stvr.1787","url":null,"abstract":"This paper proposes the novel past‐faults fault prediction algorithm Linespots, based on the Bugspots algorithm. We analyse the predictive performance and runtime of Linespots compared with Bugspots with an empirical study using the most significant self‐built dataset as of now, including high‐quality samples for validation. As a novelty in fault prediction, we use Bayesian data analysis and Directed Acyclic Graphs to model the effects. We found consistent improvements in the predictive performance of Linespots over Bugspots for all seven evaluation metrics. We conclude that Linespots should be used over Bugspots in all cases where no real‐time performance is necessary.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"1 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83214592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}