Software Testing Verification & Reliability最新文献

英文中文

Choosing the fitness function for the job: Automated generation of test suites that detect real faults 选择适合工作的功能:自动生成检测实际故障的测试套件

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-11-01 DOI: 10.1002/stvr.1758

Alireza Salahirad, H. Almulla, Gregory Gay

The article from this special issue was previously published in Software Testing, Verification and Reliability, Volume 29, Issue 4–5, 2019. For completeness we are including the title page of the article below. The full text of the article can be read in Issue 29:4–5 on Wiley Online Library: https://onlinelibrary.wiley.com/doi/10.1002/stvr.1701

本文原载于《软件测试、验证与可靠性》2019年第29卷第4-5期。为了完整起见，我们包括下面文章的标题页。文章全文见Wiley Online Library: https://onlinelibrary.wiley.com/doi/10.1002/stvr.1701第29期4 - 5期

引用次数: 13

Comparing the effectiveness of capture and replay against automatic input generation for Android graphical user interface testing 在Android图形用户界面测试中比较捕获和重放与自动输入生成的有效性

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-10-16 DOI: 10.1002/stvr.1754

S. Martino, A. R. Fasolino, L. L. L. Starace, Porfirio Tramontana

Exploratory testing and fully automated testing tools represent two viable and cheap alternatives to traditional test‐case‐based approaches for graphical user interface (GUI) testing of Android apps. The former can be executed by capture and replay tools that directly translate execution scenarios registered by testers in test cases, without requiring preliminary test‐case design and advanced programming/testing skills. The latter tools are able to test Android GUIs without tester intervention. Even if these two strategies are widely employed, to the best of our knowledge, no empirical investigation has been performed to compare their performance and obtain useful insights for a project manager to establish an effective testing strategy. In this paper, we present two experiments we carried out to compare the effectiveness of exploratory testing approaches using a capture and replay tool (Robotium Recorder) against three freely available automatic testing tools (AndroidRipper, Sapienz, and Google Robo). The first experiment involved 20 computer engineering students who were asked to record testing executions, under strict temporal limits and no access to the source code. Results were slightly better than those of fully automated tools, but not in a conclusive way. In the second experiment, the same students were asked to improve the achieved testing coverage by exploiting the source code and the coverage obtained in the previous tests, without strict temporal constraints. The results of this second experiment showed that students outperformed the automated tools especially for long/complex execution scenarios. The obtained findings provide useful indications for deciding testing strategies that combine manual exploratory testing and automated testing.

对于Android应用程序的图形用户界面(GUI)测试，探索性测试和全自动测试工具代表了传统的基于测试用例的方法的两种可行且廉价的替代方案。前者可以通过捕获和重放工具来执行，这些工具可以直接转换测试人员在测试用例中注册的执行场景，而不需要初步的测试用例设计和高级编程/测试技能。后一种工具能够在没有测试人员干预的情况下测试Android gui。即使这两种策略被广泛采用，据我们所知，还没有进行实证调查来比较它们的性能，并为项目经理建立有效的测试策略获得有用的见解。在本文中，我们提出了两个实验，以比较使用捕获和重播工具(Robotium Recorder)和三种免费的自动测试工具(AndroidRipper, Sapienz和Google Robo)的探索性测试方法的有效性。第一个实验涉及20名计算机工程专业的学生，他们被要求在严格的时间限制下记录测试执行情况，并且不能访问源代码。结果略好于全自动工具，但不是决定性的。在第二个实验中，同样的学生被要求在没有严格时间限制的情况下，通过利用源代码和在之前的测试中获得的覆盖率来提高获得的测试覆盖率。第二个实验的结果表明，学生的表现优于自动化工具，特别是在长/复杂的执行场景中。获得的发现为决定结合手工探索性测试和自动化测试的测试策略提供了有用的指示。

{"title":"Comparing the effectiveness of capture and replay against automatic input generation for Android graphical user interface testing","authors":"S. Martino, A. R. Fasolino, L. L. L. Starace, Porfirio Tramontana","doi":"10.1002/stvr.1754","DOIUrl":"https://doi.org/10.1002/stvr.1754","url":null,"abstract":"Exploratory testing and fully automated testing tools represent two viable and cheap alternatives to traditional test‐case‐based approaches for graphical user interface (GUI) testing of Android apps. The former can be executed by capture and replay tools that directly translate execution scenarios registered by testers in test cases, without requiring preliminary test‐case design and advanced programming/testing skills. The latter tools are able to test Android GUIs without tester intervention. Even if these two strategies are widely employed, to the best of our knowledge, no empirical investigation has been performed to compare their performance and obtain useful insights for a project manager to establish an effective testing strategy. In this paper, we present two experiments we carried out to compare the effectiveness of exploratory testing approaches using a capture and replay tool (Robotium Recorder) against three freely available automatic testing tools (AndroidRipper, Sapienz, and Google Robo). The first experiment involved 20 computer engineering students who were asked to record testing executions, under strict temporal limits and no access to the source code. Results were slightly better than those of fully automated tools, but not in a conclusive way. In the second experiment, the same students were asked to improve the achieved testing coverage by exploiting the source code and the coverage obtained in the previous tests, without strict temporal constraints. The results of this second experiment showed that students outperformed the automated tools especially for long/complex execution scenarios. The obtained findings provide useful indications for deciding testing strategies that combine manual exploratory testing and automated testing.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"51 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91386826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

BUGSJS: a benchmark and taxonomy of JavaScript bugs BUGSJS: JavaScript bug的基准和分类

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-10-08 DOI: 10.1002/stvr.1751

Péter Gyimesi, Béla Vancsics, Andrea Stocco, D. Mazinanian, Árpád Beszédes, R. Ferenc, A. Mesbah

JavaScript is a popular programming language that is also error‐prone due to its asynchronous, dynamic, and loosely typed nature. In recent years, numerous techniques have been proposed for analyzing and testing JavaScript applications. However, our survey of the literature in this area revealed that the proposed techniques are often evaluated on different datasets of programs and bugs. The lack of a commonly used benchmark limits the ability to perform fair and unbiased comparisons for assessing the efficacy of new techniques. To fill this gap, we propose BugsJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server‐side programs, comprising 444k lines of code (LOC) in total. Each bug is accompanied by its bug report, the test cases that expose it, as well as the patch that fixes it. We extended BugsJS with a rich web interface for visualizing and dissecting the bugs' information, as well as a programmable API to access the faulty and fixed versions of the programs and to execute the corresponding test cases, which facilitates conducting highly reproducible empirical studies and comparisons of JavaScript analysis and testing tools. Moreover, following a rigorous procedure, we performed a classification of the bugs according to their nature. Our internal validation shows that our taxonomy is adequate for characterizing the bugs in BugsJS. We discuss several ways in which the resulting taxonomy and the benchmark can help direct researchers interested in automated testing of JavaScript applications. © 2021 The Authors. Software Testing, Verification & Reliability published by John Wiley & Sons, Ltd.

JavaScript是一种流行的编程语言，由于其异步、动态和松散类型的特性，它也容易出错。近年来，出现了许多用于分析和测试JavaScript应用程序的技术。然而，我们对该领域文献的调查显示，所提出的技术通常是在不同的程序和错误数据集上进行评估的。缺乏常用的基准限制了对评估新技术功效进行公平和无偏见比较的能力。为了填补这一空白，我们提出了BugsJS，这是一个来自10个流行的JavaScript服务器端程序的453个真实的、手动验证的JavaScript错误的基准测试，总共包含444k行代码(LOC)。每个错误都伴随着它的错误报告，暴露它的测试用例，以及修复它的补丁。我们对BugsJS进行了扩展，提供了丰富的web界面，用于可视化和剖析bug信息，并提供了可编程API，用于访问程序的错误和固定版本，并执行相应的测试用例，从而便于对JavaScript分析和测试工具进行高度可重复的实证研究和比较。此外，按照严格的程序，我们根据bug的性质对它们进行了分类。我们的内部验证表明，我们的分类法足以描述BugsJS中的bug。我们讨论了几种方法，由此产生的分类法和基准可以帮助对JavaScript应用程序的自动化测试感兴趣的研究人员。©2021作者。软件测试、验证与可靠性由John Wiley & Sons出版。

{"title":"BUGSJS: a benchmark and taxonomy of JavaScript bugs","authors":"Péter Gyimesi, Béla Vancsics, Andrea Stocco, D. Mazinanian, Árpád Beszédes, R. Ferenc, A. Mesbah","doi":"10.1002/stvr.1751","DOIUrl":"https://doi.org/10.1002/stvr.1751","url":null,"abstract":"JavaScript is a popular programming language that is also error‐prone due to its asynchronous, dynamic, and loosely typed nature. In recent years, numerous techniques have been proposed for analyzing and testing JavaScript applications. However, our survey of the literature in this area revealed that the proposed techniques are often evaluated on different datasets of programs and bugs. The lack of a commonly used benchmark limits the ability to perform fair and unbiased comparisons for assessing the efficacy of new techniques. To fill this gap, we propose BugsJS, a benchmark of 453 real, manually validated JavaScript bugs from 10 popular JavaScript server‐side programs, comprising 444k lines of code (LOC) in total. Each bug is accompanied by its bug report, the test cases that expose it, as well as the patch that fixes it. We extended BugsJS with a rich web interface for visualizing and dissecting the bugs' information, as well as a programmable API to access the faulty and fixed versions of the programs and to execute the corresponding test cases, which facilitates conducting highly reproducible empirical studies and comparisons of JavaScript analysis and testing tools. Moreover, following a rigorous procedure, we performed a classification of the bugs according to their nature. Our internal validation shows that our taxonomy is adequate for characterizing the bugs in BugsJS. We discuss several ways in which the resulting taxonomy and the benchmark can help direct researchers interested in automated testing of JavaScript applications. © 2021 The Authors. Software Testing, Verification & Reliability published by John Wiley & Sons, Ltd.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"2 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88800415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Functional test generation from UI test scenarios using reinforcement learning for android applications android应用程序使用强化学习从UI测试场景生成功能测试

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-10-05 DOI: 10.1002/stvr.1752

Yavuz Köroglu, A. Sen

With the ever‐growing Android graphical user interface (GUI) application market, there have been many studies on automated test generation for Android GUI applications. These studies successfully demonstrate how to detect fatal exceptions and achieve high coverage with fully automated test generation engines. However, it is unclear how many GUI functions these engines manage to test. The current best practice for the functional testing of Android GUI applications is to design user interface (UI) test scenarios with a non‐technical and human‐readable language such as Gherkin and implement Java/Kotlin methods for every statement of all the UI test scenarios. Writing tests for UI test scenarios is hard, especially when some scenario statements are high‐level and declarative, so it is not clear what actions should the generated test perform. We propose the Fully Automated Reinforcement LEArning‐Driven specification‐based test generator for Android (FARLEAD‐Android). FARLEAD‐Android first translates the UI test scenario to a GUI‐level formal specification as a linear‐time temporal logic (LTL) formula. The LTL formula guides the test generation and acts as a specified test oracle. By dynamically executing the application under test (AUT), and monitoring the LTL formula, FARLEAD‐Android learns how to produce a witness for the UI test scenario, using reinforcement learning (RL). Our evaluation shows that FARLEAD‐Android is more effective and achieves higher performance in generating tests for UI test scenarios than three known engines: Random, Monkey and QBEa. To the best of our knowledge, FARLEAD‐Android is the first fully automated mobile GUI testing engine that uses formal specifications.

随着Android图形用户界面(GUI)应用市场的不断发展，针对Android图形用户界面应用的自动化测试生成研究也越来越多。这些研究成功地演示了如何检测致命的异常，并通过完全自动化的测试生成引擎实现高覆盖率。然而，目前还不清楚这些引擎能够测试多少GUI功能。目前Android GUI应用程序功能测试的最佳实践是使用非技术性和人类可读的语言(如Gherkin)设计用户界面(UI)测试场景，并为所有UI测试场景的每个语句实现Java/Kotlin方法。为UI测试场景编写测试是很困难的，特别是当一些场景语句是高级的和声明性的，所以不清楚生成的测试应该执行什么操作。我们提出了完全自动化强化学习驱动的基于规范的Android测试生成器(FARLEAD‐Android)。FARLEAD‐Android首先将UI测试场景转换为GUI级别的正式规范，作为线性时间时态逻辑(LTL)公式。LTL公式指导测试生成，并充当指定的测试oracle。通过动态执行被测应用程序(AUT)，并监控LTL公式，FARLEAD‐Android学习如何使用强化学习(RL)为UI测试场景生成见证。我们的评估表明，在生成UI测试场景的测试时，FARLEAD‐Android比三种已知引擎(Random、Monkey和QBEa)更有效，性能更高。据我们所知，FARLEAD‐Android是第一个使用正式规范的全自动移动GUI测试引擎。

{"title":"Functional test generation from UI test scenarios using reinforcement learning for android applications","authors":"Yavuz Köroglu, A. Sen","doi":"10.1002/stvr.1752","DOIUrl":"https://doi.org/10.1002/stvr.1752","url":null,"abstract":"With the ever‐growing Android graphical user interface (GUI) application market, there have been many studies on automated test generation for Android GUI applications. These studies successfully demonstrate how to detect fatal exceptions and achieve high coverage with fully automated test generation engines. However, it is unclear how many GUI functions these engines manage to test. The current best practice for the functional testing of Android GUI applications is to design user interface (UI) test scenarios with a non‐technical and human‐readable language such as Gherkin and implement Java/Kotlin methods for every statement of all the UI test scenarios. Writing tests for UI test scenarios is hard, especially when some scenario statements are high‐level and declarative, so it is not clear what actions should the generated test perform. We propose the Fully Automated Reinforcement LEArning‐Driven specification‐based test generator for Android (FARLEAD‐Android). FARLEAD‐Android first translates the UI test scenario to a GUI‐level formal specification as a linear‐time temporal logic (LTL) formula. The LTL formula guides the test generation and acts as a specified test oracle. By dynamically executing the application under test (AUT), and monitoring the LTL formula, FARLEAD‐Android learns how to produce a witness for the UI test scenario, using reinforcement learning (RL). Our evaluation shows that FARLEAD‐Android is more effective and achieves higher performance in generating tests for UI test scenarios than three known engines: Random, Monkey and QBEa. To the best of our knowledge, FARLEAD‐Android is the first fully automated mobile GUI testing engine that uses formal specifications.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"39 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81566077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Effective automated repair of internationalization presentation failures in web applications using style similarity clustering and search‐based techniques 使用风格相似聚类和基于搜索的技术有效地自动修复web应用程序中的国际化表示失败

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-09-06 DOI: 10.1002/stvr.1746

Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, William G. J. Halfond

Companies often employ (i18n) frameworks to provide translated text and localized media content on their websites in order to effectively communicate with a global audience. However, the varying lengths of text from different languages can cause undesired distortions in the layout of a web page. Such distortions, called Internationalization Presentation Failures (IPFs), can negatively affect the aesthetics or usability of the website. Most of the existing automated techniques developed for assisting repair of IPFs either produce fixes that are likely to significantly reduce the legibility and attractiveness of the pages or are limited to only detecting IPFs, with the actual repair itself remaining a labour intensive manual task. To address this problem, we propose a search‐based technique for automatically repairing IPFs in web applications, while ensuring a legible and attractive page. The empirical evaluation of our approach reported that our approach was able to successfully resolve 94% of the detected IPFs for 46 real‐world web pages. In a user study, participants rated the visual quality of our fixes significantly higher than the unfixed versions and also considered the repairs generated by our approach to be notably more legible and visually appealing than the repairs generated by existing techniques.

公司经常使用(i18n)框架在其网站上提供翻译文本和本地化媒体内容，以便有效地与全球受众进行交流。然而，来自不同语言的不同长度的文本可能会导致网页布局中的不希望的扭曲。这种扭曲被称为国际化呈现失败(ipf)，会对网站的美观性或可用性产生负面影响。为协助修复ipf而开发的大多数现有自动化技术要么产生可能大大降低页面的易读性和吸引力的修复，要么仅限于检测ipf，实际修复本身仍然是一项劳动密集型的手工任务。为了解决这个问题，我们提出了一种基于搜索的技术来自动修复web应用程序中的ipf，同时确保页面的可读性和吸引力。我们的方法的经验评估报告说，我们的方法能够成功地解析94%的检测到的46个真实世界的网页ipf。在一项用户研究中，参与者认为我们修复的视觉质量明显高于未修复的版本，并且认为我们的方法生成的修复比现有技术生成的修复更清晰，更有视觉吸引力。

{"title":"Effective automated repair of internationalization presentation failures in web applications using style similarity clustering and search‐based techniques","authors":"Sonal Mahajan, Abdulmajeed Alameer, Phil McMinn, William G. J. Halfond","doi":"10.1002/stvr.1746","DOIUrl":"https://doi.org/10.1002/stvr.1746","url":null,"abstract":"Companies often employ (i18n) frameworks to provide translated text and localized media content on their websites in order to effectively communicate with a global audience. However, the varying lengths of text from different languages can cause undesired distortions in the layout of a web page. Such distortions, called Internationalization Presentation Failures (IPFs), can negatively affect the aesthetics or usability of the website. Most of the existing automated techniques developed for assisting repair of IPFs either produce fixes that are likely to significantly reduce the legibility and attractiveness of the pages or are limited to only detecting IPFs, with the actual repair itself remaining a labour intensive manual task. To address this problem, we propose a search‐based technique for automatically repairing IPFs in web applications, while ensuring a legible and attractive page. The empirical evaluation of our approach reported that our approach was able to successfully resolve 94% of the detected IPFs for 46 real‐world web pages. In a user study, participants rated the visual quality of our fixes significantly higher than the unfixed versions and also considered the repairs generated by our approach to be notably more legible and visually appealing than the repairs generated by existing techniques.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"236 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77563069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Automatic monitoring of service reliability for web applications: a simulation-based approach web应用程序服务可靠性的自动监控:基于模拟的方法

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-09-01 DOI: 10.1002/stvr.1747

Sundeuk Kim, Ilhyun Suh, Y. Chung

引用次数: 1

On automation in software engineering 软件工程中的自动化

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-08-16 DOI: 10.1002/stvr.1753

R. Hierons, Tao Xie

引用次数: 1

Localizing software performance regressions in web applications by comparing execution timelines 通过比较执行时间来本地化web应用程序中的软件性能退化

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-08-11 DOI: 10.1002/stvr.1750

Frolin S. Ocariza, Boyang Zhao

A performance regression in software is defined as an increase in an application step's response time as a result of code changes. Detecting such regressions can be done using profiling tools; however, investigating their root cause is a mostly‐manual and time‐consuming task. This statement holds true especially when comparing execution timelines, which are dynamic function call trees augmented with response time data; these timelines are compared to find the performance regression‐causes – the lowest‐level function calls that regressed during execution. When done manually, these comparisons often require the investigator to analyze thousands of function call nodes. Further, performing these comparisons on web applications is challenging due to JavaScript's asynchronous and event‐driven model, which introduce noise in the timelines. In response, we propose a design – Zam – that automatically compares execution timelines collected from web applications, to identify performance regression‐causes. Our approach uses a hybrid node matching algorithm that recursively attempts to find the longest common subsequence in each call tree level, then aggregates multiple comparisons' results to eliminate noise. Our evaluation of Zam on 10 web applications indicates that it can identify performance regression‐causes with a path recall of 100% and a path precision of 96%, while performing comparisons in under a minute on average. We also demonstrate the real‐world applicability of Zam, which has been used to successfully complete performance investigations by the performance and reliability team in SAP.

软件中的性能回归被定义为由于代码更改而导致应用程序步骤响应时间的增加。检测这种回归可以使用分析工具来完成;然而，调查其根本原因主要是一项手动且耗时的任务。这句话尤其适用于比较执行时间线，它是动态函数调用树和响应时间数据的增强;将这些时间线进行比较，找出性能退化的原因——在执行过程中退化的最低级别函数调用。当手工完成时，这些比较通常需要研究人员分析数千个函数调用节点。此外，由于JavaScript的异步和事件驱动模型，在web应用程序上执行这些比较是具有挑战性的，这在时间轴上引入了噪声。作为回应，我们提出了一种设计——Zam——它可以自动比较从web应用程序收集的执行时间线，以识别性能退化的原因。我们的方法使用混合节点匹配算法，递归地尝试在每个调用树级别找到最长公共子序列，然后聚合多次比较的结果以消除噪声。我们在10个web应用程序上对Zam的评估表明，它可以以100%的路径召回率和96%的路径精度识别性能回归原因，而平均在一分钟内执行比较。我们还展示了Zam在现实世界中的适用性，它已被SAP的性能和可靠性团队成功地用于完成性能调查。

{"title":"Localizing software performance regressions in web applications by comparing execution timelines","authors":"Frolin S. Ocariza, Boyang Zhao","doi":"10.1002/stvr.1750","DOIUrl":"https://doi.org/10.1002/stvr.1750","url":null,"abstract":"A performance regression in software is defined as an increase in an application step's response time as a result of code changes. Detecting such regressions can be done using profiling tools; however, investigating their root cause is a mostly‐manual and time‐consuming task. This statement holds true especially when comparing execution timelines, which are dynamic function call trees augmented with response time data; these timelines are compared to find the performance regression‐causes – the lowest‐level function calls that regressed during execution. When done manually, these comparisons often require the investigator to analyze thousands of function call nodes. Further, performing these comparisons on web applications is challenging due to JavaScript's asynchronous and event‐driven model, which introduce noise in the timelines. In response, we propose a design – Zam – that automatically compares execution timelines collected from web applications, to identify performance regression‐causes. Our approach uses a hybrid node matching algorithm that recursively attempts to find the longest common subsequence in each call tree level, then aggregates multiple comparisons' results to eliminate noise. Our evaluation of Zam on 10 web applications indicates that it can identify performance regression‐causes with a path recall of 100% and a path precision of 96%, while performing comparisons in under a minute on average. We also demonstrate the real‐world applicability of Zam, which has been used to successfully complete performance investigations by the performance and reliability team in SAP.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"77 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83968987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automatically identifying potential regressions in the layout of responsive web pages 自动识别响应式网页布局中潜在的倒退

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-08-03 DOI: 10.1002/stvr.1748

Thomas A. Walsh, G. M. Kapfhammer, Phil McMinn

SUMMARY Providing a good user experience on the ever-increasing number and variety of devices being used to browse the web is a dif ﬁ cult, yet critical, task. With responsive web design, front-end web developers design web pages so that they dynamically resize and rearrange content to best ﬁ t the dimensions of a device ’ s screen. However, when making code modi ﬁ cations to a responsive page, developers can easily introduce regressions from the correct layout that have detrimental effects at unpredictable screen sizes. For instance, the source code change that a developer makes to improve the layout at one screen size may obscure a page ’ s content at other sizes. Current approaches to testing are often insuf ﬁ cient because they rely on limited tools and error-prone manual inspections of web pages. As such, many unintended regressions in web page layout often go undetected and ultimately manifest in production websites. To address the challenge of detecting regressions in responsive web pages, this paper presents an automated approach that extracts the responsive layout of two versions of a page and compares them, alerting developers to the differences in layout that they may wish to investigate further. We implemented the approach and empirically evaluated it on 15 real-world responsive web pages. Leveraging code mutations that a tool automatically injected into the pages as a systematic simulation of developer changes, the experiments show that the approach was highly effective. When compared with manual and automated baseline testing techniques, it detected 12.5% and 18.75% more injected changes, respectively. Along with identifying the best parameters for the method that extracts the responsive layout, the experiments show that the approach surpasses the baselines across changes that vary in their impact, but works particularly well for subtle, hard-to-detect mutants, showing the bene ﬁ ts of automatically identifying regressions in web page layout. © 2020 John Wiley & Sons, Ltd.

在越来越多、种类越来越多的设备上提供良好的用户体验是一项艰巨而又关键的任务。有了响应式网页设计，前端网页开发人员设计网页，以便他们动态调整和重新排列内容，以最适合设备屏幕的尺寸。然而，当对响应式页面进行代码修改时，开发人员很容易从正确的布局中引入回归，这对不可预测的屏幕尺寸有不利影响。例如，开发人员为改进一种屏幕尺寸的布局而对源代码进行的更改可能会使其他尺寸的页面内容变得模糊。目前的测试方法往往是不够的，因为它们依赖于有限的工具和容易出错的网页手工检查。因此，网页布局中许多意想不到的回归通常不会被发现，并最终在生产网站中表现出来。为了解决在响应式网页中检测回归的挑战，本文提出了一种自动化的方法，提取一个页面的两个版本的响应式布局，并对它们进行比较，提醒开发人员他们可能希望进一步研究的布局差异。我们实施了这种方法，并在15个真实的响应式网页上进行了实证评估。利用工具自动注入页面的代码突变作为开发人员更改的系统模拟，实验表明该方法非常有效。与手动和自动化基线测试技术相比，它分别检测到12.5%和18.75%的注入变化。除了确定提取响应式布局的方法的最佳参数外，实验表明该方法超越了影响变化的基线，但对于微妙的，难以检测的突变尤其有效，显示了自动识别网页布局回归的好处。©2020 John Wiley & Sons, Ltd

{"title":"Automatically identifying potential regressions in the layout of responsive web pages","authors":"Thomas A. Walsh, G. M. Kapfhammer, Phil McMinn","doi":"10.1002/stvr.1748","DOIUrl":"https://doi.org/10.1002/stvr.1748","url":null,"abstract":"SUMMARY Providing a good user experience on the ever-increasing number and variety of devices being used to browse the web is a dif ﬁ cult, yet critical, task. With responsive web design, front-end web developers design web pages so that they dynamically resize and rearrange content to best ﬁ t the dimensions of a device ’ s screen. However, when making code modi ﬁ cations to a responsive page, developers can easily introduce regressions from the correct layout that have detrimental effects at unpredictable screen sizes. For instance, the source code change that a developer makes to improve the layout at one screen size may obscure a page ’ s content at other sizes. Current approaches to testing are often insuf ﬁ cient because they rely on limited tools and error-prone manual inspections of web pages. As such, many unintended regressions in web page layout often go undetected and ultimately manifest in production websites. To address the challenge of detecting regressions in responsive web pages, this paper presents an automated approach that extracts the responsive layout of two versions of a page and compares them, alerting developers to the differences in layout that they may wish to investigate further. We implemented the approach and empirically evaluated it on 15 real-world responsive web pages. Leveraging code mutations that a tool automatically injected into the pages as a systematic simulation of developer changes, the experiments show that the approach was highly effective. When compared with manual and automated baseline testing techniques, it detected 12.5% and 18.75% more injected changes, respectively. Along with identifying the best parameters for the method that extracts the responsive layout, the experiments show that the approach surpasses the baselines across changes that vary in their impact, but works particularly well for subtle, hard-to-detect mutants, showing the bene ﬁ ts of automatically identifying regressions in web page layout. © 2020 John Wiley & Sons, Ltd.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"1 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2020-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79804147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

An empirical study of Linespots: A novel past‐fault algorithm 线点的实证研究:一种新的过去故障算法

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Software Testing Verification & Reliability

Pub Date : 2020-07-18 DOI: 10.1002/stvr.1787

Maximilian Scholz, Richard Torkar

This paper proposes the novel past‐faults fault prediction algorithm Linespots, based on the Bugspots algorithm. We analyse the predictive performance and runtime of Linespots compared with Bugspots with an empirical study using the most significant self‐built dataset as of now, including high‐quality samples for validation. As a novelty in fault prediction, we use Bayesian data analysis and Directed Acyclic Graphs to model the effects. We found consistent improvements in the predictive performance of Linespots over Bugspots for all seven evaluation metrics. We conclude that Linespots should be used over Bugspots in all cases where no real‐time performance is necessary.

本文在bugspot算法的基础上，提出了一种新的故障预测算法。我们使用迄今为止最重要的自建数据集(包括用于验证的高质量样本)进行实证研究，分析了linespot与bugspot的预测性能和运行时间。作为一种新颖的故障预测方法，我们使用贝叶斯数据分析和有向无环图来模拟故障预测的效果。我们发现，对于所有七个评估指标，linespot的预测性能优于bugspot。我们得出的结论是，在所有不需要实时性能的情况下，应该使用线点而不是bug点。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Software Testing Verification & Reliability

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀