Concurrency testing is an important activity to expose concurrency faults in thread‐safe classes. A concurrent test for a thread‐safe class is a set of method call sequences that exercise the public interface of the class from multiple threads. Automatically generating fault‐revealing concurrent tests within an affordable time budget is difficult due to the huge search space of possible concurrent tests. In this paper, we present DepCon+, a novel approach that reduces the search space of concurrent tests by leveraging statically computed dependencies among public methods. DepCon+ exploits the intuition that concurrent tests can expose thread‐safety violations that manifest exceptions or deadlocks, only if they exercise some specific method dependencies. DepCon+ provides an efficient way to identify such dependencies by statically analysing the code and relies on the computed dependencies to steer the test generation towards those concurrent tests that exhibit the computed dependencies. We developed a prototype DepCon+ implementation for Java and evaluated the approach on 19 known concurrency faults of thread‐safe classes that lead to thread‐safety violations of either exception or deadlock type. The results presented in this paper show that DepCon+ is more effective than state‐of‐the‐art approaches in exposing the concurrency faults. The search space pruning of DepCon+ dramatically reduces the search space of possible concurrent tests, without missing any thread‐safety violations.
{"title":"Statically driven generation of concurrent tests for thread‐safe classes","authors":"Valerio Terragni, M. Pezzè","doi":"10.1002/stvr.1774","DOIUrl":"https://doi.org/10.1002/stvr.1774","url":null,"abstract":"Concurrency testing is an important activity to expose concurrency faults in thread‐safe classes. A concurrent test for a thread‐safe class is a set of method call sequences that exercise the public interface of the class from multiple threads. Automatically generating fault‐revealing concurrent tests within an affordable time budget is difficult due to the huge search space of possible concurrent tests. In this paper, we present DepCon+, a novel approach that reduces the search space of concurrent tests by leveraging statically computed dependencies among public methods. DepCon+ exploits the intuition that concurrent tests can expose thread‐safety violations that manifest exceptions or deadlocks, only if they exercise some specific method dependencies. DepCon+ provides an efficient way to identify such dependencies by statically analysing the code and relies on the computed dependencies to steer the test generation towards those concurrent tests that exhibit the computed dependencies. We developed a prototype DepCon+ implementation for Java and evaluated the approach on 19 known concurrency faults of thread‐safe classes that lead to thread‐safety violations of either exception or deadlock type. The results presented in this paper show that DepCon+ is more effective than state‐of‐the‐art approaches in exposing the concurrency faults. The search space pruning of DepCon+ dramatically reduces the search space of possible concurrent tests, without missing any thread‐safety violations.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"59 1 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90408386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. R. Siqueira, F. Ferrari, Kathiani E. Souza, V. V. D. Camargo, R. Lemos
Adaptive systems (ASs) and context‐aware systems (CASs) are able to evaluate their own behaviour and to adapt it when the system fails to accomplish its goals or when better functionality or performance is possible. Ensuring the reliability of ASs and CASs is demanding because failures might have undesirable consequences. Testing ASs and CASs effectively is not trivial because of the inherent characteristics of these systems. The literature lacks a comprehensive review that provides a broad picture of the area; current reviews are outdated and incomplete. The objectives of this study are characterizing the state of the art in AS and CAS testing and discussing approaches, challenges, observed trends, and research limitations and directions. We performed a systematic literature review (SLR) and a thematic analysis of studies, reporting up‐to‐date, refined and extended results when compared with existing reviews. Based on 102 selected studies, we (i) characterized testing approaches by grouping techniques for ASs and CASs; (ii) updated and refined a characterization of testing challenges for ASs and CASs; and (iii) analysed and discussed research trends and implications for AS and CAS testing. There are recurring research concerns regarding AS and CAS testing. Examples are the generation of test cases and built‐in tests. Moreover, we also identified recurring testing challenges such as context monitoring and runtime decisions. Moreover, there are some trends such as model‐based testing and hybrid techniques and some little investigated issues like uncertainty and prediction of changes. All in all, our results may provide guidance for developers and researchers with respect to the practice and the future research on AS and CAS testing.
{"title":"Testing of adaptive and context‐aware systems: approaches and challenges","authors":"B. R. Siqueira, F. Ferrari, Kathiani E. Souza, V. V. D. Camargo, R. Lemos","doi":"10.1002/stvr.1772","DOIUrl":"https://doi.org/10.1002/stvr.1772","url":null,"abstract":"Adaptive systems (ASs) and context‐aware systems (CASs) are able to evaluate their own behaviour and to adapt it when the system fails to accomplish its goals or when better functionality or performance is possible. Ensuring the reliability of ASs and CASs is demanding because failures might have undesirable consequences. Testing ASs and CASs effectively is not trivial because of the inherent characteristics of these systems. The literature lacks a comprehensive review that provides a broad picture of the area; current reviews are outdated and incomplete. The objectives of this study are characterizing the state of the art in AS and CAS testing and discussing approaches, challenges, observed trends, and research limitations and directions. We performed a systematic literature review (SLR) and a thematic analysis of studies, reporting up‐to‐date, refined and extended results when compared with existing reviews. Based on 102 selected studies, we (i) characterized testing approaches by grouping techniques for ASs and CASs; (ii) updated and refined a characterization of testing challenges for ASs and CASs; and (iii) analysed and discussed research trends and implications for AS and CAS testing. There are recurring research concerns regarding AS and CAS testing. Examples are the generation of test cases and built‐in tests. Moreover, we also identified recurring testing challenges such as context monitoring and runtime decisions. Moreover, there are some trends such as model‐based testing and hybrid techniques and some little investigated issues like uncertainty and prediction of changes. All in all, our results may provide guidance for developers and researchers with respect to the practice and the future research on AS and CAS testing.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"42 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82179441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Web user interface (UI) test automation strategies have been dominated by programmable and record–playback approaches. Of these, record–playback allows creating automation tests easily and reduces the cost of test generation. However, this approach increases the cost of test maintenance due to its unstable generated locators for identifying UI objects during playback. In this paper, we propose a new approach to generating and selecting resilient and maintainable locators. Our approach consists of two parts, a new XPath construction method and selecting the best XPath to locate the target element. Our XPath construction method relies on semantic structures of Web pages to locate the target element using its neighbors. We conducted an experiment on 15 popular websites. The results show that our approach outperforms the state‐of‐the‐practice/art Selenium IDE and Robula+ in locating target elements by effectively avoiding wrong locators. It also produces more readable XPaths (hence more maintainable tests) than do these approaches.
{"title":"Generating and selecting resilient and maintainable locators for Web automated testing","authors":"Vu-Loc Nguyen, T. To, Gia-Han Diep","doi":"10.1002/stvr.1760","DOIUrl":"https://doi.org/10.1002/stvr.1760","url":null,"abstract":"Web user interface (UI) test automation strategies have been dominated by programmable and record–playback approaches. Of these, record–playback allows creating automation tests easily and reduces the cost of test generation. However, this approach increases the cost of test maintenance due to its unstable generated locators for identifying UI objects during playback. In this paper, we propose a new approach to generating and selecting resilient and maintainable locators. Our approach consists of two parts, a new XPath construction method and selecting the best XPath to locate the target element. Our XPath construction method relies on semantic structures of Web pages to locate the target element using its neighbors. We conducted an experiment on 15 popular websites. The results show that our approach outperforms the state‐of‐the‐practice/art Selenium IDE and Robula+ in locating target elements by effectively avoiding wrong locators. It also produces more readable XPaths (hence more maintainable tests) than do these approaches.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"2 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87354836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Vos, Pekka Aho, Fernando Pastor Ricós, Olivia Rodriguez Valdés, Adolf Mulders
Covering all the possible paths of the graphical user interface (GUI) with test scripts would take too much effort and result in serious maintenance issues. We propose complementing scripted testing with scriptless test automation using the open‐source testar tool. This paper gives a comprehensive overview of testar and its latest extensions together with the ongoing and future research. With this paper, we hope we can help and encourage other researchers to use testar for their GUI testing‐related research and pave the way for an international research agenda in GUI testing built upon stable and open‐source infrastructure.
{"title":"testar – scriptless testing through graphical user interface","authors":"T. Vos, Pekka Aho, Fernando Pastor Ricós, Olivia Rodriguez Valdés, Adolf Mulders","doi":"10.1002/stvr.1771","DOIUrl":"https://doi.org/10.1002/stvr.1771","url":null,"abstract":"Covering all the possible paths of the graphical user interface (GUI) with test scripts would take too much effort and result in serious maintenance issues. We propose complementing scripted testing with scriptless test automation using the open‐source testar tool. This paper gives a comprehensive overview of testar and its latest extensions together with the ongoing and future research. With this paper, we hope we can help and encourage other researchers to use testar for their GUI testing‐related research and pave the way for an international research agenda in GUI testing built upon stable and open‐source infrastructure.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"127 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89066435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
By ensuring adequate functional coverage, End‐to‐End (E2E) testing is a key enabling factor of continuous integration. This is even more true for web applications, where automated E2E testing is the only way to exercise the full stack used to create a modern application. The test code used for web testing usually relies on DOM locators, often expressed as XPath expressions, to identify the web elements and to extract the data checked in assertions. When applications evolve, the most dominant cost for the evolution of test code is due to broken locators, which fail to locate the target element in the novel versions and must be repaired. In this paper, we formulate the robust XPath locator generation problem as a graph exploration problem, instead of relying on ad‐hoc heuristics as the one implemented by the state of the art tool robula+. Our approach is based on a statistical adaptive algorithm implemented by the tool sidereal, which outperforms robula+'s heuristics in terms of robustness by learning the potential fragility of HTML properties from previous versions of the application under test. sidereal was applied to six applications and to a total of 611 locators and was compared against two baseline algorithms, robula+ and Montoto. The adoption of sidereal results in a significant reduction of the number of broken locators (respectively ‐55% and ‐70%). The time for generating such robust locators was deemed acceptable being in the order of hundredths of second.
{"title":"Sidereal: Statistical adaptive generation of robust locators for web testing","authors":"Maurizio Leotta, F. Ricca, P. Tonella","doi":"10.1002/stvr.1767","DOIUrl":"https://doi.org/10.1002/stvr.1767","url":null,"abstract":"By ensuring adequate functional coverage, End‐to‐End (E2E) testing is a key enabling factor of continuous integration. This is even more true for web applications, where automated E2E testing is the only way to exercise the full stack used to create a modern application. The test code used for web testing usually relies on DOM locators, often expressed as XPath expressions, to identify the web elements and to extract the data checked in assertions. When applications evolve, the most dominant cost for the evolution of test code is due to broken locators, which fail to locate the target element in the novel versions and must be repaired. In this paper, we formulate the robust XPath locator generation problem as a graph exploration problem, instead of relying on ad‐hoc heuristics as the one implemented by the state of the art tool robula+. Our approach is based on a statistical adaptive algorithm implemented by the tool sidereal, which outperforms robula+'s heuristics in terms of robustness by learning the potential fragility of HTML properties from previous versions of the application under test. sidereal was applied to six applications and to a total of 611 locators and was compared against two baseline algorithms, robula+ and Montoto. The adoption of sidereal results in a significant reduction of the number of broken locators (respectively ‐55% and ‐70%). The time for generating such robust locators was deemed acceptable being in the order of hundredths of second.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"9 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89570485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emil Alégroth, Luca Ardito, Riccardo Coppola, R. Feldt
Market demands for faster delivery and higher software quality are progressively becoming more stringent. A key hindrance for software companies to meet those demands is how to test the software due to the intrinsic costs of development, maintenance and evolution of testware, especially since testware should be defined and aligned, with all layers of the system under test (SUT), including all user interface (UI) abstraction levels. UI-based test approaches are forms of end-to-end testing. The interaction with the system is carried out by mimicking the operations that a human user would perform. Regarding graphical user interfaces (i.e., GUIs), different GUI-based test approaches exist according to the layer of abstraction of the GUI that is considered for creating test locators and oracles: specifically, first generation, or coordinate-based, tests use the exact position on the screen to identify the elements to interact with; second generation, or layout-based, tests leverage GUI properties as locators; and third generation, or visual, tests make use of image recognition. The three approaches provide various benefits and drawbacks. They are seldom used together because of the costs mentioned above, despite growing academic evidence of the complimentary benefits. User interfaces are, however, not limited to GUIs, especially with the recent diffusion of innovative typologies of user interfaces (e.g., conversational, voice-recognition, gesture-based and textual UIs) that are still rarely tested by developers; testing techniques can also be distinguished based on the way the test scripts are generated, i.e., if they are written inside JUnit-like test scripts or obtained through the capture of interactions with the SUT, or automatically obtained traversing a model of the user interface, as modern model-based testing tools do it. Test automation is a well-rooted practice in the industrial environment. However, there are software development domains, e.g., web and mobile apps, where UI testing is still not adopted on a systematic basis. The results of many investigations in literature highlighted many reasons for this lack of penetration of the most evolved UI testing techniques among developers: 1 Scarce documentation of the available testing tools; 2 Significant maintenance effort when keeping the test scripts aligned with the evolution of the AUT, e.g., for performing regression testing; 3 Limited perception of the benefits that advanced UI testing techniques yield when confronted with traditional manual testing.
{"title":"Special issue on new generations of UI testing","authors":"Emil Alégroth, Luca Ardito, Riccardo Coppola, R. Feldt","doi":"10.1002/stvr.1770","DOIUrl":"https://doi.org/10.1002/stvr.1770","url":null,"abstract":"Market demands for faster delivery and higher software quality are progressively becoming more stringent. A key hindrance for software companies to meet those demands is how to test the software due to the intrinsic costs of development, maintenance and evolution of testware, especially since testware should be defined and aligned, with all layers of the system under test (SUT), including all user interface (UI) abstraction levels. UI-based test approaches are forms of end-to-end testing. The interaction with the system is carried out by mimicking the operations that a human user would perform. Regarding graphical user interfaces (i.e., GUIs), different GUI-based test approaches exist according to the layer of abstraction of the GUI that is considered for creating test locators and oracles: specifically, first generation, or coordinate-based, tests use the exact position on the screen to identify the elements to interact with; second generation, or layout-based, tests leverage GUI properties as locators; and third generation, or visual, tests make use of image recognition. The three approaches provide various benefits and drawbacks. They are seldom used together because of the costs mentioned above, despite growing academic evidence of the complimentary benefits. User interfaces are, however, not limited to GUIs, especially with the recent diffusion of innovative typologies of user interfaces (e.g., conversational, voice-recognition, gesture-based and textual UIs) that are still rarely tested by developers; testing techniques can also be distinguished based on the way the test scripts are generated, i.e., if they are written inside JUnit-like test scripts or obtained through the capture of interactions with the SUT, or automatically obtained traversing a model of the user interface, as modern model-based testing tools do it. Test automation is a well-rooted practice in the industrial environment. However, there are software development domains, e.g., web and mobile apps, where UI testing is still not adopted on a systematic basis. The results of many investigations in literature highlighted many reasons for this lack of penetration of the most evolved UI testing techniques among developers: 1 Scarce documentation of the available testing tools; 2 Significant maintenance effort when keeping the test scripts aligned with the evolution of the AUT, e.g., for performing regression testing; 3 Limited perception of the benefits that advanced UI testing techniques yield when confronted with traditional manual testing.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"1 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89731450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Embedded systems have high coupling and dependency among different hardware and software components in heterogeneous layers, which makes location and issue tracking in their testing difficult. Despite these poor verification conditions, even the most important reliability quality verification among embedded system characteristics is verified with insufficient sample size, typical test cases, and general test strategies, following limitations such as development costs and scheduling. As a result, shipments are highly likely to lead to various reliability quality problems because items have not been verified considering reliability quality characteristics. Hence, to address this gap, this study developed remote embedded device test framework on the cloud (RED‐TFC), which has an innovative reliability test manager component that can automatically perform various tests for the evaluation of reliability and performance of distributed shared devices by utilizing the cloud concept. RED‐TFC offers two key enhancements over existing testing services: (i) the adaptive sample scale for reliability test (ASRT), a feature that identifies the most appropriate sample size for performing functionality and reliability tests of remote verification targets connected to the RED‐TFC server; and (ii) the mass sample reliability test (MSRT), which uses a test case that is specific to reliability, with the sample size obtained by ASRT, to perform verification following the Markov prediction process. This paper analyses two Android smartphone models considered the most generic examples, including many embedded components, and presents a method of detecting a high number of reliability problems in smartphones using the proposed RED‐TFC and its implications.
{"title":"Remote embedded devices test framework on the cloud","authors":"Il-Seok Choi, C. Jeong","doi":"10.1002/stvr.1768","DOIUrl":"https://doi.org/10.1002/stvr.1768","url":null,"abstract":"Embedded systems have high coupling and dependency among different hardware and software components in heterogeneous layers, which makes location and issue tracking in their testing difficult. Despite these poor verification conditions, even the most important reliability quality verification among embedded system characteristics is verified with insufficient sample size, typical test cases, and general test strategies, following limitations such as development costs and scheduling. As a result, shipments are highly likely to lead to various reliability quality problems because items have not been verified considering reliability quality characteristics. Hence, to address this gap, this study developed remote embedded device test framework on the cloud (RED‐TFC), which has an innovative reliability test manager component that can automatically perform various tests for the evaluation of reliability and performance of distributed shared devices by utilizing the cloud concept. RED‐TFC offers two key enhancements over existing testing services: (i) the adaptive sample scale for reliability test (ASRT), a feature that identifies the most appropriate sample size for performing functionality and reliability tests of remote verification targets connected to the RED‐TFC server; and (ii) the mass sample reliability test (MSRT), which uses a test case that is specific to reliability, with the sample size obtained by ASRT, to perform verification following the Markov prediction process. This paper analyses two Android smartphone models considered the most generic examples, including many embedded components, and presents a method of detecting a high number of reliability problems in smartphones using the proposed RED‐TFC and its implications.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"14 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73449662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many open‐source software are included in commercial software. Also, several open‐source software are used in the cloud service such as OpenStack and Eucalyptus from standpoint of the unified management, cost reduction and maintainability. In particular, the operation phase of cloud service has a unique feature with uncertainty such as big data and network connectivity, because the operation phase of cloud service changes depending on many external factors. On the other hand, the effective methods of performance assessments for cloud service have only a few presented. Recently, edge computing is the focus of attention because of the problems of connection and processing delay in case of cloud computing. It is known as that cloud computing treats big data. On the other hand, edge computing operates on instant data. We focus on the performance assessments based on the relationship between the cloud and edge services operated by using several open‐source software. Then we propose a two‐dimensional stochastic differential equation model considering the unique features with uncertainty from big data under the operation of cloud and edge services. Also, we analyse actual data to show numerical examples of performance assessments considering the network connectivity as characteristics of cloud and edge services. Moreover, we compare the noise terms of the proposed model for actual data.
{"title":"Performance assessment based on stochastic differential equation and effort data for edge computing","authors":"Y. Tamura, S. Yamada","doi":"10.1002/stvr.1766","DOIUrl":"https://doi.org/10.1002/stvr.1766","url":null,"abstract":"Many open‐source software are included in commercial software. Also, several open‐source software are used in the cloud service such as OpenStack and Eucalyptus from standpoint of the unified management, cost reduction and maintainability. In particular, the operation phase of cloud service has a unique feature with uncertainty such as big data and network connectivity, because the operation phase of cloud service changes depending on many external factors. On the other hand, the effective methods of performance assessments for cloud service have only a few presented. Recently, edge computing is the focus of attention because of the problems of connection and processing delay in case of cloud computing. It is known as that cloud computing treats big data. On the other hand, edge computing operates on instant data. We focus on the performance assessments based on the relationship between the cloud and edge services operated by using several open‐source software. Then we propose a two‐dimensional stochastic differential equation model considering the unique features with uncertainty from big data under the operation of cloud and edge services. Also, we analyse actual data to show numerical examples of performance assessments considering the network connectivity as characteristics of cloud and edge services. Moreover, we compare the noise terms of the proposed model for actual data.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"150 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77402708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since it is common for the users of a web page to access it through a wide variety of devices—including desktops, laptops, tablets and phones—web developers rely on responsive web design (RWD) principles and frameworks to create sites that are useful on all devices. A correctly implemented responsive web page adjusts its layout according to the viewport width of the device in use, thereby ensuring that its design suitably features the content. Since the use of complex RWD frameworks often leads to web pages with hard‐to‐detect responsive layout failures (RLFs), developers employ testing tools that generate reports of potential RLFs. Since testing tools for responsive web pages, like ReDeCheck, analyse a web page representation called the Document Object Model (DOM), they may inadvertently flag concerns that are not human visible, thereby requiring developers to manually confirm and classify each potential RLF as a true positive (TP), false positive (FP), or non‐observable issue (NOI)—a process that is time consuming and error prone. The conference version of this paper presented Viser, a tool that automatically classified three types of RLFs reported by ReDeCheck. Since Viser was not designed to automatically confirm and classify two types of RLFs that ReDeCheck's DOM‐based analysis could surface, this paper introduces Verve, a tool that automatically classifies all RLF types reported by ReDeCheck. Along with manipulating the opacity of HTML elements in a web page, as does Viser, the Verve tool also uses histogram‐based image comparison to classify RLFs in web pages. Incorporating both the 25 web pages used in prior experiments and 20 new pages not previously considered, this paper's empirical study reveals that Verve's classification of all five types of RLFs frequently agrees with classifications produced manually by humans. The experiments also reveal that Verve took on average about 4 s to classify any of the RLFs among the 469 reported by ReDeCheck. Since this paper demonstrates that classifying an RLF as a TP, FP, or NOI with Verve, a publicly available tool, is less subjective and error prone than the same manual process done by a human web developer, we argue that it is well‐suited for supporting the testing of complex responsive web pages.
{"title":"Automated visual classification of DOM‐based presentation failure reports for responsive web pages","authors":"Ibrahim Althomali, G. M. Kapfhammer, Phil McMinn","doi":"10.1002/stvr.1756","DOIUrl":"https://doi.org/10.1002/stvr.1756","url":null,"abstract":"Since it is common for the users of a web page to access it through a wide variety of devices—including desktops, laptops, tablets and phones—web developers rely on responsive web design (RWD) principles and frameworks to create sites that are useful on all devices. A correctly implemented responsive web page adjusts its layout according to the viewport width of the device in use, thereby ensuring that its design suitably features the content. Since the use of complex RWD frameworks often leads to web pages with hard‐to‐detect responsive layout failures (RLFs), developers employ testing tools that generate reports of potential RLFs. Since testing tools for responsive web pages, like ReDeCheck, analyse a web page representation called the Document Object Model (DOM), they may inadvertently flag concerns that are not human visible, thereby requiring developers to manually confirm and classify each potential RLF as a true positive (TP), false positive (FP), or non‐observable issue (NOI)—a process that is time consuming and error prone. The conference version of this paper presented Viser, a tool that automatically classified three types of RLFs reported by ReDeCheck. Since Viser was not designed to automatically confirm and classify two types of RLFs that ReDeCheck's DOM‐based analysis could surface, this paper introduces Verve, a tool that automatically classifies all RLF types reported by ReDeCheck. Along with manipulating the opacity of HTML elements in a web page, as does Viser, the Verve tool also uses histogram‐based image comparison to classify RLFs in web pages. Incorporating both the 25 web pages used in prior experiments and 20 new pages not previously considered, this paper's empirical study reveals that Verve's classification of all five types of RLFs frequently agrees with classifications produced manually by humans. The experiments also reveal that Verve took on average about 4 s to classify any of the RLFs among the 469 reported by ReDeCheck. Since this paper demonstrates that classifying an RLF as a TP, FP, or NOI with Verve, a publicly available tool, is less subjective and error prone than the same manual process done by a human web developer, we argue that it is well‐suited for supporting the testing of complex responsive web pages.","PeriodicalId":49506,"journal":{"name":"Software Testing Verification & Reliability","volume":"2 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2021-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80906269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}