2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)最新文献

FILO: FIx-LOcus Recommendation for Problems Caused by Android Framework Upgrade FILO:修复Android框架升级问题的轨迹建议

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00043

M. Mobilio, O. Riganelli, D. Micucci, L. Mariani

Dealing with the evolution of operating systems is challenging for developers of mobile apps, who have to deal with frequent upgrades that often include backward incompatible changes of the underlying API framework. As a consequence of framework upgrades, apps may show misbehaviours and unexpected crashes once executed within an evolved environment. Identifying the portion of the app that must be modified to correctly execute on a newly released operating system can be challenging. Although incompatibilities are visibile at the level of the interactions between the app and its execution environment, the actual methods to be changed are often located in classes that do not directly interact with any external element. To facilitate debugging activities for problems introduced by backward incompatible upgrades of the operating system, this paper presents FILO, a technique that can recommend the method that must be changed to implement the fix from the analysis of a single failing execution. FILO can also select key symptomatic anomalous events that can help the developer understanding the reason of the failure and facilitate the implementation of the fix. Our evaluation with multiple known compatibility problems introduced by Android upgrades shows that FILO can effectively and efficiently identify the faulty methods in the apps.

处理操作系统的演变对移动应用开发者来说是一个挑战，他们必须处理频繁的升级，这些升级通常包括底层API框架的向后不兼容变化。作为框架升级的结果，应用程序可能会出现错误行为和意外的崩溃，一旦在一个进化的环境中执行。识别应用程序中必须修改才能在新发布的操作系统上正确运行的部分可能具有挑战性。尽管在应用程序与其执行环境之间的交互级别上可以看到不兼容性，但要更改的实际方法通常位于不直接与任何外部元素交互的类中。为了便于对操作系统的向后不兼容升级所带来的问题进行调试，本文介绍了FILO，这是一种技术，它可以从单个失败执行的分析中推荐必须更改的方法来实现修复。FILO还可以选择关键的有症状的异常事件，这些事件可以帮助开发人员理解失败的原因，并促进修复的实现。我们对Android升级带来的多个已知兼容性问题的评估表明，FILO可以有效地识别应用程序中的错误方法。

{"title":"FILO: FIx-LOcus Recommendation for Problems Caused by Android Framework Upgrade","authors":"M. Mobilio, O. Riganelli, D. Micucci, L. Mariani","doi":"10.1109/ISSRE.2019.00043","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00043","url":null,"abstract":"Dealing with the evolution of operating systems is challenging for developers of mobile apps, who have to deal with frequent upgrades that often include backward incompatible changes of the underlying API framework. As a consequence of framework upgrades, apps may show misbehaviours and unexpected crashes once executed within an evolved environment. Identifying the portion of the app that must be modified to correctly execute on a newly released operating system can be challenging. Although incompatibilities are visibile at the level of the interactions between the app and its execution environment, the actual methods to be changed are often located in classes that do not directly interact with any external element. To facilitate debugging activities for problems introduced by backward incompatible upgrades of the operating system, this paper presents FILO, a technique that can recommend the method that must be changed to implement the fix from the analysis of a single failing execution. FILO can also select key symptomatic anomalous events that can help the developer understanding the reason of the failure and facilitate the implementation of the fix. Our evaluation with multiple known compatibility problems introduced by Android upgrades shows that FILO can effectively and efficiently identify the faulty methods in the apps.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129032382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Estimating Return on Investment for GUI Test Automation Frameworks 估计GUI测试自动化框架的投资回报

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00035

Felix Dobslaw, R. Feldt, David Michaëlsson, Patrick Haar, F. D. O. Neto, Richard Torkar

Automated graphical user interface (GUI) tests can reduce manual testing activities and increase test frequency. This motivates the conversion of manual test cases into automated GUI tests. However, it is not clear whether such automation is cost-effective given that GUI automation scripts add to the code base and demand maintenance as a system evolves. In this paper, we introduce a method for estimating maintenance cost and Return on Investment (ROI) for Automated GUI Testing (AGT). The method utilizes the existing source code change history and has the potential to be used for the evaluation of other testing or quality assurance automation technologies. We evaluate the method for a real-world, industrial software system and compare two fundamentally different AGT frameworks, namely Selenium and EyeAutomate, to estimate and compare their ROI. We also report on their defect-finding capabilities and usability. The quantitative data is complemented by interviews with employees at the company the study has been conducted at. The method was successfully applied, and estimated maintenance cost and ROI for both frameworks are reported. Overall, the study supports earlier results showing that implementation time is the leading cost for introducing AGT. The findings further suggest that, while EyeAutomate tests are significantly faster to implement, Selenium tests require more of a programming background but less maintenance.

自动化图形用户界面(GUI)测试可以减少手工测试活动并增加测试频率。这促使将手工测试用例转换为自动化GUI测试。然而，考虑到GUI自动化脚本添加到代码库中，并且随着系统的发展需要维护，这种自动化是否具有成本效益还不清楚。本文介绍了一种估算自动化GUI测试(AGT)维护成本和投资回报率(ROI)的方法。该方法利用了现有的源代码变更历史，并且具有用于评估其他测试或质量保证自动化技术的潜力。我们评估了一个现实世界的方法，工业软件系统，并比较了两个根本不同的AGT框架，即Selenium和eyeautomation，以估计和比较他们的投资回报率。我们还报告了它们的缺陷发现能力和可用性。定量数据是通过对公司员工的访谈来补充的。该方法得到了成功的应用，并报告了两种框架的估计维护成本和ROI。总体而言，该研究支持了先前的结果，即实施时间是引入AGT的主要成本。研究结果进一步表明，虽然eyeautomation测试的实现速度要快得多，但Selenium测试需要更多的编程背景，但需要更少的维护。

{"title":"Estimating Return on Investment for GUI Test Automation Frameworks","authors":"Felix Dobslaw, R. Feldt, David Michaëlsson, Patrick Haar, F. D. O. Neto, Richard Torkar","doi":"10.1109/ISSRE.2019.00035","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00035","url":null,"abstract":"Automated graphical user interface (GUI) tests can reduce manual testing activities and increase test frequency. This motivates the conversion of manual test cases into automated GUI tests. However, it is not clear whether such automation is cost-effective given that GUI automation scripts add to the code base and demand maintenance as a system evolves. In this paper, we introduce a method for estimating maintenance cost and Return on Investment (ROI) for Automated GUI Testing (AGT). The method utilizes the existing source code change history and has the potential to be used for the evaluation of other testing or quality assurance automation technologies. We evaluate the method for a real-world, industrial software system and compare two fundamentally different AGT frameworks, namely Selenium and EyeAutomate, to estimate and compare their ROI. We also report on their defect-finding capabilities and usability. The quantitative data is complemented by interviews with employees at the company the study has been conducted at. The method was successfully applied, and estimated maintenance cost and ROI for both frameworks are reported. Overall, the study supports earlier results showing that implementation time is the leading cost for introducing AGT. The findings further suggest that, while EyeAutomate tests are significantly faster to implement, Selenium tests require more of a programming background but less maintenance.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130614287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Back to Basics - Redefining Quality Measurement for Hybrid Software Development Organizations 回归基础——重新定义混合软件开发组织的质量度量

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00047

Satyabrata Pradhan, Venky Nanniyur

As the software industry transitions from a license-based model to a subscription-based Software-as-aService (SaaS) model, many software development groups are using a hybrid development model that incorporates Agile and Waterfall methodologies in different parts of the organization. The traditional metrics used for measuring software quality in Waterfall or Agile paradigms do not apply to this new hybrid methodology. In addition, to respond to higher quality demands from customers and to gain a competitive advantage in the market, many companies are starting to prioritize quality as a strategic differentiator. As a result, quality metrics are included in the decision-making activities all the way up to the executive level, including Board of Director reviews. This paper presents key challenges associated with measuring software quality in organizations using the hybrid development model. We developed a framework called PIER (Prevention-InspectionEvaluation-Removal) to provide a comprehensive metric definition for hybrid organizations. The framework includes quality measurements, quality enforcement, and quality decision points at different organizational levels and project milestones during the software development life cycle (SDLC). The metrics framework defined in this paper is being used for all Cisco Systems products used in customer premises. Preliminary field metrics data for one of the product groups show quality improvement after implementation of the proposed measurement system.

随着软件行业从基于许可的模型过渡到基于订阅的软件即服务(SaaS)模型，许多软件开发团队正在使用混合开发模型，该模型在组织的不同部分结合了敏捷和瀑布方法。瀑布或敏捷范例中用于度量软件质量的传统度量标准不适用于这种新的混合方法。此外，为了响应客户对质量的更高要求，并在市场上获得竞争优势，许多公司开始优先考虑质量作为战略差异化因素。结果，质量度量被包括在决策活动中，一直到执行级别，包括董事会的审查。本文提出了在使用混合开发模型的组织中与度量软件质量相关的关键挑战。我们开发了一个名为PIER(预防-检查-评估-移除)的框架，为混合型组织提供了一个全面的度量定义。该框架包括软件开发生命周期(SDLC)中不同组织级别和项目里程碑上的质量度量、质量实施和质量决策点。本文中定义的度量框架被用于客户场所中使用的所有思科系统产品。其中一个产品组的初步现场度量数据显示，在实施拟议的测量系统后，质量得到了改善。

{"title":"Back to Basics - Redefining Quality Measurement for Hybrid Software Development Organizations","authors":"Satyabrata Pradhan, Venky Nanniyur","doi":"10.1109/ISSRE.2019.00047","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00047","url":null,"abstract":"As the software industry transitions from a license-based model to a subscription-based Software-as-aService (SaaS) model, many software development groups are using a hybrid development model that incorporates Agile and Waterfall methodologies in different parts of the organization. The traditional metrics used for measuring software quality in Waterfall or Agile paradigms do not apply to this new hybrid methodology. In addition, to respond to higher quality demands from customers and to gain a competitive advantage in the market, many companies are starting to prioritize quality as a strategic differentiator. As a result, quality metrics are included in the decision-making activities all the way up to the executive level, including Board of Director reviews. This paper presents key challenges associated with measuring software quality in organizations using the hybrid development model. We developed a framework called PIER (Prevention-InspectionEvaluation-Removal) to provide a comprehensive metric definition for hybrid organizations. The framework includes quality measurements, quality enforcement, and quality decision points at different organizational levels and project milestones during the software development life cycle (SDLC). The metrics framework defined in this paper is being used for all Cisco Systems products used in customer premises. Preliminary field metrics data for one of the product groups show quality improvement after implementation of the proposed measurement system.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Textout: Detecting Text-Layout Bugs in Mobile Apps via Visualization-Oriented Learning Textout:通过可视化学习检测手机应用中的文本布局错误

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00032

Yaohui Wang, Hui Xu, Yangfan Zhou, Michael R. Lyu, Xin Wang

Layout bugs commonly exist in mobile apps. Due to the fragmentation issues of smartphones, a layout bug may occur only on particular versions of smartphones. It is quite challenging to detect such bugs for state-of-the-art commercial automated testing platforms, although they can test an app with thousands of different smartphones in parallel. The main reason is that typical layout bugs neither crash an app nor generate any error messages. In this paper, we present our work for detecting text-layout bugs, which account for a large portion of layout bugs. We model text-layout bug detection as a classification problem. This then allows us to address it with sophisticated image processing and machine learning techniques. To this end, we propose an approach which we call Textout. Textout takes screenshots as its input and adopts a specifically-tailored text detection method and a convolutional neural network (CNN) classifier to perform automatic text-layout bug detection. We collect 33,102 text-region images as our training dataset and verify the effectiveness of our tool with 1,481 text-region images collected from real-world apps. Textout achieves an AUC (area under the curve) of 0.956 on the test dataset and shows an acceptable overhead. The dataset is open-source released for follow-up research.

布局错误通常存在于移动应用程序中。由于智能手机的碎片化问题，布局错误可能只发生在特定版本的智能手机上。对于最先进的商业自动化测试平台来说，检测这些漏洞相当具有挑战性，尽管它们可以在数千种不同的智能手机上并行测试应用程序。主要原因是典型的布局bug既不会导致应用崩溃，也不会产生任何错误信息。在本文中，我们介绍了检测文本布局错误的工作，它占布局错误的很大一部分。我们将文本布局错误检测建模为分类问题。这样我们就可以用复杂的图像处理和机器学习技术来解决这个问题。为此，我们提出了一种方法，我们称之为Textout。Textout以截图为输入，采用专门定制的文本检测方法和卷积神经网络(CNN)分类器自动检测文本布局错误。我们收集了33,102个文本区域图像作为我们的训练数据集，并使用从实际应用中收集的1,481个文本区域图像验证了我们的工具的有效性。Textout在测试数据集上实现了0.956的AUC(曲线下面积)，并显示出可接受的开销。该数据集是开源的，供后续研究使用。

{"title":"Textout: Detecting Text-Layout Bugs in Mobile Apps via Visualization-Oriented Learning","authors":"Yaohui Wang, Hui Xu, Yangfan Zhou, Michael R. Lyu, Xin Wang","doi":"10.1109/ISSRE.2019.00032","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00032","url":null,"abstract":"Layout bugs commonly exist in mobile apps. Due to the fragmentation issues of smartphones, a layout bug may occur only on particular versions of smartphones. It is quite challenging to detect such bugs for state-of-the-art commercial automated testing platforms, although they can test an app with thousands of different smartphones in parallel. The main reason is that typical layout bugs neither crash an app nor generate any error messages. In this paper, we present our work for detecting text-layout bugs, which account for a large portion of layout bugs. We model text-layout bug detection as a classification problem. This then allows us to address it with sophisticated image processing and machine learning techniques. To this end, we propose an approach which we call Textout. Textout takes screenshots as its input and adopts a specifically-tailored text detection method and a convolutional neural network (CNN) classifier to perform automatic text-layout bug detection. We collect 33,102 text-region images as our training dataset and verify the effectiveness of our tool with 1,481 text-region images collected from real-world apps. Textout achieves an AUC (area under the curve) of 0.956 on the test dataset and shows an acceptable overhead. The dataset is open-source released for follow-up research.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133864393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Test Case Generation Based on Client-Server of Web Applications by Memetic Algorithm 基于Memetic算法的Web应用客户机-服务器测试用例生成

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00029

Wen Wang, Xiaohong Guo, Zheng Li, Ruilian Zhao

Currently, more than 90% web applications are potentially vulnerable to attacks from both the client side and server side. Test case generation plays a crucial role in testing web applications, where most existing studies focus on test case generation either from client-side or from server-side to detect vulnerabilities, regardless of the interactions between client and server. Consequently, it is difficult for those test cases to discover certain faults which involve both client and server. In this paper, the server-side sensitive paths are considered as vulnerable code paths due to insufficient or erroneous filtering mechanisms. An evolutionary testing approach based on the memetic algorithm is proposed to connect the server-side and client-side, in which test cases are generated from the client-side behavior model, while guided by the coverage of sensitive paths from server-side. The experiments are conducted on four open source web applications, and the results demonstrate that our approach can generate test cases from the client-side behavior model that can cover the server-side sensitive paths, on which the vulnerabilities can be detected more effectively.

目前，超过90%的web应用程序可能容易受到来自客户端和服务器端的攻击。测试用例生成在测试web应用程序中起着至关重要的作用，大多数现有的研究都集中在从客户端或服务器端生成测试用例以检测漏洞，而不考虑客户端和服务器之间的交互。因此，这些测试用例很难发现涉及客户端和服务器的某些错误。由于过滤机制不足或错误，本文将服务器端敏感路径视为易受攻击的代码路径。提出了一种基于模因算法的连接服务器端和客户端的进化测试方法，该方法从客户端行为模型中生成测试用例，同时以服务器端敏感路径的覆盖为指导。在四个开源web应用程序上进行了实验，结果表明，我们的方法可以从客户端行为模型生成覆盖服务器端敏感路径的测试用例，从而可以更有效地检测到漏洞。

{"title":"Test Case Generation Based on Client-Server of Web Applications by Memetic Algorithm","authors":"Wen Wang, Xiaohong Guo, Zheng Li, Ruilian Zhao","doi":"10.1109/ISSRE.2019.00029","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00029","url":null,"abstract":"Currently, more than 90% web applications are potentially vulnerable to attacks from both the client side and server side. Test case generation plays a crucial role in testing web applications, where most existing studies focus on test case generation either from client-side or from server-side to detect vulnerabilities, regardless of the interactions between client and server. Consequently, it is difficult for those test cases to discover certain faults which involve both client and server. In this paper, the server-side sensitive paths are considered as vulnerable code paths due to insufficient or erroneous filtering mechanisms. An evolutionary testing approach based on the memetic algorithm is proposed to connect the server-side and client-side, in which test cases are generated from the client-side behavior model, while guided by the coverage of sensitive paths from server-side. The experiments are conducted on four open source web applications, and the results demonstrate that our approach can generate test cases from the client-side behavior model that can cover the server-side sensitive paths, on which the vulnerabilities can be detected more effectively.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129405421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Criteria to Systematically Evaluate (Safety) Assurance Cases 系统评估(安全)保证个案的准则

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00045

T. Chowdhury, Alan Wassyng, R. Paige, M. Lawford

An assurance case (AC) captures explicit reasoning associated with assuring critical properties, such as safety. A vital attribute of an AC is that it facilitates the identification of fallacies in the validity of any claim. There is considerable published research related to confidence in ACs, which primarily relate to a measure of soundness of reasoning. Evaluation of an AC is more general than measuring confidence and considers multiple aspects of the quality of an AC. Evaluation criteria thus play a significant role in making the evaluation process more systematic. This paper contributes to the identification of effective evaluation criteria for ACs, the rationale for their use, and initial tests of the criteria on existing ACs. We classify these criteria as to whether they apply to the structure of the AC, or to the content of the AC. This paper focuses on safety as the critical property to be assured, but only a very small number of the criteria are specific to safety, and can serve as placeholders for evaluation criteria specific to other critical properties. All of the other evaluation criteria are generic. This separation is useful when evaluating ACs developed using different notations, and when evaluating ACs against safety standards. We explore the rationale for these criteria as well as the way they are used by the developers of the AC and also when they are used by a third-party evaluator.

保证用例(AC)捕获与保证关键属性(如安全性)相关的显式推理。AC的一个重要属性是，它有助于识别任何主张有效性中的谬误。有相当多的已发表的研究与ACs的信心有关，这主要与推理合理性的衡量有关。对交流对象的评价比测量信心更为普遍，并考虑交流对象质量的多个方面。因此，评价标准在使评价过程更加系统化方面发挥着重要作用。本文有助于确定有效的空气污染评估标准、使用这些标准的理由，以及对现有空气污染标准进行初步测试。我们根据它们是否适用于AC的结构或AC的内容对这些标准进行分类。本文主要将安全性作为要保证的关键属性，但只有极少数标准特定于安全性，并且可以作为特定于其他关键属性的评估标准的占位符。所有其他评价标准都是通用的。在评估使用不同符号开发的ac时，以及根据安全标准评估ac时，这种分离是有用的。我们探讨了这些标准的基本原理，以及AC开发人员使用它们的方式，以及第三方评估人员使用它们的方式。

{"title":"Criteria to Systematically Evaluate (Safety) Assurance Cases","authors":"T. Chowdhury, Alan Wassyng, R. Paige, M. Lawford","doi":"10.1109/ISSRE.2019.00045","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00045","url":null,"abstract":"An assurance case (AC) captures explicit reasoning associated with assuring critical properties, such as safety. A vital attribute of an AC is that it facilitates the identification of fallacies in the validity of any claim. There is considerable published research related to confidence in ACs, which primarily relate to a measure of soundness of reasoning. Evaluation of an AC is more general than measuring confidence and considers multiple aspects of the quality of an AC. Evaluation criteria thus play a significant role in making the evaluation process more systematic. This paper contributes to the identification of effective evaluation criteria for ACs, the rationale for their use, and initial tests of the criteria on existing ACs. We classify these criteria as to whether they apply to the structure of the AC, or to the content of the AC. This paper focuses on safety as the critical property to be assured, but only a very small number of the criteria are specific to safety, and can serve as placeholders for evaluation criteria specific to other critical properties. All of the other evaluation criteria are generic. This separation is useful when evaluating ACs developed using different notations, and when evaluating ACs against safety standards. We explore the rationale for these criteria as well as the way they are used by the developers of the AC and also when they are used by a third-party evaluator.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123923116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

On the Density and Diversity of Degradation Symptoms in Refactored Classes: A Multi-case Study 重构类中退化症状的密度和多样性:多案例研究

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00042

W. Oizumi, L. Sousa, Anderson Oliveira, L. Carvalho, Alessandro F. Garcia, T. Colanzi, R. Oliveira

Root canal refactoring is a software development activity that is intended to improve dependability-related attributes such as modifiability and reusability. Despite being an activity that contributes to these attributes, deciding when applying root canal refactoring is far from trivial. In fact, finding which elements should be refactored is not a cut-and-dried task. One of the main reasons is the lack of consensus on which characteristics indicate the presence of structural degradation. Thus, we evaluated whether the density and diversity of multiple automatically detected symptoms can be used as consistent indicators of the need for root canal refactoring. To achieve our goal, we conducted a multi-case exploratory study involving 6 open source systems and 2 systems from our industry partners. For each system, we identified the classes that were changed through one or more root canal refactorings. After that, we compared refactored and non-refactored classes with respect to the density and diversity of degradation symptoms. We also investigated if the most recurrent combinations of symptoms in refactored classes can be used as strong indicators of structural degradation. Our results show that refactored classes usually present higher density and diversity of symptoms than non-refactored classes. However, root canal refactorings that are performed by developers in practice may not be enough for reducing degradation, since the vast majority had little to no impact on the density and diversity of symptoms. Finally, we observed that symptom combinations in refactored classes are similar to the combinations in non-refactored classes. Based on our findings, we elicited an initial set of requirements for automatically recommending root canal refactorings.

根管重构是一种软件开发活动，旨在改进与可靠性相关的属性，如可修改性和可重用性。尽管根管重构是一种有助于实现这些属性的活动，但决定何时应用根管重构绝非易事。事实上，找出应该重构哪些元素并不是一项简单的任务。其中一个主要原因是对哪些特征表明存在结构退化缺乏共识。因此，我们评估了多个自动检测到的症状的密度和多样性是否可以作为需要进行根管重构的一致指标。为了实现我们的目标，我们进行了多案例探索性研究，涉及6个开源系统和2个来自行业合作伙伴的系统。对于每个系统，我们确定了通过一个或多个根管重构所更改的类。之后，我们就退化症状的密度和多样性比较了重构类和非重构类。我们还研究了重构类中最常见的症状组合是否可以作为结构退化的有力指标。我们的结果表明，重构类通常比非重构类表现出更高的密度和多样性的症状。然而，开发人员在实践中进行的根管重构可能不足以减少退化，因为绝大多数根管重构对症状的密度和多样性几乎没有影响。最后，我们观察到重构类中的症状组合与非重构类中的症状组合相似。基于我们的发现，我们引出了一组自动推荐根管重构的初始需求。

{"title":"On the Density and Diversity of Degradation Symptoms in Refactored Classes: A Multi-case Study","authors":"W. Oizumi, L. Sousa, Anderson Oliveira, L. Carvalho, Alessandro F. Garcia, T. Colanzi, R. Oliveira","doi":"10.1109/ISSRE.2019.00042","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00042","url":null,"abstract":"Root canal refactoring is a software development activity that is intended to improve dependability-related attributes such as modifiability and reusability. Despite being an activity that contributes to these attributes, deciding when applying root canal refactoring is far from trivial. In fact, finding which elements should be refactored is not a cut-and-dried task. One of the main reasons is the lack of consensus on which characteristics indicate the presence of structural degradation. Thus, we evaluated whether the density and diversity of multiple automatically detected symptoms can be used as consistent indicators of the need for root canal refactoring. To achieve our goal, we conducted a multi-case exploratory study involving 6 open source systems and 2 systems from our industry partners. For each system, we identified the classes that were changed through one or more root canal refactorings. After that, we compared refactored and non-refactored classes with respect to the density and diversity of degradation symptoms. We also investigated if the most recurrent combinations of symptoms in refactored classes can be used as strong indicators of structural degradation. Our results show that refactored classes usually present higher density and diversity of symptoms than non-refactored classes. However, root canal refactorings that are performed by developers in practice may not be enough for reducing degradation, since the vast majority had little to no impact on the density and diversity of symptoms. Finally, we observed that symptom combinations in refactored classes are similar to the combinations in non-refactored classes. Based on our findings, we elicited an initial set of requirements for automatically recommending root canal refactorings.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128455093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Inferring Performance Bug Patterns from Developer Commits 从开发人员提交推断性能错误模式

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00017

Yiqun Chen, Stefan Winter, N. Suri

Performance bugs, i.e., program source code that is unnecessarily inefficient, have received significant attention by the research community in recent years. A number of empirical studies have investigated how these bugs differ from "ordinary" bugs that cause functional deviations and several approaches to aid their detection, localization, and removal have been proposed. Many of these approaches focus on certain subclasses of performance bugs, e.g., those resulting from redundant computations or unnecessary synchronization, and the evaluation of their effectiveness is usually limited to a small number of known instances of these bugs. To provide researchers working on performance bug detection and localization techniques with a larger corpus of performance bugs to evaluate against, we conduct a study of more than 700 performance bug fixing commits across 13 popular open source projects written in C and C++ and investigate the relative frequency of bug types as well as their complexity. Our results show that many of these fixes follow a small set of bug patterns, that they are contributed by experienced developers, and that the number of lines needed to fix performance bugs is highly project dependent.

性能缺陷，即不必要的低效程序源代码，近年来受到了研究团体的极大关注。许多实证研究已经调查了这些bug与导致功能偏差的“普通”bug的区别，并提出了几种帮助检测、定位和去除它们的方法。这些方法中的许多都关注于性能缺陷的某些子类，例如，那些由冗余计算或不必要的同步引起的缺陷，并且对其有效性的评估通常仅限于这些缺陷的少数已知实例。为了给研究性能缺陷检测和定位技术的研究人员提供更大的性能缺陷语料库来进行评估，我们对13个用C和c++编写的流行开源项目中的700多个性能缺陷修复提交进行了研究，并调查了缺陷类型的相对频率及其复杂性。我们的结果表明，许多这些修复都遵循一小部分错误模式，它们是由经验丰富的开发人员贡献的，并且修复性能错误所需的行数高度依赖于项目。

{"title":"Inferring Performance Bug Patterns from Developer Commits","authors":"Yiqun Chen, Stefan Winter, N. Suri","doi":"10.1109/ISSRE.2019.00017","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00017","url":null,"abstract":"Performance bugs, i.e., program source code that is unnecessarily inefficient, have received significant attention by the research community in recent years. A number of empirical studies have investigated how these bugs differ from \"ordinary\" bugs that cause functional deviations and several approaches to aid their detection, localization, and removal have been proposed. Many of these approaches focus on certain subclasses of performance bugs, e.g., those resulting from redundant computations or unnecessary synchronization, and the evaluation of their effectiveness is usually limited to a small number of known instances of these bugs. To provide researchers working on performance bug detection and localization techniques with a larger corpus of performance bugs to evaluate against, we conduct a study of more than 700 performance bug fixing commits across 13 popular open source projects written in C and C++ and investigate the relative frequency of bug types as well as their complexity. Our results show that many of these fixes follow a small set of bug patterns, that they are contributed by experienced developers, and that the number of lines needed to fix performance bugs is highly project dependent.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"44 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114081392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Machine Learning and Constraint Solving for Automated Form Testing 自动化表单测试的机器学习和约束求解

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00030

D. Santiago, Justin Phillips, Patrick Alt, Brian R. Muras, Tariq M. King, Peter J. Clarke

In recent years there has been a focus on the automatic generation of test cases using white box testing techniques, however the same cannot be said for the generation of test cases at the system-level from natural language system requirements. Some of the white-box techniques include: the use of constraint solvers for the automatic generation of test inputs at the white box level; the use of control flow graphs generated from code; and the use of path generation and symbolic execution to generate test inputs and test for path feasibility. Techniques such as boundary value analysis (BVA) may also be used for generating stronger test suites. However, for black box testing we rely on specifications or implicit requirements and spend considerable time and effort designing and executing test cases. This paper presents an approach that leverages natural language processing and machine learning techniques to capture black box system behavior in the form of constraints. Constraint solvers are then used to generate test cases using BVA and equivalence class partitioning. We also conduct a proof of concept that applies this approach to a simplified task management application and an enterprise job recruiting application.

近年来，人们开始关注使用白盒测试技术自动生成测试用例，然而，从自然语言系统需求生成系统级的测试用例却并非如此。一些白盒技术包括:在白盒级别使用约束求解器自动生成测试输入;使用控制流程图生成代码;并使用路径生成和符号执行来生成测试输入并测试路径的可行性。边界值分析(BVA)等技术也可以用于生成更强的测试套件。然而，对于黑盒测试，我们依赖于规格说明或隐式需求，并花费大量的时间和精力来设计和执行测试用例。本文提出了一种利用自然语言处理和机器学习技术以约束形式捕获黑箱系统行为的方法。然后使用约束求解器使用BVA和等价类划分来生成测试用例。我们还进行了概念验证，将此方法应用于简化的任务管理应用程序和企业职位招聘应用程序。

{"title":"Machine Learning and Constraint Solving for Automated Form Testing","authors":"D. Santiago, Justin Phillips, Patrick Alt, Brian R. Muras, Tariq M. King, Peter J. Clarke","doi":"10.1109/ISSRE.2019.00030","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00030","url":null,"abstract":"In recent years there has been a focus on the automatic generation of test cases using white box testing techniques, however the same cannot be said for the generation of test cases at the system-level from natural language system requirements. Some of the white-box techniques include: the use of constraint solvers for the automatic generation of test inputs at the white box level; the use of control flow graphs generated from code; and the use of path generation and symbolic execution to generate test inputs and test for path feasibility. Techniques such as boundary value analysis (BVA) may also be used for generating stronger test suites. However, for black box testing we rely on specifications or implicit requirements and spend considerable time and effort designing and executing test cases. This paper presents an approach that leverages natural language processing and machine learning techniques to capture black box system behavior in the form of constraints. Constraint solvers are then used to generate test cases using BVA and equivalence class partitioning. We also conduct a proof of concept that applies this approach to a simplified task management application and an enterprise job recruiting application.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122475037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Evaluation of Anomaly Detection Algorithms Made Easy with RELOAD 重载使得异常检测算法的评估变得容易

2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)

Pub Date : 2019-10-01 DOI: 10.1109/ISSRE.2019.00051

T. Zoppi, A. Ceccarelli, A. Bondavalli

Anomaly detection aims at identifying patterns in data that do not conform to the expected behavior. Despite anomaly detection has been arising as one of the most powerful techniques to suspect attacks or failures, dedicated support for the experimental evaluation is actually scarce. In fact, existing frameworks are mostly intended for the broad purposes of data mining and machine learning. Intuitive tools tailored for evaluating anomaly detection algorithms for failure and attack detection with an intuitive support to sliding windows are currently missing. This paper presents RELOAD, a flexible and intuitive tool for the Rapid EvaLuation Of Anomaly Detection algorithms. RELOAD is able to automatically i) fetch data from an existing data set, ii) identify the most informative features of the data set, iii) run anomaly detection algorithms, including those based on sliding windows, iv) apply multiple strategies to features and decide on anomalies, and v) provide conclusive results following an extensive set of metrics, along with plots of algorithms scores. Finally, RELOAD includes a simple GUI to set up the experiments and examine results. After describing the structure of the tool and detailing inputs and outputs of RELOAD, we exercise RELOAD to analyze an intrusion detection dataset available on a public platform, showing its setup, metric scores and plots.

异常检测的目的是识别数据中不符合预期行为的模式。尽管异常检测已经成为怀疑攻击或故障的最强大的技术之一，但对实验评估的专门支持实际上很少。事实上，现有的框架主要用于数据挖掘和机器学习的广泛目的。用于评估故障和攻击检测的异常检测算法的直观工具，以及对滑动窗口的直观支持，目前还缺乏。重载是一种灵活、直观的异常检测算法快速评估工具。RELOAD能够自动i)从现有数据集中获取数据，ii)识别数据集中最具信息量的特征，iii)运行异常检测算法，包括基于滑动窗口的算法，iv)对特征应用多种策略并决定异常，以及v)根据广泛的指标集提供结结性结果，以及算法得分图。最后，RELOAD包含一个简单的GUI，用于设置实验和检查结果。在描述了工具的结构和重载的详细输入和输出之后，我们使用重载来分析公共平台上可用的入侵检测数据集，显示其设置，度量分数和图。

{"title":"Evaluation of Anomaly Detection Algorithms Made Easy with RELOAD","authors":"T. Zoppi, A. Ceccarelli, A. Bondavalli","doi":"10.1109/ISSRE.2019.00051","DOIUrl":"https://doi.org/10.1109/ISSRE.2019.00051","url":null,"abstract":"Anomaly detection aims at identifying patterns in data that do not conform to the expected behavior. Despite anomaly detection has been arising as one of the most powerful techniques to suspect attacks or failures, dedicated support for the experimental evaluation is actually scarce. In fact, existing frameworks are mostly intended for the broad purposes of data mining and machine learning. Intuitive tools tailored for evaluating anomaly detection algorithms for failure and attack detection with an intuitive support to sliding windows are currently missing. This paper presents RELOAD, a flexible and intuitive tool for the Rapid EvaLuation Of Anomaly Detection algorithms. RELOAD is able to automatically i) fetch data from an existing data set, ii) identify the most informative features of the data set, iii) run anomaly detection algorithms, including those based on sliding windows, iv) apply multiple strategies to features and decide on anomalies, and v) provide conclusive results following an extensive set of metrics, along with plots of algorithms scores. Finally, RELOAD includes a simple GUI to set up the experiments and examine results. After describing the structure of the tool and detailing inputs and outputs of RELOAD, we exercise RELOAD to analyze an intrusion detection dataset available on a public platform, showing its setup, metric scores and plots.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115028049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15