2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)最新文献

英文中文

Surveying the Developer Experience of Flaky Tests 调查开发人员的片状测试经验

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3513037

Michael C Hilton

Test cases that pass and fail without changes to the code under test are known as flaky. The past decade has seen increasing research interest in flaky tests, though little attention has been afforded to the views and experiences of software developers. In this study, we utilized a multi-source approach to obtain insights into how developers define flaky tests, their experiences of the impacts and causes of flaky tests, and the actions they take in response to them. To that end, we designed a literature-guided developer survey that we deployed on social media, receiving 170 total responses. We also searched on StackOverflow and analyzed 38 threads relevant to flaky tests, offering a distinct perspective free of any self-reporting bias. Through a mixture of numerical and thematic analyses, this study reveals a number of findings, including (1) developers strongly agree that flaky tests hinder continuous integration; (2) developers who experience flaky tests more often may be more likely to ignore potentially genuine test failures; and (3) developers rate issues in setup and teardown to be the most common causes of flaky tests.

测试用例通过或失败而不更改被测代码被称为片状的。在过去的十年中，人们对片状测试的研究兴趣越来越大，尽管很少有人关注软件开发人员的观点和经验。在本研究中，我们利用多源方法来深入了解开发人员如何定义片状测试，他们对片状测试的影响和原因的经验，以及他们为响应这些测试而采取的行动。为此，我们设计了一份以文献为指导的开发者调查，并将其投放到社交媒体上，共收到170份回复。我们还在StackOverflow上搜索并分析了38个与片状测试相关的线程，提供了一个没有任何自我报告偏见的独特视角。通过数字和主题分析的混合，本研究揭示了一些发现，包括(1)开发人员强烈同意片状测试阻碍持续集成;(2)经常经历不可靠测试的开发人员更有可能忽略潜在的真正的测试失败;(3)开发人员认为安装和拆卸过程中的问题是导致测试不稳定的最常见原因。

{"title":"Surveying the Developer Experience of Flaky Tests","authors":"Michael C Hilton","doi":"10.1145/3510457.3513037","DOIUrl":"https://doi.org/10.1145/3510457.3513037","url":null,"abstract":"Test cases that pass and fail without changes to the code under test are known as flaky. The past decade has seen increasing research interest in flaky tests, though little attention has been afforded to the views and experiences of software developers. In this study, we utilized a multi-source approach to obtain insights into how developers define flaky tests, their experiences of the impacts and causes of flaky tests, and the actions they take in response to them. To that end, we designed a literature-guided developer survey that we deployed on social media, receiving 170 total responses. We also searched on StackOverflow and analyzed 38 threads relevant to flaky tests, offering a distinct perspective free of any self-reporting bias. Through a mixture of numerical and thematic analyses, this study reveals a number of findings, including (1) developers strongly agree that flaky tests hinder continuous integration; (2) developers who experience flaky tests more often may be more likely to ignore potentially genuine test failures; and (3) developers rate issues in setup and teardown to be the most common causes of flaky tests.","PeriodicalId":119790,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116851644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Looking for Lacunae in Bitcoin Core's Fuzzing Efforts 寻找比特币核心模糊测试工作的漏洞

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3513072

Alex Groce

Bitcoin is one of the most prominent distributed software systems in the world. This paper describes an effort to investigate and enhance the effectiveness of the Bitcoin Core fuzzing effort. The effort initially began as a query about how to escape saturation in the fuzzing effort, but developed into a more general exploration. This paper summarizes the outcomes of a two-week focused effort. While the effort found no smoking guns indicating major test/fuzz weaknesses, it produced a large number of additional fuzz corpus entries, increased the set of fuzzers used for Bitcoin Core, and ran mutation analysis of Bitcoin Core fuzz targets, with a comparison to Bitcoin functional tests and other cryptocurrencies’ tests. Our conclusion is that for high quality fuzzing efforts, improvements to the oracle may be the best way to get more out of fuzzing.

比特币是世界上最著名的分布式软件系统之一。本文描述了一种调查和提高比特币核心模糊工作有效性的努力。这项工作最初是作为一个关于如何在模糊工作中避免饱和的问题开始的，但后来发展成为一个更广泛的探索。本文总结了为期两周的重点工作的成果。虽然这项工作没有发现表明主要测试/模糊弱点的确凿证据，但它产生了大量额外的模糊语料库条目，增加了用于比特币核心的模糊器集，并对比特币核心模糊目标进行了突变分析，并与比特币功能测试和其他加密货币的测试进行了比较。我们的结论是，对于高质量的模糊测试工作，改进oracle可能是从模糊测试中获得更多的最好方法。

引用次数: 3

An Empirical Study on Quality Issues of eBay's Big Data SQL Analytics Platform eBay大数据SQL分析平台质量问题实证研究

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3513034

Feng Zhu, Lijie Xu, Gang Ma, Shuping Ji, Jie Wang, Gang Wang, Hongyi Zhang, K. Wan, Ming-ming Wang, Xingchao Zhang, Yuming Wang, Jingpin Li

Big data SQL analytics platform has evolved as the key infrastructure for business data analysis. Compared with traditional costly commercial RDBMS, scalable solutions with open-source projects, such as SQL-on-Hadoop, are more popular and attractive to enter-prises. In eBay, we build Carmel, a company-wide interactive SQL analytics platform based on Apache Spark. Carmel has been serving thousands of customers from hundreds of teams globally for more than 3 years. Meanwhile, despite the popularity of open-source based big data SQL analytics platforms, few empirical studies on service quality issues (e.g., job failure) were carried out for them. However, a deep understanding of service quality issues and taking right mitigation are significant to the ease of manual maintenance efforts. To fill this gap, we conduct a comprehensive empirical study on 1,884 real-word service quality issues from Carmel. We summa-rize the common symptoms and identify the root causes with typical cases. Stakeholders including system developers, researchers, and platform maintainers can benefit from our findings and implications. Furthermore, we also present lessons learned from critical cases in our daily practice, as well as insights to motivate automatic tool support and future research directions.

大数据SQL分析平台已经发展成为商业数据分析的关键基础设施。与传统昂贵的商业关系型数据库管理系统相比，采用开源项目的可扩展解决方案，如SQL-on-Hadoop，更受企业欢迎，也更有吸引力。在eBay，我们构建了Carmel，一个基于Apache Spark的全公司范围的交互式SQL分析平台。Carmel已经为全球数百个团队的数千名客户提供了超过3年的服务。同时，尽管基于开源的大数据SQL分析平台很受欢迎，但针对其服务质量问题(如作业失败)的实证研究却很少。但是，深刻理解服务质量问题并采取正确的缓解措施对于简化人工维护工作非常重要。为了填补这一空白，我们对卡梅尔1,884个真实的服务质量问题进行了全面的实证研究。我们总结了常见的症状，并通过典型案例找出了根本原因。包括系统开发人员、研究人员和平台维护者在内的涉众可以从我们的发现和暗示中受益。此外，我们还介绍了我们在日常实践中从关键案例中吸取的经验教训，以及激励自动化工具支持和未来研究方向的见解。

{"title":"An Empirical Study on Quality Issues of eBay's Big Data SQL Analytics Platform","authors":"Feng Zhu, Lijie Xu, Gang Ma, Shuping Ji, Jie Wang, Gang Wang, Hongyi Zhang, K. Wan, Ming-ming Wang, Xingchao Zhang, Yuming Wang, Jingpin Li","doi":"10.1145/3510457.3513034","DOIUrl":"https://doi.org/10.1145/3510457.3513034","url":null,"abstract":"Big data SQL analytics platform has evolved as the key infrastructure for business data analysis. Compared with traditional costly commercial RDBMS, scalable solutions with open-source projects, such as SQL-on-Hadoop, are more popular and attractive to enter-prises. In eBay, we build Carmel, a company-wide interactive SQL analytics platform based on Apache Spark. Carmel has been serving thousands of customers from hundreds of teams globally for more than 3 years. Meanwhile, despite the popularity of open-source based big data SQL analytics platforms, few empirical studies on service quality issues (e.g., job failure) were carried out for them. However, a deep understanding of service quality issues and taking right mitigation are significant to the ease of manual maintenance efforts. To fill this gap, we conduct a comprehensive empirical study on 1,884 real-word service quality issues from Carmel. We summa-rize the common symptoms and identify the root causes with typical cases. Stakeholders including system developers, researchers, and platform maintainers can benefit from our findings and implications. Furthermore, we also present lessons learned from critical cases in our daily practice, as well as insights to motivate automatic tool support and future research directions.","PeriodicalId":119790,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130846583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automatic Anti-Pattern Detection in Microservice Architectures Based on Distributed Tracing 基于分布式跟踪的微服务体系结构自动反模式检测

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3513066

Tim Hübener, M. Chaudron, Yaping Luo, Pieter Vallen, Jonck van der Kogel, Tom Liefheid

The successful use of microservice-based applications by large companies has popularized this architectural style. One problem with the microservice architecture is that current techniques for visualising- and detecting anti-pattern are inadequate. This study contributes a method and tool for detecting anti-patterns in microservice architecture based on distributed execution traces. We demonstrate this on an industrial case study.

大型公司对基于微服务的应用程序的成功使用使这种架构风格得到了推广。微服务架构的一个问题是，当前用于可视化和检测反模式的技术是不够的。本研究提供了一种基于分布式执行轨迹的微服务体系结构反模式检测方法和工具。我们通过一个工业案例研究来证明这一点。

引用次数: 2

Bug Tracking Process Smells In Practice Bug跟踪过程在实践中很糟糕

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3513080

Erdem Tuna, V. Kovalenko, Eray Tüzün

Software teams use bug tracking (BT) tools to report and manage bugs. Each record in a bug tracking system (BTS) is a reporting entity consisting of several information fields. The contents of the reports are similar across different tracking tools, though not the same. The variation in the workflow between teams prevents defining an ideal process of running BTS. Nevertheless, there are best practices reported both in white and gray literature. Developer teams may not adopt the best practices in their BT process. This study investigates the non-compliance of developers with best practices, so-called smells, in the BT process. We mine bug reports of four projects in the BTS of JetBrains, a software company, to observe the prevalence of BT smells in an industrial setting. Also, we survey developers to see (1) if they recognize the smells, (2) their perception of the severity of the smells, and (3) the potential benefits of a BT process smell detection tool. We found that (1) smells occur, and their detection requires a solid understanding of the BT practices of the projects, (2) smell severity perception varies across smell types, and (3) developers considered that a smell detection tool would be useful for six out of the 12 smell categories.

软件团队使用bug跟踪(BT)工具来报告和管理bug。bug跟踪系统(BTS)中的每条记录都是由几个信息字段组成的报告实体。报告的内容在不同的跟踪工具之间是相似的，尽管不相同。团队之间工作流程的差异阻碍了定义运行BTS的理想流程。尽管如此，在白色和灰色文献中都有最佳实践报告。开发团队可能不会在他们的BT过程中采用最佳实践。本研究调查了开发人员在BT过程中不遵守最佳实践的情况，即所谓的气味。我们在软件公司JetBrains的BTS中挖掘了四个项目的bug报告，以观察BT气味在工业环境中的流行程度。此外，我们还调查了开发人员，以了解(1)他们是否识别气味，(2)他们对气味严重程度的感知，以及(3)BT过程气味检测工具的潜在好处。我们发现(1)气味是存在的，它们的检测需要对项目的BT实践有深入的了解，(2)气味严重程度的感知因气味类型而异，(3)开发人员认为气味检测工具将对12种气味类别中的6种有用。

{"title":"Bug Tracking Process Smells In Practice","authors":"Erdem Tuna, V. Kovalenko, Eray Tüzün","doi":"10.1145/3510457.3513080","DOIUrl":"https://doi.org/10.1145/3510457.3513080","url":null,"abstract":"Software teams use bug tracking (BT) tools to report and manage bugs. Each record in a bug tracking system (BTS) is a reporting entity consisting of several information fields. The contents of the reports are similar across different tracking tools, though not the same. The variation in the workflow between teams prevents defining an ideal process of running BTS. Nevertheless, there are best practices reported both in white and gray literature. Developer teams may not adopt the best practices in their BT process. This study investigates the non-compliance of developers with best practices, so-called smells, in the BT process. We mine bug reports of four projects in the BTS of JetBrains, a software company, to observe the prevalence of BT smells in an industrial setting. Also, we survey developers to see (1) if they recognize the smells, (2) their perception of the severity of the smells, and (3) the potential benefits of a BT process smell detection tool. We found that (1) smells occur, and their detection requires a solid understanding of the BT practices of the projects, (2) smell severity perception varies across smell types, and (3) developers considered that a smell detection tool would be useful for six out of the 12 smell categories.","PeriodicalId":119790,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127388021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

AI for Automated Code Updates 用于自动代码更新的AI

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3513073

Salwa Alamir, Petr Babkin, N. Navarro, Sameena Shah

Most modern code bases extensively rely on external libraries to provide robust functionality out of the box. When these libraries are updated they can sometimes introduce breaking changes in the process, which require extensive developer maintenance. To mitigate this we propose to use artificial intelligence to parse the text of release notes to capture code deprecations in structured form. This, in turn, enables us to develop an IDE plugin that can automatically detect deprecated library usages in live code bases and even suggest recommended fixes. We evaluated our system on over 30 internal projects within J.P. Morgan.

大多数现代代码库广泛依赖外部库来提供开箱即用的健壮功能。当这些库更新时，它们有时会在流程中引入破坏性的更改，这需要大量的开发人员维护。为了缓解这个问题，我们建议使用人工智能来解析发行说明的文本，以结构化的形式捕获代码弃用。反过来，这使我们能够开发一个IDE插件，它可以自动检测实时代码库中不推荐的库用法，甚至建议修复。我们在摩根大通的30多个内部项目中评估了我们的系统。

引用次数: 2

Record and Replay of Online Traffic for Microservices with Automatic Mocking Point Identification 基于自动模拟点识别的微服务在线流量记录和重放

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3513029

Jiangchao Liu, Jierui Liu, Peng Di, A. Liu, Zexin Zhong

Using recorded online traffic for the regression testing of web applications has become a common practice in industry. However, this “record and replay” on microservices is challenging because simply recorded online traffic (i.e., values for variables or input/output for function calls) often cannot be successfully replayed because microservices often have various dependencies on the complicated online environment. These dependencies include the states of underlying systems, internal states (e.g., caches), and external states (e.g., interaction with other microservices/middleware). Considering the large size and the complexity of industrial microservices, an automatic, scalable, and precise identification of such dependencies is needed as manual identification is time-consuming. In this paper, we propose an industrial grade solution to identifying all dependencies, and generating mocking points automatically using static program analysis techniques. Our solution has been deployed in a large Internet company (i.e., Ant Group) to handle hundreds of microservices, which consists of hundreds of millions lines of code, with high success rate in replay (99% on average). Moreover, our framework can boost the efficiency of the testing system by refining dependencies that must not affect the behavior of a microservice. Our experimental results show that our approach can filter out 73.1% system state dependency and 71.4% internal state dependency, which have no effect on the behavior of the microservice.

使用记录的在线流量进行web应用程序的回归测试已经成为工业界的一种常见做法。然而，微服务上的这种“记录和重播”是具有挑战性的，因为简单地记录在线流量(即变量值或函数调用的输入/输出)通常不能成功地重播，因为微服务通常对复杂的在线环境有各种依赖。这些依赖包括底层系统的状态、内部状态(例如，缓存)和外部状态(例如，与其他微服务/中间件的交互)。考虑到工业微服务的庞大规模和复杂性，需要对这些依赖项进行自动、可扩展和精确的识别，因为手动识别非常耗时。在本文中，我们提出了一个工业级的解决方案来识别所有依赖关系，并使用静态程序分析技术自动生成模拟点。我们的解决方案已经部署在一家大型互联网公司(例如蚂蚁集团)中，以处理数百个微服务，这些微服务由数亿行代码组成，具有很高的重放成功率(平均99%)。此外，我们的框架可以通过细化不影响微服务行为的依赖关系来提高测试系统的效率。实验结果表明，该方法可以过滤掉73.1%的系统状态依赖和71.4%的内部状态依赖，对微服务的行为没有影响。

{"title":"Record and Replay of Online Traffic for Microservices with Automatic Mocking Point Identification","authors":"Jiangchao Liu, Jierui Liu, Peng Di, A. Liu, Zexin Zhong","doi":"10.1145/3510457.3513029","DOIUrl":"https://doi.org/10.1145/3510457.3513029","url":null,"abstract":"Using recorded online traffic for the regression testing of web applications has become a common practice in industry. However, this “record and replay” on microservices is challenging because simply recorded online traffic (i.e., values for variables or input/output for function calls) often cannot be successfully replayed because microservices often have various dependencies on the complicated online environment. These dependencies include the states of underlying systems, internal states (e.g., caches), and external states (e.g., interaction with other microservices/middleware). Considering the large size and the complexity of industrial microservices, an automatic, scalable, and precise identification of such dependencies is needed as manual identification is time-consuming. In this paper, we propose an industrial grade solution to identifying all dependencies, and generating mocking points automatically using static program analysis techniques. Our solution has been deployed in a large Internet company (i.e., Ant Group) to handle hundreds of microservices, which consists of hundreds of millions lines of code, with high success rate in replay (99% on average). Moreover, our framework can boost the efficiency of the testing system by refining dependencies that must not affect the behavior of a microservice. Our experimental results show that our approach can filter out 73.1% system state dependency and 71.4% internal state dependency, which have no effect on the behavior of the microservice.","PeriodicalId":119790,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"407 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122778373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Software Impact Analysis Tool based on Change History Learning and its Evaluation 基于变更历史学习的软件影响分析工具及其评价

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3519017

H. Iwasaki, Tsuyoshi Nakajima, Ryota Tsukamoto, Kazuko Takahashi, Shuichi Tokumoto

Software change impact analysis plays an important role in controlling software evolution in the maintenance of continuous software development. We developed a tool for change impact analysis, which machine-learns change histories and directly outputs candidates of the components to be modified for a change request. We applied the tool to real project data to evaluate it with two metrics: coverage range ratio and accuracy in the coverage range. The results show that it works well for software projects having many change histories for one source code base.

软件变更影响分析在软件持续开发的维护中，对控制软件演进起着重要的作用。我们开发了一个用于变更影响分析的工具，该工具可以通过机器学习变更历史，并直接输出要为变更请求修改的候选组件。我们将该工具应用到实际的项目数据中，用两个度量来评估它:覆盖范围比率和覆盖范围中的准确性。结果表明，对于一个源代码库具有许多变更历史的软件项目，它可以很好地工作。

引用次数: 1

Automatically Identifying Shared Root Causes of Test Breakages in SAP HANA 自动识别SAP HANA测试中断的共享根本原因

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3513051

Gabin An, Juyeon Yoon, Jeongju Sohn, Jingun Hong, Dongwon Hwang, Shin Yoo

Continuous Integration (CI) of a largescale software system such as SAP HANA can produce a non-trivial number of test breakages. Each breakage that newly occurs from daily runs needs to be manually inspected, triaged, and eventually assigned to developers for debugging. However, not all new breakages are unique, as some test breakages would share the same root cause; in addition, human errors can produce duplicate bug tickets for the same root cause. An automated identification of breakages with shared root causes will be able to significantly reduce the cost of the (typically manual) post-breakage steps. This paper investigates multiple similarity functions between test breakages to assist and automate the identification of test breakages that are caused by the same root cause. We consider multiple information sources, such as static (i.e., the code itself), historical (i.e., whether the test results have changed in a similar way in the past), as well as dynamic (i.e., whether the coverage of test cases are similar to each other), for the purpose of such automation. We evaluate a total of 27 individual similarity functions, using realworld CI data of SAP HANA from a six-month period. Further, using these individual similarity functions as in-put features, we construct a classification model that can predict whether two test breakages share the same root cause or not. When trained using ground truth labels extracted from the issue tracker of SAP HANA, our model achieves an F1 score of 0.743 when evaluated using a set of unseen test breakages collected over three months. Our results show that a classification model based on test similarity functions can successfully support the bug triage stage of a CI pipeline.

大型软件系统(如SAP HANA)的持续集成(CI)可能会产生大量的测试中断。每天运行中新出现的每个损坏都需要手工检查、分类，并最终分配给开发人员进行调试。然而，并不是所有的新中断都是唯一的，因为一些测试中断会共享相同的根本原因;此外，人为错误可能会为相同的根本原因产生重复的错误票据。具有共享的根本原因的破损的自动识别将能够显著地减少破损后步骤的成本(通常是手动的)。本文研究了测试中断之间的多个相似函数，以帮助和自动识别由同一根本原因引起的测试中断。我们考虑多个信息源，例如静态的(例如，代码本身)，历史的(例如，测试结果是否在过去以类似的方式改变)，以及动态的(例如，测试用例的覆盖是否彼此相似)，为了实现这种自动化的目的。我们使用SAP HANA六个月的真实CI数据，总共评估了27个单独的相似性函数。此外，使用这些单个相似函数作为输入特征，我们构建了一个分类模型，该模型可以预测两个测试中断是否具有相同的根本原因。当使用从SAP HANA的问题跟踪器中提取的真实标签进行训练时，当使用三个月内收集的一组未见过的测试中断进行评估时，我们的模型获得了0.743的F1分数。我们的研究结果表明，基于测试相似度函数的分类模型可以成功地支持CI管道的错误分类阶段。

{"title":"Automatically Identifying Shared Root Causes of Test Breakages in SAP HANA","authors":"Gabin An, Juyeon Yoon, Jeongju Sohn, Jingun Hong, Dongwon Hwang, Shin Yoo","doi":"10.1145/3510457.3513051","DOIUrl":"https://doi.org/10.1145/3510457.3513051","url":null,"abstract":"Continuous Integration (CI) of a largescale software system such as SAP HANA can produce a non-trivial number of test breakages. Each breakage that newly occurs from daily runs needs to be manually inspected, triaged, and eventually assigned to developers for debugging. However, not all new breakages are unique, as some test breakages would share the same root cause; in addition, human errors can produce duplicate bug tickets for the same root cause. An automated identification of breakages with shared root causes will be able to significantly reduce the cost of the (typically manual) post-breakage steps. This paper investigates multiple similarity functions between test breakages to assist and automate the identification of test breakages that are caused by the same root cause. We consider multiple information sources, such as static (i.e., the code itself), historical (i.e., whether the test results have changed in a similar way in the past), as well as dynamic (i.e., whether the coverage of test cases are similar to each other), for the purpose of such automation. We evaluate a total of 27 individual similarity functions, using realworld CI data of SAP HANA from a six-month period. Further, using these individual similarity functions as in-put features, we construct a classification model that can predict whether two test breakages share the same root cause or not. When trained using ground truth labels extracted from the issue tracker of SAP HANA, our model achieves an F1 score of 0.743 when evaluated using a set of unseen test breakages collected over three months. Our results show that a classification model based on test similarity functions can successfully support the bug triage stage of a CI pipeline.","PeriodicalId":119790,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134107195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Unreliable Test Infrastructures in Automotive Testing Setups 汽车测试装置中不可靠的测试基础设施

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

Pub Date : 2022-05-01 DOI: 10.1145/3510457.3513069

Claudius V. Jordan, P. Foth, A. Pretschner, Matthias Fruth

During system testing of automotive electrical control units various reasons can lead to invalid test failures, e.g., non-responding components, faulty simulation models, faulty test case implementations, or hardware or software misconfigurations. To determine whether a test failure is invalid and what the underlying cause was, the test executions have to be analyzed manually, which is tedious and therefore costly. In this work, we report the magnitude of the problem of invalid test failures with four system testing projects from the automotive domain. We find that up to 91% of failed test executions are considered invalid. An oftentimes overlooked challenge are unreliable test infrastructures which deteriorate the validity of the test runs. In the studied projects already between 27% and 53% of failed test executions are linked to unreliable test infrastructures.

在汽车电气控制单元的系统测试过程中，各种原因可能导致无效的测试失败，例如，无响应的组件，错误的仿真模型，错误的测试用例实现，或硬件或软件配置错误。为了确定测试失败是否无效以及潜在的原因是什么，必须手动分析测试执行，这是乏味的，因此成本很高。在这项工作中，我们报告了来自汽车领域的四个系统测试项目的无效测试失败问题的严重性。我们发现高达91%的失败测试执行被认为是无效的。一个经常被忽视的挑战是不可靠的测试基础结构，它会降低测试运行的有效性。在研究的项目中，27%到53%的失败测试执行与不可靠的测试基础设施有关。

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀