2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)最新文献

英文中文

Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We? 机器学习库的自动单元测试生成:我们走了多远?

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00138

Song Wang, Nishtha Shrestha, Abarna Kucheri Subburaman, Junjie Wang, Moshi Wei, Nachiappan Nagappan

Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.

探索输入空间并为给定程序生成有效测试用例的自动单元测试生成已经研究了几十年。许多单元测试生成工具可以帮助生成在程序上具有高结构覆盖率的单元测试用例。然而，现有的测试生成工具主要在通用软件程序上进行评估，这一事实引发了对机器学习库的实际有效性和有用性的质疑，机器学习库是面向统计的，并且与通用软件项目具有根本不同的性质和结构。在本文中，我们着手研究现有单元测试生成技术在机器学习库上的有效性。为了研究这个问题，我们对五个广泛使用的机器学习库进行了实证研究，这些库使用了两个流行的单元测试用例生成工具，即EVOSUITE和Randoop。我们发现(1)大多数机器学习库没有维护一个高质量的单元测试套件，关于常用的质量指标，如代码覆盖率(平均为34.1%)和突变分数(平均为21.3%)，(2)单元测试用例生成工具，即EVOSUITE和Randoop，导致代码覆盖率和突变分数的明显改善，但是，改进是有限的。(3)在五个机器学习库的未覆盖代码中存在通用模式，可用于改进单元测试用例生成任务。

{"title":"Automatic Unit Test Generation for Machine Learning Libraries: How Far Are We?","authors":"Song Wang, Nishtha Shrestha, Abarna Kucheri Subburaman, Junjie Wang, Moshi Wei, Nachiappan Nagappan","doi":"10.1109/ICSE43902.2021.00138","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00138","url":null,"abstract":"Automatic unit test generation that explores the input space and produces effective test cases for given programs have been studied for decades. Many unit test generation tools that can help generate unit test cases with high structural coverage over a program have been examined. However, the fact that existing test generation tools are mainly evaluated on general software programs calls into question about its practical effectiveness and usefulness for machine learning libraries, which are statistically orientated and have fundamentally different nature and construction from general software projects. In this paper, we set out to investigate the effectiveness of existing unit test generation techniques on machine learning libraries. To investigate this issue, we conducted an empirical study on five widely used machine learning libraries with two popular unit testcase generation tools, i.e., EVOSUITE and Randoop. We find that (1) most of the machine learning libraries do not maintain a high-quality unit test suite regarding commonly applied quality metrics such as code coverage (on average is 34.1%) and mutation score (on average is 21.3%), (2) unit test case generation tools, i.e., EVOSUITE and Randoop, lead to clear improvements in code coverage and mutation score, however, the improvement is limited, and (3) there exist common patterns in the uncovered code across the five machine learning libraries that can be used to improve unit test case generation tasks.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134524495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Domain-Specific Fixes for Flaky Tests with Wrong Assumptions on Underdetermined Specifications 针对在未确定规格上有错误假设的片状测试的特定领域修复

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00018

Peilun Zhang, Yanjie Jiang, Anjiang Wei, V. Stodden, D. Marinov, A. Shi

Library developers can provide classes and methods with underdetermined specifications that allow flexibility in future implementations. Library users may write code that relies on a specific implementation rather than on the specification, e.g., assuming mistakenly that the order of elements cannot change in the future. Prior work proposed the NonDex approach that detects such wrong assumptions. We present a novel approach, called DexFix, to repair wrong assumptions on underdetermined specifications in an automated way. We run the NonDex tool on 200 open-source Java projects and detect 275 tests that fail due to wrong assumptions. The majority of failures are from iterating over HashMap/HashSet collections and the getDeclaredFields method. We provide several new repair strategies that can fix these violations in both the test code and the main code. DexFix proposes fixes for 119 tests from the detected 275 tests. We have already reported fixes for 102 tests as GitHub pull requests: 74 have been merged, with only 5 rejected, and the remaining pending.

库开发人员可以提供具有未确定规范的类和方法，以便在将来的实现中具有灵活性。库用户可能会编写依赖于特定实现而不是规范的代码，例如，错误地假设元素的顺序在将来不会改变。先前的工作提出了NonDex方法来检测这些错误的假设。我们提出了一种新的方法，称为DexFix，以自动化的方式修复对未确定规格的错误假设。我们在200个开源Java项目上运行NonDex工具，并检测到275个由于错误假设而失败的测试。大多数失败来自于迭代HashMap/HashSet集合和getDeclaredFields方法。我们提供了几种新的修复策略，可以在测试代码和主代码中修复这些违反。DexFix对检测到的275个测试中的119个测试提出修复建议。我们已经将102个测试的修复报告为GitHub拉取请求:74个已经合并，只有5个被拒绝，其余的待定。

{"title":"Domain-Specific Fixes for Flaky Tests with Wrong Assumptions on Underdetermined Specifications","authors":"Peilun Zhang, Yanjie Jiang, Anjiang Wei, V. Stodden, D. Marinov, A. Shi","doi":"10.1109/ICSE43902.2021.00018","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00018","url":null,"abstract":"Library developers can provide classes and methods with underdetermined specifications that allow flexibility in future implementations. Library users may write code that relies on a specific implementation rather than on the specification, e.g., assuming mistakenly that the order of elements cannot change in the future. Prior work proposed the NonDex approach that detects such wrong assumptions. We present a novel approach, called DexFix, to repair wrong assumptions on underdetermined specifications in an automated way. We run the NonDex tool on 200 open-source Java projects and detect 275 tests that fail due to wrong assumptions. The majority of failures are from iterating over HashMap/HashSet collections and the getDeclaredFields method. We provide several new repair strategies that can fix these violations in both the test code and the main code. DexFix proposes fixes for 119 tests from the detected 275 tests. We have already reported fixes for 102 tests as GitHub pull requests: 74 have been merged, with only 5 rejected, and the remaining pending.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115242013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Studying Test Annotation Maintenance in the Wild 研究野外测试注释维护

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00019

Dong Jae Kim, Nikolaos Tsantalis, T. Chen, Jinqiu Yang

Since the introduction of annotations in Java 5, the majority of testing frameworks, such as JUnit, TestNG, and Mockito, have adopted annotations in their core design. This adoption affected the testing practices in every step of the test life-cycle, from fixture setup and test execution to fixture teardown. Despite the importance of test annotations, most research on test maintenance has mainly focused on test code quality and test assertions. As a result, there is little empirical evidence on the evolution and maintenance of test annotations. To fill this gap, we perform the first fine-grained empirical study on annotation changes. We developed a tool to mine 82,810 commits and detect 23,936 instances of test annotation changes from 12 open-source Java projects. Our main findings are: (1) Test annotation changes are more frequent than rename and type change refactorings. (2) We recover various migration efforts within the same testing framework or between different frameworks by analyzing common annotation replacement patterns. (3) We create a taxonomy by manually inspecting and classifying a sample of 368 test annotation changes and documenting the motivations driving these changes. Finally, we present a list of actionable implications for developers, researchers, and framework designers.

自从在Java 5中引入注释以来，大多数测试框架(如JUnit、TestNG和Mockito)都在其核心设计中采用了注释。这种采用影响了测试生命周期的每个步骤中的测试实践，从夹具设置和测试执行到夹具拆卸。尽管测试注释很重要，但大多数关于测试维护的研究主要集中在测试代码质量和测试断言上。因此，很少有关于测试注释的发展和维护的经验证据。为了填补这一空白，我们对注释变化进行了第一次细粒度的实证研究。我们开发了一个工具来挖掘82,810个提交，并检测来自12个开源Java项目的23,936个测试注释更改实例。我们的主要发现是:(1)测试注释的更改比重命名和类型更改的重构更频繁。(2)我们通过分析常见的注释替换模式来恢复同一测试框架内或不同框架之间的各种迁移工作。(3)我们通过手工检查和分类368个测试注释更改的样本并记录驱动这些更改的动机来创建分类法。最后，我们为开发人员、研究人员和框架设计人员提供了一个可操作的含义列表。

{"title":"Studying Test Annotation Maintenance in the Wild","authors":"Dong Jae Kim, Nikolaos Tsantalis, T. Chen, Jinqiu Yang","doi":"10.1109/ICSE43902.2021.00019","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00019","url":null,"abstract":"Since the introduction of annotations in Java 5, the majority of testing frameworks, such as JUnit, TestNG, and Mockito, have adopted annotations in their core design. This adoption affected the testing practices in every step of the test life-cycle, from fixture setup and test execution to fixture teardown. Despite the importance of test annotations, most research on test maintenance has mainly focused on test code quality and test assertions. As a result, there is little empirical evidence on the evolution and maintenance of test annotations. To fill this gap, we perform the first fine-grained empirical study on annotation changes. We developed a tool to mine 82,810 commits and detect 23,936 instances of test annotation changes from 12 open-source Java projects. Our main findings are: (1) Test annotation changes are more frequent than rename and type change refactorings. (2) We recover various migration efforts within the same testing framework or between different frameworks by analyzing common annotation replacement patterns. (3) We create a taxonomy by manually inspecting and classifying a sample of 368 test annotation changes and documenting the motivations driving these changes. Finally, we present a list of actionable implications for developers, researchers, and framework designers.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"49 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115861014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Are Machine Learning Cloud APIs Used Correctly? 机器学习云api的使用是否正确?

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00024

Chengcheng Wan, Shicheng Liu, H. Hoffmann, M. Maire, Shan Lu

Machine learning (ML) cloud APIs enable developers to easily incorporate learning solutions into software systems. Unfortunately, ML APIs are challenging to use correctly and efficiently, given their unique semantics, data requirements, and accuracy-performance tradeoffs. Much prior work has studied how to develop ML APIs or ML cloud services, but not how open-source applications are using ML APIs. In this paper, we manually studied 360 representative open-source applications that use Google or AWS cloud-based ML APIs, and found 70% of these applications contain API misuses in their latest versions that degrade functional, performance, or economical quality of the software. We have generalized 8 anti-patterns based on our manual study and developed automated checkers that identify hundreds of more applications that contain ML API misuses.

机器学习(ML)云api使开发人员能够轻松地将学习解决方案整合到软件系统中。不幸的是，考虑到ML api独特的语义、数据需求和准确性与性能之间的权衡，要正确有效地使用ML api是一项挑战。许多先前的工作研究了如何开发ML api或ML云服务，但没有研究开源应用程序如何使用ML api。在本文中，我们人工研究了360个使用Google或AWS基于云的ML API的代表性开源应用程序，发现70%的应用程序在其最新版本中包含API误用，从而降低了软件的功能、性能或经济质量。基于我们的人工研究，我们概括了8种反模式，并开发了自动检查器，以识别数百个包含ML API滥用的应用程序。

引用次数: 25

IMGDroid: Detecting Image Loading Defects in Android Applications IMGDroid:检测图像加载缺陷在Android应用程序

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00080

Wei Song, Mengqi Han, Jeff Huang

Images are essential for many Android applications or apps. Although images play a critical role in app functionalities and user experience, inefficient or improper image loading and displaying operations may severely impact the app performance and quality. Additionally, since these image loading defects may not be manifested by immediate failures, e.g., app crashes, existing GUI testing approaches cannot detect them effectively. In this paper, we identify five anti-patterns of such image loading defects, including image passing by intent, image decoding without resizing, local image loading without permission, repeated decoding without caching, and image decoding in UI thread. Based on these anti-patterns, we propose a static analysis technique, IMGDroid, to automatically and effectively detect such defects. We have applied IMGDroid to a benchmark of 21 open-source Android apps, and found that it not only successfully detects the 45 previously-known image loading defects but also finds 15 new such defects. Our empirical study on 1,000 commercial Android apps demonstrates that the image loading defects are prevalent.

对于许多Android应用程序或应用程序来说，图像是必不可少的。虽然图片在应用程序的功能和用户体验中起着至关重要的作用，但低效或不当的图片加载和显示操作可能会严重影响应用程序的性能和质量。此外，由于这些图像加载缺陷可能不会立即出现故障，例如应用程序崩溃，现有的GUI测试方法无法有效地检测到它们。在本文中，我们识别了五种图像加载缺陷的反模式，包括故意传递图像、不调整大小的图像解码、未经许可的本地图像加载、不缓存的重复解码和UI线程中的图像解码。基于这些反模式，我们提出了一种静态分析技术——IMGDroid来自动有效地检测这些缺陷。我们将IMGDroid应用于21个开源Android应用程序的基准测试，发现它不仅成功检测到45个已知的图像加载缺陷，而且还发现了15个新的此类缺陷。我们对1000个商业Android应用的实证研究表明，图像加载缺陷普遍存在。

{"title":"IMGDroid: Detecting Image Loading Defects in Android Applications","authors":"Wei Song, Mengqi Han, Jeff Huang","doi":"10.1109/ICSE43902.2021.00080","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00080","url":null,"abstract":"Images are essential for many Android applications or apps. Although images play a critical role in app functionalities and user experience, inefficient or improper image loading and displaying operations may severely impact the app performance and quality. Additionally, since these image loading defects may not be manifested by immediate failures, e.g., app crashes, existing GUI testing approaches cannot detect them effectively. In this paper, we identify five anti-patterns of such image loading defects, including image passing by intent, image decoding without resizing, local image loading without permission, repeated decoding without caching, and image decoding in UI thread. Based on these anti-patterns, we propose a static analysis technique, IMGDroid, to automatically and effectively detect such defects. We have applied IMGDroid to a benchmark of 21 open-source Android apps, and found that it not only successfully detects the 45 previously-known image loading defects but also finds 15 new such defects. Our empirical study on 1,000 commercial Android apps demonstrates that the image loading defects are prevalent.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115374625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Input Algebras 输入代数

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00070

Rahul Gopinath, Hamed Nemati, A. Zeller

Grammar-based test generators are highly efficient in producing syntactically valid test inputs, and give their user precise control over which test inputs should be generated. Adapting a grammar or a test generator towards a particular testing goal can be tedious, though. We introduce the concept of a grammar transformer, specializing a grammar towards inclusion or exclusion of specific patterns: "The phone number must not start with 011 or +1". To the best of our knowledge, ours is the first approach to allow for arbitrary Boolean combinations of patterns, giving testers unprecedented flexibility in creating targeted software tests. The resulting specialized grammars can be used with any grammar-based fuzzer for targeted test generation, but also as validators to check whether the given specialization is met or not, opening up additional usage scenarios. In our evaluation on real-world bugs, we show that specialized grammars are accurate both in producing and validating targeted inputs.

基于语法的测试生成器在生成语法上有效的测试输入方面非常有效，并且使用户可以精确地控制应该生成哪些测试输入。但是，为特定的测试目标调整语法或测试生成器可能很乏味。我们引入语法转换器的概念，专门化语法以包含或排除特定模式:“电话号码不能以011或+1开头”。据我们所知，我们的方法是第一个允许任意布尔模式组合的方法，在创建目标软件测试时为测试人员提供了前所未有的灵活性。生成的专门化语法可以与任何基于语法的模糊器一起用于目标测试生成，但也可以作为验证器来检查给定的专门化是否得到满足，从而打开额外的使用场景。在我们对现实世界bug的评估中，我们展示了专门的语法在生成和验证目标输入时都是准确的。

引用次数: 3

Abacus: Precise Side-Channel Analysis 算盘:精确的侧通道分析

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00078

Qinkun Bao, Zihao Wang, Xiaoting Li, J. Larus, Dinghao Wu

Side-channel attacks allow adversaries to infer sensitive information from non-functional characteristics. Prior side-channel detection work is able to identify numerous potential vulnerabilities. However, in practice, many such vulnerabilities leak a negligible amount of sensitive information, and thus developers are often reluctant to address them. Existing tools do not provide information to evaluate a leak's severity, such as the number of leaked bits. To address this issue, we propose a new program analysis method to precisely quantify the leaked information in a single-trace attack through side-channels. It can identify covert information flows in programs that expose confidential information and can reason about security flaws that would otherwise be difficult, if not impossible, for a developer to find. We model an attacker's observation of each leakage site as a constraint. We use symbolic execution to generate these constraints and then run Monte Carlo sampling to estimate the number of leaked bits for each leakage site. By applying the Central Limit Theorem, we provide an error bound for these estimations. We have implemented the technique in a tool called Abacus, which not only finds very fine-grained side-channel vulnerabilities but also estimates how many bits are leaked. Abacus outperforms existing dynamic side-channel detection tools in performance and accuracy. We evaluate Abacus on OpenSSL, mbedTLS, Libgcrypt, and Monocypher. Our results demonstrate that most reported vulnerabilities are difficult to exploit in practice and should be de-prioritized by developers. We also find several sensitive vulnerabilities that are missed by the existing tools. We confirm those vulnerabilities with manual checks and by contacting the developers.

侧信道攻击允许攻击者从非功能特征中推断敏感信息。先前的侧信道检测工作能够识别出许多潜在的漏洞。然而，在实践中，许多这样的漏洞泄露了微不足道的敏感信息，因此开发人员通常不愿意解决它们。现有的工具不提供评估泄漏严重性的信息，比如泄漏位的数量。为了解决这一问题，我们提出了一种新的程序分析方法，以精确量化单道攻击中通过侧信道泄露的信息。它可以识别暴露机密信息的程序中的隐蔽信息流，并且可以推断出开发人员很难(如果不是不可能的话)发现的安全缺陷。我们将攻击者对每个泄漏点的观察建模为约束。我们使用符号执行来生成这些约束，然后运行蒙特卡罗采样来估计每个泄漏点的泄漏比特数。通过应用中心极限定理，我们给出了这些估计的误差界。我们已经在一个名为Abacus的工具中实现了该技术，该工具不仅可以发现非常细粒度的侧通道漏洞，还可以估计泄漏的比特数。Abacus在性能和精度上优于现有的动态侧通道检测工具。我们在OpenSSL, mbedTLS, libcrypt和Monocypher上评估Abacus。我们的研究结果表明，大多数报告的漏洞在实践中很难利用，开发人员应该降低优先级。我们还发现了一些被现有工具遗漏的敏感漏洞。我们通过手动检查和联系开发人员来确认这些漏洞。

{"title":"Abacus: Precise Side-Channel Analysis","authors":"Qinkun Bao, Zihao Wang, Xiaoting Li, J. Larus, Dinghao Wu","doi":"10.1109/ICSE43902.2021.00078","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00078","url":null,"abstract":"Side-channel attacks allow adversaries to infer sensitive information from non-functional characteristics. Prior side-channel detection work is able to identify numerous potential vulnerabilities. However, in practice, many such vulnerabilities leak a negligible amount of sensitive information, and thus developers are often reluctant to address them. Existing tools do not provide information to evaluate a leak's severity, such as the number of leaked bits. To address this issue, we propose a new program analysis method to precisely quantify the leaked information in a single-trace attack through side-channels. It can identify covert information flows in programs that expose confidential information and can reason about security flaws that would otherwise be difficult, if not impossible, for a developer to find. We model an attacker's observation of each leakage site as a constraint. We use symbolic execution to generate these constraints and then run Monte Carlo sampling to estimate the number of leaked bits for each leakage site. By applying the Central Limit Theorem, we provide an error bound for these estimations. We have implemented the technique in a tool called Abacus, which not only finds very fine-grained side-channel vulnerabilities but also estimates how many bits are leaked. Abacus outperforms existing dynamic side-channel detection tools in performance and accuracy. We evaluate Abacus on OpenSSL, mbedTLS, Libgcrypt, and Monocypher. Our results demonstrate that most reported vulnerabilities are difficult to exploit in practice and should be de-prioritized by developers. We also find several sensitive vulnerabilities that are missed by the existing tools. We confirm those vulnerabilities with manual checks and by contacting the developers.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131117761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Synthesizing Object State Transformers for Dynamic Software Updates 动态软件更新的综合对象状态转换器

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00103

Ze-Yi Zhao, Yanyan Jiang, Chang Xu, Tianxiao Gu, Xiaoxing Ma

There is an increasing demand for evolving software systems to deliver continuous services of no restart. Dynamic software update (DSU) aims to achieve this goal by patching the system state on the fly but is currently hindered from practice due to non-trivial cross-version object state transformations. This paper revisits this problem through an in-depth empirical study of over 190 class changes from Tomcat 8. The study produced an important finding that most non-trivial object state transformers can be constructed by reassembling existing old/new version code snippets. This paper presents a domain-specific language and an efficient algorithm for synthesizing non-trivial object transformers over code reuse. We experimentally evaluated our tool implementation PASTA with real-world software systems, reporting PASTA's effectiveness in succeeding in 7.5X non-trivial object transformation tasks compared with the best existing DSU techniques.

对不断发展的软件系统的需求不断增加，以提供无需重新启动的连续服务。动态软件更新(DSU)旨在通过动态修补系统状态来实现这一目标，但目前由于非平凡的跨版本对象状态转换而阻碍了实践。本文通过对Tomcat 8中190多个类的变化进行深入的实证研究，重新审视了这个问题。该研究产生了一个重要的发现，即大多数重要的对象状态转换器可以通过重新组装现有的旧/新版本代码片段来构建。本文提出了一种领域特定语言和一种有效的算法，用于在代码重用过程中综合非平凡对象转换。我们用现实世界的软件系统实验地评估了我们的工具实现PASTA，报告了PASTA与现有最佳DSU技术相比，在7.5倍的重要对象转换任务中成功的有效性。

引用次数: 6

Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation 基于概率标签估计的半监督日志异常检测

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00130

Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, Wenbin Zhang

With the growth of software systems, logs have become an important data to aid system maintenance. Log-based anomaly detection is one of the most important methods for such purpose, which aims to automatically detect system anomalies via log analysis. However, existing log-based anomaly detection approaches still suffer from practical issues due to either depending on a large amount of manually labeled training data (supervised approaches) or unsatisfactory performance without learning the knowledge on historical anomalies (unsupervised and semi-supervised approaches). In this paper, we propose a novel practical log-based anomaly detection approach, PLELog, which is semi-supervised to get rid of time-consuming manual labeling and incorporates the knowledge on historical anomalies via probabilistic label estimation to bring supervised approaches' superiority into play. In addition, PLELog is able to stay immune to unstable log data via semantic embedding and detect anomalies efficiently and effectively by designing an attention-based GRU neural network. We evaluated PLELog on two most widely-used public datasets, and the results demonstrate the effectiveness of PLELog, significantly outperforming the compared approaches with an average of 181.6% improvement in terms of F1-score. In particular, PLELog has been applied to two real-world systems from our university and a large corporation, further demonstrating its practicability

随着软件系统的发展，日志已经成为辅助系统维护的重要数据。基于日志的异常检测是其中最重要的方法之一，它旨在通过日志分析自动检测系统异常。然而，现有的基于日志的异常检测方法仍然存在一些实际问题，要么依赖于大量人工标记的训练数据(监督方法)，要么没有学习历史异常的知识(无监督和半监督方法)，性能不理想。在本文中，我们提出了一种新的实用的基于日志的异常检测方法——PLELog，该方法是半监督的，以摆脱耗时的人工标记，并通过概率标签估计结合历史异常的知识，以发挥监督方法的优势。此外，PLELog能够通过语义嵌入对不稳定的日志数据保持免疫，并通过设计基于注意力的GRU神经网络高效地检测异常。我们在两个最广泛使用的公共数据集上对PLELog进行了评估，结果证明了PLELog的有效性，在f1得分方面平均提高了181.6%。特别地，PLELog应用于我校和一家大型企业的两个实际系统，进一步证明了它的实用性

{"title":"Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation","authors":"Lin Yang, Junjie Chen, Zan Wang, Weijing Wang, Jiajun Jiang, Xuyuan Dong, Wenbin Zhang","doi":"10.1109/ICSE43902.2021.00130","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00130","url":null,"abstract":"With the growth of software systems, logs have become an important data to aid system maintenance. Log-based anomaly detection is one of the most important methods for such purpose, which aims to automatically detect system anomalies via log analysis. However, existing log-based anomaly detection approaches still suffer from practical issues due to either depending on a large amount of manually labeled training data (supervised approaches) or unsatisfactory performance without learning the knowledge on historical anomalies (unsupervised and semi-supervised approaches). In this paper, we propose a novel practical log-based anomaly detection approach, PLELog, which is semi-supervised to get rid of time-consuming manual labeling and incorporates the knowledge on historical anomalies via probabilistic label estimation to bring supervised approaches' superiority into play. In addition, PLELog is able to stay immune to unstable log data via semantic embedding and detect anomalies efficiently and effectively by designing an attention-based GRU neural network. We evaluated PLELog on two most widely-used public datasets, and the results demonstrate the effectiveness of PLELog, significantly outperforming the compared approaches with an average of 181.6% improvement in terms of F1-score. In particular, PLELog has been applied to two real-world systems from our university and a large corporation, further demonstrating its practicability","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125196900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 71

Evaluating Unit Testing Practices in R Packages 评估R包中的单元测试实践

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00136

M. Vidoni

Testing Technical Debt (TTD) occurs due to shortcuts (non-optimal decisions) taken about testing; it is the test dimension of technical debt. R is a package-based programming ecosystem that provides an easy way to install third-party code, datasets, tests, documentation and examples. This structure makes it especially vulnerable to TTD because errors present in a package can transitively affect all packages and scripts that depend on it. Thus, TTD can effectively become a threat to the validity of all analysis written in R that rely on potentially faulty code. This two-part study provides the first analysis in this area. First, 177 systematically-selected, open-source R packages were mined and analysed to address quality of testing, testing goals, and identify potential TTD sources. Second, a survey addressed how R package developers perceive testing and face its challenges (response rate of 19.4%). Results show that testing in R packages is of low quality; the most common smells are inadequate and obscure unit testing, improper asserts, inexperienced testers and improper test design. Furthermore, skilled R developers still face challenges such as time constraints, emphasis on development rather than testing, poor tool documentation and a steep learning curve.

测试技术债务(TTD)是由于在测试过程中采取的捷径(非最佳决策)而产生的;这是技术债务的测试维度。R是一个基于包的编程生态系统，它提供了一种简单的方式来安装第三方代码、数据集、测试、文档和示例。这种结构使得它特别容易受到TTD的攻击，因为包中出现的错误可以传递地影响依赖于它的所有包和脚本。因此，TTD可以有效地威胁到所有用R编写的、依赖于潜在错误代码的分析的有效性。本研究分为两部分，首次对这一领域进行了分析。首先，对177个系统选择的开源R包进行了挖掘和分析，以解决测试质量、测试目标和识别潜在的TTD来源。其次，一项关于R包开发人员如何看待测试和面对挑战的调查(回复率为19.4%)。结果表明，在R包中测试的质量很低;最常见的气味是不充分和模糊的单元测试、不适当的断言、没有经验的测试人员和不适当的测试设计。此外，熟练的R开发人员仍然面临着诸如时间限制、强调开发而不是测试、糟糕的工具文档和陡峭的学习曲线等挑战。

{"title":"Evaluating Unit Testing Practices in R Packages","authors":"M. Vidoni","doi":"10.1109/ICSE43902.2021.00136","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00136","url":null,"abstract":"Testing Technical Debt (TTD) occurs due to shortcuts (non-optimal decisions) taken about testing; it is the test dimension of technical debt. R is a package-based programming ecosystem that provides an easy way to install third-party code, datasets, tests, documentation and examples. This structure makes it especially vulnerable to TTD because errors present in a package can transitively affect all packages and scripts that depend on it. Thus, TTD can effectively become a threat to the validity of all analysis written in R that rely on potentially faulty code. This two-part study provides the first analysis in this area. First, 177 systematically-selected, open-source R packages were mined and analysed to address quality of testing, testing goals, and identify potential TTD sources. Second, a survey addressed how R package developers perceive testing and face its challenges (response rate of 19.4%). Results show that testing in R packages is of low quality; the most common smells are inadequate and obscure unit testing, improper asserts, inexperienced testers and improper test design. Furthermore, skilled R developers still face challenges such as time constraints, emphasis on development rather than testing, poor tool documentation and a steep learning curve.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115264390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀