2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)最新文献

JEST: N+1-Version Differential Testing of Both JavaScript Engines and Specification JEST: JavaScript引擎和规范的N+1版本差异测试

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00015

Jihyeok Park, Seungmin An, Dongjun Youn, Gyeongwon Kim, Sukyoung Ryu

Modern programming follows the continuous integration (CI) and continuous deployment (CD) approach rather than the traditional waterfall model. Even the development of modern programming languages uses the CI/CD approach to swiftly provide new language features and to adapt to new development environments. Unlike in the conventional approach, in the modern CI/CD approach, a language specification is no more the oracle of the language semantics because both the specification and its implementations (interpreters or compilers) can co-evolve. In this setting, both the specification and implementations may have bugs, and guaranteeing their correctness is non-trivial. In this paper, we propose a novel N+1-version differential testing to resolve the problem. Unlike the traditional differential testing, our approach consists of three steps: 1) to automatically synthesize programs guided by the syntax and semantics from a given language specification, 2) to generate conformance tests by injecting assertions to the synthesized programs to check their final program states, 3) to detect bugs in the specification and implementations via executing the conformance tests on multiple implementations, and 4) to localize bugs on the specification using statistical information. We actualize our approach for the JavaScript programming language via JEST, which performs N+1-version differential testing for modern JavaScript engines and ECMAScript, the language specification describing the syntax and semantics of JavaScript in a natural language. We evaluated JEST with four JavaScript engines that support all modern JavaScript language features and the latest version of ECMAScript (ES11, 2020). JEST automatically synthesized 1,700 programs that covered 97.78% of syntax and 87.70% of semantics from ES11. Using the assertion-injected JavaScript programs, it detected 44 engine bugs in four different engines and 27 specification bugs in ES11.

现代编程遵循持续集成(CI)和持续部署(CD)方法，而不是传统的瀑布模型。甚至现代编程语言的开发也使用CI/CD方法来快速提供新的语言特性并适应新的开发环境。与传统方法不同，在现代CI/CD方法中，语言规范不再是语言语义的预言器，因为规范及其实现(解释器或编译器)可以共同发展。在这种情况下，规范和实现都可能有错误，保证它们的正确性是非常重要的。在本文中，我们提出了一个新的N+1版本差分检验来解决这个问题。与传统的差分测试不同，我们的方法包括三个步骤:1)根据给定语言规范的语法和语义自动合成程序;2)通过向合成程序注入断言来生成一致性测试，以检查其最终程序状态;3)通过对多个实现执行一致性测试来检测规范和实现中的错误;4)使用统计信息来定位规范上的错误。我们通过JEST实现了JavaScript编程语言的方法，JEST对现代JavaScript引擎和ECMAScript执行N+1版本差异测试，ECMAScript是用自然语言描述JavaScript语法和语义的语言规范。我们用四个JavaScript引擎来评估JEST，这些引擎支持所有现代JavaScript语言功能和最新版本的ECMAScript (es11,2020)。JEST自动合成了1700个程序，涵盖了ES11中97.78%的语法和87.70%的语义。使用断言注入的JavaScript程序，它在四个不同的引擎中检测到了44个引擎错误，在ES11中检测到了27个规范错误。

{"title":"JEST: N+1-Version Differential Testing of Both JavaScript Engines and Specification","authors":"Jihyeok Park, Seungmin An, Dongjun Youn, Gyeongwon Kim, Sukyoung Ryu","doi":"10.1109/ICSE43902.2021.00015","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00015","url":null,"abstract":"Modern programming follows the continuous integration (CI) and continuous deployment (CD) approach rather than the traditional waterfall model. Even the development of modern programming languages uses the CI/CD approach to swiftly provide new language features and to adapt to new development environments. Unlike in the conventional approach, in the modern CI/CD approach, a language specification is no more the oracle of the language semantics because both the specification and its implementations (interpreters or compilers) can co-evolve. In this setting, both the specification and implementations may have bugs, and guaranteeing their correctness is non-trivial. In this paper, we propose a novel N+1-version differential testing to resolve the problem. Unlike the traditional differential testing, our approach consists of three steps: 1) to automatically synthesize programs guided by the syntax and semantics from a given language specification, 2) to generate conformance tests by injecting assertions to the synthesized programs to check their final program states, 3) to detect bugs in the specification and implementations via executing the conformance tests on multiple implementations, and 4) to localize bugs on the specification using statistical information. We actualize our approach for the JavaScript programming language via JEST, which performs N+1-version differential testing for modern JavaScript engines and ECMAScript, the language specification describing the syntax and semantics of JavaScript in a natural language. We evaluated JEST with four JavaScript engines that support all modern JavaScript language features and the latest version of ECMAScript (ES11, 2020). JEST automatically synthesized 1,700 programs that covered 97.78% of syntax and 87.70% of semantics from ES11. Using the assertion-injected JavaScript programs, it detected 44 engine bugs in four different engines and 27 specification bugs in ES11.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115651035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IoT Bugs and Development Challenges 物联网漏洞和开发挑战

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00051

Amir Makhshari, A. Mesbah

IoT systems are rapidly adopted in various domains, from embedded systems to smart homes. Despite their growing adoption and popularity, there has been no thorough study to understand IoT development challenges from the practitioners' point of view. We provide the first systematic study of bugs and challenges that IoT developers face in practice, through a large-scale empirical investigation. We collected 5,565 bug reports from 91 representative IoT project repositories and categorized a random sample of 323 based on the observed failures, root causes, and the locations of the faulty components. In addition, we conducted nine interviews with IoT experts to uncover more details about IoT bugs and to gain insight into IoT developers' challenges. Lastly, we surveyed 194 IoT developers to validate our findings and gain further insights. We propose the first bug taxonomy for IoT systems based on our results. We highlight frequent bug categories and their root causes, correlations between them, and common pitfalls and challenges that IoT developers face. We recommend future directions for IoT areas that require research and development attention.

物联网系统迅速应用于从嵌入式系统到智能家居的各个领域。尽管它们的采用和普及程度越来越高，但从从业者的角度来看，还没有深入的研究来理解物联网发展的挑战。通过大规模的实证调查，我们首次对物联网开发人员在实践中面临的漏洞和挑战进行了系统研究。我们从91个具有代表性的物联网项目存储库中收集了5565个bug报告，并根据观察到的故障、根本原因和故障组件的位置，随机抽取了323个样本进行分类。此外，我们对物联网专家进行了九次采访，以揭示有关物联网漏洞的更多细节，并深入了解物联网开发人员面临的挑战。最后，我们调查了194个物联网开发人员，以验证我们的发现并获得进一步的见解。根据我们的研究结果，我们提出了物联网系统的第一个漏洞分类。我们强调了常见的错误类别及其根本原因，它们之间的相关性，以及物联网开发人员面临的常见陷阱和挑战。我们为需要研发关注的物联网领域推荐了未来的发展方向。

引用次数: 42

"Ignorance and Prejudice" in Software Fairness 软件公平中的“无知与偏见”

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00129

J Zhang, M. Harman

Machine learning software can be unfair when making human-related decisions, having prejudices over certain groups of people. Existing work primarily focuses on proposing fairness metrics and presenting fairness improvement approaches. It remains unclear how key aspect of any machine learning system, such as feature set and training data, affect fairness. This paper presents results from a comprehensive study that addresses this problem. We find that enlarging the feature set plays a significant role in fairness (with an average effect rate of 38%). Importantly, and contrary to widely-held beliefs that greater fairness often corresponds to lower accuracy, our findings reveal that an enlarged feature set has both higher accuracy and fairness. Perhaps also surprisingly, we find that a larger training data does not help to improve fairness. Our results suggest a larger training data set has more unfairness than a smaller one when feature sets are insufficient; an important cautionary finding for practising software engineers.

机器学习软件在做出与人类相关的决策时可能是不公平的，对某些群体有偏见。现有的工作主要集中在提出公平指标和提出公平改进方法。目前还不清楚任何机器学习系统的关键方面，如特征集和训练数据，是如何影响公平性的。本文介绍了一项针对这一问题的综合研究的结果。我们发现，扩大特征集对公平性有显著作用(平均有效率为38%)。重要的是，与普遍认为的更高的公平性往往对应更低的准确性相反，我们的研究结果表明，扩大的特征集具有更高的准确性和公平性。也许同样令人惊讶的是，我们发现更大的训练数据无助于提高公平性。我们的结果表明，当特征集不足时，较大的训练数据集比较小的训练数据集具有更大的不公平性;这是对软件工程师的重要警示。

引用次数: 41

If It’s Not Secure, It Should Not Compile: Preventing DOM-Based XSS in Large-Scale Web Development with API Hardening 如果它不安全，就不应该编译:通过API加固防止大规模Web开发中基于dom的XSS

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00123

Pei Wang, Julian Bangert, Christoph Kern

With tons of efforts spent on its mitigation, Cross-site scripting (XSS) remains one of the most prevalent security threats on the internet. Decades of exploitation and remediation demonstrated that code inspection and testing alone does not eliminate XSS vulnerabilities in complex web applications with a high degree of confidence. This paper introduces Google's secure-by-design engineering paradigm that effectively prevents DOM-based XSS vulnerabilities in large-scale web development. Our approach, named API hardening, enforces a series of company-wide secure coding practices. We provide a set of secure APIs to replace native DOM APIs that are prone to XSS vulnerabilities. Through a combination of type contracts and appropriate validation and escaping, the secure APIs ensure that applications based thereon are free of XSS vulnerabilities. We deploy a simple yet capable compile-time checker to guarantee that developers exclusively use our hardened APIs to interact with the DOM. We make various of efforts to scale this approach to tens of thousands of engineers without significant productivity impact. By offering rigorous tooling and consultant support, we help developers adopt the secure coding practices as seamlessly as possible. We present empirical results showing how API hardening has helped reduce the occurrences of XSS vulnerabilities in Google's enormous code base over the course of two-year deployment.

跨站点脚本(XSS)仍然是互联网上最普遍的安全威胁之一，尽管人们为减轻这种威胁付出了大量的努力。几十年的开发和补救表明，仅靠代码检查和测试并不能消除复杂web应用程序中的XSS漏洞。本文介绍了b谷歌的设计安全工程范例，该范例有效地防止了大规模web开发中基于dom的XSS漏洞。我们的方法，称为API强化，强制执行一系列全公司范围的安全编码实践。我们提供了一组安全api来取代容易出现XSS漏洞的本地DOM api。通过组合类型契约和适当的验证和转义，安全api确保基于此的应用程序不受XSS漏洞的影响。我们部署了一个简单但功能强大的编译时检查器，以保证开发人员只使用我们加固的api与DOM交互。我们做出了各种努力，将这种方法扩展到成千上万的工程师，而不会对生产力产生重大影响。通过提供严格的工具和顾问支持，我们帮助开发人员尽可能无缝地采用安全编码实践。我们提供的实证结果表明，在两年的部署过程中，API加固如何帮助减少b谷歌庞大代码库中XSS漏洞的发生。

{"title":"If It’s Not Secure, It Should Not Compile: Preventing DOM-Based XSS in Large-Scale Web Development with API Hardening","authors":"Pei Wang, Julian Bangert, Christoph Kern","doi":"10.1109/ICSE43902.2021.00123","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00123","url":null,"abstract":"With tons of efforts spent on its mitigation, Cross-site scripting (XSS) remains one of the most prevalent security threats on the internet. Decades of exploitation and remediation demonstrated that code inspection and testing alone does not eliminate XSS vulnerabilities in complex web applications with a high degree of confidence. This paper introduces Google's secure-by-design engineering paradigm that effectively prevents DOM-based XSS vulnerabilities in large-scale web development. Our approach, named API hardening, enforces a series of company-wide secure coding practices. We provide a set of secure APIs to replace native DOM APIs that are prone to XSS vulnerabilities. Through a combination of type contracts and appropriate validation and escaping, the secure APIs ensure that applications based thereon are free of XSS vulnerabilities. We deploy a simple yet capable compile-time checker to guarantee that developers exclusively use our hardened APIs to interact with the DOM. We make various of efforts to scale this approach to tens of thousands of engineers without significant productivity impact. By offering rigorous tooling and consultant support, we help developers adopt the secure coding practices as seamlessly as possible. We present empirical results showing how API hardening has helped reduce the occurrences of XSS vulnerabilities in Google's enormous code base over the course of two-year deployment.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129288579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Too Quiet in the Library: An Empirical Study of Security Updates in Android Apps' Native Code 图书馆里太安静了:Android应用原生代码安全更新的实证研究

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00122

Sumaya Almanee, Arda Ünal, Mathias Payer, Joshua Garcia

Android apps include third-party native libraries to increase performance and to reuse functionality. Native code is directly executed from apps through the Java Native Interface or the Android Native Development Kit. Android developers add precompiled native libraries to their projects, enabling their use. Unfortunately, developers often struggle or simply neglect to update these libraries in a timely manner. This results in the continuous use of outdated native libraries with unpatched security vulnerabilities years after patches became available. To further understand such phenomena, we study the security updates in native libraries in the most popular 200 free apps on Google Play from Sept. 2013 to May 2020. A core difficulty we face in this study is the identification of libraries and their versions. Developers often rename or modify libraries, making their identification challenging. We create an approach called LibRARIAN (LibRAry veRsion IdentificAtioN) that accurately identifies native libraries and their versions as found in Android apps based on our novel similarity metric bin2sim. LibRARIAN leverages different features extracted from libraries based on their metadata and identifying strings in read-only sections. We discovered 53/200 popular apps (26.5%) with vulnerable versions with known CVEs between Sept. 2013 and May 2020, with 14 of those apps remaining vulnerable. We find that app developers took, on average, 528.71±40.20 days to apply security patches, while library developers release a security patch after 54.59 ± 8.12 days-a 10 times slower rate of update.

Android应用包括第三方原生库来提高性能和重用功能。本地代码可以通过Java Native Interface或Android Native Development Kit直接从应用中执行。Android开发人员将预编译的本地库添加到他们的项目中，使其能够使用。不幸的是，开发人员经常努力或者忽略及时更新这些库。这导致在补丁可用数年后仍在使用带有未修补安全漏洞的过时本机库。为了进一步了解这一现象，我们研究了2013年9月至2020年5月Google Play上最受欢迎的200款免费应用的本地库安全更新情况。在本研究中我们面临的一个核心困难是库及其版本的识别。开发人员经常重命名或修改库，这使得它们的识别具有挑战性。我们创造了一种名为LibRARIAN(图书馆版本识别)的方法，基于我们新颖的相似度度量标准bin2sim，它可以准确识别Android应用中的本地库及其版本。LibRARIAN利用基于元数据从库中提取的不同特性，并识别只读部分中的字符串。在2013年9月至2020年5月期间，我们发现了53/200个流行应用程序(26.5%)具有已知cve的漏洞版本，其中14个应用程序仍然存在漏洞。我们发现，应用程序开发者应用安全补丁的平均时间为528.71±40.20天，而库开发者发布安全补丁的平均时间为54.59±8.12天，更新速度慢了10倍。

{"title":"Too Quiet in the Library: An Empirical Study of Security Updates in Android Apps' Native Code","authors":"Sumaya Almanee, Arda Ünal, Mathias Payer, Joshua Garcia","doi":"10.1109/ICSE43902.2021.00122","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00122","url":null,"abstract":"Android apps include third-party native libraries to increase performance and to reuse functionality. Native code is directly executed from apps through the Java Native Interface or the Android Native Development Kit. Android developers add precompiled native libraries to their projects, enabling their use. Unfortunately, developers often struggle or simply neglect to update these libraries in a timely manner. This results in the continuous use of outdated native libraries with unpatched security vulnerabilities years after patches became available. To further understand such phenomena, we study the security updates in native libraries in the most popular 200 free apps on Google Play from Sept. 2013 to May 2020. A core difficulty we face in this study is the identification of libraries and their versions. Developers often rename or modify libraries, making their identification challenging. We create an approach called LibRARIAN (LibRAry veRsion IdentificAtioN) that accurately identifies native libraries and their versions as found in Android apps based on our novel similarity metric bin2sim. LibRARIAN leverages different features extracted from libraries based on their metadata and identifying strings in read-only sections. We discovered 53/200 popular apps (26.5%) with vulnerable versions with known CVEs between Sept. 2013 and May 2020, with 14 of those apps remaining vulnerable. We find that app developers took, on average, 528.71±40.20 days to apply security patches, while library developers release a security patch after 54.59 ± 8.12 days-a 10 times slower rate of update.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123479527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Verifying Determinism in Sequential Programs 顺序程序的确定性验证

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00017

Rashmi Mudduluru, Jason Waataja, Suzanne Millstein, Michael D. Ernst

When a program is nondeterministic, it is difficult to test and debug. Nondeterminism occurs even in sequential programs: e.g., by iterating over the elements of a hash table. We have created a type system that expresses determinism specifications in a program. The key ideas in the type system are type qualifiers for nondeterminism, order-nondeterminism, and determinism; type well-formedness rules to restrict collection types; and enhancements to polymorphism that improve precision when analyzing collection operations. While state of-the-art nondeterminism detection tools rely on observing output from specific runs, our approach soundly verifies determinism at compile time. We implemented our type system for Java. Our type checker, the Determinism Checker, warns if a program is nondeterministic or verifies that the program is deterministic. In case studies of 90097 lines of code, the Determinism Checker found 87 previously-unknown nondeterminism errors, even in programs that had been heavily vetted by developers who were greatly concerned about nondeterminism errors. In experiments, the Determinism Checker found all of the non-concurrency-related nondeterminism that was found by state-of-the-art dynamic approaches for detecting flaky tests.

当一个程序是不确定的，测试和调试是困难的。不确定性甚至出现在顺序程序中:例如，在哈希表的元素上迭代。我们已经创建了一个在程序中表达确定性规范的类型系统。类型系统的关键思想是不确定性、顺序不确定性和决定论的类型限定符;使用格式良好的规则来限制集合类型;以及对多态性的增强，在分析集合操作时提高了精度。虽然最先进的不确定性检测工具依赖于观察特定运行的输出，但我们的方法在编译时可靠地验证了确定性。我们为Java实现了类型系统。我们的类型检查器Determinism检查器会在程序是非确定的情况下发出警告，或者验证程序是确定的。在90097行代码的案例研究中，Determinism Checker发现了87个以前未知的非确定性错误，即使是在那些已经被非常关注非确定性错误的开发人员严格审查过的程序中。在实验中，Determinism Checker发现了所有与非并发相关的非确定性，这些非确定性是通过最先进的动态方法检测片状测试发现的。

{"title":"Verifying Determinism in Sequential Programs","authors":"Rashmi Mudduluru, Jason Waataja, Suzanne Millstein, Michael D. Ernst","doi":"10.1109/ICSE43902.2021.00017","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00017","url":null,"abstract":"When a program is nondeterministic, it is difficult to test and debug. Nondeterminism occurs even in sequential programs: e.g., by iterating over the elements of a hash table. We have created a type system that expresses determinism specifications in a program. The key ideas in the type system are type qualifiers for nondeterminism, order-nondeterminism, and determinism; type well-formedness rules to restrict collection types; and enhancements to polymorphism that improve precision when analyzing collection operations. While state of-the-art nondeterminism detection tools rely on observing output from specific runs, our approach soundly verifies determinism at compile time. We implemented our type system for Java. Our type checker, the Determinism Checker, warns if a program is nondeterministic or verifies that the program is deterministic. In case studies of 90097 lines of code, the Determinism Checker found 87 previously-unknown nondeterminism errors, even in programs that had been heavily vetted by developers who were greatly concerned about nondeterminism errors. In experiments, the Determinism Checker found all of the non-concurrency-related nondeterminism that was found by state-of-the-art dynamic approaches for detecting flaky tests.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113975956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

CodeShovel: Constructing Method-Level Source Code Histories CodeShovel:构建方法级源代码历史

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00135

F. Grund, S. Chowdhury, N. Bradley, Braxton Hall, Reid Holmes

Source code histories are commonly used by developers and researchers to reason about how software evolves. Through a survey with 42 professional software developers, we learned that developers face significant mismatches between the output provided by developers' existing tools for examining source code histories and what they need to successfully complete their historical analysis tasks. To address these shortcomings, we propose CodeShovel, a tool for uncovering method histories that quickly produces complete and accurate change histories for 90% methods (including 97% of all method changes) outperforming leading tools from both research (e.g, FinerGit) and practice (e.g., IntelliJ / git log). CodeShovel helps developers to navigate the entire history of source code methods so they can better understand how the method evolved. A field study on industrial code bases with 16 industrial developers confirmed our empirical findings of CodeShovel's correctness, low runtime overheads, and additionally showed that the approach can be useful for a wide range of industrial development tasks.

开发人员和研究人员通常使用源代码历史来推断软件是如何发展的。通过对42名专业软件开发人员的调查，我们了解到开发人员面临着由开发人员用于检查源代码历史的现有工具提供的输出与他们成功完成历史分析任务所需的输出之间的严重不匹配。为了解决这些缺点，我们提出了CodeShovel，一个用于发现方法历史的工具，它可以快速生成90%方法(包括所有方法更改的97%)的完整和准确的变更历史，优于研究(例如FinerGit)和实践(例如IntelliJ / git log)中的领先工具。CodeShovel帮助开发人员浏览源代码方法的整个历史，以便他们能够更好地理解方法是如何演变的。对16个工业开发人员的工业代码库的实地研究证实了我们的经验发现，即CodeShovel的正确性、低运行时开销，并且还表明该方法可用于广泛的工业开发任务。

引用次数: 20

Enhancing Genetic Improvement of Software with Regression Test Selection 用回归测试选择增强软件的遗传改进

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00120

Giovani Guizzo, J. Petke, Federica Sarro, M. Harman

Genetic improvement uses artificial intelligence to automatically improve software with respect to non-functional properties (AI for SE). In this paper, we propose the use of existing software engineering best practice to enhance Genetic Improvement (SE for AI). We conjecture that existing Regression Test Selection (RTS) techniques (which have been proven to be efficient and effective) can and should be used as a core component of the GI search process for maximising its effectiveness. To assess our idea, we have carried out a thorough empirical study assessing the use of both dynamic and static RTS techniques with GI to improve seven real-world software programs. The results of our empirical evaluation show that incorporation of RTS within GI significantly speeds up the whole GI process, making it up to 78% faster on our benchmark set, being still able to produce valid software improvements. Our findings are significant in that they can save hours to days of computational time, and can facilitate the uptake of GI in an industrial setting, by significantly reducing the time for the developer to receive feedback from such an automated technique. Therefore, we recommend the use of RTS in future test-based automated software improvement work. Finally, we hope this successful application of SE for AI will encourage other researchers to investigate further applications in this area.

遗传改进使用人工智能来自动改进软件的非功能属性(AI for SE)。在本文中，我们建议使用现有的软件工程最佳实践来增强人工智能的遗传改进(SE)。我们推测，现有的回归测试选择(RTS)技术(已被证明是高效和有效的)可以而且应该用作地理标志搜索过程的核心组件，以最大化其有效性。为了评估我们的想法，我们进行了一项彻底的实证研究，评估动态和静态RTS技术与GI的使用，以改进七个现实世界的软件程序。我们的经验评估结果表明，在GI中合并RTS显著加快了整个GI过程，在我们的基准集上使其速度提高了78%，并且仍然能够产生有效的软件改进。我们的发现意义重大，因为它们可以节省数小时到数天的计算时间，并且可以通过显着减少开发人员从这种自动化技术接收反馈的时间来促进工业环境中GI的吸收。因此，我们建议在未来基于测试的自动化软件改进工作中使用RTS。最后，我们希望SE在AI中的成功应用将鼓励其他研究人员在这一领域进一步研究应用。

{"title":"Enhancing Genetic Improvement of Software with Regression Test Selection","authors":"Giovani Guizzo, J. Petke, Federica Sarro, M. Harman","doi":"10.1109/ICSE43902.2021.00120","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00120","url":null,"abstract":"Genetic improvement uses artificial intelligence to automatically improve software with respect to non-functional properties (AI for SE). In this paper, we propose the use of existing software engineering best practice to enhance Genetic Improvement (SE for AI). We conjecture that existing Regression Test Selection (RTS) techniques (which have been proven to be efficient and effective) can and should be used as a core component of the GI search process for maximising its effectiveness. To assess our idea, we have carried out a thorough empirical study assessing the use of both dynamic and static RTS techniques with GI to improve seven real-world software programs. The results of our empirical evaluation show that incorporation of RTS within GI significantly speeds up the whole GI process, making it up to 78% faster on our benchmark set, being still able to produce valid software improvements. Our findings are significant in that they can save hours to days of computational time, and can facilitate the uptake of GI in an industrial setting, by significantly reducing the time for the developer to receive feedback from such an automated technique. Therefore, we recommend the use of RTS in future test-based automated software improvement work. Finally, we hope this successful application of SE for AI will encourage other researchers to investigate further applications in this area.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124402653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Efficient Compiler Autotuning via Bayesian Optimization 基于贝叶斯优化的高效编译器自动调优

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00110

Junjie Chen, Ningxin Xu, Peiqi Chen, Hongyu Zhang

A typical compiler such as GCC supports hundreds of optimizations controlled by compilation flags for improving the runtime performance of the compiled program. Due to the large number of compilation flags and the exponential number of flag combinations, it is impossible for compiler users to manually tune these optimization flags in order to achieve the required runtime performance of the compiled programs. Over the years, many compiler autotuning approaches have been proposed to automatically tune optimization flags, but they still suffer from the efficiency problem due to the huge search space. In this paper, we propose the first Bayesian optimization based approach, called BOCA, for efficient compiler autotuning. In BOCA, we leverage a tree-based model for approximating the objective function in order to make Bayesian optimization scalable to a large number of optimization flags. Moreover, we design a novel searching strategy to improve the efficiency of Bayesian optimization by incorporating the impact of each optimization flag measured by the tree-based model and a decay function to strike a balance between exploitation and exploration. We conduct extensive experiments to investigate the effectiveness of BOCA on two most popular C compilers (i.e., GCC and LLVM) and two widely-used C benchmarks (i.e., cBench and PolyBench). The results show that BOCA significantly outperforms the state-of-the-art compiler autotuning approaches and Bayesion optimization methods in terms of the time spent on achieving specified speedups, demonstrating the effectiveness of BOCA.

典型的编译器(如GCC)支持由编译标志控制的数百种优化，以提高编译后程序的运行时性能。由于大量的编译标志和指数级的标志组合，编译器用户不可能手动调整这些优化标志，以达到编译程序所需的运行时性能。多年来，已经提出了许多编译器自动调优方法来自动调优优化标志，但是由于巨大的搜索空间，它们仍然存在效率问题。在本文中，我们提出了第一种基于贝叶斯优化的方法，称为BOCA，用于有效的编译器自动调优。在BOCA中，我们利用基于树的模型来近似目标函数，以便使贝叶斯优化可扩展到大量优化标志。此外，我们设计了一种新的搜索策略，通过结合基于树的模型测量的每个优化标志的影响和衰减函数来达到开发和探索之间的平衡，以提高贝叶斯优化的效率。我们进行了广泛的实验来研究BOCA在两种最流行的C编译器(即GCC和LLVM)和两种广泛使用的C基准(即cBench和PolyBench)上的有效性。结果表明，在实现指定加速所花费的时间方面，BOCA显著优于最先进的编译器自动调优方法和贝叶斯优化方法，证明了BOCA的有效性。

{"title":"Efficient Compiler Autotuning via Bayesian Optimization","authors":"Junjie Chen, Ningxin Xu, Peiqi Chen, Hongyu Zhang","doi":"10.1109/ICSE43902.2021.00110","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00110","url":null,"abstract":"A typical compiler such as GCC supports hundreds of optimizations controlled by compilation flags for improving the runtime performance of the compiled program. Due to the large number of compilation flags and the exponential number of flag combinations, it is impossible for compiler users to manually tune these optimization flags in order to achieve the required runtime performance of the compiled programs. Over the years, many compiler autotuning approaches have been proposed to automatically tune optimization flags, but they still suffer from the efficiency problem due to the huge search space. In this paper, we propose the first Bayesian optimization based approach, called BOCA, for efficient compiler autotuning. In BOCA, we leverage a tree-based model for approximating the objective function in order to make Bayesian optimization scalable to a large number of optimization flags. Moreover, we design a novel searching strategy to improve the efficiency of Bayesian optimization by incorporating the impact of each optimization flag measured by the tree-based model and a decay function to strike a balance between exploitation and exploration. We conduct extensive experiments to investigate the effectiveness of BOCA on two most popular C compilers (i.e., GCC and LLVM) and two widely-used C benchmarks (i.e., cBench and PolyBench). The results show that BOCA significantly outperforms the state-of-the-art compiler autotuning approaches and Bayesion optimization methods in terms of the time spent on achieving specified speedups, demonstrating the effectiveness of BOCA.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121501983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Semantic Web Accessibility Testing via Hierarchical Visual Analysis 基于层次视觉分析的语义网页可访问性测试

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Pub Date : 2021-05-01 DOI: 10.1109/ICSE43902.2021.00143

Mohammad Bajammal, A. Mesbah

Web accessibility, the design of web apps to be usable by users with disabilities, impacts millions of people around the globe. Although accessibility has traditionally been a marginal afterthought that is often ignored in many software products, it is increasingly becoming a legal requirement that must be satisfied. While some web accessibility testing tools exist, most only perform rudimentary syntactical checks that do not assess the more important high-level semantic aspects that users with disabilities rely on. Accordingly, assessing web accessibility has largely remained a laborious manual process requiring human input. In this paper, we propose an approach, called AXERAY, that infers semantic groupings of various regions of a web page and their semantic roles. We evaluate our approach on 30 real-world websites and assess the accuracy of semantic inference as well as the ability to detect accessibility failures. The results show that AXERAY achieves, on average, an F-measure of 87% for inferring semantic groupings, and is able to detect accessibility failures with 85% accuracy.

网页可访问性，即网页应用程序的设计能够被残疾用户使用，影响着全球数百万人。虽然易访问性在传统上是一个边缘的事后考虑，在许多软件产品中经常被忽略，但它正日益成为必须满足的法律要求。虽然存在一些web可访问性测试工具，但大多数只执行基本的语法检查，而不评估残疾用户依赖的更重要的高级语义方面。因此，评估网站的可访问性在很大程度上仍然是一个需要人工输入的费力的手工过程。在本文中，我们提出了一种称为AXERAY的方法，该方法可以推断网页中各个区域的语义分组及其语义角色。我们在30个真实世界的网站上评估了我们的方法，并评估了语义推理的准确性以及检测可访问性故障的能力。结果表明，AXERAY在推断语义分组方面的平均f值达到87%，并且能够以85%的准确率检测可访问性故障。

引用次数: 9