Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00015
Jihyeok Park, Seungmin An, Dongjun Youn, Gyeongwon Kim, Sukyoung Ryu
Modern programming follows the continuous integration (CI) and continuous deployment (CD) approach rather than the traditional waterfall model. Even the development of modern programming languages uses the CI/CD approach to swiftly provide new language features and to adapt to new development environments. Unlike in the conventional approach, in the modern CI/CD approach, a language specification is no more the oracle of the language semantics because both the specification and its implementations (interpreters or compilers) can co-evolve. In this setting, both the specification and implementations may have bugs, and guaranteeing their correctness is non-trivial. In this paper, we propose a novel N+1-version differential testing to resolve the problem. Unlike the traditional differential testing, our approach consists of three steps: 1) to automatically synthesize programs guided by the syntax and semantics from a given language specification, 2) to generate conformance tests by injecting assertions to the synthesized programs to check their final program states, 3) to detect bugs in the specification and implementations via executing the conformance tests on multiple implementations, and 4) to localize bugs on the specification using statistical information. We actualize our approach for the JavaScript programming language via JEST, which performs N+1-version differential testing for modern JavaScript engines and ECMAScript, the language specification describing the syntax and semantics of JavaScript in a natural language. We evaluated JEST with four JavaScript engines that support all modern JavaScript language features and the latest version of ECMAScript (ES11, 2020). JEST automatically synthesized 1,700 programs that covered 97.78% of syntax and 87.70% of semantics from ES11. Using the assertion-injected JavaScript programs, it detected 44 engine bugs in four different engines and 27 specification bugs in ES11.
{"title":"JEST: N+1-Version Differential Testing of Both JavaScript Engines and Specification","authors":"Jihyeok Park, Seungmin An, Dongjun Youn, Gyeongwon Kim, Sukyoung Ryu","doi":"10.1109/ICSE43902.2021.00015","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00015","url":null,"abstract":"Modern programming follows the continuous integration (CI) and continuous deployment (CD) approach rather than the traditional waterfall model. Even the development of modern programming languages uses the CI/CD approach to swiftly provide new language features and to adapt to new development environments. Unlike in the conventional approach, in the modern CI/CD approach, a language specification is no more the oracle of the language semantics because both the specification and its implementations (interpreters or compilers) can co-evolve. In this setting, both the specification and implementations may have bugs, and guaranteeing their correctness is non-trivial. In this paper, we propose a novel N+1-version differential testing to resolve the problem. Unlike the traditional differential testing, our approach consists of three steps: 1) to automatically synthesize programs guided by the syntax and semantics from a given language specification, 2) to generate conformance tests by injecting assertions to the synthesized programs to check their final program states, 3) to detect bugs in the specification and implementations via executing the conformance tests on multiple implementations, and 4) to localize bugs on the specification using statistical information. We actualize our approach for the JavaScript programming language via JEST, which performs N+1-version differential testing for modern JavaScript engines and ECMAScript, the language specification describing the syntax and semantics of JavaScript in a natural language. We evaluated JEST with four JavaScript engines that support all modern JavaScript language features and the latest version of ECMAScript (ES11, 2020). JEST automatically synthesized 1,700 programs that covered 97.78% of syntax and 87.70% of semantics from ES11. Using the assertion-injected JavaScript programs, it detected 44 engine bugs in four different engines and 27 specification bugs in ES11.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115651035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00051
Amir Makhshari, A. Mesbah
IoT systems are rapidly adopted in various domains, from embedded systems to smart homes. Despite their growing adoption and popularity, there has been no thorough study to understand IoT development challenges from the practitioners' point of view. We provide the first systematic study of bugs and challenges that IoT developers face in practice, through a large-scale empirical investigation. We collected 5,565 bug reports from 91 representative IoT project repositories and categorized a random sample of 323 based on the observed failures, root causes, and the locations of the faulty components. In addition, we conducted nine interviews with IoT experts to uncover more details about IoT bugs and to gain insight into IoT developers' challenges. Lastly, we surveyed 194 IoT developers to validate our findings and gain further insights. We propose the first bug taxonomy for IoT systems based on our results. We highlight frequent bug categories and their root causes, correlations between them, and common pitfalls and challenges that IoT developers face. We recommend future directions for IoT areas that require research and development attention.
{"title":"IoT Bugs and Development Challenges","authors":"Amir Makhshari, A. Mesbah","doi":"10.1109/ICSE43902.2021.00051","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00051","url":null,"abstract":"IoT systems are rapidly adopted in various domains, from embedded systems to smart homes. Despite their growing adoption and popularity, there has been no thorough study to understand IoT development challenges from the practitioners' point of view. We provide the first systematic study of bugs and challenges that IoT developers face in practice, through a large-scale empirical investigation. We collected 5,565 bug reports from 91 representative IoT project repositories and categorized a random sample of 323 based on the observed failures, root causes, and the locations of the faulty components. In addition, we conducted nine interviews with IoT experts to uncover more details about IoT bugs and to gain insight into IoT developers' challenges. Lastly, we surveyed 194 IoT developers to validate our findings and gain further insights. We propose the first bug taxonomy for IoT systems based on our results. We highlight frequent bug categories and their root causes, correlations between them, and common pitfalls and challenges that IoT developers face. We recommend future directions for IoT areas that require research and development attention.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114272929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00129
J Zhang, M. Harman
Machine learning software can be unfair when making human-related decisions, having prejudices over certain groups of people. Existing work primarily focuses on proposing fairness metrics and presenting fairness improvement approaches. It remains unclear how key aspect of any machine learning system, such as feature set and training data, affect fairness. This paper presents results from a comprehensive study that addresses this problem. We find that enlarging the feature set plays a significant role in fairness (with an average effect rate of 38%). Importantly, and contrary to widely-held beliefs that greater fairness often corresponds to lower accuracy, our findings reveal that an enlarged feature set has both higher accuracy and fairness. Perhaps also surprisingly, we find that a larger training data does not help to improve fairness. Our results suggest a larger training data set has more unfairness than a smaller one when feature sets are insufficient; an important cautionary finding for practising software engineers.
{"title":"\"Ignorance and Prejudice\" in Software Fairness","authors":"J Zhang, M. Harman","doi":"10.1109/ICSE43902.2021.00129","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00129","url":null,"abstract":"Machine learning software can be unfair when making human-related decisions, having prejudices over certain groups of people. Existing work primarily focuses on proposing fairness metrics and presenting fairness improvement approaches. It remains unclear how key aspect of any machine learning system, such as feature set and training data, affect fairness. This paper presents results from a comprehensive study that addresses this problem. We find that enlarging the feature set plays a significant role in fairness (with an average effect rate of 38%). Importantly, and contrary to widely-held beliefs that greater fairness often corresponds to lower accuracy, our findings reveal that an enlarged feature set has both higher accuracy and fairness. Perhaps also surprisingly, we find that a larger training data does not help to improve fairness. Our results suggest a larger training data set has more unfairness than a smaller one when feature sets are insufficient; an important cautionary finding for practising software engineers.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128983255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00123
Pei Wang, Julian Bangert, Christoph Kern
With tons of efforts spent on its mitigation, Cross-site scripting (XSS) remains one of the most prevalent security threats on the internet. Decades of exploitation and remediation demonstrated that code inspection and testing alone does not eliminate XSS vulnerabilities in complex web applications with a high degree of confidence. This paper introduces Google's secure-by-design engineering paradigm that effectively prevents DOM-based XSS vulnerabilities in large-scale web development. Our approach, named API hardening, enforces a series of company-wide secure coding practices. We provide a set of secure APIs to replace native DOM APIs that are prone to XSS vulnerabilities. Through a combination of type contracts and appropriate validation and escaping, the secure APIs ensure that applications based thereon are free of XSS vulnerabilities. We deploy a simple yet capable compile-time checker to guarantee that developers exclusively use our hardened APIs to interact with the DOM. We make various of efforts to scale this approach to tens of thousands of engineers without significant productivity impact. By offering rigorous tooling and consultant support, we help developers adopt the secure coding practices as seamlessly as possible. We present empirical results showing how API hardening has helped reduce the occurrences of XSS vulnerabilities in Google's enormous code base over the course of two-year deployment.
{"title":"If It’s Not Secure, It Should Not Compile: Preventing DOM-Based XSS in Large-Scale Web Development with API Hardening","authors":"Pei Wang, Julian Bangert, Christoph Kern","doi":"10.1109/ICSE43902.2021.00123","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00123","url":null,"abstract":"With tons of efforts spent on its mitigation, Cross-site scripting (XSS) remains one of the most prevalent security threats on the internet. Decades of exploitation and remediation demonstrated that code inspection and testing alone does not eliminate XSS vulnerabilities in complex web applications with a high degree of confidence. This paper introduces Google's secure-by-design engineering paradigm that effectively prevents DOM-based XSS vulnerabilities in large-scale web development. Our approach, named API hardening, enforces a series of company-wide secure coding practices. We provide a set of secure APIs to replace native DOM APIs that are prone to XSS vulnerabilities. Through a combination of type contracts and appropriate validation and escaping, the secure APIs ensure that applications based thereon are free of XSS vulnerabilities. We deploy a simple yet capable compile-time checker to guarantee that developers exclusively use our hardened APIs to interact with the DOM. We make various of efforts to scale this approach to tens of thousands of engineers without significant productivity impact. By offering rigorous tooling and consultant support, we help developers adopt the secure coding practices as seamlessly as possible. We present empirical results showing how API hardening has helped reduce the occurrences of XSS vulnerabilities in Google's enormous code base over the course of two-year deployment.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129288579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Android apps include third-party native libraries to increase performance and to reuse functionality. Native code is directly executed from apps through the Java Native Interface or the Android Native Development Kit. Android developers add precompiled native libraries to their projects, enabling their use. Unfortunately, developers often struggle or simply neglect to update these libraries in a timely manner. This results in the continuous use of outdated native libraries with unpatched security vulnerabilities years after patches became available. To further understand such phenomena, we study the security updates in native libraries in the most popular 200 free apps on Google Play from Sept. 2013 to May 2020. A core difficulty we face in this study is the identification of libraries and their versions. Developers often rename or modify libraries, making their identification challenging. We create an approach called LibRARIAN (LibRAry veRsion IdentificAtioN) that accurately identifies native libraries and their versions as found in Android apps based on our novel similarity metric bin2sim. LibRARIAN leverages different features extracted from libraries based on their metadata and identifying strings in read-only sections. We discovered 53/200 popular apps (26.5%) with vulnerable versions with known CVEs between Sept. 2013 and May 2020, with 14 of those apps remaining vulnerable. We find that app developers took, on average, 528.71±40.20 days to apply security patches, while library developers release a security patch after 54.59 ± 8.12 days-a 10 times slower rate of update.
Android应用包括第三方原生库来提高性能和重用功能。本地代码可以通过Java Native Interface或Android Native Development Kit直接从应用中执行。Android开发人员将预编译的本地库添加到他们的项目中,使其能够使用。不幸的是,开发人员经常努力或者忽略及时更新这些库。这导致在补丁可用数年后仍在使用带有未修补安全漏洞的过时本机库。为了进一步了解这一现象,我们研究了2013年9月至2020年5月Google Play上最受欢迎的200款免费应用的本地库安全更新情况。在本研究中我们面临的一个核心困难是库及其版本的识别。开发人员经常重命名或修改库,这使得它们的识别具有挑战性。我们创造了一种名为LibRARIAN(图书馆版本识别)的方法,基于我们新颖的相似度度量标准bin2sim,它可以准确识别Android应用中的本地库及其版本。LibRARIAN利用基于元数据从库中提取的不同特性,并识别只读部分中的字符串。在2013年9月至2020年5月期间,我们发现了53/200个流行应用程序(26.5%)具有已知cve的漏洞版本,其中14个应用程序仍然存在漏洞。我们发现,应用程序开发者应用安全补丁的平均时间为528.71±40.20天,而库开发者发布安全补丁的平均时间为54.59±8.12天,更新速度慢了10倍。
{"title":"Too Quiet in the Library: An Empirical Study of Security Updates in Android Apps' Native Code","authors":"Sumaya Almanee, Arda Ünal, Mathias Payer, Joshua Garcia","doi":"10.1109/ICSE43902.2021.00122","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00122","url":null,"abstract":"Android apps include third-party native libraries to increase performance and to reuse functionality. Native code is directly executed from apps through the Java Native Interface or the Android Native Development Kit. Android developers add precompiled native libraries to their projects, enabling their use. Unfortunately, developers often struggle or simply neglect to update these libraries in a timely manner. This results in the continuous use of outdated native libraries with unpatched security vulnerabilities years after patches became available. To further understand such phenomena, we study the security updates in native libraries in the most popular 200 free apps on Google Play from Sept. 2013 to May 2020. A core difficulty we face in this study is the identification of libraries and their versions. Developers often rename or modify libraries, making their identification challenging. We create an approach called LibRARIAN (LibRAry veRsion IdentificAtioN) that accurately identifies native libraries and their versions as found in Android apps based on our novel similarity metric bin2sim. LibRARIAN leverages different features extracted from libraries based on their metadata and identifying strings in read-only sections. We discovered 53/200 popular apps (26.5%) with vulnerable versions with known CVEs between Sept. 2013 and May 2020, with 14 of those apps remaining vulnerable. We find that app developers took, on average, 528.71±40.20 days to apply security patches, while library developers release a security patch after 54.59 ± 8.12 days-a 10 times slower rate of update.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123479527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00017
Rashmi Mudduluru, Jason Waataja, Suzanne Millstein, Michael D. Ernst
When a program is nondeterministic, it is difficult to test and debug. Nondeterminism occurs even in sequential programs: e.g., by iterating over the elements of a hash table. We have created a type system that expresses determinism specifications in a program. The key ideas in the type system are type qualifiers for nondeterminism, order-nondeterminism, and determinism; type well-formedness rules to restrict collection types; and enhancements to polymorphism that improve precision when analyzing collection operations. While state of-the-art nondeterminism detection tools rely on observing output from specific runs, our approach soundly verifies determinism at compile time. We implemented our type system for Java. Our type checker, the Determinism Checker, warns if a program is nondeterministic or verifies that the program is deterministic. In case studies of 90097 lines of code, the Determinism Checker found 87 previously-unknown nondeterminism errors, even in programs that had been heavily vetted by developers who were greatly concerned about nondeterminism errors. In experiments, the Determinism Checker found all of the non-concurrency-related nondeterminism that was found by state-of-the-art dynamic approaches for detecting flaky tests.
{"title":"Verifying Determinism in Sequential Programs","authors":"Rashmi Mudduluru, Jason Waataja, Suzanne Millstein, Michael D. Ernst","doi":"10.1109/ICSE43902.2021.00017","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00017","url":null,"abstract":"When a program is nondeterministic, it is difficult to test and debug. Nondeterminism occurs even in sequential programs: e.g., by iterating over the elements of a hash table. We have created a type system that expresses determinism specifications in a program. The key ideas in the type system are type qualifiers for nondeterminism, order-nondeterminism, and determinism; type well-formedness rules to restrict collection types; and enhancements to polymorphism that improve precision when analyzing collection operations. While state of-the-art nondeterminism detection tools rely on observing output from specific runs, our approach soundly verifies determinism at compile time. We implemented our type system for Java. Our type checker, the Determinism Checker, warns if a program is nondeterministic or verifies that the program is deterministic. In case studies of 90097 lines of code, the Determinism Checker found 87 previously-unknown nondeterminism errors, even in programs that had been heavily vetted by developers who were greatly concerned about nondeterminism errors. In experiments, the Determinism Checker found all of the non-concurrency-related nondeterminism that was found by state-of-the-art dynamic approaches for detecting flaky tests.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113975956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00135
F. Grund, S. Chowdhury, N. Bradley, Braxton Hall, Reid Holmes
Source code histories are commonly used by developers and researchers to reason about how software evolves. Through a survey with 42 professional software developers, we learned that developers face significant mismatches between the output provided by developers' existing tools for examining source code histories and what they need to successfully complete their historical analysis tasks. To address these shortcomings, we propose CodeShovel, a tool for uncovering method histories that quickly produces complete and accurate change histories for 90% methods (including 97% of all method changes) outperforming leading tools from both research (e.g, FinerGit) and practice (e.g., IntelliJ / git log). CodeShovel helps developers to navigate the entire history of source code methods so they can better understand how the method evolved. A field study on industrial code bases with 16 industrial developers confirmed our empirical findings of CodeShovel's correctness, low runtime overheads, and additionally showed that the approach can be useful for a wide range of industrial development tasks.
{"title":"CodeShovel: Constructing Method-Level Source Code Histories","authors":"F. Grund, S. Chowdhury, N. Bradley, Braxton Hall, Reid Holmes","doi":"10.1109/ICSE43902.2021.00135","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00135","url":null,"abstract":"Source code histories are commonly used by developers and researchers to reason about how software evolves. Through a survey with 42 professional software developers, we learned that developers face significant mismatches between the output provided by developers' existing tools for examining source code histories and what they need to successfully complete their historical analysis tasks. To address these shortcomings, we propose CodeShovel, a tool for uncovering method histories that quickly produces complete and accurate change histories for 90% methods (including 97% of all method changes) outperforming leading tools from both research (e.g, FinerGit) and practice (e.g., IntelliJ / git log). CodeShovel helps developers to navigate the entire history of source code methods so they can better understand how the method evolved. A field study on industrial code bases with 16 industrial developers confirmed our empirical findings of CodeShovel's correctness, low runtime overheads, and additionally showed that the approach can be useful for a wide range of industrial development tasks.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115843244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00120
Giovani Guizzo, J. Petke, Federica Sarro, M. Harman
Genetic improvement uses artificial intelligence to automatically improve software with respect to non-functional properties (AI for SE). In this paper, we propose the use of existing software engineering best practice to enhance Genetic Improvement (SE for AI). We conjecture that existing Regression Test Selection (RTS) techniques (which have been proven to be efficient and effective) can and should be used as a core component of the GI search process for maximising its effectiveness. To assess our idea, we have carried out a thorough empirical study assessing the use of both dynamic and static RTS techniques with GI to improve seven real-world software programs. The results of our empirical evaluation show that incorporation of RTS within GI significantly speeds up the whole GI process, making it up to 78% faster on our benchmark set, being still able to produce valid software improvements. Our findings are significant in that they can save hours to days of computational time, and can facilitate the uptake of GI in an industrial setting, by significantly reducing the time for the developer to receive feedback from such an automated technique. Therefore, we recommend the use of RTS in future test-based automated software improvement work. Finally, we hope this successful application of SE for AI will encourage other researchers to investigate further applications in this area.
遗传改进使用人工智能来自动改进软件的非功能属性(AI for SE)。在本文中,我们建议使用现有的软件工程最佳实践来增强人工智能的遗传改进(SE)。我们推测,现有的回归测试选择(RTS)技术(已被证明是高效和有效的)可以而且应该用作地理标志搜索过程的核心组件,以最大化其有效性。为了评估我们的想法,我们进行了一项彻底的实证研究,评估动态和静态RTS技术与GI的使用,以改进七个现实世界的软件程序。我们的经验评估结果表明,在GI中合并RTS显著加快了整个GI过程,在我们的基准集上使其速度提高了78%,并且仍然能够产生有效的软件改进。我们的发现意义重大,因为它们可以节省数小时到数天的计算时间,并且可以通过显着减少开发人员从这种自动化技术接收反馈的时间来促进工业环境中GI的吸收。因此,我们建议在未来基于测试的自动化软件改进工作中使用RTS。最后,我们希望SE在AI中的成功应用将鼓励其他研究人员在这一领域进一步研究应用。
{"title":"Enhancing Genetic Improvement of Software with Regression Test Selection","authors":"Giovani Guizzo, J. Petke, Federica Sarro, M. Harman","doi":"10.1109/ICSE43902.2021.00120","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00120","url":null,"abstract":"Genetic improvement uses artificial intelligence to automatically improve software with respect to non-functional properties (AI for SE). In this paper, we propose the use of existing software engineering best practice to enhance Genetic Improvement (SE for AI). We conjecture that existing Regression Test Selection (RTS) techniques (which have been proven to be efficient and effective) can and should be used as a core component of the GI search process for maximising its effectiveness. To assess our idea, we have carried out a thorough empirical study assessing the use of both dynamic and static RTS techniques with GI to improve seven real-world software programs. The results of our empirical evaluation show that incorporation of RTS within GI significantly speeds up the whole GI process, making it up to 78% faster on our benchmark set, being still able to produce valid software improvements. Our findings are significant in that they can save hours to days of computational time, and can facilitate the uptake of GI in an industrial setting, by significantly reducing the time for the developer to receive feedback from such an automated technique. Therefore, we recommend the use of RTS in future test-based automated software improvement work. Finally, we hope this successful application of SE for AI will encourage other researchers to investigate further applications in this area.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124402653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00110
Junjie Chen, Ningxin Xu, Peiqi Chen, Hongyu Zhang
A typical compiler such as GCC supports hundreds of optimizations controlled by compilation flags for improving the runtime performance of the compiled program. Due to the large number of compilation flags and the exponential number of flag combinations, it is impossible for compiler users to manually tune these optimization flags in order to achieve the required runtime performance of the compiled programs. Over the years, many compiler autotuning approaches have been proposed to automatically tune optimization flags, but they still suffer from the efficiency problem due to the huge search space. In this paper, we propose the first Bayesian optimization based approach, called BOCA, for efficient compiler autotuning. In BOCA, we leverage a tree-based model for approximating the objective function in order to make Bayesian optimization scalable to a large number of optimization flags. Moreover, we design a novel searching strategy to improve the efficiency of Bayesian optimization by incorporating the impact of each optimization flag measured by the tree-based model and a decay function to strike a balance between exploitation and exploration. We conduct extensive experiments to investigate the effectiveness of BOCA on two most popular C compilers (i.e., GCC and LLVM) and two widely-used C benchmarks (i.e., cBench and PolyBench). The results show that BOCA significantly outperforms the state-of-the-art compiler autotuning approaches and Bayesion optimization methods in terms of the time spent on achieving specified speedups, demonstrating the effectiveness of BOCA.
{"title":"Efficient Compiler Autotuning via Bayesian Optimization","authors":"Junjie Chen, Ningxin Xu, Peiqi Chen, Hongyu Zhang","doi":"10.1109/ICSE43902.2021.00110","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00110","url":null,"abstract":"A typical compiler such as GCC supports hundreds of optimizations controlled by compilation flags for improving the runtime performance of the compiled program. Due to the large number of compilation flags and the exponential number of flag combinations, it is impossible for compiler users to manually tune these optimization flags in order to achieve the required runtime performance of the compiled programs. Over the years, many compiler autotuning approaches have been proposed to automatically tune optimization flags, but they still suffer from the efficiency problem due to the huge search space. In this paper, we propose the first Bayesian optimization based approach, called BOCA, for efficient compiler autotuning. In BOCA, we leverage a tree-based model for approximating the objective function in order to make Bayesian optimization scalable to a large number of optimization flags. Moreover, we design a novel searching strategy to improve the efficiency of Bayesian optimization by incorporating the impact of each optimization flag measured by the tree-based model and a decay function to strike a balance between exploitation and exploration. We conduct extensive experiments to investigate the effectiveness of BOCA on two most popular C compilers (i.e., GCC and LLVM) and two widely-used C benchmarks (i.e., cBench and PolyBench). The results show that BOCA significantly outperforms the state-of-the-art compiler autotuning approaches and Bayesion optimization methods in terms of the time spent on achieving specified speedups, demonstrating the effectiveness of BOCA.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121501983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-05-01DOI: 10.1109/ICSE43902.2021.00143
Mohammad Bajammal, A. Mesbah
Web accessibility, the design of web apps to be usable by users with disabilities, impacts millions of people around the globe. Although accessibility has traditionally been a marginal afterthought that is often ignored in many software products, it is increasingly becoming a legal requirement that must be satisfied. While some web accessibility testing tools exist, most only perform rudimentary syntactical checks that do not assess the more important high-level semantic aspects that users with disabilities rely on. Accordingly, assessing web accessibility has largely remained a laborious manual process requiring human input. In this paper, we propose an approach, called AXERAY, that infers semantic groupings of various regions of a web page and their semantic roles. We evaluate our approach on 30 real-world websites and assess the accuracy of semantic inference as well as the ability to detect accessibility failures. The results show that AXERAY achieves, on average, an F-measure of 87% for inferring semantic groupings, and is able to detect accessibility failures with 85% accuracy.
{"title":"Semantic Web Accessibility Testing via Hierarchical Visual Analysis","authors":"Mohammad Bajammal, A. Mesbah","doi":"10.1109/ICSE43902.2021.00143","DOIUrl":"https://doi.org/10.1109/ICSE43902.2021.00143","url":null,"abstract":"Web accessibility, the design of web apps to be usable by users with disabilities, impacts millions of people around the globe. Although accessibility has traditionally been a marginal afterthought that is often ignored in many software products, it is increasingly becoming a legal requirement that must be satisfied. While some web accessibility testing tools exist, most only perform rudimentary syntactical checks that do not assess the more important high-level semantic aspects that users with disabilities rely on. Accordingly, assessing web accessibility has largely remained a laborious manual process requiring human input. In this paper, we propose an approach, called AXERAY, that infers semantic groupings of various regions of a web page and their semantic roles. We evaluate our approach on 30 real-world websites and assess the accuracy of semantic inference as well as the ability to detect accessibility failures. The results show that AXERAY achieves, on average, an F-measure of 87% for inferring semantic groupings, and is able to detect accessibility failures with 85% accuracy.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127804992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}