首页 > 最新文献

2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)最新文献

英文 中文
Proving Termination by k-Induction 用k归纳法证明终止
Jianhui Chen, Fei He
We propose a novel approach to proving the termination of imperative programs by -induction. By our approach, the termination proving problem can be formalized as a -inductive invariant synthesis task. On the one hand, -induction uses weaker invariants than that required by the standard inductive approach. On the other hand, the base case of -induction, which unrolls the program, can provide stronger pre-condition for invariant synthesis. As a result, the termination arguments of our approach can be synthesized more efficiently than the standard method. We implement a prototype of our inductive approach. The experimental results show the significant effectiveness and efficiency of our approach.
提出了一种用-归纳法证明命令式程序终止的新方法。通过我们的方法,终止证明问题可以形式化为一个-归纳不变综合任务。一方面,-归纳法比标准归纳法使用更弱的不变量。另一方面,-归纳的基本情况,展开程序,可以为不变综合提供更强的前提条件。因此,我们的方法可以比标准方法更有效地综合终止参数。我们实现了归纳方法的一个原型。实验结果表明了该方法的有效性和高效性。
{"title":"Proving Termination by k-Induction","authors":"Jianhui Chen, Fei He","doi":"10.1145/3324884.3418929","DOIUrl":"https://doi.org/10.1145/3324884.3418929","url":null,"abstract":"We propose a novel approach to proving the termination of imperative programs by -induction. By our approach, the termination proving problem can be formalized as a -inductive invariant synthesis task. On the one hand, -induction uses weaker invariants than that required by the standard inductive approach. On the other hand, the base case of -induction, which unrolls the program, can provide stronger pre-condition for invariant synthesis. As a result, the termination arguments of our approach can be synthesized more efficiently than the standard method. We implement a prototype of our inductive approach. The experimental results show the significant effectiveness and efficiency of our approach.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117010289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Summary-Based Symbolic Evaluation for Smart Contracts 基于摘要的智能合约符号评估
Yu Feng, E. Torlak, R. Bodík
This paper presents Solar, a system for automatic synthesis of adversarial contracts that exploit vulnerabilities in a victim smart contract. To make the synthesis tractable, we introduce a query language as well as summary-based symbolic evaluation, which significantly reduces the number of instructions that our synthesizer needs to evaluate symbolically, without compromising the precision of the vulnerability query. We encoded common vulnerabilities of smart contracts and evaluated Solar on the entire data set from Etherscan. Our experiments demonstrate the benefits of summary-based symbolic evaluation and show that Solar outperforms state-of-the-art smart contracts analyzers, TEETHER, Mythril, and Contract Fuzzer, in terms of running time and precision.
本文介绍了Solar,这是一个自动合成对抗性合约的系统,该系统利用了受害者智能合约中的漏洞。为了使合成易于处理,我们引入了查询语言和基于摘要的符号求值,这大大减少了合成器需要用符号求值的指令数量,同时又不影响漏洞查询的精度。我们对智能合约的常见漏洞进行了编码,并在Etherscan的整个数据集上评估了Solar。我们的实验证明了基于摘要的符号评估的好处,并表明Solar在运行时间和精度方面优于最先进的智能合约分析仪,TEETHER, Mythril和Contract Fuzzer。
{"title":"Summary-Based Symbolic Evaluation for Smart Contracts","authors":"Yu Feng, E. Torlak, R. Bodík","doi":"10.1145/3324884.3416646","DOIUrl":"https://doi.org/10.1145/3324884.3416646","url":null,"abstract":"This paper presents Solar, a system for automatic synthesis of adversarial contracts that exploit vulnerabilities in a victim smart contract. To make the synthesis tractable, we introduce a query language as well as summary-based symbolic evaluation, which significantly reduces the number of instructions that our synthesizer needs to evaluate symbolically, without compromising the precision of the vulnerability query. We encoded common vulnerabilities of smart contracts and evaluated Solar on the entire data set from Etherscan. Our experiments demonstrate the benefits of summary-based symbolic evaluation and show that Solar outperforms state-of-the-art smart contracts analyzers, TEETHER, Mythril, and Contract Fuzzer, in terms of running time and precision.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129633567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Detecting and Explaining Self-Admitted Technical Debts with Attention-based Neural Networks 基于注意的神经网络检测和解释自我承认的技术债务
Xin Wang
Self-Admitted Technical Debt (SATD) is a sub-type of technical debt. It is introduced to represent such technical debts that are intentionally introduced by developers in the process of software development. While being able to gain short-term benefits, the introduction of SATDs often requires to be paid back later with a higher cost, e.g., introducing bugs to the software or increasing the complexity of the software. To cope with these issues, our community has proposed various machine learning-based approaches to detect SATDs. These approaches, however, are either not generic that usually require manual feature engineering efforts or do not provide promising means to explain the predicted outcomes. To that end, we propose to the community a novel approach, namely HATD (Hybrid Attention-based method for self-admitted Technical Debt detection), to detect and explain SATDs using attention-based neural networks. Through extensive experiments on 445,365 comments in 20 projects, we show that HATD is effective in detecting SATDs on both in-the-lab and in-the-wild datasets under both within-project and cross-project settings. HATD also outperforms the state-of-the-art approaches in detecting and explaining SATDs.
自我承认的技术债务(SATD)是技术债务的一个子类型。引入它是为了表示开发人员在软件开发过程中有意引入的技术债务。在能够获得短期利益的同时,引入satd往往需要在以后以更高的成本来偿还,例如,向软件引入错误或增加软件的复杂性。为了解决这些问题,我们的社区提出了各种基于机器学习的方法来检测satd。然而,这些方法要么不是通用的,通常需要人工特征工程的努力,要么不能提供有希望的方法来解释预测的结果。为此,我们向社区提出了一种新颖的方法,即HATD(基于注意力的混合方法,用于自我承认的技术债务检测),使用基于注意力的神经网络来检测和解释satd。通过对20个项目中445,365条评论的广泛实验,我们表明,在项目内和跨项目设置下,HATD在实验室和野外数据集上都能有效地检测satd。HATD在检测和解释satd方面也优于最先进的方法。
{"title":"Detecting and Explaining Self-Admitted Technical Debts with Attention-based Neural Networks","authors":"Xin Wang","doi":"10.1145/3324884.3416583","DOIUrl":"https://doi.org/10.1145/3324884.3416583","url":null,"abstract":"Self-Admitted Technical Debt (SATD) is a sub-type of technical debt. It is introduced to represent such technical debts that are intentionally introduced by developers in the process of software development. While being able to gain short-term benefits, the introduction of SATDs often requires to be paid back later with a higher cost, e.g., introducing bugs to the software or increasing the complexity of the software. To cope with these issues, our community has proposed various machine learning-based approaches to detect SATDs. These approaches, however, are either not generic that usually require manual feature engineering efforts or do not provide promising means to explain the predicted outcomes. To that end, we propose to the community a novel approach, namely HATD (Hybrid Attention-based method for self-admitted Technical Debt detection), to detect and explain SATDs using attention-based neural networks. Through extensive experiments on 445,365 comments in 20 projects, we show that HATD is effective in detecting SATDs on both in-the-lab and in-the-wild datasets under both within-project and cross-project settings. HATD also outperforms the state-of-the-art approaches in detecting and explaining SATDs.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128548173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Exploring the Architectural Impact of Possible Dependencies in Python Software 探索Python软件中可能的依赖对体系结构的影响
Wuxia Jin, Yuanfang Cai, R. Kazman, Gang Zhang, Q. Zheng, Ting Liu
Dependencies among software entities are the basis for many software analytic research and architecture analysis tools. Dynamically typed languages, such as Python, JavaScript and Ruby, tolerate the lack of explicit type references, making certain syntactic dependencies indiscernible in source code. We call these possible dependencies, in contrast with the explicit dependencies that are directly referenced in source code. Type inference techniques have been widely studied and applied, but existing architecture analytic research and tools have not taken possible dependencies into consideration. The fundamental question is, to what extent will these missing possible dependencies impact the architecture analysis? To answer this question, we conducted an empirical study with 105 Python projects, using type inference techniques to manifest possible dependencies. Our study revealed that the architectural impact of possible dependencies is substantial-higher than that of explicit dependencies: (1) file-level possible dependencies account for at least 27.93% of all file-level dependencies, and create different dependency structures than that of explicit dependencies only, with an average difference of 30.71%; (2) adding possible dependencies significantly improves the precision (0.52%~14.18%), recall(31.73%~39.12%), and F1 scores (22.13%~32.09%) of capturing co-change relations; (3) on average, a file involved in possible dependencies influences 28% more files and 42% more dependencies within architectural sub-spaces than a file involved in just explicit dependencies; (4) on average, a file involved in possible dependencies consumes 32% more maintenance effort. Consequently, maintainability scores reported by existing tools make a system written in these dynamic languages appear to be better modularized than it actually is. This evidence stronglysuggests that possible dependencies have a more significant impact than explicit dependencies on architecture quality, that architecture analysis and tools should assess and even emphasize the architectural impact of possible dependencies due to dynamic typing.
软件实体之间的依赖关系是许多软件分析研究和架构分析工具的基础。动态类型语言,如Python、JavaScript和Ruby,容忍缺乏显式类型引用,使某些语法依赖在源代码中无法识别。我们称这些为可能的依赖,与直接在源代码中引用的显式依赖形成对比。类型推断技术已经得到了广泛的研究和应用,但是现有的架构分析研究和工具并没有考虑到可能的依赖性。最基本的问题是,这些缺失的可能的依赖关系会在多大程度上影响架构分析?为了回答这个问题,我们对105个Python项目进行了实证研究,使用类型推断技术来显示可能的依赖关系。我们的研究表明,可能的依赖关系对架构的影响比显式依赖关系大得多:①文件级可能的依赖关系至少占所有文件级依赖关系的27.93%,并且与仅显式依赖关系创建的依赖结构不同,平均差异为30.71%;(2)增加可能依赖关系显著提高了共变关系捕获的准确率(0.52%~14.18%)、召回率(31.73%~39.12%)和F1得分(22.13%~32.09%);(3)平均而言,在架构子空间中,涉及可能依赖项的文件比仅涉及显式依赖项的文件多影响28%的文件和42%的依赖项;(4)平均而言,涉及到可能依赖项的文件要多消耗32%的维护工作。因此,由现有工具报告的可维护性分数使得用这些动态语言编写的系统看起来比实际上更好地模块化了。这一证据有力地表明,可能的依赖关系比显式依赖关系对体系结构质量的影响更大,体系结构分析和工具应该评估甚至强调由于动态类型导致的可能的依赖关系对体系结构的影响。
{"title":"Exploring the Architectural Impact of Possible Dependencies in Python Software","authors":"Wuxia Jin, Yuanfang Cai, R. Kazman, Gang Zhang, Q. Zheng, Ting Liu","doi":"10.1145/3324884.3416619","DOIUrl":"https://doi.org/10.1145/3324884.3416619","url":null,"abstract":"Dependencies among software entities are the basis for many software analytic research and architecture analysis tools. Dynamically typed languages, such as Python, JavaScript and Ruby, tolerate the lack of explicit type references, making certain syntactic dependencies indiscernible in source code. We call these possible dependencies, in contrast with the explicit dependencies that are directly referenced in source code. Type inference techniques have been widely studied and applied, but existing architecture analytic research and tools have not taken possible dependencies into consideration. The fundamental question is, to what extent will these missing possible dependencies impact the architecture analysis? To answer this question, we conducted an empirical study with 105 Python projects, using type inference techniques to manifest possible dependencies. Our study revealed that the architectural impact of possible dependencies is substantial-higher than that of explicit dependencies: (1) file-level possible dependencies account for at least 27.93% of all file-level dependencies, and create different dependency structures than that of explicit dependencies only, with an average difference of 30.71%; (2) adding possible dependencies significantly improves the precision (0.52%~14.18%), recall(31.73%~39.12%), and F1 scores (22.13%~32.09%) of capturing co-change relations; (3) on average, a file involved in possible dependencies influences 28% more files and 42% more dependencies within architectural sub-spaces than a file involved in just explicit dependencies; (4) on average, a file involved in possible dependencies consumes 32% more maintenance effort. Consequently, maintainability scores reported by existing tools make a system written in these dynamic languages appear to be better modularized than it actually is. This evidence stronglysuggests that possible dependencies have a more significant impact than explicit dependencies on architecture quality, that architecture analysis and tools should assess and even emphasize the architectural impact of possible dependencies due to dynamic typing.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126467851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The Symptom, Cause and Repair of Workaround 解决方案的症状、原因和修复
Daohan Song, Hao Zhong, Li Jia
In software development, issue tracker systems are widely used to manage bug reports. In such a system, a bug report can be filed, diagnosed, assigned, and fixed. In the standard process, a bug can be resolved as fixed, invalid, duplicated or won't fix. Although the above resolutions are well-defined and easy to understand, a bug report can end with a less known resolution, i.e., workaround. Compared with other resolutions, the definition of workarounds is more ambiguous. Besides the problem that is reported in a bug report, the resolution of a workaround raises more questions. Some questions are important for users, especially those programmers who build their projects upon others (e.g., libraries). Although some early studies have been conducted to analyze API workarounds, many research questions on workarounds are still open. For example, which bugs are resolved as workarounds? Why is a bug report resolved as workarounds? What are the repairs of workarounds? In this experience paper, we conduct the first empirical study to explore the above research questions. In particular, we analyzed 221 real workarounds that were collected from Apache projects. Our results lead to some interesting and useful answers to all the above questions. For example, we find that most bug reports are resolved as workarounds, because their problems reside in libraries (24.43%), settings (18.55%), and clients (10.41%). Among them, many bugs are difficult to be fixed fully and perfectly. As a late breaking result, we can only briefly introduce our study, but we present a detailed plan to extend it to a full paper.ACM Reference Format: Daohan Song, Hao Zhong, and Li Jia. 2020. The Symptom, Cause and Repair of Workaround. In 35th IEEE/ACM International Conference on Automated Software Engineering (ASE ‘20), September 21–25, 2020, Virtual Event, Australia. ACM, New York, NY, USA, 3 pages. https://doi.org/10.1145/3324884.3418910
在软件开发中,问题跟踪系统被广泛用于管理bug报告。在这样的系统中,bug报告可以归档、诊断、分配和修复。在标准流程中,错误可以被解决为已修复、无效、重复或无法修复。尽管上面的解决方案是定义良好且易于理解的,但bug报告可能以不太为人所知的解决方案结束,即变通方案。与其他解决方案相比,变通方法的定义更加模糊。除了在bug报告中报告的问题之外,解决方法还会引发更多的问题。有些问题对用户来说是很重要的,尤其是那些在别人的基础上构建项目的程序员(例如,库)。尽管已经进行了一些早期的研究来分析API变通方法,但关于变通方法的许多研究问题仍然是开放的。例如,哪些bug作为变通解决了?为什么bug报告被解决为变通方法?变通方法的修复是什么?在这篇经验论文中,我们进行了第一次实证研究来探讨上述研究问题。特别地,我们分析了从Apache项目中收集的221个实际解决方案。我们的研究结果为上述所有问题提供了一些有趣而有用的答案。例如,我们发现大多数错误报告都是作为变通方法解决的,因为它们的问题存在于库(24.43%)、设置(18.55%)和客户端(10.41%)中。其中,很多bug很难完全、完美的修复。作为一个后期的突破性成果,我们只能简要介绍我们的研究,但我们提出了一个详细的计划,将其扩展到一篇完整的论文。ACM参考文献格式:宋道涵,钟昊,李嘉。2020。解决方案的症状、原因和修复。第35届IEEE/ACM自动化软件工程国际会议(ASE ' 20), 2020年9月21-25日,虚拟事件,澳大利亚。ACM,纽约,美国,3页。https://doi.org/10.1145/3324884.3418910
{"title":"The Symptom, Cause and Repair of Workaround","authors":"Daohan Song, Hao Zhong, Li Jia","doi":"10.1145/3324884.3418910","DOIUrl":"https://doi.org/10.1145/3324884.3418910","url":null,"abstract":"In software development, issue tracker systems are widely used to manage bug reports. In such a system, a bug report can be filed, diagnosed, assigned, and fixed. In the standard process, a bug can be resolved as fixed, invalid, duplicated or won't fix. Although the above resolutions are well-defined and easy to understand, a bug report can end with a less known resolution, i.e., workaround. Compared with other resolutions, the definition of workarounds is more ambiguous. Besides the problem that is reported in a bug report, the resolution of a workaround raises more questions. Some questions are important for users, especially those programmers who build their projects upon others (e.g., libraries). Although some early studies have been conducted to analyze API workarounds, many research questions on workarounds are still open. For example, which bugs are resolved as workarounds? Why is a bug report resolved as workarounds? What are the repairs of workarounds? In this experience paper, we conduct the first empirical study to explore the above research questions. In particular, we analyzed 221 real workarounds that were collected from Apache projects. Our results lead to some interesting and useful answers to all the above questions. For example, we find that most bug reports are resolved as workarounds, because their problems reside in libraries (24.43%), settings (18.55%), and clients (10.41%). Among them, many bugs are difficult to be fixed fully and perfectly. As a late breaking result, we can only briefly introduce our study, but we present a detailed plan to extend it to a full paper.ACM Reference Format: Daohan Song, Hao Zhong, and Li Jia. 2020. The Symptom, Cause and Repair of Workaround. In 35th IEEE/ACM International Conference on Automated Software Engineering (ASE ‘20), September 21–25, 2020, Virtual Event, Australia. ACM, New York, NY, USA, 3 pages. https://doi.org/10.1145/3324884.3418910","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128365970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Predictive Analysis for Detecting Deadlock in MPI Programs MPI程序中死锁检测的预测分析
Yu Huang, B. Ogles, Eric Mercer
A common problem in MPI programs is deadlock: when two or more processes are blocked indefinitely due to a circular communication dependency. Automatically detecting deadlock is difficult due to its schedule-dependent nature. This paper presents a predictive analysis for single-path MPI programs that observes a single program execution and then determines whether any other feasible schedule of the program can lead to a deadlock. The analysis works by identifying problematic communication patterns in a dependency graph to form a set of deadlock candidates. The deadlock candidates are filtered by an abstract machine and ultimately tested for reachability by an SMT solver with an efficient encoding for deadlock. This approach quickly yields a set of high probability deadlock candidates useful for reasoning about complex codes and yields higher performance overall in many cases compared to other state-of-the-art analyses. The analysis is sound and complete for single-path MPI programs on a given input.
MPI程序中的一个常见问题是死锁:当两个或多个进程由于循环通信依赖而无限期阻塞时。自动检测死锁是困难的,因为它依赖于调度的性质。本文提出了一种单路径MPI程序的预测分析方法,它观察单个程序的执行情况,然后确定程序的任何其他可行的调度是否会导致死锁。分析通过在依赖关系图中识别有问题的通信模式来形成一组死锁候选者。死锁候选者由抽象机器过滤,并最终由具有有效死锁编码的SMT求解器测试可达性。这种方法可以快速生成一组高概率死锁候选者,这对于推理复杂代码非常有用,并且在许多情况下,与其他最先进的分析相比,可以产生更高的总体性能。对于给定输入的单路径MPI程序,分析是健全和完整的。
{"title":"A Predictive Analysis for Detecting Deadlock in MPI Programs","authors":"Yu Huang, B. Ogles, Eric Mercer","doi":"10.1145/3324884.3416588","DOIUrl":"https://doi.org/10.1145/3324884.3416588","url":null,"abstract":"A common problem in MPI programs is deadlock: when two or more processes are blocked indefinitely due to a circular communication dependency. Automatically detecting deadlock is difficult due to its schedule-dependent nature. This paper presents a predictive analysis for single-path MPI programs that observes a single program execution and then determines whether any other feasible schedule of the program can lead to a deadlock. The analysis works by identifying problematic communication patterns in a dependency graph to form a set of deadlock candidates. The deadlock candidates are filtered by an abstract machine and ultimately tested for reachability by an SMT solver with an efficient encoding for deadlock. This approach quickly yields a set of high probability deadlock candidates useful for reasoning about complex codes and yields higher performance overall in many cases compared to other state-of-the-art analyses. The analysis is sound and complete for single-path MPI programs on a given input.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126513406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance 训练深度学习软件系统的问题与机遇:方差分析
H. Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, Nachiappan Nagappan
Deep learning (DL) training algorithms utilize nondeterminism to improve models' accuracy and training efficiency. Hence, multiple identical training runs (e.g., identical training data, algorithm, and network) produce different models with different accuracies and training times. In addition to these algorithmic factors, DL libraries (e.g., TensorFlow and cuDNN) introduce additional variance (referred to as implementation-level variance) due to parallelism, optimization, and floating-point computation. This work is the first to study the variance of DL systems and the awareness of this variance among researchers and practitioners. Our experiments on three datasets with six popular networks show large overall accuracy differences among identical training runs. Even after excluding weak models, the accuracy difference is 10.8%. In addition, implementation-level factors alone cause the accuracy difference across identical training runs to be up to 2.9%, the per-class accuracy difference to be up to 52.4%, and the training time difference to be up to 145.3%. All core libraries (TensorFlow, CNTK, and Theano) and low-level libraries (e.g., cuDNN) exhibit implementation-level variance across all evaluated versions. Our researcher and practitioner survey shows that 83.8% of the 901 participants are unaware of or unsure about any implementation-level variance. In addition, our literature survey shows that only 19.5±3% of papers in recent top software engineering (SE), artificial intelligence (AI), and systems conferences use multiple identical training runs to quantify the variance of their DL approaches. This paper raises awareness of DL variance and directs SE researchers to challenging tasks such as creating deterministic DL implementations to facilitate debugging and improving the reproducibility of DL software and results.
深度学习(DL)训练算法利用不确定性来提高模型的准确性和训练效率。因此,多次相同的训练运行(例如,相同的训练数据、算法和网络)会产生具有不同精度和训练时间的不同模型。除了这些算法因素,深度学习库(例如,TensorFlow和cuDNN)由于并行性、优化和浮点计算而引入了额外的方差(称为实现级方差)。这项工作是第一次研究深度学习系统的差异以及研究人员和从业者对这种差异的认识。我们在六个流行网络的三个数据集上的实验显示,在相同的训练运行中,总体准确率存在很大差异。即使在排除弱模型后,准确率也相差10.8%。此外,仅实现层面的因素就会导致相同训练运行之间的准确率差异高达2.9%,每类准确率差异高达52.4%,训练时间差异高达145.3%。所有核心库(TensorFlow, CNTK和Theano)和底层库(例如cuDNN)在所有评估版本中都表现出实现级别的差异。我们的研究人员和从业者调查显示,901名参与者中有83.8%的人不知道或不确定任何实现级别的差异。此外,我们的文献调查显示,在最近的顶级软件工程(SE)、人工智能(AI)和系统会议上,只有19.5±3%的论文使用多次相同的训练运行来量化他们的深度学习方法的方差。本文提高了对深度学习差异的认识,并指导SE研究人员完成具有挑战性的任务,例如创建确定性的深度学习实现,以促进调试和提高深度学习软件和结果的可重复性。
{"title":"Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance","authors":"H. Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, Nachiappan Nagappan","doi":"10.1145/3324884.3416545","DOIUrl":"https://doi.org/10.1145/3324884.3416545","url":null,"abstract":"Deep learning (DL) training algorithms utilize nondeterminism to improve models' accuracy and training efficiency. Hence, multiple identical training runs (e.g., identical training data, algorithm, and network) produce different models with different accuracies and training times. In addition to these algorithmic factors, DL libraries (e.g., TensorFlow and cuDNN) introduce additional variance (referred to as implementation-level variance) due to parallelism, optimization, and floating-point computation. This work is the first to study the variance of DL systems and the awareness of this variance among researchers and practitioners. Our experiments on three datasets with six popular networks show large overall accuracy differences among identical training runs. Even after excluding weak models, the accuracy difference is 10.8%. In addition, implementation-level factors alone cause the accuracy difference across identical training runs to be up to 2.9%, the per-class accuracy difference to be up to 52.4%, and the training time difference to be up to 145.3%. All core libraries (TensorFlow, CNTK, and Theano) and low-level libraries (e.g., cuDNN) exhibit implementation-level variance across all evaluated versions. Our researcher and practitioner survey shows that 83.8% of the 901 participants are unaware of or unsure about any implementation-level variance. In addition, our literature survey shows that only 19.5±3% of papers in recent top software engineering (SE), artificial intelligence (AI), and systems conferences use multiple identical training runs to quantify the variance of their DL approaches. This paper raises awareness of DL variance and directs SE researchers to challenging tasks such as creating deterministic DL implementations to facilitate debugging and improving the reproducibility of DL software and results.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126556139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 92
On the Effectiveness of Unified Debugging: An Extensive Study on 16 Program Repair Systems 论统一调试的有效性——对16个程序修复系统的广泛研究
Samuel Benton, Xia Li, Yiling Lou, Lingming Zhang
Automated debugging techniques, including fault localization and program repair, have been studied for over a decade. However, the only existing connection between fault localization and program repair is that fault localization computes the potential buggy elements for program repair to patch. Recently, a pioneering work, ProFL, explored the idea of unified debugging to unify fault localization and program repair in the other direction for thefi rst time to boost both areas. More specifically, ProFL utilizes the patch execution results from one state-of-the-art repair system, PraPR, to help improve state-of-the-art fault localization. In this way, ProFL not only improves fault localization for manual repair, but also extends the application scope of automated repair to all possible bugs (not only the small ratio of bugs that can be automaticallyfi xed). However, ProFL only considers one APR system (i.e., PraPR), and it is not clear how other existing APR systems based on different designs contribute to unified debugging. In this work, we perform an extensive study of the unified-debugging approach on 16 state-of-the-art program repair systems for thefi rst time. Our experimental results on the widely studied Defects4J benchmark suite reveal various practical guidelines for unified debugging, such as (1) nearly all the studied 16 repair systems can positively contribute to unified debugging despite their varying repairing capabilities, (2) repair systems targeting multi-edit patches can bring extraneous noise into unified debugging, (3) repair systems with more executed/plausible patches tend to perform better for unified debugging, and (4) unified debugging effectiveness does not rely on the availability of correct patches in automated repair. Based on our results, we further propose an advanced unified debugging technique, UniDebug++, which can localize over 20% more bugs within Top-1 positions than state-of-the-art unified debugging technique, ProFL.
自动调试技术,包括故障定位和程序修复,已经研究了十多年。然而,故障定位和程序修复之间唯一存在的联系是故障定位计算程序修复要修补的潜在错误元素。最近,一项开创性的工作,ProFL,首次探索了统一调试的思想,将故障定位和程序修复在另一个方向上统一起来,从而促进了这两个领域的发展。更具体地说,ProFL利用来自最先进的修复系统PraPR的补丁执行结果来帮助改进最先进的故障定位。这样,ProFL不仅提高了人工修复的故障定位,而且将自动修复的应用范围扩展到所有可能的错误(而不仅仅是可以自动修复的一小部分错误)。但是,ProFL只考虑一个APR系统(即PraPR),并且不清楚基于不同设计的其他现有APR系统如何对统一调试做出贡献。在这项工作中,我们首次对16个最先进的程序维修系统进行了统一调试方法的广泛研究。我们在广泛研究的缺陷4j基准套件上的实验结果揭示了统一调试的各种实用指南,例如:(1)几乎所有研究的16个修复系统都可以对统一调试做出积极贡献,尽管它们的修复能力不同;(2)针对多编辑补丁的修复系统可能会给统一调试带来无关的噪音;(3)具有更多执行/可信补丁的修复系统往往在统一调试中表现更好。(4)统一调试的有效性不依赖于自动修复中正确补丁的可用性。基于我们的结果,我们进一步提出了一种先进的统一调试技术UniDebug++,它可以比最先进的统一调试技术ProFL多定位20%以上的Top-1位置的bug。
{"title":"On the Effectiveness of Unified Debugging: An Extensive Study on 16 Program Repair Systems","authors":"Samuel Benton, Xia Li, Yiling Lou, Lingming Zhang","doi":"10.1145/3324884.3416566","DOIUrl":"https://doi.org/10.1145/3324884.3416566","url":null,"abstract":"Automated debugging techniques, including fault localization and program repair, have been studied for over a decade. However, the only existing connection between fault localization and program repair is that fault localization computes the potential buggy elements for program repair to patch. Recently, a pioneering work, ProFL, explored the idea of unified debugging to unify fault localization and program repair in the other direction for thefi rst time to boost both areas. More specifically, ProFL utilizes the patch execution results from one state-of-the-art repair system, PraPR, to help improve state-of-the-art fault localization. In this way, ProFL not only improves fault localization for manual repair, but also extends the application scope of automated repair to all possible bugs (not only the small ratio of bugs that can be automaticallyfi xed). However, ProFL only considers one APR system (i.e., PraPR), and it is not clear how other existing APR systems based on different designs contribute to unified debugging. In this work, we perform an extensive study of the unified-debugging approach on 16 state-of-the-art program repair systems for thefi rst time. Our experimental results on the widely studied Defects4J benchmark suite reveal various practical guidelines for unified debugging, such as (1) nearly all the studied 16 repair systems can positively contribute to unified debugging despite their varying repairing capabilities, (2) repair systems targeting multi-edit patches can bring extraneous noise into unified debugging, (3) repair systems with more executed/plausible patches tend to perform better for unified debugging, and (4) unified debugging effectiveness does not rely on the availability of correct patches in automated repair. Based on our results, we further propose an advanced unified debugging technique, UniDebug++, which can localize over 20% more bugs within Top-1 positions than state-of-the-art unified debugging technique, ProFL.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126418097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Towards Robust Production Machine Learning Systems: Managing Dataset Shift 迈向稳健的生产机器学习系统:管理数据集转移
Hala Abdelkader
The advances in machine learning (ML) have stimulated the integration of their capabilities into software systems. However, there is a tangible gap between software engineering and machine learning practices, that is delaying the progress of intelligent services development. Software organisations are devoting effort to adjust the software engineering processes and practices to facilitate the integration of machine learning models. Machine learning researchers as well are focusing on improving the interpretability of machine learning models to support overall system robustness. Our research focuses on bridging this gap through a methodology that evaluates the robustness of machine learning-enabled software engineering systems. In particular, this methodology will automate the evaluation of the robustness properties of software systems against dataset shift problems in ML. It will also feature a notification mechanism that facilitates the debugging of ML components.
机器学习(ML)的进步刺激了将它们的功能集成到软件系统中。然而,软件工程和机器学习实践之间存在着明显的差距,这阻碍了智能服务发展的进程。软件组织正在努力调整软件工程过程和实践,以促进机器学习模型的集成。机器学习研究人员也专注于提高机器学习模型的可解释性,以支持整个系统的鲁棒性。我们的研究重点是通过评估支持机器学习的软件工程系统的鲁棒性的方法来弥合这一差距。特别是,这种方法将自动评估软件系统对机器学习中数据集移位问题的鲁棒性。它还将具有一个通知机制,有助于机器学习组件的调试。
{"title":"Towards Robust Production Machine Learning Systems: Managing Dataset Shift","authors":"Hala Abdelkader","doi":"10.1145/3324884.3415281","DOIUrl":"https://doi.org/10.1145/3324884.3415281","url":null,"abstract":"The advances in machine learning (ML) have stimulated the integration of their capabilities into software systems. However, there is a tangible gap between software engineering and machine learning practices, that is delaying the progress of intelligent services development. Software organisations are devoting effort to adjust the software engineering processes and practices to facilitate the integration of machine learning models. Machine learning researchers as well are focusing on improving the interpretability of machine learning models to support overall system robustness. Our research focuses on bridging this gap through a methodology that evaluates the robustness of machine learning-enabled software engineering systems. In particular, this methodology will automate the evaluation of the robustness properties of software systems against dataset shift problems in ML. It will also feature a notification mechanism that facilitates the debugging of ML components.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115899262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Framework for Automated Test Mocking of Mobile Apps 移动应用程序的自动化测试模拟框架
M. Fazzini, Alessandra Gorla, A. Orso
Mobile apps interact with their environment extensively, and these interactions can complicate testing activities because test cases may need a complete environment to be executed. Interactions with the environment can also introduce test flakiness, for instance when the environment behaves in non-deterministic ways. For these reasons, it is common to create test mocks that can eliminate the need for (part of) the environment to be present during testing. Manual mock creation, however, can be extremely time consuming and error-prone. Moreover, the generated mocks can typically only be used in the context of the specific tests for which they were created. To address these issues, we propose MOKA, a general framework for collecting and generating reusable test mocks in an automated way. MOKA leverages the ability to observe a large number of interactions between an application and its environment and uses an iterative approach to generate two possible, alternative types of mocks with different reusability characteristics: advanced mocks generated through program synthesis (ideally) and basic record-replay-based mocks (as a fallback solution). In this paper, we describe the new ideas behind MOKA, its main characteristics, a preliminary empirical study, and a set of possible applications that could benefit from our framework.
移动应用程序与环境进行广泛的交互,这些交互会使测试活动复杂化,因为测试用例可能需要一个完整的环境来执行。与环境的交互也会引入测试缺陷,例如,当环境以不确定的方式运行时。由于这些原因,通常会创建测试模拟,以消除在测试期间出现(部分)环境的需要。但是,手动创建模拟非常耗时且容易出错。此外,生成的模拟通常只能在为其创建的特定测试的上下文中使用。为了解决这些问题,我们提出了MOKA,一个以自动化方式收集和生成可重用测试模型的通用框架。MOKA利用观察应用程序与其环境之间的大量交互的能力,并使用迭代方法生成两种可能的、具有不同可重用性特征的可选模拟类型:通过程序合成生成的高级模拟(理想情况下)和基于记录重播的基本模拟(作为后备解决方案)。在本文中,我们描述了MOKA背后的新思想,它的主要特征,初步的实证研究,以及一组可能从我们的框架中受益的应用程序。
{"title":"A Framework for Automated Test Mocking of Mobile Apps","authors":"M. Fazzini, Alessandra Gorla, A. Orso","doi":"10.1145/3324884.3418927","DOIUrl":"https://doi.org/10.1145/3324884.3418927","url":null,"abstract":"Mobile apps interact with their environment extensively, and these interactions can complicate testing activities because test cases may need a complete environment to be executed. Interactions with the environment can also introduce test flakiness, for instance when the environment behaves in non-deterministic ways. For these reasons, it is common to create test mocks that can eliminate the need for (part of) the environment to be present during testing. Manual mock creation, however, can be extremely time consuming and error-prone. Moreover, the generated mocks can typically only be used in the context of the specific tests for which they were created. To address these issues, we propose MOKA, a general framework for collecting and generating reusable test mocks in an automated way. MOKA leverages the ability to observe a large number of interactions between an application and its environment and uses an iterative approach to generate two possible, alternative types of mocks with different reusability characteristics: advanced mocks generated through program synthesis (ideally) and basic record-replay-based mocks (as a fallback solution). In this paper, we describe the new ideas behind MOKA, its main characteristics, a preliminary empirical study, and a set of possible applications that could benefit from our framework.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121623730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1