2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)最新文献_第4页

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389278

Chunyang Ling, Zeqi Lin, Yanzhen Zou, Bing Xie

Searching code in a large-scale codebase using natural language queries is a common practice during software development. Deep learning-based code search methods demonstrate superior performance if models are trained with large amount of text-code pairs. However, few deep code search models can be easily transferred from one codebase to another. It can be very costly to prepare training data for a new codebase and re-train an appropriate deep learning model. In this paper, we propose AdaCS, an adaptive deep code search method that can be trained once and transferred to new codebases. AdaCS decomposes the learning process into embedding domain-specific words and matching general syntactic patterns. Firstly, an unsupervised word embedding technique is used to construct a matching matrix to represent the lexical similarities. Then, a recurrent neural network is used to capture latent syntactic patterns from these matching matrices in a supervised way. As the supervised task learns general syntactic patterns that exist across domains, AdaCS is transferable to new codebases. Experimental results show that: when extended to new software projects never seen in the training data, AdaCS is more robust and significantly outperforms state-of-the-art deep code search methods.

在软件开发期间，使用自然语言查询在大规模代码库中搜索代码是一种常见的做法。基于深度学习的代码搜索方法在使用大量文本-代码对训练模型时表现出优异的性能。然而，很少有深度代码搜索模型可以很容易地从一个代码库转移到另一个代码库。为新代码库准备训练数据并重新训练适当的深度学习模型可能非常昂贵。在本文中，我们提出了一种自适应深度代码搜索方法AdaCS，它可以训练一次并转移到新的代码库中。AdaCS将学习过程分解为嵌入特定领域的单词和匹配一般语法模式。首先，采用无监督词嵌入技术构建匹配矩阵来表示词汇相似度;然后，使用递归神经网络以监督的方式从这些匹配矩阵中捕获潜在的句法模式。当监督任务学习跨域存在的通用语法模式时，adac可以转移到新的代码库中。实验结果表明:当扩展到训练数据中从未见过的新软件项目时，AdaCS更具鲁棒性，并且显著优于最先进的深度代码搜索方法。

{"title":"Adaptive Deep Code Search","authors":"Chunyang Ling, Zeqi Lin, Yanzhen Zou, Bing Xie","doi":"10.1145/3387904.3389278","DOIUrl":"https://doi.org/10.1145/3387904.3389278","url":null,"abstract":"Searching code in a large-scale codebase using natural language queries is a common practice during software development. Deep learning-based code search methods demonstrate superior performance if models are trained with large amount of text-code pairs. However, few deep code search models can be easily transferred from one codebase to another. It can be very costly to prepare training data for a new codebase and re-train an appropriate deep learning model. In this paper, we propose AdaCS, an adaptive deep code search method that can be trained once and transferred to new codebases. AdaCS decomposes the learning process into embedding domain-specific words and matching general syntactic patterns. Firstly, an unsupervised word embedding technique is used to construct a matching matrix to represent the lexical similarities. Then, a recurrent neural network is used to capture latent syntactic patterns from these matching matrices in a supervised way. As the supervised task learns general syntactic patterns that exist across domains, AdaCS is transferable to new codebases. Experimental results show that: when extended to new software projects never seen in the training data, AdaCS is more robust and significantly outperforms state-of-the-art deep code search methods.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122563888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

On Combining IR Methods to Improve Bug localization 结合红外方法改进Bug定位

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389280

Saket Khatiwada, Miroslav Tushev, Anas Mahmoud

Information Retrieval (IR) methods have been recently employed to provide automatic support for bug localization tasks. However, for an IR-based bug localization tool to be useful, it has to achieve adequate retrieval accuracy. Lower precision and recall can leave developers with large amounts of incorrect information to wade through. To address this issue, in this paper, we systematically investigate the impact of combining various IR methods on the retrieval accuracy of bug localization engines. The main assumption is that different IR methods, targeting different dimensions of similarity between artifacts, can be used to enhance the confidence in each others' results. Five benchmark systems from different application domains are used to conduct our analysis. The results show that a) near-optimal global configurations can be determined for different combinations of IR methods, b) optimized IR-hybrids can significantly outperform individual methods as well as other unoptimized methods, and c) hybrid methods achieve their best performance when utilizing information-theoretic IR methods. Our findings can be used to enhance the practicality of IR-based bug localization tools and minimize the cognitive overload developers often face when locating bugs.

信息检索(IR)方法最近被用来为bug定位任务提供自动支持。然而，要使基于ir的错误定位工具有用，它必须达到足够的检索准确性。较低的准确性和召回率会让开发人员在大量不正确的信息中挣扎。为了解决这一问题，本文系统地研究了不同IR方法组合对bug定位引擎检索精度的影响。主要的假设是，不同的红外方法，针对工件之间不同的相似性维度，可以用来增强对彼此结果的置信度。我们使用了来自不同应用程序领域的五个基准系统来进行分析。结果表明:a)不同红外方法组合可以确定近似最优的全局配置;b)优化后的红外混合方法可以显著优于单个方法和其他未优化的方法;c)混合方法在使用信息论红外方法时达到最佳性能。我们的发现可以用来增强基于ir的bug定位工具的实用性，并最大限度地减少开发人员在定位bug时经常面临的认知过载。

{"title":"On Combining IR Methods to Improve Bug localization","authors":"Saket Khatiwada, Miroslav Tushev, Anas Mahmoud","doi":"10.1145/3387904.3389280","DOIUrl":"https://doi.org/10.1145/3387904.3389280","url":null,"abstract":"Information Retrieval (IR) methods have been recently employed to provide automatic support for bug localization tasks. However, for an IR-based bug localization tool to be useful, it has to achieve adequate retrieval accuracy. Lower precision and recall can leave developers with large amounts of incorrect information to wade through. To address this issue, in this paper, we systematically investigate the impact of combining various IR methods on the retrieval accuracy of bug localization engines. The main assumption is that different IR methods, targeting different dimensions of similarity between artifacts, can be used to enhance the confidence in each others' results. Five benchmark systems from different application domains are used to conduct our analysis. The results show that a) near-optimal global configurations can be determined for different combinations of IR methods, b) optimized IR-hybrids can significantly outperform individual methods as well as other unoptimized methods, and c) hybrid methods achieve their best performance when utilizing information-theoretic IR methods. Our findings can be used to enhance the practicality of IR-based bug localization tools and minimize the cognitive overload developers often face when locating bugs.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131771058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Improving Code Search with Co-Attentive Representation Learning 用共同注意表示学习改进代码搜索

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389269

Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, Yan Lei

Searching and reusing existing code from a large-scale codebase, e.g, GitHub, can help developers complete a programming task efficiently. Recently, Gu et al. proposed a deep learning-based model (i.e., DeepCS), which significantly outperformed prior models. The DeepCS embedded codebase and natural language queries into vectors by two LSTM (long and short-term memory) models separately, and returned developers the code with higher similarity to a code search query. However, such embedding method learned two isolated representations for code and query but ignored their internal semantic correlations. As a result, the learned isolated representations of code and query may limit the effectiveness of code search. To address the aforementioned issue, we propose a co-attentive representation learning model, i.e., Co-Attentive Representation Learning Code Search-CNN (CARLCS-CNN). CARLCS-CNN learns interdependent representations for the embedded code and query with a co-attention mechanism. Generally, such mechanism learns a correlation matrix between embedded code and query, and coattends their semantic relationship via row/column-wise max-pooling. In this way, the semantic correlation between code and query can directly affect their individual representations. We evaluate the effectiveness of CARLCS-CNN on Gu et al.'s dataset with 10k queries. Experimental results show that the proposed CARLCS-CNN model significantly outperforms DeepCS by 26.72% in terms of MRR (mean reciprocal rank). Additionally, CARLCS-CNN is five times faster than DeepCS in model training and four times in testing.

从大型代码库(如GitHub)中搜索和重用现有代码，可以帮助开发人员高效地完成编程任务。最近，Gu等人提出了一种基于深度学习的模型(即DeepCS)，该模型显著优于先前的模型。DeepCS通过两个LSTM(长短期记忆)模型分别将代码库和自然语言查询嵌入到向量中，并向开发人员返回与代码搜索查询相似度较高的代码。然而，这种嵌入方法学习了代码和查询的两种孤立的表示，而忽略了它们内部的语义相关性。因此，学习到的代码和查询的孤立表示可能会限制代码搜索的有效性。为了解决上述问题，我们提出了一种共注意表示学习模型，即共注意表示学习代码搜索- cnn (CARLCS-CNN)。CARLCS-CNN通过共同关注机制学习嵌入式代码和查询的相互依存表示。通常，这种机制学习嵌入代码和查询之间的关联矩阵，并通过逐行/逐列的最大池来协调它们的语义关系。通过这种方式，代码和查询之间的语义相关性可以直接影响它们各自的表示。我们对Gu等人的数据集进行了10k次查询，评估了carcs - cnn的有效性。实验结果表明，所提出的carcs - cnn模型在MRR(平均倒数秩)方面显著优于DeepCS 26.72%。此外，carcs - cnn在模型训练方面比DeepCS快5倍，在测试方面比DeepCS快4倍。

{"title":"Improving Code Search with Co-Attentive Representation Learning","authors":"Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, Yan Lei","doi":"10.1145/3387904.3389269","DOIUrl":"https://doi.org/10.1145/3387904.3389269","url":null,"abstract":"Searching and reusing existing code from a large-scale codebase, e.g, GitHub, can help developers complete a programming task efficiently. Recently, Gu et al. proposed a deep learning-based model (i.e., DeepCS), which significantly outperformed prior models. The DeepCS embedded codebase and natural language queries into vectors by two LSTM (long and short-term memory) models separately, and returned developers the code with higher similarity to a code search query. However, such embedding method learned two isolated representations for code and query but ignored their internal semantic correlations. As a result, the learned isolated representations of code and query may limit the effectiveness of code search. To address the aforementioned issue, we propose a co-attentive representation learning model, i.e., Co-Attentive Representation Learning Code Search-CNN (CARLCS-CNN). CARLCS-CNN learns interdependent representations for the embedded code and query with a co-attention mechanism. Generally, such mechanism learns a correlation matrix between embedded code and query, and coattends their semantic relationship via row/column-wise max-pooling. In this way, the semantic correlation between code and query can directly affect their individual representations. We evaluate the effectiveness of CARLCS-CNN on Gu et al.'s dataset with 10k queries. Experimental results show that the proposed CARLCS-CNN model significantly outperforms DeepCS by 26.72% in terms of MRR (mean reciprocal rank). Additionally, CARLCS-CNN is five times faster than DeepCS in model training and four times in testing.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131807039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 74

The Secret Life of Commented-Out Source Code 带注释的源代码的秘密生活

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389259

Tri Minh Triet Pham, Jinqiu Yang

Source code commenting is a common practice to improve code comprehension in software development. While comments often consist of descriptive natural language, surprisingly, there exists a non-trivial portion of comments that are actually code statements, i.e., commented-out code (CO code), even in well-maintained software systems. Commented-out code practice is rarely studied and often excluded in prior studies on comments due to its irrelevance to natural language. When being openly discussed, CO practice is generally considered a bad practice. However, there is no prior work to assess the nature (prevalence, evolution, motivation, and necessity of utilization) of CO code practice. In this paper, we perform the first study to understand CO code practice. Inspired by prior works in comment analysis, we develop automated solutions to identify CO code and track its evolution in development history. Through analyzing six open-source projects of different sizes and from diverse domains, we find that CO code practice is non-trivial in software development, especially in the early phase of development history, e.g., up to 20% of the commits involve CO code practice. We observe common evolution patterns of CO code and find that developers may uncomment and comment code more frequently than expected, e.g., 10% of the CO code practices have been uncommented at least once. Through a manual analysis, we identify the common reasons that developers adopt CO code practices and reveal maintenance challenges associated with CO code practices.

在软件开发中，源代码注释是提高代码理解能力的一种常见做法。虽然注释通常由描述性的自然语言组成，但令人惊讶的是，有一部分注释实际上是代码语句，即注释掉的代码(CO代码)，即使在维护良好的软件系统中也是如此。注释掉的代码实践很少被研究，并且由于它与自然语言无关，通常被排除在先前的注释研究中。当被公开讨论时，CO实践通常被认为是一种不好的实践。然而，没有先前的工作来评估CO代码实践的性质(流行，演变，动机和使用的必要性)。在本文中，我们进行了第一次研究，以了解CO规范的实践。受先前评论分析工作的启发，我们开发了自动化的解决方案来识别CO代码并跟踪其在开发历史中的演变。通过分析六个不同规模和不同领域的开源项目，我们发现CO代码实践在软件开发中是非常重要的，特别是在开发历史的早期阶段，例如，高达20%的提交涉及CO代码实践。我们观察到CO代码的共同演化模式，并发现开发人员可能比预期更频繁地取消注释和注释代码，例如，10%的CO代码实践至少取消了一次注释。通过手工分析，我们确定了开发人员采用CO代码实践的常见原因，并揭示了与CO代码实践相关的维护挑战。

{"title":"The Secret Life of Commented-Out Source Code","authors":"Tri Minh Triet Pham, Jinqiu Yang","doi":"10.1145/3387904.3389259","DOIUrl":"https://doi.org/10.1145/3387904.3389259","url":null,"abstract":"Source code commenting is a common practice to improve code comprehension in software development. While comments often consist of descriptive natural language, surprisingly, there exists a non-trivial portion of comments that are actually code statements, i.e., commented-out code (CO code), even in well-maintained software systems. Commented-out code practice is rarely studied and often excluded in prior studies on comments due to its irrelevance to natural language. When being openly discussed, CO practice is generally considered a bad practice. However, there is no prior work to assess the nature (prevalence, evolution, motivation, and necessity of utilization) of CO code practice. In this paper, we perform the first study to understand CO code practice. Inspired by prior works in comment analysis, we develop automated solutions to identify CO code and track its evolution in development history. Through analyzing six open-source projects of different sizes and from diverse domains, we find that CO code practice is non-trivial in software development, especially in the early phase of development history, e.g., up to 20% of the commits involve CO code practice. We observe common evolution patterns of CO code and find that developers may uncomment and comment code more frequently than expected, e.g., 10% of the CO code practices have been uncommented at least once. Through a manual analysis, we identify the common reasons that developers adopt CO code practices and reveal maintenance challenges associated with CO code practices.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127641029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Improving the Accuracy of Spectrum-based Fault Localization for Automated Program Repair 提高基于频谱的自动程序修复故障定位的准确性

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389290

Tetsushi Kuma, Yoshiki Higo, S. Matsumoto, S. Kusumoto

The sufficiency of test cases is essential for spectrum-based fault localization (in short, SBFL). If a given set of test cases is not sufficient, SBFL does not work. In such a case, we can improve the reliability of SBFL by adding new test cases. However, adding many test cases without considering their properties is not appropriate in the context of automated program repair (in short, APR). For example, in the case of GenProg, which is the most famous APR tool, all the test cases related to the bug module are executed for each of the mutated programs. Execution results of test cases are used for checking whether they pass all the test cases and inferring faulty statements for a given bug. Thus, in the context of APR, it is important to add necessary minimum test cases to improve the accuracy of SBFL. In this paper, we propose three strategies for selecting some test cases from a large number of automatically-generated test cases. We conducted a small experiment on bug dataset Defect4J and confirmed that the accuracy of SBFL was improved for 56.3% of target bugs while the accuracy was decreased for 17.3% in the case of the best strategy. We also confirmed that the increase of the execution time was suppressed to 1.5 seconds at the median.

测试用例的充分性对于基于频谱的故障定位(简称SBFL)至关重要。如果一组给定的测试用例是不够的，那么SBFL就不能工作。在这种情况下，我们可以通过添加新的测试用例来提高sffl的可靠性。然而，在自动程序修复(简而言之，APR)的上下文中，添加许多不考虑其属性的测试用例是不合适的。例如，在最著名的APR工具GenProg中，所有与bug模块相关的测试用例都会针对每个变异的程序执行。测试用例的执行结果用于检查它们是否通过了所有的测试用例，并推断给定bug的错误语句。因此，在APR上下文中，添加必要的最小测试用例以提高SBFL的准确性是很重要的。在本文中，我们提出了从大量自动生成的测试用例中选择一些测试用例的三种策略。我们在bug数据集Defect4J上进行了一个小实验，证实在最佳策略的情况下，SBFL的准确率提高了56.3%，而准确率下降了17.3%。我们还确认，执行时间的增加在中值处被抑制到1.5秒。

{"title":"Improving the Accuracy of Spectrum-based Fault Localization for Automated Program Repair","authors":"Tetsushi Kuma, Yoshiki Higo, S. Matsumoto, S. Kusumoto","doi":"10.1145/3387904.3389290","DOIUrl":"https://doi.org/10.1145/3387904.3389290","url":null,"abstract":"The sufficiency of test cases is essential for spectrum-based fault localization (in short, SBFL). If a given set of test cases is not sufficient, SBFL does not work. In such a case, we can improve the reliability of SBFL by adding new test cases. However, adding many test cases without considering their properties is not appropriate in the context of automated program repair (in short, APR). For example, in the case of GenProg, which is the most famous APR tool, all the test cases related to the bug module are executed for each of the mutated programs. Execution results of test cases are used for checking whether they pass all the test cases and inferring faulty statements for a given bug. Thus, in the context of APR, it is important to add necessary minimum test cases to improve the accuracy of SBFL. In this paper, we propose three strategies for selecting some test cases from a large number of automatically-generated test cases. We conducted a small experiment on bug dataset Defect4J and confirmed that the accuracy of SBFL was improved for 56.3% of target bugs while the accuracy was decreased for 17.3% in the case of the best strategy. We also confirmed that the increase of the execution time was suppressed to 1.5 seconds at the median.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"306 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132564430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

How are Deep Learning Models Similar? An Empirical Study on Clone Analysis of Deep Learning Software 深度学习模型有何相似之处?深度学习软件克隆分析的实证研究

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389254

Xiongfei Wu, Liangyu Qin, Bing Yu, Xiaofei Xie, Lei Ma, Yinxing Xue, Yang Liu, Jianjun Zhao

Deep learning (DL) has been successfully applied to many cuttingedge applications, e.g., image processing, speech recognition, and natural language processing. As more and more DL software is made open-sourced, publicly available, and organized in model repositories and stores (MODEL Zoo, Modeldepot), there comes a need to understand the relationships of these DL models regarding their maintenance and evolution tasks. Although clone analysis has been extensively studied for traditional software, up to the present, clone analysis has not been investigated for DL software. Since DL software adopts the data-driven development paradigm, it is still not clear whether and to what extent the clone analysis techniques of traditional software could be adapted to DL software. In this paper, we initiate the first step on the clone analysis of DL software at three different levels, i.e., source code level, model structural level, and input/output (I/O)-semantic level, which would be a key in DL software management, maintenance and evolution. We intend to investigate the similarity between these DL models from clone analysis perspective. Several tools and metrics are selected to conduct clone analysis of DL software at three different levels. Our study on two popular datasets (i.e., MNIST and CIFAR-10) and eight DL models of five architectural families (i.e., LeNet, ResNet, DenseNet, AlexNet, and VGG) shows that: 1). the three levels of similarity analysis are generally adequate to find clones between DL models ranging from structural to semantic; 2). different measures for clone analysis used at each level yield similar results; 3) clone analysis of one single level may not render a complete picture of the similarity of DL models. Our findings open up several research opportunities worth further exploration towards better understanding and more effective clone analysis of DL software.

深度学习(DL)已经成功地应用于许多前沿应用，例如图像处理、语音识别和自然语言处理。随着越来越多的深度学习软件成为开源的、公开可用的，并组织在模型存储库和存储库中(model Zoo、Modeldepot)，就需要了解这些深度学习模型之间的关系，以及它们的维护和进化任务。虽然克隆分析在传统软件中已经得到了广泛的研究，但到目前为止，还没有对DL软件进行克隆分析的研究。由于深度学习软件采用的是数据驱动的开发范式，传统软件的克隆分析技术能否适用于深度学习软件，以及在多大程度上适用于深度学习软件，目前还不清楚。本文首先从源代码层、模型结构层和输入/输出(I/O)语义层三个层面对深度学习软件进行克隆分析，这将是深度学习软件管理、维护和发展的关键。我们打算从克隆分析的角度来研究这些深度学习模型之间的相似性。选择了几个工具和指标，在三个不同的层次上对DL软件进行克隆分析。我们对两个流行的数据集(即MNIST和ci远-10)和五个体系结构家族的八个深度学习模型(即LeNet, ResNet, DenseNet, AlexNet和VGG)的研究表明:1)三个级别的相似性分析通常足以找到从结构到语义的深度学习模型之间的克隆;2).在每个水平上使用不同的无性系分析方法，得到相似的结果;3)单个水平的克隆分析可能无法全面反映DL模型的相似性。我们的发现开辟了几个值得进一步探索的研究机会，以更好地理解和更有效地克隆分析DL软件。

{"title":"How are Deep Learning Models Similar? An Empirical Study on Clone Analysis of Deep Learning Software","authors":"Xiongfei Wu, Liangyu Qin, Bing Yu, Xiaofei Xie, Lei Ma, Yinxing Xue, Yang Liu, Jianjun Zhao","doi":"10.1145/3387904.3389254","DOIUrl":"https://doi.org/10.1145/3387904.3389254","url":null,"abstract":"Deep learning (DL) has been successfully applied to many cuttingedge applications, e.g., image processing, speech recognition, and natural language processing. As more and more DL software is made open-sourced, publicly available, and organized in model repositories and stores (MODEL Zoo, Modeldepot), there comes a need to understand the relationships of these DL models regarding their maintenance and evolution tasks. Although clone analysis has been extensively studied for traditional software, up to the present, clone analysis has not been investigated for DL software. Since DL software adopts the data-driven development paradigm, it is still not clear whether and to what extent the clone analysis techniques of traditional software could be adapted to DL software. In this paper, we initiate the first step on the clone analysis of DL software at three different levels, i.e., source code level, model structural level, and input/output (I/O)-semantic level, which would be a key in DL software management, maintenance and evolution. We intend to investigate the similarity between these DL models from clone analysis perspective. Several tools and metrics are selected to conduct clone analysis of DL software at three different levels. Our study on two popular datasets (i.e., MNIST and CIFAR-10) and eight DL models of five architectural families (i.e., LeNet, ResNet, DenseNet, AlexNet, and VGG) shows that: 1). the three levels of similarity analysis are generally adequate to find clones between DL models ranging from structural to semantic; 2). different measures for clone analysis used at each level yield similar results; 3) clone analysis of one single level may not render a complete picture of the similarity of DL models. Our findings open up several research opportunities worth further exploration towards better understanding and more effective clone analysis of DL software.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Deep-Diving into Documentation to Develop Improved Java-to-Swift API Mapping 深入文档开发改进的java到swift API映射

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389282

Zejun Zhang, Minxue Pan, Tian Zhang, Xinyu Zhou, Xuandong Li

Application program interface (API) mapping is the key to the success of code migration. Leveraging API documentation to map APIs has been explored by previous studies, and recently, code-based learning approaches have become the mainstream approach and shown better results. However, learning approaches often require a large amount of training data (e.g., projects implemented using multiple languages or API mapping datasets), which are not widely available. In contrast, API documentation is usually available, but we have observed that much information in API documentation has been underexploited. Therefore, we develop a deep-dive approach to extensively explore API documentation to create improved API mapping methods. Our documentation exploration approach involves analyzing the functional description of APIs, and also considers the parameters and return values. The results of this analysis can be used to generate not only one-to-one API mapping, but also compatible API sequences, thereby enabling one-to-many API mapping. In addition, parameter-mapping relationships, which have often been ignored in previous approaches, can be produced. We apply this approach to map APIs from Java to Swift, and the experimental results indicate that our deep-dive analysis of API documentation leads to API mapping results that are superior to those generated by existing approaches.

应用程序接口(API)映射是代码迁移成功的关键。利用API文档来映射API已经被以前的研究所探索，最近，基于代码的学习方法已经成为主流方法，并显示出更好的结果。然而，学习方法通常需要大量的训练数据(例如，使用多种语言或API映射数据集实现的项目)，这些数据并不广泛可用。相比之下，API文档通常是可用的，但我们观察到API文档中的许多信息尚未得到充分利用。因此，我们开发了一种深入的方法来广泛地探索API文档，以创建改进的API映射方法。我们的文档探索方法包括分析api的功能描述，还考虑参数和返回值。该分析的结果不仅可以用于生成一对一的API映射，还可以用于生成兼容的API序列，从而实现一对多的API映射。此外，可以产生在以前的方法中经常被忽略的参数映射关系。我们将这种方法应用于将API从Java映射到Swift，实验结果表明，我们对API文档的深入分析导致API映射结果优于现有方法生成的结果。

{"title":"Deep-Diving into Documentation to Develop Improved Java-to-Swift API Mapping","authors":"Zejun Zhang, Minxue Pan, Tian Zhang, Xinyu Zhou, Xuandong Li","doi":"10.1145/3387904.3389282","DOIUrl":"https://doi.org/10.1145/3387904.3389282","url":null,"abstract":"Application program interface (API) mapping is the key to the success of code migration. Leveraging API documentation to map APIs has been explored by previous studies, and recently, code-based learning approaches have become the mainstream approach and shown better results. However, learning approaches often require a large amount of training data (e.g., projects implemented using multiple languages or API mapping datasets), which are not widely available. In contrast, API documentation is usually available, but we have observed that much information in API documentation has been underexploited. Therefore, we develop a deep-dive approach to extensively explore API documentation to create improved API mapping methods. Our documentation exploration approach involves analyzing the functional description of APIs, and also considers the parameters and return values. The results of this analysis can be used to generate not only one-to-one API mapping, but also compatible API sequences, thereby enabling one-to-many API mapping. In addition, parameter-mapping relationships, which have often been ignored in previous approaches, can be produced. We apply this approach to map APIs from Java to Swift, and the experimental results indicate that our deep-dive analysis of API documentation leads to API mapping results that are superior to those generated by existing approaches.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127499979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Measuring Software Testability Modulo Test Quality 测量软件可测试性模数测试质量

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389273

Valerio Terragni, P. Salza, M. Pezzè

Comprehending the degree to which software components support testing is important to accurately schedule testing activities, train developers, and plan effective refactoring actions. Software testability estimates such property by relating code characteristics to the test effort. The main studies of testability reported in the literature investigate the relation between class metrics and test effort in terms of the size and complexity of the associated test suites. They report a moderate correlation of some class metrics to test-effort metrics, but suffer from two main limitations: (i) the results hardly generalize due to the small empirical evidence (datasets with no more than eight software projects); and (ii) mostly ignore the quality of the tests. However, considering the quality of the tests is important. Indeed, a class may have a low test effort because the associated tests are of poor quality, and not because the class is easier to test. In this paper, we propose an approach to measure testability that normalizes the test effort with respect to the test quality, which we quantify in terms of code coverage and mutation score. We present the results of a set of experiments on a dataset of 9,861 JAVA classes, belonging to 1,186 open source projects, with around 1.5 million of lines of code overall. The results confirm that normalizing the test effort with respect to the test quality largely improves the correlation between class metrics and the test effort. Better correlations result in better prediction power and thus better prediction of the test effort.

理解软件组件支持测试的程度对于准确地安排测试活动、培训开发人员和计划有效的重构操作是很重要的。软件可测试性通过将代码特征与测试工作联系起来来评估这些属性。在文献中报道的可测试性的主要研究是根据相关测试套件的大小和复杂性调查类度量和测试工作之间的关系。他们报告了一些类度量与测试工作度量之间的适度相关性，但是有两个主要的限制:(i)由于小的经验证据(不超过8个软件项目的数据集)，结果很难概括;(二)大多忽略了测试的质量。然而，考虑测试的质量是很重要的。实际上，类的测试工作可能较低，因为相关的测试质量较差，而不是因为类更容易测试。在本文中，我们提出了一种度量可测试性的方法，该方法将测试质量标准化，我们根据代码覆盖率和突变分数对其进行量化。我们在一个包含9,861个JAVA类的数据集上展示了一组实验的结果，这些类属于1,186个开源项目，总共有大约150万行代码。结果证实，相对于测试质量规范化测试工作很大程度上提高了类度量和测试工作之间的相关性。更好的相关性导致更好的预测能力，从而更好地预测测试工作。

{"title":"Measuring Software Testability Modulo Test Quality","authors":"Valerio Terragni, P. Salza, M. Pezzè","doi":"10.1145/3387904.3389273","DOIUrl":"https://doi.org/10.1145/3387904.3389273","url":null,"abstract":"Comprehending the degree to which software components support testing is important to accurately schedule testing activities, train developers, and plan effective refactoring actions. Software testability estimates such property by relating code characteristics to the test effort. The main studies of testability reported in the literature investigate the relation between class metrics and test effort in terms of the size and complexity of the associated test suites. They report a moderate correlation of some class metrics to test-effort metrics, but suffer from two main limitations: (i) the results hardly generalize due to the small empirical evidence (datasets with no more than eight software projects); and (ii) mostly ignore the quality of the tests. However, considering the quality of the tests is important. Indeed, a class may have a low test effort because the associated tests are of poor quality, and not because the class is easier to test. In this paper, we propose an approach to measure testability that normalizes the test effort with respect to the test quality, which we quantify in terms of code coverage and mutation score. We present the results of a set of experiments on a dataset of 9,861 JAVA classes, belonging to 1,186 open source projects, with around 1.5 million of lines of code overall. The results confirm that normalizing the test effort with respect to the test quality largely improves the correlation between class metrics and the test effort. Better correlations result in better prediction power and thus better prediction of the test effort.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125490598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

An Empirical Study of Quick Remedy Commits 快速补救提交的实证研究

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-07-13 DOI: 10.1145/3387904.3389266

Fengcai Wen, Csaba Nagy, Michele Lanza, G. Bavota

Software systems are continuously modified to implement new features, to fix bugs, and to improve quality attributes. Most of these activities are not atomic changes, but rather the result of several related changes affecting different parts of the code. For this reason, it may happen that developers omit some of the needed changes and, as a consequence, leave a task partially unfinished, introduce technical debt or, in the worst case scenario, inject bugs. Knowing the changes that are mistakenly omitted by developers can help in designing recommender systems able to automatically identify risky situations in which, for example, the developer is likely to be pushing an incomplete change to the software repository. We present a qualitative study investigating“quick remedy commits” performed by developers with the goal of implementing changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer in the same repository, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The defined taxonomy can guide the development of tools aimed at detecting omitted changes, and possibly autocomplete them.

软件系统被不断地修改，以实现新的特性，修复错误，并提高质量属性。这些活动中的大多数都不是原子性的更改，而是影响代码不同部分的几个相关更改的结果。由于这个原因，开发人员可能会忽略一些必要的更改，结果导致任务部分未完成，引入技术债务，或者在最坏的情况下注入bug。了解开发人员错误地忽略的更改可以帮助设计能够自动识别风险情况的推荐系统，例如，开发人员可能会将不完整的更改推送到软件存储库。我们提出了一项定性研究，调查开发人员执行的“快速补救提交”，目的是实现之前提交中遗漏的更改。快速补救提交我们指的是(i)快速跟进同一开发者在同一存储库中执行的提交，以及(ii)旨在修复由于前一次提交中遗漏的代码更改而引入的问题(例如，修复由于重命名重构而损坏的代码组件的引用)。通过对500个快速补救提交的手工分析，我们定义了一个分类法，对开发人员倾向于忽略的更改类型进行分类。定义的分类法可以指导旨在检测遗漏的更改的工具的开发，并可能自动完成这些更改。

{"title":"An Empirical Study of Quick Remedy Commits","authors":"Fengcai Wen, Csaba Nagy, Michele Lanza, G. Bavota","doi":"10.1145/3387904.3389266","DOIUrl":"https://doi.org/10.1145/3387904.3389266","url":null,"abstract":"Software systems are continuously modified to implement new features, to fix bugs, and to improve quality attributes. Most of these activities are not atomic changes, but rather the result of several related changes affecting different parts of the code. For this reason, it may happen that developers omit some of the needed changes and, as a consequence, leave a task partially unfinished, introduce technical debt or, in the worst case scenario, inject bugs. Knowing the changes that are mistakenly omitted by developers can help in designing recommender systems able to automatically identify risky situations in which, for example, the developer is likely to be pushing an incomplete change to the software repository. We present a qualitative study investigating“quick remedy commits” performed by developers with the goal of implementing changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer in the same repository, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The defined taxonomy can guide the development of tools aimed at detecting omitted changes, and possibly autocomplete them.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125023639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Automatic Android Deprecated-API Usage Update by learning from Single Updated Example 通过学习单个更新示例自动更新Android废弃api使用情况

2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)

Pub Date : 2020-05-27 DOI: 10.1145/3387904.3389285

S. A. Haryono, Ferdian Thung, Hong Jin Kang, Lucas Serrano, Gilles Muller, J. Lawall, David Lo, Lingxiao Jiang

Due to the deprecation of APIs in the Android operating system, developers have to update usages of the APIs to ensure that their applications work for both the past and current versions of Android. Such updates may be widespread, non-trivial, and time-consuming. Therefore, automation of such updates will be of great benefit to developers. AppEvolve, which is the state-of-the-art tool for automating such updates, relies on having before- and after-update examples to learn from. In this work, we propose an approach named CocciEvolve that performs such updates using only a single after-update example. CocciEvolve learns edits by extracting the relevant update to a block of code from an after-update example. From preliminary experiments, we find that CocciEvolve can successfully perform 96 out of 112 updates, with a success rate of 85%.

由于Android操作系统中api的弃用，开发人员必须更新api的用法，以确保他们的应用程序适用于过去和当前版本的Android。这样的更新可能是广泛的、重要的和耗时的。因此，这种更新的自动化将对开发人员有很大的好处。AppEvolve是最先进的自动化更新工具，它依赖于更新前后的示例来学习。在这项工作中，我们提出了一种名为CocciEvolve的方法，该方法仅使用单个更新后示例执行此类更新。CocciEvolve通过从更新后示例中提取相关更新到代码块来学习编辑。从初步实验中，我们发现CocciEvolve可以成功执行112次更新中的96次，成功率为85%。

引用次数: 21