Searching code in a large-scale codebase using natural language queries is a common practice during software development. Deep learning-based code search methods demonstrate superior performance if models are trained with large amount of text-code pairs. However, few deep code search models can be easily transferred from one codebase to another. It can be very costly to prepare training data for a new codebase and re-train an appropriate deep learning model. In this paper, we propose AdaCS, an adaptive deep code search method that can be trained once and transferred to new codebases. AdaCS decomposes the learning process into embedding domain-specific words and matching general syntactic patterns. Firstly, an unsupervised word embedding technique is used to construct a matching matrix to represent the lexical similarities. Then, a recurrent neural network is used to capture latent syntactic patterns from these matching matrices in a supervised way. As the supervised task learns general syntactic patterns that exist across domains, AdaCS is transferable to new codebases. Experimental results show that: when extended to new software projects never seen in the training data, AdaCS is more robust and significantly outperforms state-of-the-art deep code search methods.
{"title":"Adaptive Deep Code Search","authors":"Chunyang Ling, Zeqi Lin, Yanzhen Zou, Bing Xie","doi":"10.1145/3387904.3389278","DOIUrl":"https://doi.org/10.1145/3387904.3389278","url":null,"abstract":"Searching code in a large-scale codebase using natural language queries is a common practice during software development. Deep learning-based code search methods demonstrate superior performance if models are trained with large amount of text-code pairs. However, few deep code search models can be easily transferred from one codebase to another. It can be very costly to prepare training data for a new codebase and re-train an appropriate deep learning model. In this paper, we propose AdaCS, an adaptive deep code search method that can be trained once and transferred to new codebases. AdaCS decomposes the learning process into embedding domain-specific words and matching general syntactic patterns. Firstly, an unsupervised word embedding technique is used to construct a matching matrix to represent the lexical similarities. Then, a recurrent neural network is used to capture latent syntactic patterns from these matching matrices in a supervised way. As the supervised task learns general syntactic patterns that exist across domains, AdaCS is transferable to new codebases. Experimental results show that: when extended to new software projects never seen in the training data, AdaCS is more robust and significantly outperforms state-of-the-art deep code search methods.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122563888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information Retrieval (IR) methods have been recently employed to provide automatic support for bug localization tasks. However, for an IR-based bug localization tool to be useful, it has to achieve adequate retrieval accuracy. Lower precision and recall can leave developers with large amounts of incorrect information to wade through. To address this issue, in this paper, we systematically investigate the impact of combining various IR methods on the retrieval accuracy of bug localization engines. The main assumption is that different IR methods, targeting different dimensions of similarity between artifacts, can be used to enhance the confidence in each others' results. Five benchmark systems from different application domains are used to conduct our analysis. The results show that a) near-optimal global configurations can be determined for different combinations of IR methods, b) optimized IR-hybrids can significantly outperform individual methods as well as other unoptimized methods, and c) hybrid methods achieve their best performance when utilizing information-theoretic IR methods. Our findings can be used to enhance the practicality of IR-based bug localization tools and minimize the cognitive overload developers often face when locating bugs.
{"title":"On Combining IR Methods to Improve Bug localization","authors":"Saket Khatiwada, Miroslav Tushev, Anas Mahmoud","doi":"10.1145/3387904.3389280","DOIUrl":"https://doi.org/10.1145/3387904.3389280","url":null,"abstract":"Information Retrieval (IR) methods have been recently employed to provide automatic support for bug localization tasks. However, for an IR-based bug localization tool to be useful, it has to achieve adequate retrieval accuracy. Lower precision and recall can leave developers with large amounts of incorrect information to wade through. To address this issue, in this paper, we systematically investigate the impact of combining various IR methods on the retrieval accuracy of bug localization engines. The main assumption is that different IR methods, targeting different dimensions of similarity between artifacts, can be used to enhance the confidence in each others' results. Five benchmark systems from different application domains are used to conduct our analysis. The results show that a) near-optimal global configurations can be determined for different combinations of IR methods, b) optimized IR-hybrids can significantly outperform individual methods as well as other unoptimized methods, and c) hybrid methods achieve their best performance when utilizing information-theoretic IR methods. Our findings can be used to enhance the practicality of IR-based bug localization tools and minimize the cognitive overload developers often face when locating bugs.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131771058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, Yan Lei
Searching and reusing existing code from a large-scale codebase, e.g, GitHub, can help developers complete a programming task efficiently. Recently, Gu et al. proposed a deep learning-based model (i.e., DeepCS), which significantly outperformed prior models. The DeepCS embedded codebase and natural language queries into vectors by two LSTM (long and short-term memory) models separately, and returned developers the code with higher similarity to a code search query. However, such embedding method learned two isolated representations for code and query but ignored their internal semantic correlations. As a result, the learned isolated representations of code and query may limit the effectiveness of code search. To address the aforementioned issue, we propose a co-attentive representation learning model, i.e., Co-Attentive Representation Learning Code Search-CNN (CARLCS-CNN). CARLCS-CNN learns interdependent representations for the embedded code and query with a co-attention mechanism. Generally, such mechanism learns a correlation matrix between embedded code and query, and coattends their semantic relationship via row/column-wise max-pooling. In this way, the semantic correlation between code and query can directly affect their individual representations. We evaluate the effectiveness of CARLCS-CNN on Gu et al.'s dataset with 10k queries. Experimental results show that the proposed CARLCS-CNN model significantly outperforms DeepCS by 26.72% in terms of MRR (mean reciprocal rank). Additionally, CARLCS-CNN is five times faster than DeepCS in model training and four times in testing.
{"title":"Improving Code Search with Co-Attentive Representation Learning","authors":"Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, Yan Lei","doi":"10.1145/3387904.3389269","DOIUrl":"https://doi.org/10.1145/3387904.3389269","url":null,"abstract":"Searching and reusing existing code from a large-scale codebase, e.g, GitHub, can help developers complete a programming task efficiently. Recently, Gu et al. proposed a deep learning-based model (i.e., DeepCS), which significantly outperformed prior models. The DeepCS embedded codebase and natural language queries into vectors by two LSTM (long and short-term memory) models separately, and returned developers the code with higher similarity to a code search query. However, such embedding method learned two isolated representations for code and query but ignored their internal semantic correlations. As a result, the learned isolated representations of code and query may limit the effectiveness of code search. To address the aforementioned issue, we propose a co-attentive representation learning model, i.e., Co-Attentive Representation Learning Code Search-CNN (CARLCS-CNN). CARLCS-CNN learns interdependent representations for the embedded code and query with a co-attention mechanism. Generally, such mechanism learns a correlation matrix between embedded code and query, and coattends their semantic relationship via row/column-wise max-pooling. In this way, the semantic correlation between code and query can directly affect their individual representations. We evaluate the effectiveness of CARLCS-CNN on Gu et al.'s dataset with 10k queries. Experimental results show that the proposed CARLCS-CNN model significantly outperforms DeepCS by 26.72% in terms of MRR (mean reciprocal rank). Additionally, CARLCS-CNN is five times faster than DeepCS in model training and four times in testing.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131807039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Source code commenting is a common practice to improve code comprehension in software development. While comments often consist of descriptive natural language, surprisingly, there exists a non-trivial portion of comments that are actually code statements, i.e., commented-out code (CO code), even in well-maintained software systems. Commented-out code practice is rarely studied and often excluded in prior studies on comments due to its irrelevance to natural language. When being openly discussed, CO practice is generally considered a bad practice. However, there is no prior work to assess the nature (prevalence, evolution, motivation, and necessity of utilization) of CO code practice. In this paper, we perform the first study to understand CO code practice. Inspired by prior works in comment analysis, we develop automated solutions to identify CO code and track its evolution in development history. Through analyzing six open-source projects of different sizes and from diverse domains, we find that CO code practice is non-trivial in software development, especially in the early phase of development history, e.g., up to 20% of the commits involve CO code practice. We observe common evolution patterns of CO code and find that developers may uncomment and comment code more frequently than expected, e.g., 10% of the CO code practices have been uncommented at least once. Through a manual analysis, we identify the common reasons that developers adopt CO code practices and reveal maintenance challenges associated with CO code practices.
{"title":"The Secret Life of Commented-Out Source Code","authors":"Tri Minh Triet Pham, Jinqiu Yang","doi":"10.1145/3387904.3389259","DOIUrl":"https://doi.org/10.1145/3387904.3389259","url":null,"abstract":"Source code commenting is a common practice to improve code comprehension in software development. While comments often consist of descriptive natural language, surprisingly, there exists a non-trivial portion of comments that are actually code statements, i.e., commented-out code (CO code), even in well-maintained software systems. Commented-out code practice is rarely studied and often excluded in prior studies on comments due to its irrelevance to natural language. When being openly discussed, CO practice is generally considered a bad practice. However, there is no prior work to assess the nature (prevalence, evolution, motivation, and necessity of utilization) of CO code practice. In this paper, we perform the first study to understand CO code practice. Inspired by prior works in comment analysis, we develop automated solutions to identify CO code and track its evolution in development history. Through analyzing six open-source projects of different sizes and from diverse domains, we find that CO code practice is non-trivial in software development, especially in the early phase of development history, e.g., up to 20% of the commits involve CO code practice. We observe common evolution patterns of CO code and find that developers may uncomment and comment code more frequently than expected, e.g., 10% of the CO code practices have been uncommented at least once. Through a manual analysis, we identify the common reasons that developers adopt CO code practices and reveal maintenance challenges associated with CO code practices.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127641029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tetsushi Kuma, Yoshiki Higo, S. Matsumoto, S. Kusumoto
The sufficiency of test cases is essential for spectrum-based fault localization (in short, SBFL). If a given set of test cases is not sufficient, SBFL does not work. In such a case, we can improve the reliability of SBFL by adding new test cases. However, adding many test cases without considering their properties is not appropriate in the context of automated program repair (in short, APR). For example, in the case of GenProg, which is the most famous APR tool, all the test cases related to the bug module are executed for each of the mutated programs. Execution results of test cases are used for checking whether they pass all the test cases and inferring faulty statements for a given bug. Thus, in the context of APR, it is important to add necessary minimum test cases to improve the accuracy of SBFL. In this paper, we propose three strategies for selecting some test cases from a large number of automatically-generated test cases. We conducted a small experiment on bug dataset Defect4J and confirmed that the accuracy of SBFL was improved for 56.3% of target bugs while the accuracy was decreased for 17.3% in the case of the best strategy. We also confirmed that the increase of the execution time was suppressed to 1.5 seconds at the median.
{"title":"Improving the Accuracy of Spectrum-based Fault Localization for Automated Program Repair","authors":"Tetsushi Kuma, Yoshiki Higo, S. Matsumoto, S. Kusumoto","doi":"10.1145/3387904.3389290","DOIUrl":"https://doi.org/10.1145/3387904.3389290","url":null,"abstract":"The sufficiency of test cases is essential for spectrum-based fault localization (in short, SBFL). If a given set of test cases is not sufficient, SBFL does not work. In such a case, we can improve the reliability of SBFL by adding new test cases. However, adding many test cases without considering their properties is not appropriate in the context of automated program repair (in short, APR). For example, in the case of GenProg, which is the most famous APR tool, all the test cases related to the bug module are executed for each of the mutated programs. Execution results of test cases are used for checking whether they pass all the test cases and inferring faulty statements for a given bug. Thus, in the context of APR, it is important to add necessary minimum test cases to improve the accuracy of SBFL. In this paper, we propose three strategies for selecting some test cases from a large number of automatically-generated test cases. We conducted a small experiment on bug dataset Defect4J and confirmed that the accuracy of SBFL was improved for 56.3% of target bugs while the accuracy was decreased for 17.3% in the case of the best strategy. We also confirmed that the increase of the execution time was suppressed to 1.5 seconds at the median.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"306 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132564430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiongfei Wu, Liangyu Qin, Bing Yu, Xiaofei Xie, Lei Ma, Yinxing Xue, Yang Liu, Jianjun Zhao
Deep learning (DL) has been successfully applied to many cuttingedge applications, e.g., image processing, speech recognition, and natural language processing. As more and more DL software is made open-sourced, publicly available, and organized in model repositories and stores (MODEL Zoo, Modeldepot), there comes a need to understand the relationships of these DL models regarding their maintenance and evolution tasks. Although clone analysis has been extensively studied for traditional software, up to the present, clone analysis has not been investigated for DL software. Since DL software adopts the data-driven development paradigm, it is still not clear whether and to what extent the clone analysis techniques of traditional software could be adapted to DL software. In this paper, we initiate the first step on the clone analysis of DL software at three different levels, i.e., source code level, model structural level, and input/output (I/O)-semantic level, which would be a key in DL software management, maintenance and evolution. We intend to investigate the similarity between these DL models from clone analysis perspective. Several tools and metrics are selected to conduct clone analysis of DL software at three different levels. Our study on two popular datasets (i.e., MNIST and CIFAR-10) and eight DL models of five architectural families (i.e., LeNet, ResNet, DenseNet, AlexNet, and VGG) shows that: 1). the three levels of similarity analysis are generally adequate to find clones between DL models ranging from structural to semantic; 2). different measures for clone analysis used at each level yield similar results; 3) clone analysis of one single level may not render a complete picture of the similarity of DL models. Our findings open up several research opportunities worth further exploration towards better understanding and more effective clone analysis of DL software.
{"title":"How are Deep Learning Models Similar? An Empirical Study on Clone Analysis of Deep Learning Software","authors":"Xiongfei Wu, Liangyu Qin, Bing Yu, Xiaofei Xie, Lei Ma, Yinxing Xue, Yang Liu, Jianjun Zhao","doi":"10.1145/3387904.3389254","DOIUrl":"https://doi.org/10.1145/3387904.3389254","url":null,"abstract":"Deep learning (DL) has been successfully applied to many cuttingedge applications, e.g., image processing, speech recognition, and natural language processing. As more and more DL software is made open-sourced, publicly available, and organized in model repositories and stores (MODEL Zoo, Modeldepot), there comes a need to understand the relationships of these DL models regarding their maintenance and evolution tasks. Although clone analysis has been extensively studied for traditional software, up to the present, clone analysis has not been investigated for DL software. Since DL software adopts the data-driven development paradigm, it is still not clear whether and to what extent the clone analysis techniques of traditional software could be adapted to DL software. In this paper, we initiate the first step on the clone analysis of DL software at three different levels, i.e., source code level, model structural level, and input/output (I/O)-semantic level, which would be a key in DL software management, maintenance and evolution. We intend to investigate the similarity between these DL models from clone analysis perspective. Several tools and metrics are selected to conduct clone analysis of DL software at three different levels. Our study on two popular datasets (i.e., MNIST and CIFAR-10) and eight DL models of five architectural families (i.e., LeNet, ResNet, DenseNet, AlexNet, and VGG) shows that: 1). the three levels of similarity analysis are generally adequate to find clones between DL models ranging from structural to semantic; 2). different measures for clone analysis used at each level yield similar results; 3) clone analysis of one single level may not render a complete picture of the similarity of DL models. Our findings open up several research opportunities worth further exploration towards better understanding and more effective clone analysis of DL software.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127288506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zejun Zhang, Minxue Pan, Tian Zhang, Xinyu Zhou, Xuandong Li
Application program interface (API) mapping is the key to the success of code migration. Leveraging API documentation to map APIs has been explored by previous studies, and recently, code-based learning approaches have become the mainstream approach and shown better results. However, learning approaches often require a large amount of training data (e.g., projects implemented using multiple languages or API mapping datasets), which are not widely available. In contrast, API documentation is usually available, but we have observed that much information in API documentation has been underexploited. Therefore, we develop a deep-dive approach to extensively explore API documentation to create improved API mapping methods. Our documentation exploration approach involves analyzing the functional description of APIs, and also considers the parameters and return values. The results of this analysis can be used to generate not only one-to-one API mapping, but also compatible API sequences, thereby enabling one-to-many API mapping. In addition, parameter-mapping relationships, which have often been ignored in previous approaches, can be produced. We apply this approach to map APIs from Java to Swift, and the experimental results indicate that our deep-dive analysis of API documentation leads to API mapping results that are superior to those generated by existing approaches.
{"title":"Deep-Diving into Documentation to Develop Improved Java-to-Swift API Mapping","authors":"Zejun Zhang, Minxue Pan, Tian Zhang, Xinyu Zhou, Xuandong Li","doi":"10.1145/3387904.3389282","DOIUrl":"https://doi.org/10.1145/3387904.3389282","url":null,"abstract":"Application program interface (API) mapping is the key to the success of code migration. Leveraging API documentation to map APIs has been explored by previous studies, and recently, code-based learning approaches have become the mainstream approach and shown better results. However, learning approaches often require a large amount of training data (e.g., projects implemented using multiple languages or API mapping datasets), which are not widely available. In contrast, API documentation is usually available, but we have observed that much information in API documentation has been underexploited. Therefore, we develop a deep-dive approach to extensively explore API documentation to create improved API mapping methods. Our documentation exploration approach involves analyzing the functional description of APIs, and also considers the parameters and return values. The results of this analysis can be used to generate not only one-to-one API mapping, but also compatible API sequences, thereby enabling one-to-many API mapping. In addition, parameter-mapping relationships, which have often been ignored in previous approaches, can be produced. We apply this approach to map APIs from Java to Swift, and the experimental results indicate that our deep-dive analysis of API documentation leads to API mapping results that are superior to those generated by existing approaches.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127499979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Comprehending the degree to which software components support testing is important to accurately schedule testing activities, train developers, and plan effective refactoring actions. Software testability estimates such property by relating code characteristics to the test effort. The main studies of testability reported in the literature investigate the relation between class metrics and test effort in terms of the size and complexity of the associated test suites. They report a moderate correlation of some class metrics to test-effort metrics, but suffer from two main limitations: (i) the results hardly generalize due to the small empirical evidence (datasets with no more than eight software projects); and (ii) mostly ignore the quality of the tests. However, considering the quality of the tests is important. Indeed, a class may have a low test effort because the associated tests are of poor quality, and not because the class is easier to test. In this paper, we propose an approach to measure testability that normalizes the test effort with respect to the test quality, which we quantify in terms of code coverage and mutation score. We present the results of a set of experiments on a dataset of 9,861 JAVA classes, belonging to 1,186 open source projects, with around 1.5 million of lines of code overall. The results confirm that normalizing the test effort with respect to the test quality largely improves the correlation between class metrics and the test effort. Better correlations result in better prediction power and thus better prediction of the test effort.
{"title":"Measuring Software Testability Modulo Test Quality","authors":"Valerio Terragni, P. Salza, M. Pezzè","doi":"10.1145/3387904.3389273","DOIUrl":"https://doi.org/10.1145/3387904.3389273","url":null,"abstract":"Comprehending the degree to which software components support testing is important to accurately schedule testing activities, train developers, and plan effective refactoring actions. Software testability estimates such property by relating code characteristics to the test effort. The main studies of testability reported in the literature investigate the relation between class metrics and test effort in terms of the size and complexity of the associated test suites. They report a moderate correlation of some class metrics to test-effort metrics, but suffer from two main limitations: (i) the results hardly generalize due to the small empirical evidence (datasets with no more than eight software projects); and (ii) mostly ignore the quality of the tests. However, considering the quality of the tests is important. Indeed, a class may have a low test effort because the associated tests are of poor quality, and not because the class is easier to test. In this paper, we propose an approach to measure testability that normalizes the test effort with respect to the test quality, which we quantify in terms of code coverage and mutation score. We present the results of a set of experiments on a dataset of 9,861 JAVA classes, belonging to 1,186 open source projects, with around 1.5 million of lines of code overall. The results confirm that normalizing the test effort with respect to the test quality largely improves the correlation between class metrics and the test effort. Better correlations result in better prediction power and thus better prediction of the test effort.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125490598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Software systems are continuously modified to implement new features, to fix bugs, and to improve quality attributes. Most of these activities are not atomic changes, but rather the result of several related changes affecting different parts of the code. For this reason, it may happen that developers omit some of the needed changes and, as a consequence, leave a task partially unfinished, introduce technical debt or, in the worst case scenario, inject bugs. Knowing the changes that are mistakenly omitted by developers can help in designing recommender systems able to automatically identify risky situations in which, for example, the developer is likely to be pushing an incomplete change to the software repository. We present a qualitative study investigating“quick remedy commits” performed by developers with the goal of implementing changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer in the same repository, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The defined taxonomy can guide the development of tools aimed at detecting omitted changes, and possibly autocomplete them.
{"title":"An Empirical Study of Quick Remedy Commits","authors":"Fengcai Wen, Csaba Nagy, Michele Lanza, G. Bavota","doi":"10.1145/3387904.3389266","DOIUrl":"https://doi.org/10.1145/3387904.3389266","url":null,"abstract":"Software systems are continuously modified to implement new features, to fix bugs, and to improve quality attributes. Most of these activities are not atomic changes, but rather the result of several related changes affecting different parts of the code. For this reason, it may happen that developers omit some of the needed changes and, as a consequence, leave a task partially unfinished, introduce technical debt or, in the worst case scenario, inject bugs. Knowing the changes that are mistakenly omitted by developers can help in designing recommender systems able to automatically identify risky situations in which, for example, the developer is likely to be pushing an incomplete change to the software repository. We present a qualitative study investigating“quick remedy commits” performed by developers with the goal of implementing changes omitted in previous commits. With quick remedy commits we refer to commits that (i) quickly follow a commit performed by the same developer in the same repository, and (ii) aim at remedying issues introduced as the result of code changes omitted in the previous commit (e.g., fix references to code components that have been broken as a consequence of a rename refactoring). Through a manual analysis of 500 quick remedy commits, we define a taxonomy categorizing the types of changes that developers tend to omit. The defined taxonomy can guide the development of tools aimed at detecting omitted changes, and possibly autocomplete them.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125023639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. A. Haryono, Ferdian Thung, Hong Jin Kang, Lucas Serrano, Gilles Muller, J. Lawall, David Lo, Lingxiao Jiang
Due to the deprecation of APIs in the Android operating system, developers have to update usages of the APIs to ensure that their applications work for both the past and current versions of Android. Such updates may be widespread, non-trivial, and time-consuming. Therefore, automation of such updates will be of great benefit to developers. AppEvolve, which is the state-of-the-art tool for automating such updates, relies on having before- and after-update examples to learn from. In this work, we propose an approach named CocciEvolve that performs such updates using only a single after-update example. CocciEvolve learns edits by extracting the relevant update to a block of code from an after-update example. From preliminary experiments, we find that CocciEvolve can successfully perform 96 out of 112 updates, with a success rate of 85%.
{"title":"Automatic Android Deprecated-API Usage Update by learning from Single Updated Example","authors":"S. A. Haryono, Ferdian Thung, Hong Jin Kang, Lucas Serrano, Gilles Muller, J. Lawall, David Lo, Lingxiao Jiang","doi":"10.1145/3387904.3389285","DOIUrl":"https://doi.org/10.1145/3387904.3389285","url":null,"abstract":"Due to the deprecation of APIs in the Android operating system, developers have to update usages of the APIs to ensure that their applications work for both the past and current versions of Android. Such updates may be widespread, non-trivial, and time-consuming. Therefore, automation of such updates will be of great benefit to developers. AppEvolve, which is the state-of-the-art tool for automating such updates, relies on having before- and after-update examples to learn from. In this work, we propose an approach named CocciEvolve that performs such updates using only a single after-update example. CocciEvolve learns edits by extracting the relevant update to a block of code from an after-update example. From preliminary experiments, we find that CocciEvolve can successfully perform 96 out of 112 updates, with a success rate of 85%.","PeriodicalId":231095,"journal":{"name":"2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123165081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}