Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, I. Malavolta, Daniel E. Krutz
Permissions are one of the most fundamental components for protecting an Android user's privacy and security. Unfortunately, developers frequently misuse permissions by requiring too many or too few permissions, or by not adhering to permission best practices. These permission-related issues can negatively impact users in a variety of ways, ranging from creating a poor user experience to severe privacy and security implications. To advance the understanding permission-related issues during the app's development process, we conducted an empirical study of 574 GitHub repositories of open-source Android apps. We analyzed the occurrences of four types of permission-related issues across the lifetime of the apps. Our findings reveal that (i) permission-related issues are a frequent phenomenon in Android apps, (ii) the majority of issues are fixed within a few days after their introduction, (iii) permission-related issues can frequently linger inside an app for an extended period of time, which can be as high as several years, before being fixed, and (iv) both project newcomers and regular contributors exhibit the same behaviour in terms of number of introduced and fixed permission-related issues per commit.
{"title":"Permission Issues in Open-Source Android Apps: An Exploratory Study","authors":"Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, I. Malavolta, Daniel E. Krutz","doi":"10.1109/SCAM.2019.00034","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00034","url":null,"abstract":"Permissions are one of the most fundamental components for protecting an Android user's privacy and security. Unfortunately, developers frequently misuse permissions by requiring too many or too few permissions, or by not adhering to permission best practices. These permission-related issues can negatively impact users in a variety of ways, ranging from creating a poor user experience to severe privacy and security implications. To advance the understanding permission-related issues during the app's development process, we conducted an empirical study of 574 GitHub repositories of open-source Android apps. We analyzed the occurrences of four types of permission-related issues across the lifetime of the apps. Our findings reveal that (i) permission-related issues are a frequent phenomenon in Android apps, (ii) the majority of issues are fixed within a few days after their introduction, (iii) permission-related issues can frequently linger inside an app for an extended period of time, which can be as high as several years, before being fixed, and (iv) both project newcomers and regular contributors exhibit the same behaviour in terms of number of introduced and fixed permission-related issues per commit.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114520149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeffrey J. Yackley, M. Kessentini, G. Bavota, Vahid Alizadeh, Bruce Maxim
Currently, refactoring and regression testing are treated independently by existing studies. However, software developers frequently switch between these two activities, using regression testing to identify unwanted behavior changes introduced while refactoring and applying refactoring on identified buggy code fragments. Our hypothesis is that the tools to support developers in these two tasks could transfer part of the knowledge extracted from the process of finding refactoring opportunities to identify relevant test cases, and vice-versa. We propose a simultasking, search-based algorithm that unifies the tasks of refactoring and regression testing, hence solving them simultaneously and enabling knowledge transfer between them. The salient feature of the proposed algorithm is a unified and generic solution representation scheme for both problems, which serves as a common platform for knowledge transfer between them. We implemented and evaluated the proposed simultasking approach on six opensource systems and one industrial project. Our study features quantitative and qualitative analysis performed with developers, and the results achieved show that the proposed approach provides advantages over mono-task techniques treating refactoring and regression testing separately.
{"title":"Simultaneous Refactoring and Regression Testing","authors":"Jeffrey J. Yackley, M. Kessentini, G. Bavota, Vahid Alizadeh, Bruce Maxim","doi":"10.1109/SCAM.2019.00032","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00032","url":null,"abstract":"Currently, refactoring and regression testing are treated independently by existing studies. However, software developers frequently switch between these two activities, using regression testing to identify unwanted behavior changes introduced while refactoring and applying refactoring on identified buggy code fragments. Our hypothesis is that the tools to support developers in these two tasks could transfer part of the knowledge extracted from the process of finding refactoring opportunities to identify relevant test cases, and vice-versa. We propose a simultasking, search-based algorithm that unifies the tasks of refactoring and regression testing, hence solving them simultaneously and enabling knowledge transfer between them. The salient feature of the proposed algorithm is a unified and generic solution representation scheme for both problems, which serves as a common platform for knowledge transfer between them. We implemented and evaluated the proposed simultasking approach on six opensource systems and one industrial project. Our study features quantitative and qualitative analysis performed with developers, and the results achieved show that the proposed approach provides advantages over mono-task techniques treating refactoring and regression testing separately.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127595342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Static Single Assignment (SSA) form is an intermediate representation used for the analysis and optimization of programs in modern compilers. The ϕ-function placement is the most computationally expensive part of converting any program into its SSA form. The most widely-used ϕ-function placement algorithms are based on computing dominance frontiers. However, this kind of algorithms works under the limiting assumption that all variables are defined at the beginning of the program, which is not the case for local variables. In this paper, we introduce an innovative algorithm based on computing reaching definitions, only assuming that global variables and formal parameters are defined at the beginning of the program. We implemented our algorithm and compared it to a well-known dominance frontiers-based algorithm in the Clang/LLVM compiler framework by performing experiments on a benchmarking suite for Perl. The results of our experiments show that, besides a few computationally expensive cases, our algorithm is fairly efficient, and most notably it produces up to 169% and on an average 74% fewer ϕ-functions than the reference dominance frontiers-based algorithm.
{"title":"Towards Constructing the SSA form using Reaching Definitions Over Dominance Frontiers","authors":"A. Masud, Federico Ciccozzi","doi":"10.1109/SCAM.2019.00012","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00012","url":null,"abstract":"The Static Single Assignment (SSA) form is an intermediate representation used for the analysis and optimization of programs in modern compilers. The ϕ-function placement is the most computationally expensive part of converting any program into its SSA form. The most widely-used ϕ-function placement algorithms are based on computing dominance frontiers. However, this kind of algorithms works under the limiting assumption that all variables are defined at the beginning of the program, which is not the case for local variables. In this paper, we introduce an innovative algorithm based on computing reaching definitions, only assuming that global variables and formal parameters are defined at the beginning of the program. We implemented our algorithm and compared it to a well-known dominance frontiers-based algorithm in the Clang/LLVM compiler framework by performing experiments on a benchmarking suite for Perl. The results of our experiments show that, besides a few computationally expensive cases, our algorithm is fairly efficient, and most notably it produces up to 169% and on an average 74% fewer ϕ-functions than the reference dominance frontiers-based algorithm.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122219990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isaac Moreira Medeiros Gomes, Daniel Coutinho, Marcelo Schots
When creating their programs, developers usually have a preferred or standardized style of their own to write code, known as coding style. Such code is usually stored in a version control repository, through which collaborative work usually takes place. However, in such a setting, isolated attempts of standardization can lead to several coding styles coexisting in the same project, causing the opposite effect to that intended. Besides increasing the effort required to understand code, coding style conflicts may also clutter repository history as developers change existing styles to their usual preferences. To overcome this problem, we propose an approach to support the definition of a repository coding style while allowing developers to use their preferred coding style. To illustrate our approach, we built the RECoSt tool and applied it using real excerpts of a popular open source project. Our proposed approach intends to help developers keep their projects' coding style standardized without having to abandon the style they are familiar with.
{"title":"No Accounting for Taste: Supporting Developers' Individual Choices of Coding Styles","authors":"Isaac Moreira Medeiros Gomes, Daniel Coutinho, Marcelo Schots","doi":"10.1109/SCAM.2019.00018","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00018","url":null,"abstract":"When creating their programs, developers usually have a preferred or standardized style of their own to write code, known as coding style. Such code is usually stored in a version control repository, through which collaborative work usually takes place. However, in such a setting, isolated attempts of standardization can lead to several coding styles coexisting in the same project, causing the opposite effect to that intended. Besides increasing the effort required to understand code, coding style conflicts may also clutter repository history as developers change existing styles to their usual preferences. To overcome this problem, we propose an approach to support the definition of a repository coding style while allowing developers to use their preferred coding style. To illustrate our approach, we built the RECoSt tool and applied it using real excerpts of a popular open source project. Our proposed approach intends to help developers keep their projects' coding style standardized without having to abandon the style they are familiar with.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128492644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anthony S Peruma, Mohamed Wiem Mkaouer, M. J. Decker, Christian D. Newman
Identifier names are the atoms of comprehension; weak identifier names decrease productivity by increasing the chance that developers make mistakes and increasing the time taken to understand chunks of code. Therefore, it is vital to support developers in naming, and renaming, identifiers. In this paper, we study how terms in an identifier change during the application of rename refactorings and contextualize these changes using co-occurring refactorings and commit messages. The goal of this work is to understand how different development activities affect the type of changes applied to names during a rename. Results of this study can help researchers understand more about developers' naming habits and support developers in determining when to rename and what words to use.
{"title":"Contextualizing Rename Decisions using Refactorings and Commit Messages","authors":"Anthony S Peruma, Mohamed Wiem Mkaouer, M. J. Decker, Christian D. Newman","doi":"10.1109/SCAM.2019.00017","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00017","url":null,"abstract":"Identifier names are the atoms of comprehension; weak identifier names decrease productivity by increasing the chance that developers make mistakes and increasing the time taken to understand chunks of code. Therefore, it is vital to support developers in naming, and renaming, identifiers. In this paper, we study how terms in an identifier change during the application of rename refactorings and contextualize these changes using co-occurring refactorings and commit messages. The goal of this work is to understand how different development activities affect the type of changes applied to names during a rename. Results of this study can help researchers understand more about developers' naming habits and support developers in determining when to rename and what words to use.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When we designed the first version of Rascal in 2009, we jokingly promised ourselves to only write a single paper on the language itself, and see it as vehicle for research from then on,—that one paper became the SCAM 2009 article, now awarded with the SCAM most influential paper award. Since then, Rascal has evolved significantly, and has been successfully applied in research, education, and industry. This extended abstract gives an overview of the impact of Rascal over the last 10 years, and looks at current and future developments.
{"title":"Rascal, 10 Years Later","authors":"P. Klint, T. Storm, J. Vinju","doi":"10.1109/SCAM.2019.00023","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00023","url":null,"abstract":"When we designed the first version of Rascal in 2009, we jokingly promised ourselves to only write a single paper on the language itself, and see it as vehicle for research from then on,—that one paper became the SCAM 2009 article, now awarded with the SCAM most influential paper award. Since then, Rascal has evolved significantly, and has been successfully applied in research, education, and industry. This extended abstract gives an overview of the impact of Rascal over the last 10 years, and looks at current and future developments.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126418276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since it was first proposed in 1992 under the name of "behavior sampling", the idea of judging whether software systems are functionally equivalent by observing their responses to common stimuli (i.e. tests) has been used for a range of tasks such as software retrieval, functional redundancy measurement and semantic clone detection. However, its efficacy has only been studied in one small experiment, with limited generalizability, described in the original paper proposing the approach. The results of that experiment suggest that a relatively small number of randomly generated tests (i.e. 4) is sufficient to recognize non-functional-equivalent software 85% of the time. This number has therefore been adopted as "sufficient" in numerous applications of the approach. In this paper we present a much larger study which suggests at least 39 randomly generated tests are actually needed to achieve this level of effectiveness, but that a far fewer number of tests generated using coverage-based heuristics are sufficient. Since these results are much more generalizable, they have implications for future applications of behavioral sampling for dynamic behavior comparison.
{"title":"On the Efficacy of Dynamic Behavior Comparison for Judging Functional Equivalence","authors":"Marcus Kessel, C. Atkinson","doi":"10.1109/SCAM.2019.00030","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00030","url":null,"abstract":"Since it was first proposed in 1992 under the name of \"behavior sampling\", the idea of judging whether software systems are functionally equivalent by observing their responses to common stimuli (i.e. tests) has been used for a range of tasks such as software retrieval, functional redundancy measurement and semantic clone detection. However, its efficacy has only been studied in one small experiment, with limited generalizability, described in the original paper proposing the approach. The results of that experiment suggest that a relatively small number of randomly generated tests (i.e. 4) is sufficient to recognize non-functional-equivalent software 85% of the time. This number has therefore been adopted as \"sufficient\" in numerous applications of the approach. In this paper we present a much larger study which suggests at least 39 randomly generated tests are actually needed to achieve this level of effectiveness, but that a far fewer number of tests generated using coverage-based heuristics are sufficient. Since these results are much more generalizable, they have implications for future applications of behavioral sampling for dynamic behavior comparison.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"151 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113989070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Masudur Rahman, Saikat Chakraborty, G. Kaiser, Baishakhi Ray
Information Retrieval (IR) plays a pivotal role in diverse Software Engineering (SE) tasks, e.g., bug localization and triaging, bug report routing, code retrieval, requirements analysis, etc. SE tasks operate on diverse types of documents including code, text, stack-traces, and structured, semi-structured and unstructured meta-data that often contain specialized vocabularies. As the performance of any IR-based tool critically depends on the underlying document types, and given the diversity of SE corpora, it is essential to understand which models work best for which types of SE documents and tasks. We empirically investigate the interaction between IR models and document types for two representative SE tasks (bug localization and relevant project search), carefully chosen as they require a diverse set of SE artifacts (mixtures of code and text), and confirm that the models' performance varies significantly with mix of document types. Leveraging this insight, we propose a generalized framework, SRCH, to automatically select the most favorable IR model(s) for a given SE task. We evaluate SRCH w.r.t. these two tasks and confirm its effectiveness. Our preliminary user study shows that SRCH's intelligent adaption of the IR model(s) to the task at hand not only improves precision and recall for SE tasks but may also improve users' satisfaction.
{"title":"Toward Optimal Selection of Information Retrieval Models for Software Engineering Tasks","authors":"Md Masudur Rahman, Saikat Chakraborty, G. Kaiser, Baishakhi Ray","doi":"10.1109/SCAM.2019.00022","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00022","url":null,"abstract":"Information Retrieval (IR) plays a pivotal role in diverse Software Engineering (SE) tasks, e.g., bug localization and triaging, bug report routing, code retrieval, requirements analysis, etc. SE tasks operate on diverse types of documents including code, text, stack-traces, and structured, semi-structured and unstructured meta-data that often contain specialized vocabularies. As the performance of any IR-based tool critically depends on the underlying document types, and given the diversity of SE corpora, it is essential to understand which models work best for which types of SE documents and tasks. We empirically investigate the interaction between IR models and document types for two representative SE tasks (bug localization and relevant project search), carefully chosen as they require a diverse set of SE artifacts (mixtures of code and text), and confirm that the models' performance varies significantly with mix of document types. Leveraging this insight, we propose a generalized framework, SRCH, to automatically select the most favorable IR model(s) for a given SE task. We evaluate SRCH w.r.t. these two tasks and confirm its effectiveness. Our preliminary user study shows that SRCH's intelligent adaption of the IR model(s) to the task at hand not only improves precision and recall for SE tasks but may also improve users' satisfaction.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133203980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salvatore Geremia, G. Bavota, R. Oliveto, Michele Lanza, M. D. Penta
Stack Overflow is the most popular question and answer website on computer programming with more than 2.5M users, 16M questions, and a new answer posted, on average, every five seconds. This wide availability of data led researchers to develop techniques to mine Stack Overflow posts. The aim is to find and recommend posts with information useful to developers. However, and not surprisingly, not every Stack Overflow post is useful from a developer's perspective. We empirically investigate what the characteristics of "useful" Stack Overflow posts are. The underlying assumption of our study is that posts that were used (referenced in the source code) in the past by developers are likely to be useful. We refer to these posts as leveraged posts. We study the characteristics of leveraged posts as opposed to the non-leveraged ones, focusing on community aspects (e.g., the reputation of the user who authored the post), the quality of the included code snippets (e.g., complexity), and the quality of the post's textual content (e.g., readability). Then, we use these features to build a prediction model to automatically identify posts that are likely to be leveraged by developers. Results of the study indicate that post meta-data (e.g., the number of comments received by the answer) is particularly useful to predict whether it has been leveraged or not, whereas code readability appears to be less useful. A classifier can classify leveraged posts with a precision of 65% and recall of 49% and non-leveraged ones with a precision of 95% and recall of 97%. This opens the road towards an automatic identification of "high-quality content" in Stack Overflow.
{"title":"Characterizing Leveraged Stack Overflow Posts","authors":"Salvatore Geremia, G. Bavota, R. Oliveto, Michele Lanza, M. D. Penta","doi":"10.1109/SCAM.2019.00025","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00025","url":null,"abstract":"Stack Overflow is the most popular question and answer website on computer programming with more than 2.5M users, 16M questions, and a new answer posted, on average, every five seconds. This wide availability of data led researchers to develop techniques to mine Stack Overflow posts. The aim is to find and recommend posts with information useful to developers. However, and not surprisingly, not every Stack Overflow post is useful from a developer's perspective. We empirically investigate what the characteristics of \"useful\" Stack Overflow posts are. The underlying assumption of our study is that posts that were used (referenced in the source code) in the past by developers are likely to be useful. We refer to these posts as leveraged posts. We study the characteristics of leveraged posts as opposed to the non-leveraged ones, focusing on community aspects (e.g., the reputation of the user who authored the post), the quality of the included code snippets (e.g., complexity), and the quality of the post's textual content (e.g., readability). Then, we use these features to build a prediction model to automatically identify posts that are likely to be leveraged by developers. Results of the study indicate that post meta-data (e.g., the number of comments received by the answer) is particularly useful to predict whether it has been leveraged or not, whereas code readability appears to be less useful. A classifier can classify leveraged posts with a precision of 65% and recall of 49% and non-leveraged ones with a precision of 95% and recall of 97%. This opens the road towards an automatic identification of \"high-quality content\" in Stack Overflow.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120958780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Lin, Csaba Nagy, G. Bavota, Andrian Marcus, Michele Lanza
Meaningful, expressive identifiers in source code can enhance the readability and reduce comprehension efforts. Over the past years, researchers have devoted considerable effort to understanding and improving the naming quality of identifiers in source code. However, little attention has been given to test code, an important resource during program comprehension activities. To better grasp identifier quality in test code, we conducted a survey involving manually written and automatically generated test cases from ten open source software projects. The survey results indicate that test cases contain low quality identifiers, including the manually written ones, and that the quality of identifiers is lower in test code than in production code. We also investigated the use of three state-of-the-art rename refactoring recommenders for improving test code identifiers. The analysis highlights their limitations when applied to test code and supports mapping out a research agenda for future work in the area.
{"title":"On the Quality of Identifiers in Test Code","authors":"B. Lin, Csaba Nagy, G. Bavota, Andrian Marcus, Michele Lanza","doi":"10.1109/SCAM.2019.00031","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00031","url":null,"abstract":"Meaningful, expressive identifiers in source code can enhance the readability and reduce comprehension efforts. Over the past years, researchers have devoted considerable effort to understanding and improving the naming quality of identifiers in source code. However, little attention has been given to test code, an important resource during program comprehension activities. To better grasp identifier quality in test code, we conducted a survey involving manually written and automatically generated test cases from ten open source software projects. The survey results indicate that test cases contain low quality identifiers, including the manually written ones, and that the quality of identifiers is lower in test code than in production code. We also investigated the use of three state-of-the-art rename refactoring recommenders for improving test code identifiers. The analysis highlights their limitations when applied to test code and supports mapping out a research agenda for future work in the area.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"338 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115473051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}