首页 > 最新文献

2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)最新文献

英文 中文
Permission Issues in Open-Source Android Apps: An Exploratory Study 开源Android应用程序中的权限问题:一项探索性研究
Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, I. Malavolta, Daniel E. Krutz
Permissions are one of the most fundamental components for protecting an Android user's privacy and security. Unfortunately, developers frequently misuse permissions by requiring too many or too few permissions, or by not adhering to permission best practices. These permission-related issues can negatively impact users in a variety of ways, ranging from creating a poor user experience to severe privacy and security implications. To advance the understanding permission-related issues during the app's development process, we conducted an empirical study of 574 GitHub repositories of open-source Android apps. We analyzed the occurrences of four types of permission-related issues across the lifetime of the apps. Our findings reveal that (i) permission-related issues are a frequent phenomenon in Android apps, (ii) the majority of issues are fixed within a few days after their introduction, (iii) permission-related issues can frequently linger inside an app for an extended period of time, which can be as high as several years, before being fixed, and (iv) both project newcomers and regular contributors exhibit the same behaviour in terms of number of introduced and fixed permission-related issues per commit.
权限是保护Android用户隐私和安全的最基本组件之一。不幸的是,开发人员经常通过要求太多或太少的权限,或者不遵守权限最佳实践来滥用权限。这些与权限相关的问题可能以各种方式对用户产生负面影响,从创建糟糕的用户体验到严重的隐私和安全隐患。为了加深对应用开发过程中权限相关问题的理解,我们对574个开源Android应用的GitHub库进行了实证研究。我们分析了在应用的整个生命周期中出现的四种类型的权限相关问题。我们的研究结果显示:(1)权限相关问题在Android应用中是一个常见现象,(2)大多数问题在引入后几天内就得到了修复,(3)权限相关问题通常会在应用中持续很长一段时间,可能长达数年,然后才得到修复。(iv)项目新人和常规贡献者在每次提交时引入的和固定的权限相关问题的数量方面表现出相同的行为。
{"title":"Permission Issues in Open-Source Android Apps: An Exploratory Study","authors":"Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, I. Malavolta, Daniel E. Krutz","doi":"10.1109/SCAM.2019.00034","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00034","url":null,"abstract":"Permissions are one of the most fundamental components for protecting an Android user's privacy and security. Unfortunately, developers frequently misuse permissions by requiring too many or too few permissions, or by not adhering to permission best practices. These permission-related issues can negatively impact users in a variety of ways, ranging from creating a poor user experience to severe privacy and security implications. To advance the understanding permission-related issues during the app's development process, we conducted an empirical study of 574 GitHub repositories of open-source Android apps. We analyzed the occurrences of four types of permission-related issues across the lifetime of the apps. Our findings reveal that (i) permission-related issues are a frequent phenomenon in Android apps, (ii) the majority of issues are fixed within a few days after their introduction, (iii) permission-related issues can frequently linger inside an app for an extended period of time, which can be as high as several years, before being fixed, and (iv) both project newcomers and regular contributors exhibit the same behaviour in terms of number of introduced and fixed permission-related issues per commit.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114520149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Simultaneous Refactoring and Regression Testing 同时进行重构和回归测试
Jeffrey J. Yackley, M. Kessentini, G. Bavota, Vahid Alizadeh, Bruce Maxim
Currently, refactoring and regression testing are treated independently by existing studies. However, software developers frequently switch between these two activities, using regression testing to identify unwanted behavior changes introduced while refactoring and applying refactoring on identified buggy code fragments. Our hypothesis is that the tools to support developers in these two tasks could transfer part of the knowledge extracted from the process of finding refactoring opportunities to identify relevant test cases, and vice-versa. We propose a simultasking, search-based algorithm that unifies the tasks of refactoring and regression testing, hence solving them simultaneously and enabling knowledge transfer between them. The salient feature of the proposed algorithm is a unified and generic solution representation scheme for both problems, which serves as a common platform for knowledge transfer between them. We implemented and evaluated the proposed simultasking approach on six opensource systems and one industrial project. Our study features quantitative and qualitative analysis performed with developers, and the results achieved show that the proposed approach provides advantages over mono-task techniques treating refactoring and regression testing separately.
目前,重构和回归测试在现有的研究中是独立对待的。然而,软件开发人员经常在这两种活动之间切换,使用回归测试来识别重构时引入的不需要的行为更改,并对已识别的有缺陷的代码片段应用重构。我们的假设是,在这两个任务中支持开发人员的工具可以转移从寻找重构机会的过程中提取的部分知识,以识别相关的测试用例,反之亦然。我们提出了一种基于搜索的并行任务算法,该算法将重构和回归测试的任务统一起来,从而同时解决它们并实现它们之间的知识转移。该算法的显著特点是为这两个问题提供了统一的、通用的解表示方案,为这两个问题之间的知识转移提供了一个通用的平台。我们在六个开源系统和一个工业项目上实施并评估了所提出的并行处理方法。我们的研究特点是对开发人员进行定量和定性分析,结果表明,所提出的方法比单独处理重构和回归测试的单任务技术更有优势。
{"title":"Simultaneous Refactoring and Regression Testing","authors":"Jeffrey J. Yackley, M. Kessentini, G. Bavota, Vahid Alizadeh, Bruce Maxim","doi":"10.1109/SCAM.2019.00032","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00032","url":null,"abstract":"Currently, refactoring and regression testing are treated independently by existing studies. However, software developers frequently switch between these two activities, using regression testing to identify unwanted behavior changes introduced while refactoring and applying refactoring on identified buggy code fragments. Our hypothesis is that the tools to support developers in these two tasks could transfer part of the knowledge extracted from the process of finding refactoring opportunities to identify relevant test cases, and vice-versa. We propose a simultasking, search-based algorithm that unifies the tasks of refactoring and regression testing, hence solving them simultaneously and enabling knowledge transfer between them. The salient feature of the proposed algorithm is a unified and generic solution representation scheme for both problems, which serves as a common platform for knowledge transfer between them. We implemented and evaluated the proposed simultasking approach on six opensource systems and one industrial project. Our study features quantitative and qualitative analysis performed with developers, and the results achieved show that the proposed approach provides advantages over mono-task techniques treating refactoring and regression testing separately.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127595342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards Constructing the SSA form using Reaching Definitions Over Dominance Frontiers 利用优势边界上的定义构建SSA形式
A. Masud, Federico Ciccozzi
The Static Single Assignment (SSA) form is an intermediate representation used for the analysis and optimization of programs in modern compilers. The ϕ-function placement is the most computationally expensive part of converting any program into its SSA form. The most widely-used ϕ-function placement algorithms are based on computing dominance frontiers. However, this kind of algorithms works under the limiting assumption that all variables are defined at the beginning of the program, which is not the case for local variables. In this paper, we introduce an innovative algorithm based on computing reaching definitions, only assuming that global variables and formal parameters are defined at the beginning of the program. We implemented our algorithm and compared it to a well-known dominance frontiers-based algorithm in the Clang/LLVM compiler framework by performing experiments on a benchmarking suite for Perl. The results of our experiments show that, besides a few computationally expensive cases, our algorithm is fairly efficient, and most notably it produces up to 169% and on an average 74% fewer ϕ-functions than the reference dominance frontiers-based algorithm.
静态单赋值(SSA)形式是现代编译器中用于分析和优化程序的中间表示形式。在将任何程序转换为其SSA形式时,在计算上花费最大的部分是部署ϕ-函数。最广泛使用的部署算法是基于计算优势边界的。然而,这种算法是在有限的假设下工作的,即所有变量都是在程序开始时定义的,而局部变量则不是这样。在本文中,我们介绍了一种基于计算到达定义的创新算法,仅假设在程序开始时定义了全局变量和形式参数。我们实现了我们的算法,并通过在Perl基准测试套件上执行实验,将其与Clang/LLVM编译器框架中一个众所周知的基于优势边界的算法进行了比较。我们的实验结果表明,除了一些计算代价昂贵的情况外,我们的算法相当高效,最值得注意的是,它比基于参考优势边界的算法产生高达169%,平均减少74%的ϕ-函数。
{"title":"Towards Constructing the SSA form using Reaching Definitions Over Dominance Frontiers","authors":"A. Masud, Federico Ciccozzi","doi":"10.1109/SCAM.2019.00012","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00012","url":null,"abstract":"The Static Single Assignment (SSA) form is an intermediate representation used for the analysis and optimization of programs in modern compilers. The ϕ-function placement is the most computationally expensive part of converting any program into its SSA form. The most widely-used ϕ-function placement algorithms are based on computing dominance frontiers. However, this kind of algorithms works under the limiting assumption that all variables are defined at the beginning of the program, which is not the case for local variables. In this paper, we introduce an innovative algorithm based on computing reaching definitions, only assuming that global variables and formal parameters are defined at the beginning of the program. We implemented our algorithm and compared it to a well-known dominance frontiers-based algorithm in the Clang/LLVM compiler framework by performing experiments on a benchmarking suite for Perl. The results of our experiments show that, besides a few computationally expensive cases, our algorithm is fairly efficient, and most notably it produces up to 169% and on an average 74% fewer ϕ-functions than the reference dominance frontiers-based algorithm.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122219990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
No Accounting for Taste: Supporting Developers' Individual Choices of Coding Styles 不考虑品味:支持开发人员对编码风格的个人选择
Isaac Moreira Medeiros Gomes, Daniel Coutinho, Marcelo Schots
When creating their programs, developers usually have a preferred or standardized style of their own to write code, known as coding style. Such code is usually stored in a version control repository, through which collaborative work usually takes place. However, in such a setting, isolated attempts of standardization can lead to several coding styles coexisting in the same project, causing the opposite effect to that intended. Besides increasing the effort required to understand code, coding style conflicts may also clutter repository history as developers change existing styles to their usual preferences. To overcome this problem, we propose an approach to support the definition of a repository coding style while allowing developers to use their preferred coding style. To illustrate our approach, we built the RECoSt tool and applied it using real excerpts of a popular open source project. Our proposed approach intends to help developers keep their projects' coding style standardized without having to abandon the style they are familiar with.
在创建程序时,开发人员通常有自己的首选或标准化的代码编写风格,称为编码风格。这样的代码通常存储在版本控制存储库中,协作工作通常通过该存储库进行。然而,在这种情况下,孤立的标准化尝试可能导致在同一项目中共存几种编码风格,从而导致与预期相反的效果。除了增加理解代码所需的工作量之外,由于开发人员将现有风格更改为他们通常的偏好,编码风格冲突还可能使存储库历史变得混乱。为了克服这个问题,我们提出了一种方法来支持存储库编码风格的定义,同时允许开发人员使用他们喜欢的编码风格。为了说明我们的方法,我们构建了RECoSt工具,并使用一个流行的开放源代码项目的真实片段来应用它。我们建议的方法旨在帮助开发人员保持他们项目的编码风格标准化,而不必放弃他们熟悉的风格。
{"title":"No Accounting for Taste: Supporting Developers' Individual Choices of Coding Styles","authors":"Isaac Moreira Medeiros Gomes, Daniel Coutinho, Marcelo Schots","doi":"10.1109/SCAM.2019.00018","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00018","url":null,"abstract":"When creating their programs, developers usually have a preferred or standardized style of their own to write code, known as coding style. Such code is usually stored in a version control repository, through which collaborative work usually takes place. However, in such a setting, isolated attempts of standardization can lead to several coding styles coexisting in the same project, causing the opposite effect to that intended. Besides increasing the effort required to understand code, coding style conflicts may also clutter repository history as developers change existing styles to their usual preferences. To overcome this problem, we propose an approach to support the definition of a repository coding style while allowing developers to use their preferred coding style. To illustrate our approach, we built the RECoSt tool and applied it using real excerpts of a popular open source project. Our proposed approach intends to help developers keep their projects' coding style standardized without having to abandon the style they are familiar with.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128492644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Contextualizing Rename Decisions using Refactorings and Commit Messages 使用重构和提交消息对重命名决策进行上下文化
Anthony S Peruma, Mohamed Wiem Mkaouer, M. J. Decker, Christian D. Newman
Identifier names are the atoms of comprehension; weak identifier names decrease productivity by increasing the chance that developers make mistakes and increasing the time taken to understand chunks of code. Therefore, it is vital to support developers in naming, and renaming, identifiers. In this paper, we study how terms in an identifier change during the application of rename refactorings and contextualize these changes using co-occurring refactorings and commit messages. The goal of this work is to understand how different development activities affect the type of changes applied to names during a rename. Results of this study can help researchers understand more about developers' naming habits and support developers in determining when to rename and what words to use.
标识符名称是理解的原子;弱标识符名称增加了开发人员犯错的机会,并增加了理解代码块所需的时间,从而降低了生产力。因此,支持开发人员命名和重命名标识符是至关重要的。在本文中,我们研究了在应用重命名重构过程中标识符中的术语是如何变化的,并使用共同发生的重构和提交消息将这些变化置于上下文环境中。这项工作的目标是了解在重命名期间不同的开发活动如何影响应用于名称的更改类型。本研究的结果可以帮助研究人员更多地了解开发人员的命名习惯,并支持开发人员决定何时重命名以及使用哪些单词。
{"title":"Contextualizing Rename Decisions using Refactorings and Commit Messages","authors":"Anthony S Peruma, Mohamed Wiem Mkaouer, M. J. Decker, Christian D. Newman","doi":"10.1109/SCAM.2019.00017","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00017","url":null,"abstract":"Identifier names are the atoms of comprehension; weak identifier names decrease productivity by increasing the chance that developers make mistakes and increasing the time taken to understand chunks of code. Therefore, it is vital to support developers in naming, and renaming, identifiers. In this paper, we study how terms in an identifier change during the application of rename refactorings and contextualize these changes using co-occurring refactorings and commit messages. The goal of this work is to understand how different development activities affect the type of changes applied to names during a rename. Results of this study can help researchers understand more about developers' naming habits and support developers in determining when to rename and what words to use.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125029576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Rascal, 10 Years Later 十年后的流氓
P. Klint, T. Storm, J. Vinju
When we designed the first version of Rascal in 2009, we jokingly promised ourselves to only write a single paper on the language itself, and see it as vehicle for research from then on,—that one paper became the SCAM 2009 article, now awarded with the SCAM most influential paper award. Since then, Rascal has evolved significantly, and has been successfully applied in research, education, and industry. This extended abstract gives an overview of the impact of Rascal over the last 10 years, and looks at current and future developments.
当我们在2009年设计Rascal的第一个版本时,我们开玩笑地承诺自己只写一篇关于语言本身的论文,并从那时起将其视为研究的工具,这篇论文成为了2009年的骗局文章,现在获得了骗局最具影响力论文奖。从那时起,Rascal有了显著的发展,并成功地应用于研究、教育和工业。这篇扩展摘要概述了《流氓》在过去10年的影响,并展望了当前和未来的发展。
{"title":"Rascal, 10 Years Later","authors":"P. Klint, T. Storm, J. Vinju","doi":"10.1109/SCAM.2019.00023","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00023","url":null,"abstract":"When we designed the first version of Rascal in 2009, we jokingly promised ourselves to only write a single paper on the language itself, and see it as vehicle for research from then on,—that one paper became the SCAM 2009 article, now awarded with the SCAM most influential paper award. Since then, Rascal has evolved significantly, and has been successfully applied in research, education, and industry. This extended abstract gives an overview of the impact of Rascal over the last 10 years, and looks at current and future developments.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126418276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On the Efficacy of Dynamic Behavior Comparison for Judging Functional Equivalence 论动态行为比较对判断功能对等的有效性
Marcus Kessel, C. Atkinson
Since it was first proposed in 1992 under the name of "behavior sampling", the idea of judging whether software systems are functionally equivalent by observing their responses to common stimuli (i.e. tests) has been used for a range of tasks such as software retrieval, functional redundancy measurement and semantic clone detection. However, its efficacy has only been studied in one small experiment, with limited generalizability, described in the original paper proposing the approach. The results of that experiment suggest that a relatively small number of randomly generated tests (i.e. 4) is sufficient to recognize non-functional-equivalent software 85% of the time. This number has therefore been adopted as "sufficient" in numerous applications of the approach. In this paper we present a much larger study which suggests at least 39 randomly generated tests are actually needed to achieve this level of effectiveness, but that a far fewer number of tests generated using coverage-based heuristics are sufficient. Since these results are much more generalizable, they have implications for future applications of behavioral sampling for dynamic behavior comparison.
自1992年以“行为抽样”的名义首次提出以来,通过观察软件系统对共同刺激(即测试)的反应来判断软件系统是否在功能上等同的想法已被用于一系列任务,如软件检索、功能冗余测量和语义克隆检测。然而,它的功效只在一个小实验中进行了研究,具有有限的普遍性,在提出该方法的原始论文中进行了描述。该实验的结果表明,相对少量的随机生成的测试(即4个)足以在85%的时间内识别非功能等效的软件。因此,在该方法的许多应用中,这个数字被认为是“足够的”。在这篇论文中,我们提出了一个更大的研究,它表明至少需要39个随机生成的测试来达到这个水平的有效性,但是使用基于覆盖率的启发式生成的测试数量要少得多就足够了。由于这些结果更具普遍性,它们对动态行为比较的行为抽样的未来应用具有启示意义。
{"title":"On the Efficacy of Dynamic Behavior Comparison for Judging Functional Equivalence","authors":"Marcus Kessel, C. Atkinson","doi":"10.1109/SCAM.2019.00030","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00030","url":null,"abstract":"Since it was first proposed in 1992 under the name of \"behavior sampling\", the idea of judging whether software systems are functionally equivalent by observing their responses to common stimuli (i.e. tests) has been used for a range of tasks such as software retrieval, functional redundancy measurement and semantic clone detection. However, its efficacy has only been studied in one small experiment, with limited generalizability, described in the original paper proposing the approach. The results of that experiment suggest that a relatively small number of randomly generated tests (i.e. 4) is sufficient to recognize non-functional-equivalent software 85% of the time. This number has therefore been adopted as \"sufficient\" in numerous applications of the approach. In this paper we present a much larger study which suggests at least 39 randomly generated tests are actually needed to achieve this level of effectiveness, but that a far fewer number of tests generated using coverage-based heuristics are sufficient. Since these results are much more generalizable, they have implications for future applications of behavioral sampling for dynamic behavior comparison.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"151 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113989070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Toward Optimal Selection of Information Retrieval Models for Software Engineering Tasks 面向软件工程任务的信息检索模型优选研究
Md Masudur Rahman, Saikat Chakraborty, G. Kaiser, Baishakhi Ray
Information Retrieval (IR) plays a pivotal role in diverse Software Engineering (SE) tasks, e.g., bug localization and triaging, bug report routing, code retrieval, requirements analysis, etc. SE tasks operate on diverse types of documents including code, text, stack-traces, and structured, semi-structured and unstructured meta-data that often contain specialized vocabularies. As the performance of any IR-based tool critically depends on the underlying document types, and given the diversity of SE corpora, it is essential to understand which models work best for which types of SE documents and tasks. We empirically investigate the interaction between IR models and document types for two representative SE tasks (bug localization and relevant project search), carefully chosen as they require a diverse set of SE artifacts (mixtures of code and text), and confirm that the models' performance varies significantly with mix of document types. Leveraging this insight, we propose a generalized framework, SRCH, to automatically select the most favorable IR model(s) for a given SE task. We evaluate SRCH w.r.t. these two tasks and confirm its effectiveness. Our preliminary user study shows that SRCH's intelligent adaption of the IR model(s) to the task at hand not only improves precision and recall for SE tasks but may also improve users' satisfaction.
信息检索(IR)在各种软件工程(SE)任务中起着关键作用,例如,错误定位和分类、错误报告路由、代码检索、需求分析等。SE任务对不同类型的文档进行操作,包括代码、文本、堆栈跟踪以及通常包含专门词汇表的结构化、半结构化和非结构化元数据。由于任何基于ir的工具的性能都严重依赖于底层文档类型,并且考虑到SE语料库的多样性,因此有必要了解哪种模型最适合哪种类型的SE文档和任务。我们根据经验调查了两个代表性SE任务(bug定位和相关项目搜索)的IR模型和文档类型之间的交互,仔细选择了它们,因为它们需要不同的SE工件集(代码和文本的混合),并确认模型的性能随着文档类型的混合而显着变化。利用这一见解,我们提出了一个广义框架SRCH,为给定的SE任务自动选择最有利的IR模型。我们对SRCH w.r.t.这两项任务进行了评估,并证实了其有效性。我们的初步用户研究表明,SRCH对手头任务的IR模型的智能适应不仅提高了SE任务的准确率和召回率,而且还可以提高用户满意度。
{"title":"Toward Optimal Selection of Information Retrieval Models for Software Engineering Tasks","authors":"Md Masudur Rahman, Saikat Chakraborty, G. Kaiser, Baishakhi Ray","doi":"10.1109/SCAM.2019.00022","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00022","url":null,"abstract":"Information Retrieval (IR) plays a pivotal role in diverse Software Engineering (SE) tasks, e.g., bug localization and triaging, bug report routing, code retrieval, requirements analysis, etc. SE tasks operate on diverse types of documents including code, text, stack-traces, and structured, semi-structured and unstructured meta-data that often contain specialized vocabularies. As the performance of any IR-based tool critically depends on the underlying document types, and given the diversity of SE corpora, it is essential to understand which models work best for which types of SE documents and tasks. We empirically investigate the interaction between IR models and document types for two representative SE tasks (bug localization and relevant project search), carefully chosen as they require a diverse set of SE artifacts (mixtures of code and text), and confirm that the models' performance varies significantly with mix of document types. Leveraging this insight, we propose a generalized framework, SRCH, to automatically select the most favorable IR model(s) for a given SE task. We evaluate SRCH w.r.t. these two tasks and confirm its effectiveness. Our preliminary user study shows that SRCH's intelligent adaption of the IR model(s) to the task at hand not only improves precision and recall for SE tasks but may also improve users' satisfaction.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133203980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Characterizing Leveraged Stack Overflow Posts 杠杆堆栈溢出职位的特征
Salvatore Geremia, G. Bavota, R. Oliveto, Michele Lanza, M. D. Penta
Stack Overflow is the most popular question and answer website on computer programming with more than 2.5M users, 16M questions, and a new answer posted, on average, every five seconds. This wide availability of data led researchers to develop techniques to mine Stack Overflow posts. The aim is to find and recommend posts with information useful to developers. However, and not surprisingly, not every Stack Overflow post is useful from a developer's perspective. We empirically investigate what the characteristics of "useful" Stack Overflow posts are. The underlying assumption of our study is that posts that were used (referenced in the source code) in the past by developers are likely to be useful. We refer to these posts as leveraged posts. We study the characteristics of leveraged posts as opposed to the non-leveraged ones, focusing on community aspects (e.g., the reputation of the user who authored the post), the quality of the included code snippets (e.g., complexity), and the quality of the post's textual content (e.g., readability). Then, we use these features to build a prediction model to automatically identify posts that are likely to be leveraged by developers. Results of the study indicate that post meta-data (e.g., the number of comments received by the answer) is particularly useful to predict whether it has been leveraged or not, whereas code readability appears to be less useful. A classifier can classify leveraged posts with a precision of 65% and recall of 49% and non-leveraged ones with a precision of 95% and recall of 97%. This opens the road towards an automatic identification of "high-quality content" in Stack Overflow.
Stack Overflow是最受欢迎的计算机编程问答网站,拥有超过250万用户,1600万个问题,平均每五秒钟发布一个新答案。数据的广泛可用性促使研究人员开发了挖掘Stack Overflow帖子的技术。其目的是寻找并推荐对开发人员有用的帖子。然而,从开发人员的角度来看,并不是每一篇Stack Overflow文章都有用,这并不奇怪。我们实证地调查了“有用的”Stack Overflow帖子的特征是什么。我们研究的基本假设是开发人员过去使用过(在源代码中引用过)的帖子可能是有用的。我们把这些帖子称为杠杆帖子。我们研究了与非杠杆帖子相反的杠杆帖子的特征,重点关注社区方面(例如,撰写帖子的用户的声誉),所包含代码片段的质量(例如,复杂性)和帖子文本内容的质量(例如,可读性)。然后,我们使用这些特性来构建一个预测模型,以自动识别可能被开发人员利用的帖子。研究结果表明,后元数据(例如,回答收到的评论数量)对于预测是否利用它特别有用,而代码可读性似乎不太有用。分类器对杠杆帖子的分类精度为65%,召回率为49%,对非杠杆帖子的分类精度为95%,召回率为97%。这为Stack Overflow中“高质量内容”的自动识别开辟了道路。
{"title":"Characterizing Leveraged Stack Overflow Posts","authors":"Salvatore Geremia, G. Bavota, R. Oliveto, Michele Lanza, M. D. Penta","doi":"10.1109/SCAM.2019.00025","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00025","url":null,"abstract":"Stack Overflow is the most popular question and answer website on computer programming with more than 2.5M users, 16M questions, and a new answer posted, on average, every five seconds. This wide availability of data led researchers to develop techniques to mine Stack Overflow posts. The aim is to find and recommend posts with information useful to developers. However, and not surprisingly, not every Stack Overflow post is useful from a developer's perspective. We empirically investigate what the characteristics of \"useful\" Stack Overflow posts are. The underlying assumption of our study is that posts that were used (referenced in the source code) in the past by developers are likely to be useful. We refer to these posts as leveraged posts. We study the characteristics of leveraged posts as opposed to the non-leveraged ones, focusing on community aspects (e.g., the reputation of the user who authored the post), the quality of the included code snippets (e.g., complexity), and the quality of the post's textual content (e.g., readability). Then, we use these features to build a prediction model to automatically identify posts that are likely to be leveraged by developers. Results of the study indicate that post meta-data (e.g., the number of comments received by the answer) is particularly useful to predict whether it has been leveraged or not, whereas code readability appears to be less useful. A classifier can classify leveraged posts with a precision of 65% and recall of 49% and non-leveraged ones with a precision of 95% and recall of 97%. This opens the road towards an automatic identification of \"high-quality content\" in Stack Overflow.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120958780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the Quality of Identifiers in Test Code 论测试代码中标识符的质量
B. Lin, Csaba Nagy, G. Bavota, Andrian Marcus, Michele Lanza
Meaningful, expressive identifiers in source code can enhance the readability and reduce comprehension efforts. Over the past years, researchers have devoted considerable effort to understanding and improving the naming quality of identifiers in source code. However, little attention has been given to test code, an important resource during program comprehension activities. To better grasp identifier quality in test code, we conducted a survey involving manually written and automatically generated test cases from ten open source software projects. The survey results indicate that test cases contain low quality identifiers, including the manually written ones, and that the quality of identifiers is lower in test code than in production code. We also investigated the use of three state-of-the-art rename refactoring recommenders for improving test code identifiers. The analysis highlights their limitations when applied to test code and supports mapping out a research agenda for future work in the area.
源代码中有意义的、表达性的标识符可以增强可读性并减少理解工作。在过去的几年中,研究人员投入了大量的精力来理解和提高源代码中标识符的命名质量。然而,很少有人关注测试代码,而测试代码是程序理解活动中的重要资源。为了更好地掌握测试代码中的标识符质量,我们进行了一项调查,包括从十个开源软件项目中手动编写和自动生成的测试用例。调查结果表明测试用例包含低质量的标识符,包括手工编写的标识符,并且测试代码中的标识符的质量比生产代码中的标识符的质量低。我们还研究了用于改进测试代码标识符的三个最先进的重命名重构推荐器的使用。当应用于测试代码时,分析强调了它们的局限性,并支持为该领域的未来工作制定研究议程。
{"title":"On the Quality of Identifiers in Test Code","authors":"B. Lin, Csaba Nagy, G. Bavota, Andrian Marcus, Michele Lanza","doi":"10.1109/SCAM.2019.00031","DOIUrl":"https://doi.org/10.1109/SCAM.2019.00031","url":null,"abstract":"Meaningful, expressive identifiers in source code can enhance the readability and reduce comprehension efforts. Over the past years, researchers have devoted considerable effort to understanding and improving the naming quality of identifiers in source code. However, little attention has been given to test code, an important resource during program comprehension activities. To better grasp identifier quality in test code, we conducted a survey involving manually written and automatically generated test cases from ten open source software projects. The survey results indicate that test cases contain low quality identifiers, including the manually written ones, and that the quality of identifiers is lower in test code than in production code. We also investigated the use of three state-of-the-art rename refactoring recommenders for improving test code identifiers. The analysis highlights their limitations when applied to test code and supports mapping out a research agenda for future work in the area.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"338 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115473051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1