2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献

英文中文

Assessing Diffusion and Perception of Test Smells in Scala Projects 评估Scala项目中测试气味的扩散和感知

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00072

Jonas De Bleser, Dario Di Nucci, Coen De Roover

Test smells are, analogously to code smells, defined as the characteristics exhibited by poorly designed unit tests. Their negative impact on test effectiveness, understanding, and maintenance has been demonstrated by several empirical studies. However, the scope of these studies has been limited mostly to Java in combination with the JUnit testing framework. Results for other language and framework combinations are –-despite their prevalence in practice–- few and far between, which might skew our understanding of test smells. The combination of Scala and ScalaTest, for instance, offers more comprehensive means for defining and reusing test fixtures, thereby possibly reducing the diffusion and perception of fixture-related test smells. This paper therefore reports on two empirical studies conducted for this combination. In the first study, we analyse the tests of 164 open-source Scala projects hosted on GitHub for the diffusion of test smells. This required the transposition of their original definition to this new context, and the implementation of a tool (SOCRATES) for their automated detection. In the second study, we assess the perception and the ability of 14 Scala developers to identify test smells. For this context, our results show (i) that test smells have a low diffusion across test classes, (ii) that the most frequently occurring test smells are LazyTest, EagerTest, and AssertionRoulette, and (iii) that many developers were able to perceive but not to identify the smells.

与代码气味类似，测试气味被定义为设计不良的单元测试所表现出的特征。它们对测试有效性、理解和维护的负面影响已经被一些实证研究证明。然而，这些研究的范围主要局限于Java与JUnit测试框架的结合。其他语言和框架组合的结果——尽管它们在实践中很普遍——很少，这可能会扭曲我们对测试气味的理解。例如，Scala和ScalaTest的组合为定义和重用测试fixture提供了更全面的方法，从而可能减少与fixture相关的测试气味的扩散和感知。因此，本文报告了针对这一组合进行的两项实证研究。在第一项研究中，我们分析了托管在GitHub上的164个开源Scala项目的测试，用于测试气味的扩散。这需要将他们的原始定义转换到这个新的环境中，并实现一个工具(苏格拉底)来进行他们的自动检测。在第二项研究中，我们评估了14名Scala开发人员识别测试气味的感知和能力。对于这种情况，我们的结果显示(i)测试气味在测试类之间的扩散程度很低，(ii)最常见的测试气味是LazyTest、EagerTest和AssertionRoulette，以及(iii)许多开发人员能够感知但不能识别这些气味。

{"title":"Assessing Diffusion and Perception of Test Smells in Scala Projects","authors":"Jonas De Bleser, Dario Di Nucci, Coen De Roover","doi":"10.1109/MSR.2019.00072","DOIUrl":"https://doi.org/10.1109/MSR.2019.00072","url":null,"abstract":"Test smells are, analogously to code smells, defined as the characteristics exhibited by poorly designed unit tests. Their negative impact on test effectiveness, understanding, and maintenance has been demonstrated by several empirical studies. However, the scope of these studies has been limited mostly to Java in combination with the JUnit testing framework. Results for other language and framework combinations are –-despite their prevalence in practice–- few and far between, which might skew our understanding of test smells. The combination of Scala and ScalaTest, for instance, offers more comprehensive means for defining and reusing test fixtures, thereby possibly reducing the diffusion and perception of fixture-related test smells. This paper therefore reports on two empirical studies conducted for this combination. In the first study, we analyse the tests of 164 open-source Scala projects hosted on GitHub for the diffusion of test smells. This required the transposition of their original definition to this new context, and the implementation of a tool (SOCRATES) for their automated detection. In the second study, we assess the perception and the ability of 14 Scala developers to identify test smells. For this context, our results show (i) that test smells have a low diffusion across test classes, (ii) that the most frequently occurring test smells are LazyTest, EagerTest, and AssertionRoulette, and (iii) that many developers were able to perceive but not to identify the smells.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"36 1","pages":"457-467"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74805260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Analyzing Comment-Induced Updates on Stack Overflow 分析由注释引起的堆栈溢出更新

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00044

Abhishek Soni, Sarah Nadi

Stack Overflow is home to a large number of technical questions and answers. These answers also include comments from the community and other users about the answer's validity. Such comments may point to flaws in the posted answer or may indicate deprecated code that is no longer valid due to API changes. In this paper, we explore how comments affect answer updates on Stack Overflow, using the SOTorrent dataset. Our results show that a large number of answers on Stack Overflow are not updated, even when they receive comments that warrant an update. Our results can be used to build recommender systems that automatically identify answers that require updating, or even automatically update answers as needed.

Stack Overflow是大量技术问题和答案的家园。这些答案还包括来自社区和其他用户对答案有效性的评论。这样的评论可能会指出发布的答案中的缺陷，或者可能会指出由于API更改而不再有效的弃用代码。在本文中，我们使用SOTorrent数据集探讨评论如何影响Stack Overflow上的答案更新。我们的结果表明，Stack Overflow上的大量答案没有更新，即使他们收到需要更新的评论。我们的结果可以用来建立自动识别需要更新的答案的推荐系统，甚至可以根据需要自动更新答案。

引用次数: 9

STRAIT: A Tool for Automated Software Reliability Growth Analysis 海峡:自动化软件可靠性增长分析工具

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00025

Stanislav Chren, Radoslav Micko, Barbora Buhnova, B. Rossi

Reliability is an essential attribute of mission-and safety-critical systems. Software Reliability Growth Models (SRGMs) are regression-based models that use historical failure data to predict the reliability-related parameters. At the moment, there is no dedicated tool available that would be able to cover the whole process of SRGMs data preparation and application from issue repositories, discouraging replications and reuse in other projects. In this paper, we introduce STRAIT, a free and open-source tool for automatic software reliability growth analysis which utilizes data from issue repositories. STRAIT features downloading, filtering and processing of data from provided issue repositories for use in multiple SRGMs, suggesting the best fitting SRGM with multiple data snapshots to consider software evolution. The tool is designed to be highly extensible, in terms of additional issue repositories, SRGMs, and new data filtering and processing options. Quality engineers can use STRAIT for the evaluation of their software systems. The research community can use STRAIT for empirical studies which involve evaluation of new SRGMs or comparison of multiple SRGMs.

可靠性是任务和安全关键系统的基本属性。软件可靠性增长模型(SRGMs)是一种基于回归的模型，它使用历史故障数据来预测与可靠性相关的参数。目前，还没有专门的工具能够涵盖srgm数据准备和从问题存储库应用的整个过程，这阻碍了在其他项目中的复制和重用。在本文中，我们介绍了STRAIT，这是一个免费的开源工具，用于自动软件可靠性增长分析，它利用来自问题存储库的数据。STRAIT支持从提供的问题存储库中下载、过滤和处理数据，以便在多个SRGM中使用，这表明SRGM与多个数据快照最适合，以考虑软件进化。在附加问题存储库、srgm以及新的数据过滤和处理选项方面，该工具被设计为具有高度可扩展性。质量工程师可以使用STRAIT来评估他们的软件系统。研究界可以使用STRAIT进行实证研究，包括评估新的SRGMs或比较多个SRGMs。

{"title":"STRAIT: A Tool for Automated Software Reliability Growth Analysis","authors":"Stanislav Chren, Radoslav Micko, Barbora Buhnova, B. Rossi","doi":"10.1109/MSR.2019.00025","DOIUrl":"https://doi.org/10.1109/MSR.2019.00025","url":null,"abstract":"Reliability is an essential attribute of mission-and safety-critical systems. Software Reliability Growth Models (SRGMs) are regression-based models that use historical failure data to predict the reliability-related parameters. At the moment, there is no dedicated tool available that would be able to cover the whole process of SRGMs data preparation and application from issue repositories, discouraging replications and reuse in other projects. In this paper, we introduce STRAIT, a free and open-source tool for automatic software reliability growth analysis which utilizes data from issue repositories. STRAIT features downloading, filtering and processing of data from provided issue repositories for use in multiple SRGMs, suggesting the best fitting SRGM with multiple data snapshots to consider software evolution. The tool is designed to be highly extensible, in terms of additional issue repositories, SRGMs, and new data filtering and processing options. Quality engineers can use STRAIT for the evaluation of their software systems. The research community can use STRAIT for empirical studies which involve evaluation of new SRGMs or comparison of multiple SRGMs.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"63 1","pages":"105-110"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73864545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

SCOR: Source Code Retrieval with Semantics and Order 具有语义和顺序的源代码检索

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00012

Shayan A. Akbar, A. Kak

Word embeddings produced by the word2vec algorithm provide us with a strong mechanism to discover relationships between the words based on the degree to which they are contextually related to one another. In and of itself, algorithms like word2vec do not give us a mechanism to impose ordering constraints on the embedded word representations. Our main goal in this paper is to exploit the semantic word vectors obtained from word2vec in such a way that allows for the ordering constraints to be invoked on them when comparing a sequence of words in a query with a sequence of words in a file for source code retrieval. These ordering constraints employ the logic of Markov Random Fields (MRF), a framework used previously to enhance the precision of the source-code retrieval engines based on the Bag-of-Words (BoW) assumption. The work we present here demonstrates that by combining word2vec with the power of MRF, it is possible to achieve improvements between 6% and 30% in retrieval accuracy over the best results that can be obtained with the more traditional applications of MRF to representations based on term and term-term frequencies. The performance improvement was 30% for the Java AspectJ repository using only the titles of the bug reports provided by iBUGS, and 6% for the case of the Eclipse repository using titles as well as descriptions of the bug reports provided by BUGLinks.

word2vec算法生成的词嵌入为我们提供了一种强大的机制，可以根据词在上下文中的相互关联程度来发现词之间的关系。就其本身而言，像word2vec这样的算法并没有给我们一种机制来对嵌入的单词表示施加排序约束。我们在本文中的主要目标是利用从word2vec获得的语义词向量，以便在比较查询中的单词序列与用于源代码检索的文件中的单词序列时，可以调用对它们的排序约束。这些排序约束采用了马尔可夫随机场(MRF)的逻辑，这是一个以前用于提高基于词袋(BoW)假设的源代码检索引擎精度的框架。我们在这里展示的工作表明，通过将word2vec与MRF的力量相结合，可以实现检索精度在6%到30%之间的提高，而不是使用更传统的MRF应用于基于项和项-项频率的表示。对于仅使用iBUGS提供的bug报告标题的Java AspectJ存储库，性能提高了30%，对于使用BUGLinks提供的bug报告标题和描述的Eclipse存储库，性能提高了6%。

{"title":"SCOR: Source Code Retrieval with Semantics and Order","authors":"Shayan A. Akbar, A. Kak","doi":"10.1109/MSR.2019.00012","DOIUrl":"https://doi.org/10.1109/MSR.2019.00012","url":null,"abstract":"Word embeddings produced by the word2vec algorithm provide us with a strong mechanism to discover relationships between the words based on the degree to which they are contextually related to one another. In and of itself, algorithms like word2vec do not give us a mechanism to impose ordering constraints on the embedded word representations. Our main goal in this paper is to exploit the semantic word vectors obtained from word2vec in such a way that allows for the ordering constraints to be invoked on them when comparing a sequence of words in a query with a sequence of words in a file for source code retrieval. These ordering constraints employ the logic of Markov Random Fields (MRF), a framework used previously to enhance the precision of the source-code retrieval engines based on the Bag-of-Words (BoW) assumption. The work we present here demonstrates that by combining word2vec with the power of MRF, it is possible to achieve improvements between 6% and 30% in retrieval accuracy over the best results that can be obtained with the more traditional applications of MRF to representations based on term and term-term frequencies. The performance improvement was 30% for the Java AspectJ repository using only the titles of the bug reports provided by iBUGS, and 6% for the case of the Eclipse repository using titles as well as descriptions of the bug reports provided by BUGLinks.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"205 1","pages":"1-12"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77476392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Mining Rule Violations in JavaScript Code Snippets JavaScript代码片段中违反规则的挖掘

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00039

Uriel Campos, Guilherme Smethurst, João Pedro Moraes, R. Bonifácio, G. Pinto

Programming code snippets readily available on platforms such as StackOverflow are undoubtedly useful for software engineers. Unfortunately, these code snippets might contain issues such as deprecated, misused, or even buggy code. These issues could pass unattended, if developers do not have adequate knowledge, time, or tool support to catch them. In this work we expand the understanding of such issues (or the so called "violations") hidden in code snippets written in JavaScript, the programming language with the highest number of questions on StackOverflow. To characterize the violations, we extracted 336k code snippets from answers to JavaScript questions on StackOverflow and statically analyzed them using ESLinter, a JavaScript linter. We discovered that there is no single JavaScript code snippet without a rule violation. On average, our studied code snippets have 11 violations, but we found instances of more than 200 violations. In particular, rules related to stylistic issues are by far the most violated ones (82.9% of the violations pertain to this category). Possible errors, which developers might be more interested in, represent only 0.1% of the violations. Finally, we found a small fraction of code snippets flagged with possible errors being reused on actual GitHub software projects. Indeed, one single code snippet with possible errors was reused 1,261 times.

在诸如StackOverflow这样的平台上随时可用的编程代码片段无疑对软件工程师很有用。不幸的是，这些代码片段可能包含一些问题，比如不推荐的、误用的，甚至是有bug的代码。如果开发人员没有足够的知识、时间或工具支持来捕捉这些问题，那么这些问题可能会被忽视。在这项工作中，我们扩展了对这些问题(或所谓的“违规”)隐藏在用JavaScript编写的代码片段中的理解，JavaScript是StackOverflow上问题最多的编程语言。为了描述违规行为，我们从StackOverflow上的JavaScript问题答案中提取了336k代码片段，并使用JavaScript检查器ESLinter对其进行了静态分析。我们发现没有一个JavaScript代码片段不违反规则。平均而言，我们研究的代码片段有11个违规，但是我们发现了超过200个违规的实例。特别是与文体问题相关的规则是迄今为止违反最多的(82.9%的违规行为属于这一类)。开发人员可能更感兴趣的可能错误仅占违规行为的0.1%。最后，我们发现在实际的GitHub软件项目中重用了一小部分标记有可能错误的代码片段。实际上，一个可能存在错误的代码片段被重用了1,261次。

{"title":"Mining Rule Violations in JavaScript Code Snippets","authors":"Uriel Campos, Guilherme Smethurst, João Pedro Moraes, R. Bonifácio, G. Pinto","doi":"10.1109/MSR.2019.00039","DOIUrl":"https://doi.org/10.1109/MSR.2019.00039","url":null,"abstract":"Programming code snippets readily available on platforms such as StackOverflow are undoubtedly useful for software engineers. Unfortunately, these code snippets might contain issues such as deprecated, misused, or even buggy code. These issues could pass unattended, if developers do not have adequate knowledge, time, or tool support to catch them. In this work we expand the understanding of such issues (or the so called \"violations\") hidden in code snippets written in JavaScript, the programming language with the highest number of questions on StackOverflow. To characterize the violations, we extracted 336k code snippets from answers to JavaScript questions on StackOverflow and statically analyzed them using ESLinter, a JavaScript linter. We discovered that there is no single JavaScript code snippet without a rule violation. On average, our studied code snippets have 11 violations, but we found instances of more than 200 violations. In particular, rules related to stylistic issues are by far the most violated ones (82.9% of the violations pertain to this category). Possible errors, which developers might be more interested in, represent only 0.1% of the violations. Finally, we found a small fraction of code snippets flagged with possible errors being reused on actual GitHub software projects. Indeed, one single code snippet with possible errors was reused 1,261 times.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"138 1","pages":"195-199"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77489490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Snakes in Paradise?: Insecure Python-Related Coding Practices in Stack Overflow 天堂里的蛇?:堆栈溢出中与python相关的不安全编码实践

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00040

A. Rahman, Effat Farhana, Nasif Imtiaz

Despite being the most popular question and answer website for software developers, answers posted on Stack Overflow (SO) are susceptible to contain Python-related insecure coding practices. A systematic analysis on how frequently insecure coding practices appear in SO answers can help the SO community assess the prevalence of insecure Python code blocks in SO. An insecure coding practice is recurrent use of insecure coding patterns in Python. We conduct an empirical study using 529,054 code blocks collected from Python-related 44,966 answers posted on SO. We observe 7.1% of the 44,966 Python-related answers to include at least one insecure coding practice. The most frequently occurring insecure coding practice is code injection. We observe 9.8% of the 7,444 accepted answers to include at least one insecure code block. We also find user reputation not to relate with the presence of insecure code blocks, suggesting that both high and low-reputed users are likely to introduce insecure code blocks.

尽管是最受软件开发人员欢迎的问答网站，但Stack Overflow (SO)上发布的答案很容易包含与python相关的不安全编码实践。对不安全编码实践在SO答案中出现的频率进行系统分析可以帮助SO社区评估SO中不安全Python代码块的流行程度。不安全的编码实践是在Python中反复使用不安全的编码模式。我们使用从SO上发布的与python相关的44,966个答案中收集的529,054个代码块进行了实证研究。我们观察到，在44,966个与python相关的答案中，有7.1%至少包含一个不安全的编码实践。最常见的不安全编码实践是代码注入。我们观察到，在7,444个接受的答案中，有9.8%至少包含一个不安全的代码块。我们还发现用户声誉与不安全代码块的存在无关，这表明声誉高和低的用户都可能引入不安全的代码块。

引用次数: 21

GreenSource: A Large-Scale Collection of Android Code, Tests and Energy Metrics GreenSource:一个大规模的Android代码，测试和能量指标的集合

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00035

Rui Rua, Marco Couto, J. Saraiva

This paper presents the GreenSource infrastructure: a large body of open source code, executable Android applications, and curated dataset containing energy code metrics. The dataset contains energy metrics obtained by both static analysing the applications' source code and by executing them with available test inputs. To automate the execution of the applications we developed the AnaDroid tool which instruments its code, compiles and executes it with test inputs in any Android device, while collecting energy metrics. GreenSource includes all Android applications included in the MUSE Java source code repository, while AnaDroid implements all Android's energy greedy features described in the literature, GreenSource aims at characterizing energy consumption in the Android ecosystem, providing both Android developers and researchers a setting to reason about energy efficient Android software development.

本文介绍了GreenSource的基础设施:大量的开源代码，可执行的Android应用程序，以及包含能源代码指标的精选数据集。数据集包含通过静态分析应用程序源代码和使用可用的测试输入执行它们获得的能量度量。为了自动执行应用程序，我们开发了Android工具，该工具可以在任何Android设备上编写代码，编译并执行测试输入，同时收集能量指标。GreenSource包括MUSE Java源代码库中包含的所有Android应用程序，而AnaDroid实现了文献中描述的所有Android的能源贪婪功能，GreenSource旨在描述Android生态系统中的能源消耗，为Android开发人员和研究人员提供一个关于节能Android软件开发的设置。

引用次数: 10

The Rise of Android Code Smells: Who is to Blame? Android代码“臭味儿”:谁该负责?

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00071

Sarra Habchi, Naouel Moha, Romain Rouvoy

The rise of mobile apps as new software systems led to the emergence of new development requirements regarding performance. Development practices that do not respect these requirements can seriously hinder app performances and impair user experience, they qualify as code smells. Mobile code smells are generally associated with inexperienced developers who lack knowledge about the framework guidelines. However, this assumption remains unverified and there is no evidence about the role played by developers in the accrual of mobile code smells. In this paper, we therefore study the contributions of developers related to Android code smells. To support this study, we propose Sniffer, an open-source toolkit that mines Git repositories to extract developers' contributions as code smell histories. Using Sniffer, we analysed 255k commits from the change history of 324 Android apps. We found that the ownership of code smells is spread across developers regardless of their seniority. There are no distinct groups of code smell introducers and removers. Developers who introduce and remove code smells are mostly the same.

移动应用程序作为新的软件系统的兴起导致了关于性能的新开发需求的出现。不尊重这些需求的开发实践可能会严重阻碍应用程序的性能并损害用户体验，它们被称为代码气味。移动代码异味通常与缺乏框架指导方针知识的缺乏经验的开发人员有关。然而，这一假设尚未得到证实，也没有证据表明开发人员在移动代码气味积累中所扮演的角色。因此，在本文中，我们研究了开发人员对Android代码气味的贡献。为了支持这项研究，我们提出了Sniffer，这是一个开源工具包，可以挖掘Git存储库以提取开发人员贡献的代码气味历史。使用Sniffer，我们从324个Android应用的变更历史中分析了255k个提交。我们发现，代码气味的所有权分布在开发人员之间，无论他们的资历如何。没有明显的代码气味引入者和去除者之分。引入和消除代码异味的开发人员基本上是一样的。

引用次数: 20

Impact of Stack Overflow Code Snippets on Software Cohesion: A Preliminary Study 堆栈溢出代码片段对软件内聚的影响:初步研究

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00050

Mashal Ahmad, M. Cinnéide

Developers frequently copy code snippets from publicly-available resources such as Stack Overflow (SO). While this may lead to a 'quick fix' for a development problem, little is known about how these copied code snippets affect the code quality of the recipient application, or how the quality of the recipient classes subsequently evolve over the time of the project. This has an impact on whether such code copying should be encouraged, and how classes that receive such code snippets should be monitored during evolution. To investigate this issue, we used instances from the SOTorrent database where Java snippets had been copied from Stack Overflow into GitHub projects. In each case, we measured the quality of the recipient class just prior to the addition of the snippet, immediately after the addition of the snippet, and at a later stage in the project. Our goal was to determine if the addition of the snippet caused quality to improve or deteriorate, and what the long-term implications were for the quality of the recipient class. Code quality was measured using the cohesion metrics Low-level Similarity-based Class Cohesion (LSCC) and Class Cohesion (CC). Over a random sample of 378 classes that received code snippets copied from Stack Overflow to GitHub, we found that in almost 70% of the cases where the copied snippet affected cohesion, the effect was to reduce the cohesion of the recipient class. Furthermore, this deterioration in cohesion tends to persist in the subsequent evolution of the recipient class. In over 70% of cases the recipient class never fully regained the cohesion it lost in receiving the snippet. These results suggest that when copying code snippets from external repositories, more attention should be paid to integrating the code with the recipient class.

开发人员经常从公开可用的资源(如Stack Overflow (SO))复制代码片段。虽然这可能会导致开发问题的“快速修复”，但很少有人知道这些复制的代码片段如何影响接收方应用程序的代码质量，或者接收方类的质量随后如何随着项目的时间发展。这影响到是否应该鼓励这样的代码复制，以及在进化过程中应该如何监控接收这样的代码片段的类。为了调查这个问题，我们使用了SOTorrent数据库中的实例，其中Java片段已经从Stack Overflow复制到GitHub项目中。在每种情况下，我们都在添加代码片段之前、添加代码片段之后以及项目的后期阶段测量接收者类的质量。我们的目标是确定添加代码片段是否会提高或降低质量，以及对接收者类的质量的长期影响是什么。代码质量是使用内聚度量——基于低级相似性的类内聚(LSCC)和类内聚(CC)来测量的。我们随机抽样了378个类，这些类接收了从Stack Overflow复制到GitHub的代码片段，我们发现，在几乎70%的情况下，复制的代码片段影响了内聚性，其效果是降低了接收类的内聚性。此外，这种内聚的恶化倾向于在接受者类的后续进化中持续存在。在超过70%的情况下，接收类从未完全恢复它在接收代码片段时失去的内聚性。这些结果表明，当从外部存储库复制代码片段时，应该更加注意将代码与接收类集成在一起。

{"title":"Impact of Stack Overflow Code Snippets on Software Cohesion: A Preliminary Study","authors":"Mashal Ahmad, M. Cinnéide","doi":"10.1109/MSR.2019.00050","DOIUrl":"https://doi.org/10.1109/MSR.2019.00050","url":null,"abstract":"Developers frequently copy code snippets from publicly-available resources such as Stack Overflow (SO). While this may lead to a 'quick fix' for a development problem, little is known about how these copied code snippets affect the code quality of the recipient application, or how the quality of the recipient classes subsequently evolve over the time of the project. This has an impact on whether such code copying should be encouraged, and how classes that receive such code snippets should be monitored during evolution. To investigate this issue, we used instances from the SOTorrent database where Java snippets had been copied from Stack Overflow into GitHub projects. In each case, we measured the quality of the recipient class just prior to the addition of the snippet, immediately after the addition of the snippet, and at a later stage in the project. Our goal was to determine if the addition of the snippet caused quality to improve or deteriorate, and what the long-term implications were for the quality of the recipient class. Code quality was measured using the cohesion metrics Low-level Similarity-based Class Cohesion (LSCC) and Class Cohesion (CC). Over a random sample of 378 classes that received code snippets copied from Stack Overflow to GitHub, we found that in almost 70% of the cases where the copied snippet affected cohesion, the effect was to reduce the cohesion of the recipient class. Furthermore, this deterioration in cohesion tends to persist in the subsequent evolution of the recipient class. In over 70% of cases the recipient class never fully regained the cohesion it lost in receiving the snippet. These results suggest that when copying code snippets from external repositories, more attention should be paid to integrating the code with the recipient class.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"71 1","pages":"250-254"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90363369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Extracting API Tips from Developer Question and Answer Websites 从开发者问答网站中提取API技巧

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00058

Shaohua Wang, Nhathai Phan, Yan Wang, Yong Zhao

The success of question and answer (Q&A) websites attracts massive user-generated content for using and learning APIs, which easily leads to information overload: many questions for APIs have a large number of answers containing useful and irrelevant information, and cannot all be consumed by developers. In this work, we develop DeepTip, a novel deep learning-based approach using different Convolutional Neural Network architectures, to extract short practical and useful tips from developer answers. Our extensive empirical experiments prove that DeepTip can extract useful tips from a large corpus of answers to questions with high precision (i.e., avg. 0.854) and coverage (i.e., 0.94), and it outperforms two state-of-the-art baselines by up to 56.7% and 162%, respectively, in terms of Precision. Furthermore, qualitatively, a user study is conducted with real Stack Overflow users and its results confirm that tip extraction is useful and our approach generates high-quality tips.

问答(Q&A)网站的成功吸引了大量用户生成的api使用和学习内容，这很容易导致信息过载:许多api问题都有大量的答案，其中包含有用的和不相关的信息，并不能全部被开发人员消费。在这项工作中，我们开发了DeepTip，这是一种新颖的基于深度学习的方法，使用不同的卷积神经网络架构，从开发人员的回答中提取简短的实用和有用的提示。我们广泛的实证实验证明，DeepTip可以从大量的问题答案语料库中提取有用的提示，精度高(即平均0.854)，覆盖率高(即0.94)，在精度方面，它比两个最先进的基线分别高出56.7%和162%。此外，定性地，对真实Stack Overflow用户进行了用户研究，其结果证实了提示提取是有用的，并且我们的方法生成了高质量的提示。

引用次数: 23

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀