2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)最新文献_第3页

How do developers discuss rationale? 开发人员如何讨论基本原理?

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330223

Rana Alkadhi, Manuel Nonnenmacher, Emitzá Guzmán, B. Brügge

Developers make various decisions during software development. The rationale behind these decisions is of great importance during software evolution of long living software systems. However, current practices for documenting rationale often fall short and rationale remains hidden in the heads of developers or embedded in development artifacts. Further challenges are faced for capturing rationale in OSS projects; in which developers are geographically distributed and rely mostly on written communication channels to support and coordinate their activities. In this paper, we present an empirical study to understand how OSS developers discuss rationale in IRC channels and explore the possibility of automatic extraction of rationale elements by analyzing IRC messages of development teams. To achieve this, we manually analyzed 7,500 messages of three large OSS projects and identified all fine-grained elements of rationale. We evaluated various machine learning algorithms for automatically detecting and classifying rationale in IRC messages. Our results show that 1) rationale is discussed on average in 25% of IRC messages, 2) code committers contributed on average 54% of the discussed rationale, and 3) machine learning algorithms can detect rationale with 0.76 precision and 0.79 recall, and classify messages into finer-grained rationale elements with an average of 0.45 precision and 0.43 recall.

开发人员在软件开发过程中做出各种各样的决策。这些决策背后的基本原理在长寿软件系统的软件进化过程中非常重要。然而，当前用于记录基本原理的实践常常不足，并且基本原理仍然隐藏在开发人员的头脑中或嵌入到开发工件中。在OSS项目中获取基本原理面临着进一步的挑战;在这种情况下，开发人员分布在不同的地理位置，主要依靠书面沟通渠道来支持和协调他们的活动。在本文中，我们提出了一项实证研究，以了解OSS开发人员如何在IRC通道中讨论基本原理，并通过分析开发团队的IRC消息来探索自动提取基本原理元素的可能性。为了实现这一点，我们手动分析了三个大型OSS项目的7500条消息，并确定了所有细粒度的基本原理元素。我们评估了用于自动检测和分类IRC消息的各种机器学习算法。我们的结果表明，1)平均25%的IRC消息中讨论了基本原理，2)代码提交者平均贡献了54%的讨论基本原理，3)机器学习算法可以以0.76的精度和0.79的召回率检测基本原理，并将消息分类为更细粒度的基本原理元素，平均精度为0.45，召回率为0.43。

{"title":"How do developers discuss rationale?","authors":"Rana Alkadhi, Manuel Nonnenmacher, Emitzá Guzmán, B. Brügge","doi":"10.1109/SANER.2018.8330223","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330223","url":null,"abstract":"Developers make various decisions during software development. The rationale behind these decisions is of great importance during software evolution of long living software systems. However, current practices for documenting rationale often fall short and rationale remains hidden in the heads of developers or embedded in development artifacts. Further challenges are faced for capturing rationale in OSS projects; in which developers are geographically distributed and rely mostly on written communication channels to support and coordinate their activities. In this paper, we present an empirical study to understand how OSS developers discuss rationale in IRC channels and explore the possibility of automatic extraction of rationale elements by analyzing IRC messages of development teams. To achieve this, we manually analyzed 7,500 messages of three large OSS projects and identified all fine-grained elements of rationale. We evaluated various machine learning algorithms for automatically detecting and classifying rationale in IRC messages. Our results show that 1) rationale is discussed on average in 25% of IRC messages, 2) code committers contributed on average 54% of the discussed rationale, and 3) machine learning algorithms can detect rationale with 0.76 precision and 0.79 recall, and classify messages into finer-grained rationale elements with an average of 0.45 precision and 0.43 recall.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"12 1","pages":"357-369"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81607058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Design patterns impact on software quality: Where are the theories? 设计模式对软件质量的影响:理论在哪里?

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330193

Foutse Khomh, Yann-Gaël Guéhéneuc

Software engineers are creators of habits. During software development, they follow again and again the same patterns when architecting, designing and implementing programs. Alexander introduced such patterns in architecture in 1974 and, 20 years later, they made their way in software development thanks to the work of Gamma et al. Software design patterns were promoted to make the design of programs more "flexible, modular, reusable, and understandable". However, ten years later, these patterns, their roles, and their impact on software quality were not fully understood. We then set out to study the impact of design patterns on different quality attributes and published a paper entitled "Do Design Patterns Impact Software Quality Positively?" in the proceedings of the 12th European Conference on Software Maintenance and Reengineering (CSMR) in 2008. Ten years later, this paper received the Most Influential Paper award at the 25th International Conference on Software Analysis, Evolution, and Reengineering (SANER) in 2018. In this retrospective paper for the award, we report and reflect on our and others' studies on the impact of design patterns, discussing some key findings reported about design patterns. We also take a step back from these studies and re-examine the role that design patterns should play in software development. Finally, we outline some avenues for future research work on design patterns, e.g., the identification of the patterns really used by developers, the theories explaining the impact of patterns, or their use to raise the abstraction level of programming languages.

软件工程师是习惯的创造者。在软件开发过程中，他们在架构、设计和实现程序时一次又一次地遵循相同的模式。Alexander于1974年在架构中引入了这种模式，20年后，由于Gamma等人的工作，它们在软件开发中取得了成功。软件设计模式被提倡使程序设计更加“灵活、模块化、可重用和可理解”。然而，十年之后，这些模式、它们的角色以及它们对软件质量的影响并没有被完全理解。然后，我们开始研究设计模式对不同质量属性的影响，并在2008年第12届欧洲软件维护和再工程会议(CSMR)上发表了题为“设计模式对软件质量有积极影响吗?”的论文。十年后，这篇论文在2018年第25届软件分析、进化与再造国际会议(SANER)上获得了最具影响力论文奖。在这篇获奖的回顾性论文中，我们报告并反思了我们和其他人关于设计模式影响的研究，讨论了一些关于设计模式的重要发现。我们也从这些研究中退后一步，重新审视设计模式在软件开发中应该扮演的角色。最后，我们概述了未来设计模式研究工作的一些途径，例如，确定开发人员真正使用的模式，解释模式影响的理论，或者使用它们来提高编程语言的抽象水平。

{"title":"Design patterns impact on software quality: Where are the theories?","authors":"Foutse Khomh, Yann-Gaël Guéhéneuc","doi":"10.1109/SANER.2018.8330193","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330193","url":null,"abstract":"Software engineers are creators of habits. During software development, they follow again and again the same patterns when architecting, designing and implementing programs. Alexander introduced such patterns in architecture in 1974 and, 20 years later, they made their way in software development thanks to the work of Gamma et al. Software design patterns were promoted to make the design of programs more \"flexible, modular, reusable, and understandable\". However, ten years later, these patterns, their roles, and their impact on software quality were not fully understood. We then set out to study the impact of design patterns on different quality attributes and published a paper entitled \"Do Design Patterns Impact Software Quality Positively?\" in the proceedings of the 12th European Conference on Software Maintenance and Reengineering (CSMR) in 2008. Ten years later, this paper received the Most Influential Paper award at the 25th International Conference on Software Analysis, Evolution, and Reengineering (SANER) in 2018. In this retrospective paper for the award, we report and reflect on our and others' studies on the impact of design patterns, discussing some key findings reported about design patterns. We also take a step back from these studies and re-examine the role that design patterns should play in software development. Finally, we outline some avenues for future research work on design patterns, e.g., the identification of the patterns really used by developers, the theories explaining the impact of patterns, or their use to raise the abstraction level of programming languages.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"14 1","pages":"15-25"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81699932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Efficient features for function matching between binary executables 用于二进制可执行文件之间的函数匹配的有效特性

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330221

Chariton Karamitas, A. Kehagias

Binary diffing is the process of reverse engineering two programs, when source code is not available, in order to study their syntactic and semantic differences. For large programs, binary diffing can be performed by function matching which, in turn, is reduced to a graph isomorphism problem between the compared programs' CFGs (Control Flow Graphs) and/or CGs (Call Graphs). In this paper we provide a set of carefully chosen features, extracted from a binary's CG and CFG, which can be used by BinDiff algorithm variants to, first, build a set of initial exact matches with minimal false positives (by scanning for unique perfect matches) and, second, propagate approximate matching information using, for example, a nearest-neighbor scheme. Furthermore, we investigate the benefits of applying Markov lumping techniques to function CFGs (to our knowledge, this technique has not been previously studied). The proposed function features are evaluated in a series of experiments on various versions of the Linux kernel (Intel64), the OpenSSH server (Intel64) and Firefox's xul.dll (IA-32). Our prototype system is also compared to Diaphora, the current state-of-the-art binary diffing software.

二进制差分是对两个程序进行逆向工程的过程，在源代码不可用的情况下，以研究它们的语法和语义差异。对于大型程序，二元差分可以通过函数匹配来执行，而函数匹配反过来又被简化为比较程序的cfg(控制流图)和/或CGs(调用图)之间的图同构问题。在本文中，我们提供了一组精心选择的特征，从二进制的CG和CFG中提取，它们可以被BinDiff算法变体使用，首先，建立一组具有最小误报的初始精确匹配(通过扫描唯一的完美匹配)，其次，使用例如最近邻方案传播近似匹配信息。此外，我们研究了将马尔可夫集总技术应用于功能cfg的好处(据我们所知，该技术以前没有研究过)。在不同版本的Linux内核(Intel64)、OpenSSH服务器(Intel64)和Firefox的xul.dll (IA-32)上进行了一系列实验，评估了所建议的功能特性。我们的原型系统也比较了Diaphora，目前最先进的二进制差分软件。

{"title":"Efficient features for function matching between binary executables","authors":"Chariton Karamitas, A. Kehagias","doi":"10.1109/SANER.2018.8330221","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330221","url":null,"abstract":"Binary diffing is the process of reverse engineering two programs, when source code is not available, in order to study their syntactic and semantic differences. For large programs, binary diffing can be performed by function matching which, in turn, is reduced to a graph isomorphism problem between the compared programs' CFGs (Control Flow Graphs) and/or CGs (Call Graphs). In this paper we provide a set of carefully chosen features, extracted from a binary's CG and CFG, which can be used by BinDiff algorithm variants to, first, build a set of initial exact matches with minimal false positives (by scanning for unique perfect matches) and, second, propagate approximate matching information using, for example, a nearest-neighbor scheme. Furthermore, we investigate the benefits of applying Markov lumping techniques to function CFGs (to our knowledge, this technique has not been previously studied). The proposed function features are evaluated in a series of experiments on various versions of the Linux kernel (Intel64), the OpenSSH server (Intel64) and Firefox's xul.dll (IA-32). Our prototype system is also compared to Diaphora, the current state-of-the-art binary diffing software.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"86 9 1","pages":"335-345"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87684820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Detecting faulty empty cells in spreadsheets 检测电子表格中的错误空单元格

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330229

Liang Xu, Shuo Wang, Wensheng Dou, Bo Yang, Chushu Gao, Jun Wei, Tao Huang

Spreadsheets play an important role in various business tasks, such as financial reports and data analysis. In spreadsheets, empty cells are widely used for different purposes, e.g., separating different tables, or default value "0". However, a user may delete a formula unintentionally, and leave a cell empty. Such ad-hoc modification may introduce a faulty empty cell that should have a formula. We observe that the context of an empty cell can help determine whether the empty cell is faulty. For example, is the empty cell next to a cell array in which all cells share the same semantics? Does the empty cell have headers similar to other non-empty cells'? In this paper, we propose EmptyCheck, to detect faulty empty cells in spreadsheets. By analyzing the context of an empty cell, EmptyCheck validates whether the cell belong to a cell array. If yes, the empty cell is faulty since it does not contain a formula. We evaluate EmptyCheck on 100 randomly sampled EUSES spreadsheets. The experimental result shows that EmptyCheck can detect faulty empty cells with high precision (75.00%) and recall (87.04%). Existing techniques can detect only 4.26% of the true faulty empty cells that EmptyCheck detects.

电子表格在各种业务任务中发挥着重要作用，例如财务报告和数据分析。在电子表格中，空单元格被广泛用于不同的目的，例如，分隔不同的表，或默认值“0”。但是，用户可能会无意中删除公式，使单元格为空。这种特殊的修改可能会引入一个有缺陷的空单元格，而这个单元格应该有一个公式。我们观察到，空单元格的上下文可以帮助确定空单元格是否有故障。例如，单元格数组旁边的空单元格是否所有单元格都具有相同的语义?空单元格是否有类似于其他非空单元格的标题?在本文中，我们提出了EmptyCheck，以检测电子表格中的错误空单元格。通过分析空单元格的上下文，EmptyCheck验证该单元格是否属于单元格数组。是，空单元格故障，因为空单元格中没有公式。我们在100个随机抽样的EUSES电子表格上评估EmptyCheck。实验结果表明，EmptyCheck能够以较高的准确率(75.00%)和召回率(87.04%)检测出故障的空细胞。现有技术只能检测到EmptyCheck检测到的4.26%的真正故障空细胞。

{"title":"Detecting faulty empty cells in spreadsheets","authors":"Liang Xu, Shuo Wang, Wensheng Dou, Bo Yang, Chushu Gao, Jun Wei, Tao Huang","doi":"10.1109/SANER.2018.8330229","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330229","url":null,"abstract":"Spreadsheets play an important role in various business tasks, such as financial reports and data analysis. In spreadsheets, empty cells are widely used for different purposes, e.g., separating different tables, or default value \"0\". However, a user may delete a formula unintentionally, and leave a cell empty. Such ad-hoc modification may introduce a faulty empty cell that should have a formula. We observe that the context of an empty cell can help determine whether the empty cell is faulty. For example, is the empty cell next to a cell array in which all cells share the same semantics? Does the empty cell have headers similar to other non-empty cells'? In this paper, we propose EmptyCheck, to detect faulty empty cells in spreadsheets. By analyzing the context of an empty cell, EmptyCheck validates whether the cell belong to a cell array. If yes, the empty cell is faulty since it does not contain a formula. We evaluate EmptyCheck on 100 randomly sampled EUSES spreadsheets. The experimental result shows that EmptyCheck can detect faulty empty cells with high precision (75.00%) and recall (87.04%). Existing techniques can detect only 4.26% of the true faulty empty cells that EmptyCheck detects.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"85 1","pages":"423-433"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80334925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

DeepWeak: Reasoning common software weaknesses via knowledge graph embedding DeepWeak:通过知识图嵌入推理常见的软件弱点

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330232

Zhuobing Han, Xiaohong Li, Hongtao Liu, Zhenchang Xing, Zhiyong Feng

Common software weaknesses, such as improper input validation, integer overflow, can harm system security directly or indirectly, causing adverse effects such as denial-of-service, execution of unauthorized code. Common Weakness Enumeration (CWE) maintains a standard list and classification of common software weakness. Although CWE contains rich information about software weaknesses, including textual descriptions, common sequences and relations between software weaknesses, the current data representation, i.e., hyperlined documents, does not support advanced reasoning tasks on software weaknesses, such as prediction of missing relations and common consequences of CWEs. Such reasoning tasks become critical to managing and analyzing large numbers of common software weaknesses and their relations. In this paper, we propose to represent common software weaknesses and their relations as a knowledge graph, and develop a translation-based, description-embodied knowledge representation learning method to embed both software weaknesses and their relations in the knowledge graph into a semantic vector space. The vector representations (i.e., embeddings) of software weaknesses and their relations can be exploited for knowledge acquisition and inference. We conduct extensive experiments to evaluate the performance of software weakness and relation embeddings in three reasoning tasks, including CWE link prediction, CWE triple classification, and common consequence prediction. Our knowledge graph embedding approach outperforms other description- and/or structure-based representation learning methods.

常见的软件弱点，如不正确的输入验证、整数溢出，会直接或间接地损害系统安全性，导致诸如拒绝服务、执行未经授权的代码等不利影响。通用弱点枚举(Common Weakness Enumeration, CWE)维护一个通用软件弱点的标准列表和分类。尽管CWE包含了关于软件弱点的丰富信息，包括文本描述、常见序列和软件弱点之间的关系，但当前的数据表示，即超链接文档，不支持对软件弱点的高级推理任务，例如预测缺失关系和CWE的常见后果。这样的推理任务对于管理和分析大量常见软件弱点及其关系变得至关重要。本文提出将常见的软件弱点及其关系表示为知识图，并开发了一种基于翻译的、描述体现的知识表示学习方法，将知识图中的软件弱点及其关系嵌入到语义向量空间中。软件弱点及其关系的向量表示(即嵌入)可以用于知识获取和推理。我们进行了大量的实验来评估软件弱点和关系嵌入在三个推理任务中的性能，包括CWE链接预测、CWE三重分类和常见结果预测。我们的知识图嵌入方法优于其他基于描述和/或结构的表示学习方法。

{"title":"DeepWeak: Reasoning common software weaknesses via knowledge graph embedding","authors":"Zhuobing Han, Xiaohong Li, Hongtao Liu, Zhenchang Xing, Zhiyong Feng","doi":"10.1109/SANER.2018.8330232","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330232","url":null,"abstract":"Common software weaknesses, such as improper input validation, integer overflow, can harm system security directly or indirectly, causing adverse effects such as denial-of-service, execution of unauthorized code. Common Weakness Enumeration (CWE) maintains a standard list and classification of common software weakness. Although CWE contains rich information about software weaknesses, including textual descriptions, common sequences and relations between software weaknesses, the current data representation, i.e., hyperlined documents, does not support advanced reasoning tasks on software weaknesses, such as prediction of missing relations and common consequences of CWEs. Such reasoning tasks become critical to managing and analyzing large numbers of common software weaknesses and their relations. In this paper, we propose to represent common software weaknesses and their relations as a knowledge graph, and develop a translation-based, description-embodied knowledge representation learning method to embed both software weaknesses and their relations in the knowledge graph into a semantic vector space. The vector representations (i.e., embeddings) of software weaknesses and their relations can be exploited for knowledge acquisition and inference. We conduct extensive experiments to evaluate the performance of software weakness and relation embeddings in three reasoning tasks, including CWE link prediction, CWE triple classification, and common consequence prediction. Our knowledge graph embedding approach outperforms other description- and/or structure-based representation learning methods.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"8 1","pages":"456-466"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83740330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Re-evaluating method-level bug prediction 重新评估方法级bug预测

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330264

L. Pascarella, Fabio Palomba, Alberto Bacchelli

Bug prediction is aimed at supporting developers in the identification of code artifacts more likely to be defective. Researchers have proposed prediction models to identify bug prone methods and provided promising evidence that it is possible to operate at this level of granularity. Particularly, models based on a mixture of product and process metrics, used as independent variables, led to the best results. In this study, we first replicate previous research on method-level bug prediction on different systems/timespans. Afterwards, we reflect on the evaluation strategy and propose a more realistic one. Key results of our study show that the performance of the method-level bug prediction model is similar to what previously reported also for different systems/timespans, when evaluated with the same strategy. However—when evaluated with a more realistic strategy—all the models show a dramatic drop in performance exhibiting results close to that of a random classifier. Our replication and negative results indicate that method-level bug prediction is still an open challenge.

Bug预测的目的是支持开发人员识别更有可能存在缺陷的代码工件。研究人员提出了预测模型来识别容易出错的方法，并提供了有希望的证据，证明在这种粒度级别上操作是可能的。特别地，基于产品和过程度量的混合模型，作为独立变量使用，会产生最好的结果。在这项研究中，我们首先在不同的系统/时间跨度上复制了之前关于方法级bug预测的研究。随后，我们对评价策略进行了反思，并提出了较为现实的评价策略。我们研究的关键结果表明，当使用相同的策略进行评估时，方法级bug预测模型的性能与之前报道的不同系统/时间跨度的性能相似。然而，当使用更现实的策略进行评估时，所有模型都显示出性能的急剧下降，显示出接近随机分类器的结果。我们的复制和阴性结果表明，方法级bug预测仍然是一个开放的挑战。

引用次数: 28

Keep it simple: Is deep learning good for linguistic smell detection? 简单点说:深度学习对语言气味检测有好处吗?

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330265

Sarah Fakhoury, V. Arnaoudova, Cedric Noiseux, Foutse Khomh, G. Antoniol

Deep neural networks is a popular technique that has been applied successfully to domains such as image processing, sentiment analysis, speech recognition, and computational linguistic. Deep neural networks are machine learning algorithms that, in general, require a labeled set of positive and negative examples that are used to tune hyper-parameters and adjust model coefficients to learn a prediction function. Recently, deep neural networks have also been successfully applied to certain software engineering problem domains (e.g., bug prediction), however, results are shown to be outperformed by traditional machine learning approaches in other domains (e.g., recovering links between entries in a discussion forum). In this paper, we report our experience in building an automatic Linguistic Antipattern Detector (LAPD) using deep neural networks. We manually build and validate an oracle of around 1,700 instances and create binary classification models using traditional machine learning approaches and Convolutional Neural Networks. Our experience is that, considering the size of the oracle, the available hardware and software, as well as the theory to interpret results, deep neural networks are outperformed by traditional machine learning algorithms in terms of all evaluation metrics we used and resources (time and memory). Therefore, although deep learning is reported to produce results comparable and even superior to human experts for certain complex tasks, it does not seem to be a good fit for simple classification tasks like smell detection. Researchers and practitioners should be careful when selecting machine learning models for the problem at hand.

深度神经网络是一种流行的技术，已成功应用于图像处理、情感分析、语音识别和计算语言学等领域。深度神经网络是机器学习算法，一般来说，需要一组标记的正、负示例，用于调整超参数和调整模型系数，以学习预测函数。最近，深度神经网络也成功地应用于某些软件工程问题领域(例如，bug预测)，然而，在其他领域(例如，恢复讨论论坛中条目之间的链接)，结果显示传统机器学习方法的表现要好。在本文中，我们报告了我们使用深度神经网络构建自动语言反模式检测器(LAPD)的经验。我们手动构建并验证了一个包含约1,700个实例的oracle，并使用传统的机器学习方法和卷积神经网络创建了二元分类模型。我们的经验是，考虑到oracle的大小，可用的硬件和软件，以及解释结果的理论，深度神经网络在我们使用的所有评估指标和资源(时间和内存)方面都优于传统的机器学习算法。因此，尽管据报道深度学习在某些复杂任务中产生的结果与人类专家相当，甚至优于人类专家，但它似乎并不适合简单的分类任务，如气味检测。研究人员和实践者在为手头的问题选择机器学习模型时应该小心。

{"title":"Keep it simple: Is deep learning good for linguistic smell detection?","authors":"Sarah Fakhoury, V. Arnaoudova, Cedric Noiseux, Foutse Khomh, G. Antoniol","doi":"10.1109/SANER.2018.8330265","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330265","url":null,"abstract":"Deep neural networks is a popular technique that has been applied successfully to domains such as image processing, sentiment analysis, speech recognition, and computational linguistic. Deep neural networks are machine learning algorithms that, in general, require a labeled set of positive and negative examples that are used to tune hyper-parameters and adjust model coefficients to learn a prediction function. Recently, deep neural networks have also been successfully applied to certain software engineering problem domains (e.g., bug prediction), however, results are shown to be outperformed by traditional machine learning approaches in other domains (e.g., recovering links between entries in a discussion forum). In this paper, we report our experience in building an automatic Linguistic Antipattern Detector (LAPD) using deep neural networks. We manually build and validate an oracle of around 1,700 instances and create binary classification models using traditional machine learning approaches and Convolutional Neural Networks. Our experience is that, considering the size of the oracle, the available hardware and software, as well as the theory to interpret results, deep neural networks are outperformed by traditional machine learning algorithms in terms of all evaluation metrics we used and resources (time and memory). Therefore, although deep learning is reported to produce results comparable and even superior to human experts for certain complex tasks, it does not seem to be a good fit for simple classification tasks like smell detection. Researchers and practitioners should be careful when selecting machine learning models for the problem at hand.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"28 1","pages":"602-611"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81090953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

A deep neural network language model with contexts for source code 一个深度神经网络语言模型与上下文的源代码

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330220

A. Nguyen, Trong Duc Nguyen, H. Phan, T. Nguyen

Statistical language models (LMs) have been applied in several software engineering applications. However, they have issues in dealing with ambiguities in the names of program and API elements (classes and method calls). In this paper, inspired by the success of Deep Neural Network (DNN) in natural language processing, we present Dnn4C, a DNN language model that complements the local context of lexical code elements with both syntactic and type contexts. We designed a context-incorporating method to use with syntactic and type annotations for source code in order to learn to distinguish the lexical tokens in different syntactic and type contexts. Our empirical evaluation on code completion for real-world projects shows that Dnn4C relatively improves 11.6%, 16.3%, 27.1%, and 44.7% top-1 accuracy over the state-of-the-art language models for source code used with the same features: RNN LM, DNN LM, SLAMC, and n-gram LM, respectively. For another application, we showed that Dnn4C helps improve accuracy over n-gram LM in migrating source code from Java to C# with a machine translation model.

统计语言模型(LMs)已经在许多软件工程应用中得到了应用。然而，它们在处理程序和API元素(类和方法调用)名称的模糊性方面存在问题。在本文中，受深度神经网络(DNN)在自然语言处理中的成功启发，我们提出了Dnn4C，这是一种DNN语言模型，它将词法代码元素的局部上下文与语法和类型上下文相补充。为了学习区分不同语法和类型上下文中的词法标记，我们设计了一种上下文结合方法，用于源代码的语法和类型注释。我们对现实世界项目代码完成的经验评估表明，Dnn4C相对于具有相同特征的源代码的最先进语言模型(RNN LM、DNN LM、SLAMC和n-gram LM)分别提高了11.6%、16.3%、27.1%和44.7%的top-1准确率。对于另一个应用程序，我们展示了Dnn4C在使用机器翻译模型将源代码从Java迁移到c#时帮助提高了n-gram LM的准确性。

{"title":"A deep neural network language model with contexts for source code","authors":"A. Nguyen, Trong Duc Nguyen, H. Phan, T. Nguyen","doi":"10.1109/SANER.2018.8330220","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330220","url":null,"abstract":"Statistical language models (LMs) have been applied in several software engineering applications. However, they have issues in dealing with ambiguities in the names of program and API elements (classes and method calls). In this paper, inspired by the success of Deep Neural Network (DNN) in natural language processing, we present Dnn4C, a DNN language model that complements the local context of lexical code elements with both syntactic and type contexts. We designed a context-incorporating method to use with syntactic and type annotations for source code in order to learn to distinguish the lexical tokens in different syntactic and type contexts. Our empirical evaluation on code completion for real-world projects shows that Dnn4C relatively improves 11.6%, 16.3%, 27.1%, and 44.7% top-1 accuracy over the state-of-the-art language models for source code used with the same features: RNN LM, DNN LM, SLAMC, and n-gram LM, respectively. For another application, we showed that Dnn4C helps improve accuracy over n-gram LM in migrating source code from Java to C# with a machine translation model.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"26 1 1","pages":"323-334"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79745870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

A comparison of software engineering domain specific sentiment analysis tools 软件工程领域特定情感分析工具的比较

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330245

M. R. Islam, M. Zibran

Sentiment Analysis (SA) in software engineering (SE) text has drawn immense interests recently. The poor performance of general-purpose SA tools, when operated on SE text, has led to recent emergence of domain-specific SA tools especially designed for SE text. However, these domain-specific tools were tested on single dataset and their performances were compared mainly against general-purpose tools. Thus, two things remain unclear: (i) how well these tools really work on other datasets, and (ii) which tool to choose in which context. To address these concerns, we operate three recent domain-specific SA tools on three separate datasets. Using standard accuracy measurement metrics, we compute and compare their accuracies in the detection of sentiments in SE text.

情感分析(SA)在软件工程(SE)文本中的应用近年来引起了广泛的关注。通用SA工具在SE文本上运行时的糟糕性能导致最近出现了专门为SE文本设计的特定于领域的SA工具。然而，这些特定领域的工具是在单个数据集上进行测试的，它们的性能主要与通用工具进行比较。因此，有两件事仍然不清楚:(i)这些工具在其他数据集上的实际工作效果如何，以及(ii)在哪种情况下选择哪种工具。为了解决这些问题，我们在三个独立的数据集上操作了三个最新的特定于领域的SA工具。使用标准的精度度量指标，我们计算并比较了它们在SE文本情感检测中的精度。

引用次数: 21

How do scientists develop scientific software? An external replication 科学家如何开发科学软件?外部复制

2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Pub Date : 2018-03-20 DOI: 10.1109/SANER.2018.8330263

G. Pinto, I. Wiese, Luiz Felipe Dias

Although the goal of scientists is to do science, not to develop software, many scientists have extended their roles to include software development to their skills. However, since scientists have different background, it remains unclear how do they perceive software engineering practices or how do they acquire software engineering knowledge. In this paper we conducted an external replication of one influential 10 years paper about how scientists develop and use scientific software. In particular, we employed the same method (an on-line questionnaire) in a different population (R developers). When analyzing the more than 1,574 responses received, enriched with data gathered from their GitHub repositories, we correlated our findings with the original study. We found that the results were consistent in many ways, including: (1) scientists that develop software work mostly alone, (2) they decide themselves what they want to work on next, and (3) most of what they learnt came from self-study, rather than a formal education. However, we also uncover new facts, such as: some of the "pain points" regarding software development are not related to technical activities (e.g., interruptions, lack of collaborators, and lack of a reward system play a role). Our replication can help researchers, practitioners, and educators to better focus their efforts on topics that are important to the scientific community that develops software.

虽然科学家的目标是做科学，而不是开发软件，但许多科学家已经扩展了他们的角色，将软件开发包括到他们的技能中。然而，由于科学家有不同的背景，他们如何理解软件工程实践或者他们如何获得软件工程知识仍然不清楚。在本文中，我们对一篇关于科学家如何开发和使用科学软件的有影响力的10年论文进行了外部复制。特别是，我们在不同的人群(R开发人员)中采用了相同的方法(在线问卷调查)。在分析收到的超过1574份回复时，我们将我们的发现与原始研究联系起来，并从他们的GitHub存储库中收集了丰富的数据。我们发现结果在很多方面是一致的，包括:(1)开发软件的科学家大多是独自工作的，(2)他们自己决定下一步要做什么，(3)他们学到的大部分知识来自自学，而不是正规教育。然而，我们也发现了新的事实，例如:关于软件开发的一些“痛点”与技术活动无关(例如，中断，缺乏合作者，以及缺乏发挥作用的奖励系统)。我们的复制可以帮助研究人员、实践者和教育者更好地将他们的努力集中在对开发软件的科学社区很重要的主题上。

{"title":"How do scientists develop scientific software? An external replication","authors":"G. Pinto, I. Wiese, Luiz Felipe Dias","doi":"10.1109/SANER.2018.8330263","DOIUrl":"https://doi.org/10.1109/SANER.2018.8330263","url":null,"abstract":"Although the goal of scientists is to do science, not to develop software, many scientists have extended their roles to include software development to their skills. However, since scientists have different background, it remains unclear how do they perceive software engineering practices or how do they acquire software engineering knowledge. In this paper we conducted an external replication of one influential 10 years paper about how scientists develop and use scientific software. In particular, we employed the same method (an on-line questionnaire) in a different population (R developers). When analyzing the more than 1,574 responses received, enriched with data gathered from their GitHub repositories, we correlated our findings with the original study. We found that the results were consistent in many ways, including: (1) scientists that develop software work mostly alone, (2) they decide themselves what they want to work on next, and (3) most of what they learnt came from self-study, rather than a formal education. However, we also uncover new facts, such as: some of the \"pain points\" regarding software development are not related to technical activities (e.g., interruptions, lack of collaborators, and lack of a reward system play a role). Our replication can help researchers, practitioners, and educators to better focus their efforts on topics that are important to the scientific community that develops software.","PeriodicalId":6602,"journal":{"name":"2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER)","volume":"9 1","pages":"582-591"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79530759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20