Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering最新文献

英文中文

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering 第四届ACM SIGSOFT软件工程NLP国际研讨会论文集

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812

引用次数: 3

TestNMT: function-to-test neural machine translation TestNMT:功能到测试的神经机器翻译

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812.3283823

Robert White, J. Krinke

Test generation can have a large impact on the software engineering process by decreasing the amount of time and effort required to maintain a high level of test coverage. This increases the quality of the resultant software while decreasing the associated effort. In this paper, we present TestNMT, an experimental approach to test generation using neural machine translation. TestNMT aims to learn to translate from functions to tests, allowing a developer to generate an approximate test for a given function, which can then be adapted to produce the final desired test. We also present a preliminary quantitative and qualitative evaluation of TestNMT in both cross-project and within-project scenarios. This evaluation shows that TestNMT is potentially useful in the within-project scenario, where it achieves a maximum BLEU score of 21.2, a maximum ROUGE-L score of 38.67, and is shown to be capable of generating approximate tests that are easy to adapt to working tests.

通过减少维护高水平测试覆盖率所需的时间和精力，测试生成可以对软件工程过程产生很大的影响。这增加了最终软件的质量，同时减少了相关的工作。在本文中，我们提出了TestNMT，一种使用神经机器翻译生成测试的实验方法。TestNMT旨在学习将函数转换为测试，允许开发人员为给定函数生成近似测试，然后可以对其进行调整以生成最终所需的测试。我们还在跨项目和项目内的场景中对TestNMT进行了初步的定量和定性评估。该评估表明，TestNMT在项目内场景中具有潜在的用途，其中它的BLEU得分最高为21.2,ROUGE-L得分最高为38.67，并且显示能够生成易于适应工作测试的近似测试。

引用次数: 5

Mining monitoring concerns implementation in Java-based software systems 挖掘监视涉及到基于java的软件系统中的实现

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812.3283821

G. Cojocar, A. Guran

In this paper we describe a new approach for automatic identification of monitoring concerns implementation in Java-based software systems. We also present the results obtained by using our approach on 21 Java-based systems, ranging from small to very large systems.

在本文中，我们描述了一种在基于java的软件系统中自动识别监视关注点实现的新方法。我们还展示了在21个基于java的系统上使用我们的方法获得的结果，从小型系统到非常大的系统。

引用次数: 1

Towards understanding code readability and its impact on design quality 理解代码可读性及其对设计质量的影响

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812.3283820

Umme Ayda Mannan, Iftekhar Ahmed, A. Sarma

Readability of code is commonly believed to impact the overall quality of software. Poor readability not only hinders developers from understanding what the code is doing but also can cause developers to make sub-optimal changes and introduce bugs. Developers also recognize this risk and state readability among their top information needs. Researchers have modeled readability scores. However, thus far, no one has investigated how readability evolves over time and how that impacts design quality of software. We perform a large scale study of 49 open source Java projects, spanning 8296 commits and 1766 files. We find that readability is high in open source projects and does not fluctuate over project’s lifetime unlike design quality of a project. Also readability has a non-significant correlation of 0.151 (Kendall’s τ ) with code smell count (indicator of design quality). Since current readability measure is unable to capture the increased difficulty in reading code due to the degraded design quality, our results hint towards the need of a better measurement and modeling of code readability.

代码的可读性通常被认为会影响软件的整体质量。糟糕的可读性不仅会阻碍开发人员理解代码在做什么，还会导致开发人员做出次优的更改并引入bug。开发人员也认识到这种风险，并将可读性列为他们最重要的信息需求。研究人员建立了可读性评分模型。然而，到目前为止，还没有人研究可读性如何随着时间的推移而演变，以及它如何影响软件的设计质量。我们对49个开源Java项目进行了大规模的研究，涵盖了8296个提交和1766个文件。我们发现可读性在开源项目中很高，并且与项目的设计质量不同，可读性在项目的整个生命周期中不会波动。此外，可读性与代码气味计数(设计质量指标)具有0.151(肯德尔τ)的非显著相关。由于当前的可读性度量无法捕捉到由于设计质量下降而增加的代码阅读难度，我们的结果提示需要更好的代码可读性度量和建模。

{"title":"Towards understanding code readability and its impact on design quality","authors":"Umme Ayda Mannan, Iftekhar Ahmed, A. Sarma","doi":"10.1145/3283812.3283820","DOIUrl":"https://doi.org/10.1145/3283812.3283820","url":null,"abstract":"Readability of code is commonly believed to impact the overall quality of software. Poor readability not only hinders developers from understanding what the code is doing but also can cause developers to make sub-optimal changes and introduce bugs. Developers also recognize this risk and state readability among their top information needs. Researchers have modeled readability scores. However, thus far, no one has investigated how readability evolves over time and how that impacts design quality of software. We perform a large scale study of 49 open source Java projects, spanning 8296 commits and 1766 files. We find that readability is high in open source projects and does not fluctuate over project’s lifetime unlike design quality of a project. Also readability has a non-significant correlation of 0.151 (Kendall’s τ ) with code smell count (indicator of design quality). Since current readability measure is unable to capture the increased difficulty in reading code due to the degraded design quality, our results hint towards the need of a better measurement and modeling of code readability.","PeriodicalId":231305,"journal":{"name":"Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127884225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Two perspectives on software documentation quality in stack overflow 堆栈溢出中软件文档质量的两个观点

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812.3283816

Mathias Ellmann, M. Schnecke

This paper studies the software documentation quality in Stack Overflow from two perspectives: the questioners’ who are accepting answers and the community’s who is voting for answers. We show what developers can do to increase the chance that their questions or answers get accepted by the community or by the questioners. We found different expectations of what information such as code or images should be included in a question or an answer. We evaluated six different quality indicators (such as Flesch Reading Ease or images) which a developer should consider before posting a question and an answer. In addition, we found different quality indicators for different types of questions, in particular error, discrepancy, and how-to questions. Finally we use a supervised machine-learning algorithm to predict when an answer will be accepted or voted.

本文从两个角度对Stack Overflow中的软件文档质量进行了研究:接受答案的提问者和对答案进行投票的社区。我们展示了开发者可以做些什么来增加他们的问题或答案被社区或提问者接受的机会。我们发现人们对问题或答案中应该包含什么信息(如代码或图像)有不同的期望。我们评估了开发者在发布问题和答案之前应该考虑的6个不同的质量指标(如Flesch Reading Ease或images)。此外，我们发现不同类型的问题有不同的质量指标，特别是错误、差异和how-to问题。最后，我们使用有监督的机器学习算法来预测答案何时被接受或投票。

引用次数: 3

3CAP: categorizing the cognitive capabilities of Alzheimer’s patients in a smart home environment 3CAP:对阿尔茨海默病患者在智能家居环境中的认知能力进行分类

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812.3283824

Kate M. Bowers, Reihaneh H. Hariri, Katey A. Price

Alzheimer’s disease is a progressive illness that affects more than 5.5 million people in the United States with no effective cure or treatment. Symptoms of the disease include declines in memory and speech abilities and increases in aggression and insomnia. Recent research suggests that NLP techniques can detect early cognitive decline as well as monitor the rate of decline over time. The processed data can be used in a smart home environment to enhance the level of home care for Alzheimer’s patients. This paper proposes early-stage research in software engineering and natural language processing for quantifying and evaluating the patient’s cognitive state to determine the required level of support in a smart home.

阿尔茨海默病是一种进行性疾病，在美国影响着超过550万人，没有有效的治疗方法。这种疾病的症状包括记忆力和语言能力下降，攻击性和失眠症增加。最近的研究表明，NLP技术可以检测早期认知能力下降，并监测随着时间的推移下降的速度。处理后的数据可以用于智能家居环境，以提高对阿尔茨海默病患者的家庭护理水平。本文提出了软件工程和自然语言处理的早期研究，用于量化和评估患者的认知状态，以确定智能家居中所需的支持水平。

引用次数: 4

A fine-grained approach for automated conversion of JUnit assertions to English 将JUnit断言自动转换为英语的细粒度方法

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812.3283819

Danielle Gonzalez, Suzanne Prentice, Mehdi Mirakhorli

Converting source or unit test code to English has been shown to improve the maintainability, understandability, and analysis of software and tests. Code summarizers identify 'important' statements in the source/tests and convert them to easily understood English sentences using static analysis and NLP techniques. However, current test summarization approaches handle only a subset of the variation and customization allowed in the JUnit assert API (a critical component of test cases) which may affect the accuracy of conversions. In this paper, we present our work towards improving JUnit test summarization with a detailed process for converting a total of 45 unique JUnit assertions to English, including 37 previously-unhandled variations of the assertThat method. This process has also been implemented and released as the AssertConvert tool. Initial evaluations have shown that this tool generates English conversions that accurately represent a wide variety of assertion statements which could be used for code summarization or other NLP analyses.

将源代码或单元测试代码转换为英语已被证明可以提高软件和测试的可维护性、可理解性和分析性。代码总结器识别源代码/测试中的“重要”语句，并使用静态分析和NLP技术将它们转换为易于理解的英语句子。然而，当前的测试总结方法只处理JUnit断言API(测试用例的关键组件)中允许的变化和定制的一个子集，这可能会影响转换的准确性。在本文中，我们通过将总共45个唯一的JUnit断言转换为英语的详细过程，介绍了我们为改进JUnit测试摘要所做的工作，其中包括assertThat方法以前未处理的37个变体。这个过程也已经作为AssertConvert工具实现和发布。最初的评估表明，该工具生成的英语转换可以准确地表示各种各样的断言语句，这些断言语句可用于代码摘要或其他NLP分析。

引用次数: 5

Natural language processing (NLP) applied on issue trackers 自然语言处理(NLP)在问题跟踪中的应用

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812.3283825

Mathias Ellmann

In the domain of software engineering NLP techniques are needed to use and find duplicate or similar development knowledge which are stored in development documentation as development tasks. To understand duplicate and similar development documentations we will discuss different NLP techniques as descriptive statistics, topic analysis and similarity algorithms as N-grams, the Jaccard or LSI algorithm as well as machine learning algorithms as Decision trees or support vector machines (SVM). Those techniques are used to reach a better understanding of the characteristics, the lexical relations (syntactical and semantical) and the classification and prediction of duplicate development tasks. We found that duplicate tasks share conceptual information and are rather created by inexperienced developers. By tuning different features to predict development tasks with a gradient or a Fidelity loss function a system can identify a duplicate tasks with a 100% accuracy.

在软件工程领域，需要使用和发现作为开发任务存储在开发文档中的重复或相似的开发知识。为了理解重复和相似的开发文档，我们将讨论不同的NLP技术，如描述性统计、主题分析和相似算法(如N-grams)、Jaccard或LSI算法以及机器学习算法(如决策树或支持向量机(SVM))。这些技术用于更好地理解特征、词汇关系(语法和语义)以及对重复开发任务的分类和预测。我们发现重复的任务共享概念信息，而且是由没有经验的开发人员创建的。通过调整不同的功能来预测具有梯度或保真度损失函数的开发任务，系统可以100%准确地识别重复任务。

引用次数: 6

LinkSO: a dataset for learning to retrieve similar question answer pairs on software development forums LinkSO:用于学习检索软件开发论坛上的类似问题答案对的数据集

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812.3283815

Xueqing Liu, Chi Wang, Yue Leng, ChengXiang Zhai

We present LinkSO, a dataset for learning to rank similar questions on Stack Overflow. Stack Overflow contains a massive amount of crowd-sourced question links of high quality, which provides a great opportunity for evaluating retrieval algorithms for community-based question answer (cQA) archives and for learning to rank such archives. However, due to the existence of missing links, one question is whether question links can be readily used as the relevance judgment for evaluation. We study this question by measuring the closeness between question links and the relevance judgment, and we find their agreement rates range from 80% to 88%. We conduct an empirical study on the performance of existing work on LinkSO. While existing work focuses on non-learning approaches, our study results reveal that learning-based approaches has great potential to further improve the retrieval performance.

我们展示了LinkSO，一个用于学习对Stack Overflow上的类似问题进行排序的数据集。Stack Overflow包含大量高质量的众包问题链接，这为评估基于社区的问答(cQA)档案的检索算法和学习对这些档案进行排序提供了一个很好的机会。然而，由于缺失环节的存在，问题环节能否作为评价的相关性判断成为一个问题。我们通过测量问题链接和相关性判断之间的紧密程度来研究这个问题，我们发现它们的一致性在80%到88%之间。我们对LinkSO现有工作的绩效进行了实证研究。虽然现有的工作主要集中在非学习方法上，但我们的研究结果表明，基于学习的方法在进一步提高检索性能方面具有很大的潜力。

引用次数: 11

Learning from code with graphs (keynote) 用图表学习代码(主题演讲)

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

Pub Date : 2018-11-04 DOI: 10.1145/3283812.3283813

Marc Brockschmidt

Learning from large corpora of source code ("Big Code") has seen increasing interest over the past few years. A first wave of work has focused on leveraging off-the-shelf methods from other machine learning fields such as natural language processing. While these techniques have succeeded in showing the feasibility of learning from code, and led to some initial practical solutions, they forego explicit use of known program semantics. In a range of recent work, we have tried to solve this issue by integrating deep learning techniques with program analysis methods in graphs. Graphs are a convenient, general formalism to model entities and their relationships, and are seeing increasing interest from machine learning researchers as well. In this talk, I present two applications of graph-based learning to understanding and generating programs and discuss a range of future work building on the success of this work.

在过去的几年中，从大型源代码语料库(“大代码”)中学习的兴趣越来越大。第一波工作的重点是利用其他机器学习领域的现成方法，如自然语言处理。虽然这些技术已经成功地展示了从代码中学习的可行性，并导致了一些最初的实用解决方案，但它们放弃了对已知程序语义的显式使用。在最近的一系列工作中，我们试图通过将深度学习技术与图中的程序分析方法集成来解决这个问题。图是一种方便的、通用的形式，可以对实体及其关系进行建模，机器学习研究人员对它的兴趣也越来越大。在这次演讲中，我介绍了基于图的学习在理解和生成程序方面的两种应用，并讨论了基于这项工作成功的一系列未来工作。

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀