Augmenting API Documentation with Insights from Stack Overflow

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2016-05-14 DOI:10.1145/2884781.2884800

Christoph Treude, M. Robillard

{"title":"Augmenting API Documentation with Insights from Stack Overflow","authors":"Christoph Treude, M. Robillard","doi":"10.1145/2884781.2884800","DOIUrl":null,"url":null,"abstract":"Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with \"insight sentences\" from Stack Overflow -- sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"17 1","pages":"392-403"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"235","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2884781.2884800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 235

Abstract

Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with "insight sentences" from Stack Overflow -- sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用堆栈溢出的见解来增强API文档

软件开发人员需要访问不同类型的信息，这些信息通常分散在不同的文档源中，例如API文档或Stack Overflow。我们提出了一种方法，用Stack Overflow中的“洞察力句子”自动增加API文档，这些句子与特定API类型相关，并提供该类型API文档中未包含的洞察力。基于1,574个句子的开发集，我们比较了两种最先进的摘要技术以及基于模式的洞察句子提取方法的性能。然后，我们提出了SISE，这是一种基于机器学习的新方法，它使用句子本身、格式、问题、答案、作者以及词性标签和句子与相应API文档的相似性作为特征。使用SISE，我们能够在开发集上实现0.64的精度和0.7的覆盖率。在与8位软件开发人员的比较研究中，我们发现SISE产生了最多的句子，这些句子被认为添加了API文档中没有的有用信息。这些结果表明，考虑Stack Overflow上可用的元数据以及词性标签可以显着改善应用于Stack Overflow数据的无监督提取方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量

期刊最新文献

Scalable Thread Sharing Analysis Overcoming Open Source Project Entry Barriers with a Portal for Newcomers Nomen est Omen: Exploring and Exploiting Similarities between Argument and Parameter Names Reliability of Run-Time Quality-of-Service Evaluation Using Parametric Model Checking On the Techniques We Create, the Tools We Build, and Their Misalignments: A Study of KLEE