Augmenting API Documentation with Insights from Stack Overflow

Christoph Treude, M. Robillard
{"title":"Augmenting API Documentation with Insights from Stack Overflow","authors":"Christoph Treude, M. Robillard","doi":"10.1145/2884781.2884800","DOIUrl":null,"url":null,"abstract":"Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with \"insight sentences\" from Stack Overflow -- sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"17 1","pages":"392-403"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"235","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2884781.2884800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 235

Abstract

Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with "insight sentences" from Stack Overflow -- sentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用堆栈溢出的见解来增强API文档
软件开发人员需要访问不同类型的信息,这些信息通常分散在不同的文档源中,例如API文档或Stack Overflow。我们提出了一种方法,用Stack Overflow中的“洞察力句子”自动增加API文档,这些句子与特定API类型相关,并提供该类型API文档中未包含的洞察力。基于1,574个句子的开发集,我们比较了两种最先进的摘要技术以及基于模式的洞察句子提取方法的性能。然后,我们提出了SISE,这是一种基于机器学习的新方法,它使用句子本身、格式、问题、答案、作者以及词性标签和句子与相应API文档的相似性作为特征。使用SISE,我们能够在开发集上实现0.64的精度和0.7的覆盖率。在与8位软件开发人员的比较研究中,我们发现SISE产生了最多的句子,这些句子被认为添加了API文档中没有的有用信息。这些结果表明,考虑Stack Overflow上可用的元数据以及词性标签可以显着改善应用于Stack Overflow数据的无监督提取方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Scalable Thread Sharing Analysis Overcoming Open Source Project Entry Barriers with a Portal for Newcomers Nomen est Omen: Exploring and Exploiting Similarities between Argument and Parameter Names Reliability of Run-Time Quality-of-Service Evaluation Using Parametric Model Checking On the Techniques We Create, the Tools We Build, and Their Misalignments: A Study of KLEE
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1