Representation Learning for Stack Overflow Posts: How Far are We?

IF 6.6 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING ACM Transactions on Software Engineering and Methodology Pub Date : 2023-12-07 DOI:10.1145/3635711
Junda He, Xin Zhou, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, David Lo
{"title":"Representation Learning for Stack Overflow Posts: How Far are We?","authors":"Junda He, Xin Zhou, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, David Lo","doi":"10.1145/3635711","DOIUrl":null,"url":null,"abstract":"<p>The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content. The performance of such solutions hinges significantly on the selection of representation models for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights the need for a powerful Stack Overflow post representation model and drives researchers’ interest in developing specialized representation models that can adeptly capture the intricacies of Stack Overflow posts. The state-of-the-art (SOTA) Stack Overflow post representation models are Post2Vec and BERTOverflow, which are built upon neural networks such as convolutional neural network (CNN) and transformer architecture (e.g., BERT). Despite their promising results, these representation methods have not been evaluated in the same experimental setting. To fill the research gap, we first empirically compare the performance of the representation models designed specifically for Stack Overflow posts (Post2Vec and BERTOverflow) in a wide range of related tasks, i.e., tag recommendation, relatedness prediction, and API recommendation. The results show that Post2Vec cannot further improve the state-of-the-art techniques of the considered downstream tasks, and BERTOverflow shows surprisingly poor performance. To find more suitable representation models for the posts, we further explore a diverse set of transformer-based models, including (1) general domain language models (RoBERTa, Longformer, GPT2) and (2) language models built with software engineering-related textual artifacts (CodeBERT, GraphCodeBERT, seBERT, CodeT5, PLBart, and CodeGen). This exploration shows that models like CodeBERT and RoBERTa are suitable for representing Stack Overflow posts. However, it also illustrates the “No Silver Bullet” concept, as none of the models consistently wins against all the others. Inspired by the findings, we propose SOBERT, which employs a simple yet effective strategy to improve the representation models of Stack Overflow posts by continuing the pre-training phase with the textual artifact from Stack Overflow. The overall experimental results demonstrate that SOBERT can consistently outperform the considered models and increase the state-of-the-art performance significantly for all the downstream tasks.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"101 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3635711","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 6

Abstract

The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content. The performance of such solutions hinges significantly on the selection of representation models for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights the need for a powerful Stack Overflow post representation model and drives researchers’ interest in developing specialized representation models that can adeptly capture the intricacies of Stack Overflow posts. The state-of-the-art (SOTA) Stack Overflow post representation models are Post2Vec and BERTOverflow, which are built upon neural networks such as convolutional neural network (CNN) and transformer architecture (e.g., BERT). Despite their promising results, these representation methods have not been evaluated in the same experimental setting. To fill the research gap, we first empirically compare the performance of the representation models designed specifically for Stack Overflow posts (Post2Vec and BERTOverflow) in a wide range of related tasks, i.e., tag recommendation, relatedness prediction, and API recommendation. The results show that Post2Vec cannot further improve the state-of-the-art techniques of the considered downstream tasks, and BERTOverflow shows surprisingly poor performance. To find more suitable representation models for the posts, we further explore a diverse set of transformer-based models, including (1) general domain language models (RoBERTa, Longformer, GPT2) and (2) language models built with software engineering-related textual artifacts (CodeBERT, GraphCodeBERT, seBERT, CodeT5, PLBart, and CodeGen). This exploration shows that models like CodeBERT and RoBERTa are suitable for representing Stack Overflow posts. However, it also illustrates the “No Silver Bullet” concept, as none of the models consistently wins against all the others. Inspired by the findings, we propose SOBERT, which employs a simple yet effective strategy to improve the representation models of Stack Overflow posts by continuing the pre-training phase with the textual artifact from Stack Overflow. The overall experimental results demonstrate that SOBERT can consistently outperform the considered models and increase the state-of-the-art performance significantly for all the downstream tasks.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Stack Overflow 帖子的表征学习:我们还有多远?
Stack Overflow 的巨大成功积累了大量的软件工程知识,因此促使研究人员提出了各种分析其内容的解决方案。这些解决方案的性能在很大程度上取决于对 Stack Overflow 帖子表示模型的选择。随着有关 Stack Overflow 的文献数量不断激增,这凸显了对强大的 Stack Overflow 帖子表示模型的需求,并推动了研究人员对开发能巧妙捕捉 Stack Overflow 帖子复杂性的专门表示模型的兴趣。最先进的 Stack Overflow 帖子表示模型(SOTA)是 Post2Vec 和 BERTOverflow,它们建立在卷积神经网络(CNN)和变压器架构(如 BERT)等神经网络的基础上。尽管这些表示方法取得了可喜的成果,但还没有在相同的实验环境中进行过评估。为了填补这一研究空白,我们首先对专为 Stack Overflow 帖子设计的表示模型(Post2Vec 和 BERTOverflow)在一系列相关任务(即标签推荐、相关性预测和 API 推荐)中的性能进行了实证比较。结果表明,在所考虑的下游任务中,Post2Vec 无法进一步改进最先进的技术,而 BERTOverflow 则表现出令人惊讶的低劣性能。为了找到更适合帖子的表示模型,我们进一步探索了一系列基于转换器的模型,包括:(1)通用领域语言模型(RoBERTa、Longformer、GPT2)和(2)使用软件工程相关文本工件构建的语言模型(CodeBERT、GraphCodeBERT、seBERT、CodeT5、PLBart 和 CodeGen)。这一探索表明,CodeBERT 和 RoBERTa 等模型适用于表示 Stack Overflow 帖子。不过,这也说明了 "没有银弹 "的概念,因为没有一个模型能在与所有其他模型的竞争中始终胜出。受这一发现的启发,我们提出了 SOBERT,它采用了一种简单而有效的策略,通过继续使用 Stack Overflow 的文本工件进行预训练来改进 Stack Overflow 帖子的表示模型。总体实验结果表明,SOBERT 可以持续超越所考虑的模型,并在所有下游任务中显著提高最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology 工程技术-计算机:软件工程
CiteScore
6.30
自引率
4.50%
发文量
164
审稿时长
>12 weeks
期刊介绍: Designing and building a large, complex software system is a tremendous challenge. ACM Transactions on Software Engineering and Methodology (TOSEM) publishes papers on all aspects of that challenge: specification, design, development and maintenance. It covers tools and methodologies, languages, data structures, and algorithms. TOSEM also reports on successful efforts, noting practical lessons that can be scaled and transferred to other projects, and often looks at applications of innovative technologies. The tone is scholarly but readable; the content is worthy of study; the presentation is effective.
期刊最新文献
Effective, Platform-Independent GUI Testing via Image Embedding and Reinforcement Learning Bitmap-Based Security Monitoring for Deeply Embedded Systems Harmonising Contributions: Exploring Diversity in Software Engineering through CQA Mining on Stack Overflow An Empirical Study on the Characteristics of Database Access Bugs in Java Applications Self-planning Code Generation with Large Language Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1