LinkSO: a dataset for learning to retrieve similar question answer pairs on software development forums

Xueqing Liu, Chi Wang, Yue Leng, ChengXiang Zhai
{"title":"LinkSO: a dataset for learning to retrieve similar question answer pairs on software development forums","authors":"Xueqing Liu, Chi Wang, Yue Leng, ChengXiang Zhai","doi":"10.1145/3283812.3283815","DOIUrl":null,"url":null,"abstract":"We present LinkSO, a dataset for learning to rank similar questions on Stack Overflow. Stack Overflow contains a massive amount of crowd-sourced question links of high quality, which provides a great opportunity for evaluating retrieval algorithms for community-based question answer (cQA) archives and for learning to rank such archives. However, due to the existence of missing links, one question is whether question links can be readily used as the relevance judgment for evaluation. We study this question by measuring the closeness between question links and the relevance judgment, and we find their agreement rates range from 80% to 88%. We conduct an empirical study on the performance of existing work on LinkSO. While existing work focuses on non-learning approaches, our study results reveal that learning-based approaches has great potential to further improve the retrieval performance.","PeriodicalId":231305,"journal":{"name":"Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th ACM SIGSOFT International Workshop on NLP for Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3283812.3283815","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

We present LinkSO, a dataset for learning to rank similar questions on Stack Overflow. Stack Overflow contains a massive amount of crowd-sourced question links of high quality, which provides a great opportunity for evaluating retrieval algorithms for community-based question answer (cQA) archives and for learning to rank such archives. However, due to the existence of missing links, one question is whether question links can be readily used as the relevance judgment for evaluation. We study this question by measuring the closeness between question links and the relevance judgment, and we find their agreement rates range from 80% to 88%. We conduct an empirical study on the performance of existing work on LinkSO. While existing work focuses on non-learning approaches, our study results reveal that learning-based approaches has great potential to further improve the retrieval performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
LinkSO:用于学习检索软件开发论坛上的类似问题答案对的数据集
我们展示了LinkSO,一个用于学习对Stack Overflow上的类似问题进行排序的数据集。Stack Overflow包含大量高质量的众包问题链接,这为评估基于社区的问答(cQA)档案的检索算法和学习对这些档案进行排序提供了一个很好的机会。然而,由于缺失环节的存在,问题环节能否作为评价的相关性判断成为一个问题。我们通过测量问题链接和相关性判断之间的紧密程度来研究这个问题,我们发现它们的一致性在80%到88%之间。我们对LinkSO现有工作的绩效进行了实证研究。虽然现有的工作主要集中在非学习方法上,但我们的研究结果表明,基于学习的方法在进一步提高检索性能方面具有很大的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Mining monitoring concerns implementation in Java-based software systems Learning from code with graphs (keynote) Two perspectives on software documentation quality in stack overflow Natural language processing (NLP) applied on issue trackers Towards understanding code readability and its impact on design quality
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1