Predicting Stack Overflow Question Tags: A Multi-Class, Multi-Label Classification

E. M. Kavuk, Ayse Tosun Misirli
{"title":"Predicting Stack Overflow Question Tags: A Multi-Class, Multi-Label Classification","authors":"E. M. Kavuk, Ayse Tosun Misirli","doi":"10.1145/3387940.3391491","DOIUrl":null,"url":null,"abstract":"This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.","PeriodicalId":309659,"journal":{"name":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3387940.3391491","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This work proposes to predict the tags assigned for the posts on Stack Overflow platform. The raw data was obtained from the stackexchange.com including more than 50K posts and their associated tags given by the users. The posts' questions and titles are pre-processed, and the sentences in the posts are further transformed into features via Latent Dirichlet Allocation. The problem is a multi-class and multi-label classification and hence, we propose 1) one-against-all models for 15 most popularly used tags, and 2) a combined multi-tag classifier for finding the top K tags for a single post. Three algorithms are used to train the one-against-all classifiers to decide to what extent a post belongs to a tag. The probabilities of each post belonging to a tag are then combined to give the results of the multi-tag classifier with the best performing algorithm. The performance is compared with a baseline approach (kNN). Our multi-tag classifier achieves 55% recall and 39% F1-score.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
预测堆栈溢出问题标签:一个多类别,多标签分类
本研究提出预测Stack Overflow平台上文章的标签分配。原始数据是从stackexchange.com获得的,包括用户给出的5万多篇帖子及其相关标签。对帖子的问题和标题进行预处理,并通过Latent Dirichlet Allocation将帖子中的句子进一步转化为特征。这个问题是一个多类和多标签的分类,因此,我们提出1)针对15个最常用标签的一个对所有模型,以及2)一个组合的多标签分类器,用于为单个帖子找到前K个标签。三种算法被用来训练单对全分类器来决定文章在多大程度上属于一个标签。然后将每个帖子属于一个标签的概率结合起来,给出具有最佳性能算法的多标签分类器的结果。将性能与基线方法(kNN)进行比较。我们的多标签分类器达到了55%的召回率和39%的f1得分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Preliminary Systematic Mapping on Software Engineering for Robotic Systems: A Software Quality Perspective Generating API Test Data Using Deep Reinforcement Learning Human Factors in the Study of Automatic Software Repair: Future Directions for Research with Industry Strategies for Crowdworkers to Overcome Barriers in Competition-based Software Crowdsourcing Development Centralized Generic Interfaces in Hardware/Software Co-design for AI Accelerators
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1