Recognizing Gender of Stack Overflow Users

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) Pub Date : 2016-05-14 DOI:10.1145/2901739.2901777

B. Lin, Alexander Serebrenik

{"title":"Recognizing Gender of Stack Overflow Users","authors":"B. Lin, Alexander Serebrenik","doi":"10.1145/2901739.2901777","DOIUrl":null,"url":null,"abstract":"Software development remains a predominantly male activity, despite coordinated efforts from research, industry, and policy makers. This gender imbalance is most visible in social programming, on platforms such as Stack Overflow.To better understand the reasons behind this disparity, and off er support for (corrective) decision making, we and others have been engaged in large-scale empirical studies of activity in these online platforms, in which gender is one of the variables of interest. However, since gender is not explicitly recorded, it is typically inferred by automatic \"gender guessers\", based on cues derived from an individual's online presence, such as their name and profi le picture. As opposed to self-reporting, used in earlier studies, gender guessers scale better, but their accuracy depends on the quantity and quality of data available in one's online pro le.In this paper we evaluate the applicability of different gender guessing approaches on several datasets derived from Stack Overflow. Our results suggest that the approaches combining different data sources perform the best.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"42 1","pages":"425-429"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2901739.2901777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 43

Abstract

Software development remains a predominantly male activity, despite coordinated efforts from research, industry, and policy makers. This gender imbalance is most visible in social programming, on platforms such as Stack Overflow.To better understand the reasons behind this disparity, and off er support for (corrective) decision making, we and others have been engaged in large-scale empirical studies of activity in these online platforms, in which gender is one of the variables of interest. However, since gender is not explicitly recorded, it is typically inferred by automatic "gender guessers", based on cues derived from an individual's online presence, such as their name and profi le picture. As opposed to self-reporting, used in earlier studies, gender guessers scale better, but their accuracy depends on the quantity and quality of data available in one's online pro le.In this paper we evaluate the applicability of different gender guessing approaches on several datasets derived from Stack Overflow. Our results suggest that the approaches combining different data sources perform the best.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

识别堆栈溢出用户的性别

尽管有来自研究、工业和政策制定者的协调努力，软件开发仍然是男性主导的活动。这种性别失衡在Stack Overflow等社交程序平台上最为明显。为了更好地理解这种差异背后的原因，并获得对(纠正)决策的支持，我们和其他人对这些在线平台的活动进行了大规模的实证研究，其中性别是感兴趣的变量之一。然而，由于性别没有被明确记录下来，它通常是由自动的“性别猜测者”来推断的，这是基于个人在线存在的线索，比如他们的名字和个人资料照片。与早期研究中使用的自我报告不同，性别猜测的规模更大，但其准确性取决于个人在线履历中可用数据的数量和质量。在本文中，我们评估了不同性别猜测方法在来自Stack Overflow的几个数据集上的适用性。我们的研究结果表明，结合不同数据源的方法效果最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)

自引率

0.00%

发文量