{"title":"Recognizing Gender of Stack Overflow Users","authors":"B. Lin, Alexander Serebrenik","doi":"10.1145/2901739.2901777","DOIUrl":null,"url":null,"abstract":"Software development remains a predominantly male activity, despite coordinated efforts from research, industry, and policy makers. This gender imbalance is most visible in social programming, on platforms such as Stack Overflow.To better understand the reasons behind this disparity, and off er support for (corrective) decision making, we and others have been engaged in large-scale empirical studies of activity in these online platforms, in which gender is one of the variables of interest. However, since gender is not explicitly recorded, it is typically inferred by automatic \"gender guessers\", based on cues derived from an individual's online presence, such as their name and profi le picture. As opposed to self-reporting, used in earlier studies, gender guessers scale better, but their accuracy depends on the quantity and quality of data available in one's online pro le.In this paper we evaluate the applicability of different gender guessing approaches on several datasets derived from Stack Overflow. Our results suggest that the approaches combining different data sources perform the best.","PeriodicalId":6621,"journal":{"name":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","volume":"42 1","pages":"425-429"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2901739.2901777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43
Abstract
Software development remains a predominantly male activity, despite coordinated efforts from research, industry, and policy makers. This gender imbalance is most visible in social programming, on platforms such as Stack Overflow.To better understand the reasons behind this disparity, and off er support for (corrective) decision making, we and others have been engaged in large-scale empirical studies of activity in these online platforms, in which gender is one of the variables of interest. However, since gender is not explicitly recorded, it is typically inferred by automatic "gender guessers", based on cues derived from an individual's online presence, such as their name and profi le picture. As opposed to self-reporting, used in earlier studies, gender guessers scale better, but their accuracy depends on the quantity and quality of data available in one's online pro le.In this paper we evaluate the applicability of different gender guessing approaches on several datasets derived from Stack Overflow. Our results suggest that the approaches combining different data sources perform the best.