{"title":"Binary Classification: An Introductory Machine Learning Tutorial for Social Scientists","authors":"Vivian P. Ta, Leonardo Carrico, Arthur Bousquet","doi":"10.2458/jmmss.5186","DOIUrl":null,"url":null,"abstract":"A barrier that prevents many social scientists from pursuing big data research is the lack of technical training required to assemble and organize big data. In an effort to address this barrier, we provide an introductory tutorial into machine learning for social scientists by demonstrating the basic steps and fundamental concepts involved in binary classification. We first describe the data and libraries required for analysis. We then demonstrate data cleaning methods, feature engineering, the model-building process, model assessment, and feature importance. Last, we discuss the ways in which social scientists can use machine learning to complement inference-based approaches and how it can contribute to a richer understanding of social science.","PeriodicalId":90602,"journal":{"name":"Journal of methods and measurement in the social sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of methods and measurement in the social sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2458/jmmss.5186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A barrier that prevents many social scientists from pursuing big data research is the lack of technical training required to assemble and organize big data. In an effort to address this barrier, we provide an introductory tutorial into machine learning for social scientists by demonstrating the basic steps and fundamental concepts involved in binary classification. We first describe the data and libraries required for analysis. We then demonstrate data cleaning methods, feature engineering, the model-building process, model assessment, and feature importance. Last, we discuss the ways in which social scientists can use machine learning to complement inference-based approaches and how it can contribute to a richer understanding of social science.