{"title":"The social construction of datasets: On the practices, processes, and challenges of dataset creation for machine learning","authors":"Will Orr, Kate Crawford","doi":"10.1177/14614448241251797","DOIUrl":null,"url":null,"abstract":"Despite the critical role that datasets play in how systems make predictions and interpret the world, the dynamics of their construction are not well understood. Drawing on a corpus of interviews with dataset creators, we uncover the messy and contingent realities of dataset preparation. We identify four key challenges in constructing datasets, including balancing the benefits and costs of increasing dataset scale, limited access to resources, a reliance on shortcuts for compiling datasets and evaluating their quality, and ambivalence regarding accountability for a dataset. These themes illustrate the ways in which datasets are not objective or neutral but reflect the personal judgments and trade-offs of their creators within wider institutional dynamics, working within social, technical, and organizational constraints. We underscore the importance of examining the processes of dataset creation to strengthen an understanding of responsible practices for dataset development and care.","PeriodicalId":19149,"journal":{"name":"New Media & Society","volume":"2014 1","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Media & Society","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1177/14614448241251797","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}
引用次数: 0
Abstract
Despite the critical role that datasets play in how systems make predictions and interpret the world, the dynamics of their construction are not well understood. Drawing on a corpus of interviews with dataset creators, we uncover the messy and contingent realities of dataset preparation. We identify four key challenges in constructing datasets, including balancing the benefits and costs of increasing dataset scale, limited access to resources, a reliance on shortcuts for compiling datasets and evaluating their quality, and ambivalence regarding accountability for a dataset. These themes illustrate the ways in which datasets are not objective or neutral but reflect the personal judgments and trade-offs of their creators within wider institutional dynamics, working within social, technical, and organizational constraints. We underscore the importance of examining the processes of dataset creation to strengthen an understanding of responsible practices for dataset development and care.
期刊介绍:
New Media & Society engages in critical discussions of the key issues arising from the scale and speed of new media development, drawing on a wide range of disciplinary perspectives and on both theoretical and empirical research. The journal includes contributions on: -the individual and the social, the cultural and the political dimensions of new media -the global and local dimensions of the relationship between media and social change -contemporary as well as historical developments -the implications and impacts of, as well as the determinants and obstacles to, media change the relationship between theory, policy and practice.