The social construction of datasets: On the practices, processes, and challenges of dataset creation for machine learning

IF 4.5 1区 文学 Q1 COMMUNICATION New Media & Society Pub Date : 2024-08-30 DOI:10.1177/14614448241251797
Will Orr, Kate Crawford
{"title":"The social construction of datasets: On the practices, processes, and challenges of dataset creation for machine learning","authors":"Will Orr, Kate Crawford","doi":"10.1177/14614448241251797","DOIUrl":null,"url":null,"abstract":"Despite the critical role that datasets play in how systems make predictions and interpret the world, the dynamics of their construction are not well understood. Drawing on a corpus of interviews with dataset creators, we uncover the messy and contingent realities of dataset preparation. We identify four key challenges in constructing datasets, including balancing the benefits and costs of increasing dataset scale, limited access to resources, a reliance on shortcuts for compiling datasets and evaluating their quality, and ambivalence regarding accountability for a dataset. These themes illustrate the ways in which datasets are not objective or neutral but reflect the personal judgments and trade-offs of their creators within wider institutional dynamics, working within social, technical, and organizational constraints. We underscore the importance of examining the processes of dataset creation to strengthen an understanding of responsible practices for dataset development and care.","PeriodicalId":19149,"journal":{"name":"New Media & Society","volume":"2014 1","pages":""},"PeriodicalIF":4.5000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"New Media & Society","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1177/14614448241251797","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMMUNICATION","Score":null,"Total":0}
引用次数: 0

Abstract

Despite the critical role that datasets play in how systems make predictions and interpret the world, the dynamics of their construction are not well understood. Drawing on a corpus of interviews with dataset creators, we uncover the messy and contingent realities of dataset preparation. We identify four key challenges in constructing datasets, including balancing the benefits and costs of increasing dataset scale, limited access to resources, a reliance on shortcuts for compiling datasets and evaluating their quality, and ambivalence regarding accountability for a dataset. These themes illustrate the ways in which datasets are not objective or neutral but reflect the personal judgments and trade-offs of their creators within wider institutional dynamics, working within social, technical, and organizational constraints. We underscore the importance of examining the processes of dataset creation to strengthen an understanding of responsible practices for dataset development and care.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据集的社会构建:机器学习数据集创建的实践、流程与挑战
尽管数据集在系统预测和解释世界的过程中发挥着至关重要的作用,但人们对数据集的构建过程却知之甚少。通过对数据集创建者的访谈,我们揭示了数据集准备过程中混乱而偶然的现实。我们发现了构建数据集过程中的四大挑战,包括平衡数据集规模扩大带来的收益和成本、资源获取途径有限、依赖捷径编制数据集和评估其质量,以及对数据集责任的矛盾心理。这些主题说明了数据集并非客观或中立,而是反映了数据集创建者在社会、技术和组织限制条件下,在更广泛的制度动态中的个人判断和权衡。我们强调研究数据集创建过程的重要性,以加强对数据集开发和维护的负责任做法的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
New Media & Society
New Media & Society COMMUNICATION-
CiteScore
12.70
自引率
8.00%
发文量
274
期刊介绍: New Media & Society engages in critical discussions of the key issues arising from the scale and speed of new media development, drawing on a wide range of disciplinary perspectives and on both theoretical and empirical research. The journal includes contributions on: -the individual and the social, the cultural and the political dimensions of new media -the global and local dimensions of the relationship between media and social change -contemporary as well as historical developments -the implications and impacts of, as well as the determinants and obstacles to, media change the relationship between theory, policy and practice.
期刊最新文献
The journalists’ exodus: Navigating the transition from Twitter to Mastodon and other alternative platforms Explaining public communication change: A structure–actor model Memeability and sharenting: The affective economy of children on social media Locked among inequalities: A study of children’s digital experiences and digital divide during the COVID-19 pandemic Catch 22: Institutional ethics and researcher welfare within online extremism and terrorism research
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1