首页 > 最新文献

Big Data Soc.最新文献

英文 中文
Agreements 'in the wild': Standards and alignment in machine learning benchmark dataset construction 野生 "协议:机器学习基准数据集构建中的标准与协调
Pub Date : 2024-04-03 DOI: 10.1177/20539517241242457
Isak Engdahl
This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.
本文通过人种学案例研究,介绍了一个企业-学术小组为各种机器学习和计算机视觉任务构建日常活动基准数据集的情况。文章采用社会技术视角,将数据集概念化为一种知识对象,它通过实用标准(日常活动、数据化、注释和基准)和协调工作(即为使这些标准在实践中有效而达成一致的努力)得到稳定。通过关注协调工作,文章强调了非正式的、沟通的和支持性的努力,这些努力是标准取得成功的基础,也是缓和参与者和因素之间紧张关系的基础。强调这些努力在多个方面做出了贡献。本文的人种学分析模式是对数据集量化指标的挑战和补充。它通过对作为集体成就的新基准数据集的开发进行详细的实证研究,推动了数据集分析领域的发展。通过展示对齐工作的重要性及其与标准的紧密联系和局限性,它加深了我们对机器学习数据集如何构建的理解。最重要的是,它对数据集的一个关键特征提出了质疑:数据集捕捉的是 "野外 "自然发生的无脚本活动,因为对齐工作会渗入数据捕捉的瞬间。
{"title":"Agreements 'in the wild': Standards and alignment in machine learning benchmark dataset construction","authors":"Isak Engdahl","doi":"10.1177/20539517241242457","DOIUrl":"https://doi.org/10.1177/20539517241242457","url":null,"abstract":"This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.","PeriodicalId":515929,"journal":{"name":"Big Data Soc.","volume":"953 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140748987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Role-based privacy cynicism and local privacy activism: How data stewards navigate privacy in higher education 基于角色的隐私愤世嫉俗和地方隐私激进主义:数据管理员如何在高等教育中保护隐私
Pub Date : 2024-04-03 DOI: 10.1177/20539517241240664
Mihaela Popescu, L. Baruh, Samuel Sudhakar
This study examines the impact of role-based constraints on privacy cynicism within higher education, a workplace increasingly subjected to surveillance. Using a thematic analysis of 15 in-depth interviews conducted between 2017 and 2023 with data stewards in the California State University System, the research explores the reasons behind data stewards’ privacy cynicism, despite their knowledge of privacy and their own ability to protect it. We investigate how academic data custodians navigate four role-based tensions: the conflict between the institutional and personal definitions of privacy; the mutual reinforcement between their privacy-cynical attitudes and their perceptions of student privacy attitudes; the influence of role constraints on data stewards’ privacy-protective behaviors; and the contrast between the negatively valued societal surveillance and the positively valued university surveillance. The findings underscore the significance of considering organizational privacy cultures and role-based expectations in studying privacy cynicism. The study contributes to the theoretical understanding of privacy cynicism and offers practical implications for organizations, emphasizing the importance of aligning organizational definitions of privacy with employees’ understanding. Future research should further explore the mutual reinforcement of privacy cynicism in the relationship between data providers and data consumers (which we call the “spiral of resignation”) and consider the impact of role-based constraints in other organizational contexts.
本研究探讨了基于角色的限制对高等教育中隐私愤世嫉俗情绪的影响,高等教育是一个日益受到监控的工作场所。通过对 2017 年至 2023 年期间与加利福尼亚州立大学系统数据管理员进行的 15 次深入访谈进行主题分析,本研究探讨了数据管理员在了解隐私知识并具备保护隐私能力的情况下仍对隐私持愤世嫉俗态度的原因。我们调查了学术数据管理员如何处理四种基于角色的紧张关系:机构和个人对隐私定义之间的冲突;他们对隐私的愤世嫉俗态度与他们对学生隐私态度的看法之间的相互促进;角色限制对数据管理员隐私保护行为的影响;以及消极的社会监督和积极的大学监督之间的对比。研究结果强调了在研究隐私愤世嫉俗时考虑组织隐私文化和基于角色的期望的重要性。本研究有助于从理论上理解隐私愤世嫉俗,并为组织提供了实际意义,强调了组织对隐私的定义与员工的理解相一致的重要性。未来的研究应进一步探讨隐私愤世嫉俗在数据提供者和数据消费者之间关系中的相互强化(我们称之为 "辞职螺旋"),并考虑基于角色的约束在其他组织环境中的影响。
{"title":"Role-based privacy cynicism and local privacy activism: How data stewards navigate privacy in higher education","authors":"Mihaela Popescu, L. Baruh, Samuel Sudhakar","doi":"10.1177/20539517241240664","DOIUrl":"https://doi.org/10.1177/20539517241240664","url":null,"abstract":"This study examines the impact of role-based constraints on privacy cynicism within higher education, a workplace increasingly subjected to surveillance. Using a thematic analysis of 15 in-depth interviews conducted between 2017 and 2023 with data stewards in the California State University System, the research explores the reasons behind data stewards’ privacy cynicism, despite their knowledge of privacy and their own ability to protect it. We investigate how academic data custodians navigate four role-based tensions: the conflict between the institutional and personal definitions of privacy; the mutual reinforcement between their privacy-cynical attitudes and their perceptions of student privacy attitudes; the influence of role constraints on data stewards’ privacy-protective behaviors; and the contrast between the negatively valued societal surveillance and the positively valued university surveillance. The findings underscore the significance of considering organizational privacy cultures and role-based expectations in studying privacy cynicism. The study contributes to the theoretical understanding of privacy cynicism and offers practical implications for organizations, emphasizing the importance of aligning organizational definitions of privacy with employees’ understanding. Future research should further explore the mutual reinforcement of privacy cynicism in the relationship between data providers and data consumers (which we call the “spiral of resignation”) and consider the impact of role-based constraints in other organizational contexts.","PeriodicalId":515929,"journal":{"name":"Big Data Soc.","volume":"211 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140748458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imaginaries of democratization and the value of open environmental data: Analysis of Microsoft's planetary computer 民主化的想象与开放环境数据的价值:微软行星计算机分析
Pub Date : 2024-04-03 DOI: 10.1177/20539517241242448
Przemyslaw Matt Lukacz
The proliferation of environmentally oriented programs within the tech industry, and the industry's coinciding efforts toward data and technology democratization, generate concerns about the status of environmental data within digital economy. While the accumulation of digital personal data has been a cornerstone of domination of the data analytics industry, many believe environmental data to be a source of “untapped potential.” The potential of environmental data, the argument goes, would benefit equally the digital economy, environmental sciences, and academic data and artificial intelligence experts. This article analyzes the proliferation of the rhetoric about open environmental data by focusing on Microsoft's Planetary Computer cloud computing program and computer vision experts who curate and use biodiversity data stored on Microsoft's servers. Through an analytical framework of sociotechnical imaginaries, the article draws connections between visions of future for environmental knowledge production and governance promoted by Microsoft and the work of computer vision experts intending to benefit from the potential of environmental data as machine learning training sets while at the same time helping environmental sciences. Although environmental data on the Planetary Computer is democratized, it nonetheless becomes a valued asset to data economy, but often with unintended consequences, such as enabling citizen science biodiversity data to be used by state surveillance apparatus. The article challenges the view that data's democratization is unproblematically serving environmental sciences by examining the consequences of imaginaries of democratization emerging from the data industry leaders and processes of nonmonetary valuation of environmental data by experts who curate these datasets.
科技行业中以环保为导向的项目激增,同时该行业也在努力实现数据和技术的民主化,这引发了人们对环境数据在数字经济中的地位的担忧。虽然数字个人数据的积累一直是数据分析行业主导地位的基石,但许多人认为环境数据是 "未开发潜力 "的来源。这种观点认为,环境数据的潜力将使数字经济、环境科学以及学术数据和人工智能专家同样受益。本文通过聚焦微软的行星计算机云计算项目,以及整理和使用存储在微软服务器上的生物多样性数据的计算机视觉专家,分析了开放环境数据言论的扩散。文章通过一个社会技术想象的分析框架,在微软倡导的环境知识生产和治理的未来愿景与计算机视觉专家的工作之间建立了联系,计算机视觉专家希望从环境数据作为机器学习训练集的潜力中获益,同时为环境科学提供帮助。尽管行星计算机上的环境数据是民主化的,但它仍然成为数据经济的宝贵资产,但往往会带来意想不到的后果,如公民科学的生物多样性数据被国家监控机构利用。文章通过研究数据行业领导者对民主化的想象所产生的后果,以及策划这些数据集的专家对环境数据进行非货币估值的过程,对数据民主化毫无问题地服务于环境科学的观点提出了质疑。
{"title":"Imaginaries of democratization and the value of open environmental data: Analysis of Microsoft's planetary computer","authors":"Przemyslaw Matt Lukacz","doi":"10.1177/20539517241242448","DOIUrl":"https://doi.org/10.1177/20539517241242448","url":null,"abstract":"The proliferation of environmentally oriented programs within the tech industry, and the industry's coinciding efforts toward data and technology democratization, generate concerns about the status of environmental data within digital economy. While the accumulation of digital personal data has been a cornerstone of domination of the data analytics industry, many believe environmental data to be a source of “untapped potential.” The potential of environmental data, the argument goes, would benefit equally the digital economy, environmental sciences, and academic data and artificial intelligence experts. This article analyzes the proliferation of the rhetoric about open environmental data by focusing on Microsoft's Planetary Computer cloud computing program and computer vision experts who curate and use biodiversity data stored on Microsoft's servers. Through an analytical framework of sociotechnical imaginaries, the article draws connections between visions of future for environmental knowledge production and governance promoted by Microsoft and the work of computer vision experts intending to benefit from the potential of environmental data as machine learning training sets while at the same time helping environmental sciences. Although environmental data on the Planetary Computer is democratized, it nonetheless becomes a valued asset to data economy, but often with unintended consequences, such as enabling citizen science biodiversity data to be used by state surveillance apparatus. The article challenges the view that data's democratization is unproblematically serving environmental sciences by examining the consequences of imaginaries of democratization emerging from the data industry leaders and processes of nonmonetary valuation of environmental data by experts who curate these datasets.","PeriodicalId":515929,"journal":{"name":"Big Data Soc.","volume":"240 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140750031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Big Data Soc.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1