Pub Date : 2024-04-03DOI: 10.1177/20539517241242457
Isak Engdahl
This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.
{"title":"Agreements 'in the wild': Standards and alignment in machine learning benchmark dataset construction","authors":"Isak Engdahl","doi":"10.1177/20539517241242457","DOIUrl":"https://doi.org/10.1177/20539517241242457","url":null,"abstract":"This article presents an ethnographic case study of a corporate-academic group constructing a benchmark dataset of daily activities for a variety of machine learning and computer vision tasks. Using a socio-technical perspective, the article conceptualizes the dataset as a knowledge object that is stabilized by both practical standards (for daily activities, datafication, annotation and benchmarks) and alignment work – that is, efforts including forging agreements to make these standards effective in practice. By attending to alignment work, the article highlights the informal, communicative and supportive efforts that underlie the success of standards and the smoothing of tensions between actors and factors. Emphasizing these efforts constitutes a contribution in several ways. This article's ethnographic mode of analysis challenges and supplements quantitative metrics on datasets. It advances the field of dataset analysis by offering a detailed empirical examination of the development of a new benchmark dataset as a collective accomplishment. By showing the importance of alignment efforts and their close ties to standards and their limitations, it adds to our understanding of how machine learning datasets are built. And, most importantly, it calls into question a key characterization of the dataset: that it captures unscripted activities occurring naturally ‘in the wild’, as alignment work bleeds into moments of data capture.","PeriodicalId":515929,"journal":{"name":"Big Data Soc.","volume":"953 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140748987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-03DOI: 10.1177/20539517241240664
Mihaela Popescu, L. Baruh, Samuel Sudhakar
This study examines the impact of role-based constraints on privacy cynicism within higher education, a workplace increasingly subjected to surveillance. Using a thematic analysis of 15 in-depth interviews conducted between 2017 and 2023 with data stewards in the California State University System, the research explores the reasons behind data stewards’ privacy cynicism, despite their knowledge of privacy and their own ability to protect it. We investigate how academic data custodians navigate four role-based tensions: the conflict between the institutional and personal definitions of privacy; the mutual reinforcement between their privacy-cynical attitudes and their perceptions of student privacy attitudes; the influence of role constraints on data stewards’ privacy-protective behaviors; and the contrast between the negatively valued societal surveillance and the positively valued university surveillance. The findings underscore the significance of considering organizational privacy cultures and role-based expectations in studying privacy cynicism. The study contributes to the theoretical understanding of privacy cynicism and offers practical implications for organizations, emphasizing the importance of aligning organizational definitions of privacy with employees’ understanding. Future research should further explore the mutual reinforcement of privacy cynicism in the relationship between data providers and data consumers (which we call the “spiral of resignation”) and consider the impact of role-based constraints in other organizational contexts.
{"title":"Role-based privacy cynicism and local privacy activism: How data stewards navigate privacy in higher education","authors":"Mihaela Popescu, L. Baruh, Samuel Sudhakar","doi":"10.1177/20539517241240664","DOIUrl":"https://doi.org/10.1177/20539517241240664","url":null,"abstract":"This study examines the impact of role-based constraints on privacy cynicism within higher education, a workplace increasingly subjected to surveillance. Using a thematic analysis of 15 in-depth interviews conducted between 2017 and 2023 with data stewards in the California State University System, the research explores the reasons behind data stewards’ privacy cynicism, despite their knowledge of privacy and their own ability to protect it. We investigate how academic data custodians navigate four role-based tensions: the conflict between the institutional and personal definitions of privacy; the mutual reinforcement between their privacy-cynical attitudes and their perceptions of student privacy attitudes; the influence of role constraints on data stewards’ privacy-protective behaviors; and the contrast between the negatively valued societal surveillance and the positively valued university surveillance. The findings underscore the significance of considering organizational privacy cultures and role-based expectations in studying privacy cynicism. The study contributes to the theoretical understanding of privacy cynicism and offers practical implications for organizations, emphasizing the importance of aligning organizational definitions of privacy with employees’ understanding. Future research should further explore the mutual reinforcement of privacy cynicism in the relationship between data providers and data consumers (which we call the “spiral of resignation”) and consider the impact of role-based constraints in other organizational contexts.","PeriodicalId":515929,"journal":{"name":"Big Data Soc.","volume":"211 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140748458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-03DOI: 10.1177/20539517241242448
Przemyslaw Matt Lukacz
The proliferation of environmentally oriented programs within the tech industry, and the industry's coinciding efforts toward data and technology democratization, generate concerns about the status of environmental data within digital economy. While the accumulation of digital personal data has been a cornerstone of domination of the data analytics industry, many believe environmental data to be a source of “untapped potential.” The potential of environmental data, the argument goes, would benefit equally the digital economy, environmental sciences, and academic data and artificial intelligence experts. This article analyzes the proliferation of the rhetoric about open environmental data by focusing on Microsoft's Planetary Computer cloud computing program and computer vision experts who curate and use biodiversity data stored on Microsoft's servers. Through an analytical framework of sociotechnical imaginaries, the article draws connections between visions of future for environmental knowledge production and governance promoted by Microsoft and the work of computer vision experts intending to benefit from the potential of environmental data as machine learning training sets while at the same time helping environmental sciences. Although environmental data on the Planetary Computer is democratized, it nonetheless becomes a valued asset to data economy, but often with unintended consequences, such as enabling citizen science biodiversity data to be used by state surveillance apparatus. The article challenges the view that data's democratization is unproblematically serving environmental sciences by examining the consequences of imaginaries of democratization emerging from the data industry leaders and processes of nonmonetary valuation of environmental data by experts who curate these datasets.
{"title":"Imaginaries of democratization and the value of open environmental data: Analysis of Microsoft's planetary computer","authors":"Przemyslaw Matt Lukacz","doi":"10.1177/20539517241242448","DOIUrl":"https://doi.org/10.1177/20539517241242448","url":null,"abstract":"The proliferation of environmentally oriented programs within the tech industry, and the industry's coinciding efforts toward data and technology democratization, generate concerns about the status of environmental data within digital economy. While the accumulation of digital personal data has been a cornerstone of domination of the data analytics industry, many believe environmental data to be a source of “untapped potential.” The potential of environmental data, the argument goes, would benefit equally the digital economy, environmental sciences, and academic data and artificial intelligence experts. This article analyzes the proliferation of the rhetoric about open environmental data by focusing on Microsoft's Planetary Computer cloud computing program and computer vision experts who curate and use biodiversity data stored on Microsoft's servers. Through an analytical framework of sociotechnical imaginaries, the article draws connections between visions of future for environmental knowledge production and governance promoted by Microsoft and the work of computer vision experts intending to benefit from the potential of environmental data as machine learning training sets while at the same time helping environmental sciences. Although environmental data on the Planetary Computer is democratized, it nonetheless becomes a valued asset to data economy, but often with unintended consequences, such as enabling citizen science biodiversity data to be used by state surveillance apparatus. The article challenges the view that data's democratization is unproblematically serving environmental sciences by examining the consequences of imaginaries of democratization emerging from the data industry leaders and processes of nonmonetary valuation of environmental data by experts who curate these datasets.","PeriodicalId":515929,"journal":{"name":"Big Data Soc.","volume":"240 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140750031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}