{"title":"Biased Priorities, Biased Outcomes: Three Recommendations for Ethics-oriented Data Annotation Practices","authors":"Gunay Kazimzade, Milagros Miceli","doi":"10.1145/3375627.3375809","DOIUrl":null,"url":null,"abstract":"In this paper, we analyze the relation between data-related biases and practices of data annotation, by placing them in the context of market economy. We understand annotation as a praxis related to the sensemaking of data and investigate annotation practices for vision models by focusing on the values that are prioritized by industrial decision-makers and practitioners. The quality of data is critical for machine learning models as it holds the power to (mis-)represent the population it is intended to analyze. For autonomous systems to be able to make sense of the world, humans first need to make sense of the data these systems will be trained on. This paper addresses this issue, guided by the following research questions: Which goals are prioritized by decision-makers at the data annotation stage? How do these priorities correlate with data-related bias issues? Focusing on work practices and their context, our research goal aims at understanding the logics driving companies and their impact on the performed annotations. The study follows a qualitative design and is based on 24 interviews with relevant actors and extensive participatory observations, including several weeks of fieldwork at two companies dedicated to data annotation for vision models in Buenos Aires, Argentina and Sofia, Bulgaria. The prevalence of market-oriented values over socially responsible approaches is argued based on three corporate priorities that inform work practices in this field and directly shape the annotations performed: profit (short deadlines connected to the strive for profit are prioritized over alternative approaches that could prevent biased outcomes), standardization (the strive for standardized and, in many cases, reductive or biased annotations to make data fit the products and revenue plans of clients), and opacity (related to client's power to impose their criteria on the annotations that are performed. Criteria that most of the times remain opaque due to corporate confidentiality). Finally, we introduce three elements, aiming at developing ethics-oriented practices of data annotation, that could help prevent biased outcomes: transparency (regarding the documentation of data transformations, including information on responsibilities and criteria for decision-making.), education (training on the potential harms caused by AI and its ethical implications, that could help data annotators and related roles adopt a more critical approach towards the interpretation and labeling of data), and regulations (clear guidelines for ethical AI developed at the governmental level and applied both in private and public organizations).","PeriodicalId":93612,"journal":{"name":"Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society","volume":"449 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3375627.3375809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
In this paper, we analyze the relation between data-related biases and practices of data annotation, by placing them in the context of market economy. We understand annotation as a praxis related to the sensemaking of data and investigate annotation practices for vision models by focusing on the values that are prioritized by industrial decision-makers and practitioners. The quality of data is critical for machine learning models as it holds the power to (mis-)represent the population it is intended to analyze. For autonomous systems to be able to make sense of the world, humans first need to make sense of the data these systems will be trained on. This paper addresses this issue, guided by the following research questions: Which goals are prioritized by decision-makers at the data annotation stage? How do these priorities correlate with data-related bias issues? Focusing on work practices and their context, our research goal aims at understanding the logics driving companies and their impact on the performed annotations. The study follows a qualitative design and is based on 24 interviews with relevant actors and extensive participatory observations, including several weeks of fieldwork at two companies dedicated to data annotation for vision models in Buenos Aires, Argentina and Sofia, Bulgaria. The prevalence of market-oriented values over socially responsible approaches is argued based on three corporate priorities that inform work practices in this field and directly shape the annotations performed: profit (short deadlines connected to the strive for profit are prioritized over alternative approaches that could prevent biased outcomes), standardization (the strive for standardized and, in many cases, reductive or biased annotations to make data fit the products and revenue plans of clients), and opacity (related to client's power to impose their criteria on the annotations that are performed. Criteria that most of the times remain opaque due to corporate confidentiality). Finally, we introduce three elements, aiming at developing ethics-oriented practices of data annotation, that could help prevent biased outcomes: transparency (regarding the documentation of data transformations, including information on responsibilities and criteria for decision-making.), education (training on the potential harms caused by AI and its ethical implications, that could help data annotators and related roles adopt a more critical approach towards the interpretation and labeling of data), and regulations (clear guidelines for ethical AI developed at the governmental level and applied both in private and public organizations).