可访问性数据集的数据代表性:一个元分析。

ASSETS. Annual ACM Conference on Assistive Technologies Pub Date : 2022-10-01 DOI:10.1145/3517428.3544826

Rie Kamikubo, Lining Wang, Crystal Marte, Amnah Mahmood, Hernisa Kacorri

{"title":"可访问性数据集的数据代表性:一个元分析。","authors":"Rie Kamikubo, Lining Wang, Crystal Marte, Amnah Mahmood, Hernisa Kacorri","doi":"10.1145/3517428.3544826","DOIUrl":null,"url":null,"abstract":"As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets-datasets sourced from people with disabilities and older adults-that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (e.g., gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.","PeriodicalId":72321,"journal":{"name":"ASSETS. Annual ACM Conference on Assistive Technologies","volume":"2022 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10024595/pdf/nihms-1869788.pdf","citationCount":"8","resultStr":"{\"title\":\"Data Representativeness in Accessibility Datasets: A Meta-Analysis.\",\"authors\":\"Rie Kamikubo, Lining Wang, Crystal Marte, Amnah Mahmood, Hernisa Kacorri\",\"doi\":\"10.1145/3517428.3544826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets-datasets sourced from people with disabilities and older adults-that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (e.g., gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.\",\"PeriodicalId\":72321,\"journal\":{\"name\":\"ASSETS. Annual ACM Conference on Assistive Technologies\",\"volume\":\"2022 \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10024595/pdf/nihms-1869788.pdf\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASSETS. Annual ACM Conference on Assistive Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3517428.3544826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASSETS. Annual ACM Conference on Assistive Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3517428.3544826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

随着数据驱动系统越来越多地大规模部署，对于在培训数据中代表性不足的历史边缘化群体，出现了不公平和歧视性结果的伦理担忧。作为回应，围绕人工智能公平性和包容性的工作需要能够代表不同人口群体的数据集。在本文中，我们分析了无障碍数据集中年龄、性别、种族和民族的代表性，这些数据集来自残疾人和老年人，这可能在减轻包容性人工智能应用的偏见方面发挥重要作用。我们通过审查190个数据集(我们称之为无障碍数据集)的公开信息，检查了残疾人数据集中的当前表现状态。我们发现可访问性数据集代表不同的年龄，但存在性别和种族代表性差距。此外，我们研究了人口统计变量的敏感性和复杂性如何使分类困难和不一致(例如，性别，种族和民族)，标签的来源通常未知。通过反思当前残疾数据贡献者所面临的挑战和机遇，我们希望我们的努力能够扩大将边缘化社区更大程度地纳入人工智能系统的可能性空间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data Representativeness in Accessibility Datasets: A Meta-Analysis.

As data-driven systems are increasingly deployed at scale, ethical concerns have arisen around unfair and discriminatory outcomes for historically marginalized groups that are underrepresented in training data. In response, work around AI fairness and inclusion has called for datasets that are representative of various demographic groups. In this paper, we contribute an analysis of the representativeness of age, gender, and race & ethnicity in accessibility datasets-datasets sourced from people with disabilities and older adults-that can potentially play an important role in mitigating bias for inclusive AI-infused applications. We examine the current state of representation within datasets sourced by people with disabilities by reviewing publicly-available information of 190 datasets, we call these accessibility datasets. We find that accessibility datasets represent diverse ages, but have gender and race representation gaps. Additionally, we investigate how the sensitive and complex nature of demographic variables makes classification difficult and inconsistent (e.g., gender, race & ethnicity), with the source of labeling often unknown. By reflecting on the current challenges and opportunities for representation of disabled data contributors, we hope our effort expands the space of possibility for greater inclusion of marginalized communities in AI-infused systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ASSETS. Annual ACM Conference on Assistive Technologies

自引率

0.00%

发文量