数据集伦理:前进需要后退

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society Pub Date : 2021-07-21 DOI:10.1145/3461702.3462643

Arvind Narayanan

{"title":"数据集伦理:前进需要后退","authors":"Arvind Narayanan","doi":"10.1145/3461702.3462643","DOIUrl":null,"url":null,"abstract":"Machine learning research culture is driven by benchmark datasets to a greater degree than most other research fields. But the centrality of datasets also amplifies the harms associated with data, including privacy violation and underrepresentation or erasure of some populations. This has stirred a much-needed debate on the ethical responsibilities of dataset creators and users. I argue that clarity on this debate requires taking a step back to better understand the benefits of the dataset-driven approach. I show that benchmark datasets play at least six different roles and that the potential harms depend on the roles a dataset plays. By understanding this relationship, we can mitigate the harms while preserving what is scientifically valuable about the prevailing approach.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The Ethics of Datasets: Moving Forward Requires Stepping Back\",\"authors\":\"Arvind Narayanan\",\"doi\":\"10.1145/3461702.3462643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning research culture is driven by benchmark datasets to a greater degree than most other research fields. But the centrality of datasets also amplifies the harms associated with data, including privacy violation and underrepresentation or erasure of some populations. This has stirred a much-needed debate on the ethical responsibilities of dataset creators and users. I argue that clarity on this debate requires taking a step back to better understand the benefits of the dataset-driven approach. I show that benchmark datasets play at least six different roles and that the potential harms depend on the roles a dataset plays. By understanding this relationship, we can mitigate the harms while preserving what is scientifically valuable about the prevailing approach.\",\"PeriodicalId\":197336,\"journal\":{\"name\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"volume\":\"2014 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3461702.3462643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461702.3462643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

与大多数其他研究领域相比，机器学习研究文化在更大程度上受到基准数据集的驱动。但数据集的中心化也放大了与数据相关的危害，包括侵犯隐私和某些人群的代表性不足或被抹去。这引发了一场关于数据集创建者和用户的道德责任的争论。我认为，要弄清楚这场争论，需要退一步来更好地理解数据集驱动方法的好处。我展示了基准数据集至少扮演六种不同的角色，并且潜在的危害取决于数据集扮演的角色。通过理解这种关系，我们可以减轻危害，同时保留主流方法的科学价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Ethics of Datasets: Moving Forward Requires Stepping Back

Machine learning research culture is driven by benchmark datasets to a greater degree than most other research fields. But the centrality of datasets also amplifies the harms associated with data, including privacy violation and underrepresentation or erasure of some populations. This has stirred a much-needed debate on the ethical responsibilities of dataset creators and users. I argue that clarity on this debate requires taking a step back to better understand the benefits of the dataset-driven approach. I show that benchmark datasets play at least six different roles and that the potential harms depend on the roles a dataset plays. By understanding this relationship, we can mitigate the harms while preserving what is scientifically valuable about the prevailing approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

自引率

0.00%

发文量