数据集伦理:前进需要后退

Arvind Narayanan
{"title":"数据集伦理:前进需要后退","authors":"Arvind Narayanan","doi":"10.1145/3461702.3462643","DOIUrl":null,"url":null,"abstract":"Machine learning research culture is driven by benchmark datasets to a greater degree than most other research fields. But the centrality of datasets also amplifies the harms associated with data, including privacy violation and underrepresentation or erasure of some populations. This has stirred a much-needed debate on the ethical responsibilities of dataset creators and users. I argue that clarity on this debate requires taking a step back to better understand the benefits of the dataset-driven approach. I show that benchmark datasets play at least six different roles and that the potential harms depend on the roles a dataset plays. By understanding this relationship, we can mitigate the harms while preserving what is scientifically valuable about the prevailing approach.","PeriodicalId":197336,"journal":{"name":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The Ethics of Datasets: Moving Forward Requires Stepping Back\",\"authors\":\"Arvind Narayanan\",\"doi\":\"10.1145/3461702.3462643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning research culture is driven by benchmark datasets to a greater degree than most other research fields. But the centrality of datasets also amplifies the harms associated with data, including privacy violation and underrepresentation or erasure of some populations. This has stirred a much-needed debate on the ethical responsibilities of dataset creators and users. I argue that clarity on this debate requires taking a step back to better understand the benefits of the dataset-driven approach. I show that benchmark datasets play at least six different roles and that the potential harms depend on the roles a dataset plays. By understanding this relationship, we can mitigate the harms while preserving what is scientifically valuable about the prevailing approach.\",\"PeriodicalId\":197336,\"journal\":{\"name\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"volume\":\"2014 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3461702.3462643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3461702.3462643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

与大多数其他研究领域相比,机器学习研究文化在更大程度上受到基准数据集的驱动。但数据集的中心化也放大了与数据相关的危害,包括侵犯隐私和某些人群的代表性不足或被抹去。这引发了一场关于数据集创建者和用户的道德责任的争论。我认为,要弄清楚这场争论,需要退一步来更好地理解数据集驱动方法的好处。我展示了基准数据集至少扮演六种不同的角色,并且潜在的危害取决于数据集扮演的角色。通过理解这种关系,我们可以减轻危害,同时保留主流方法的科学价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
The Ethics of Datasets: Moving Forward Requires Stepping Back
Machine learning research culture is driven by benchmark datasets to a greater degree than most other research fields. But the centrality of datasets also amplifies the harms associated with data, including privacy violation and underrepresentation or erasure of some populations. This has stirred a much-needed debate on the ethical responsibilities of dataset creators and users. I argue that clarity on this debate requires taking a step back to better understand the benefits of the dataset-driven approach. I show that benchmark datasets play at least six different roles and that the potential harms depend on the roles a dataset plays. By understanding this relationship, we can mitigate the harms while preserving what is scientifically valuable about the prevailing approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Beyond Reasonable Doubt: Improving Fairness in Budget-Constrained Decision Making using Confidence Thresholds Measuring Automated Influence: Between Empirical Evidence and Ethical Values Artificial Intelligence and the Purpose of Social Systems Ethically Compliant Planning within Moral Communities Co-design and Ethical Artificial Intelligence for Health: Myths and Misconceptions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1