Bias in human data: A feedback from social sciences

WIREs Data Mining and Knowledge Discovery Pub Date : 2023-04-20 DOI:10.1002/widm.1498

Savaş Takan, Duygu Ergün, Sinem Getir Yaman, Onur Kılınççeker

{"title":"Bias in human data: A feedback from social sciences","authors":"Savaş Takan, Duygu Ergün, Sinem Getir Yaman, Onur Kılınççeker","doi":"10.1002/widm.1498","DOIUrl":null,"url":null,"abstract":"Abstract The fairness of human‐related software has become critical with its widespread use in our daily lives, where life‐changing decisions are made. However, with the use of these systems, many erroneous results emerged. Technologies have started to be developed to tackle unexpected results. As for the solution to the issue, companies generally focus on algorithm‐oriented errors. The utilized solutions usually only work in some algorithms. Because the cause of the problem is not just the algorithm; it is also the data itself. For instance, deep learning cannot establish the cause–effect relationship quickly. In addition, the boundaries between statistical or heuristic algorithms are unclear. The algorithm's fairness may vary depending on the data related to context. From this point of view, our article focuses on how the data should be, which is not a matter of statistics. In this direction, the picture in question has been revealed through a scenario specific to “vulnerable and disadvantaged” groups, which is one of the most fundamental problems today. With the joint contribution of computer science and social sciences, it aims to predict the possible social dangers that may arise from artificial intelligence algorithms using the clues obtained in this study. To highlight the potential social and mass problems caused by data, Gerbner's “cultivation theory” is reinterpreted. To this end, we conduct an experimental evaluation on popular algorithms and their data sets, such as Word2Vec, GloVe, and ELMO. The article stresses the importance of a holistic approach combining the algorithm, data, and an interdisciplinary assessment. This article is categorized under: Algorithmic Development > Statistics","PeriodicalId":500599,"journal":{"name":"WIREs Data Mining and Knowledge Discovery","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WIREs Data Mining and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/widm.1498","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract The fairness of human‐related software has become critical with its widespread use in our daily lives, where life‐changing decisions are made. However, with the use of these systems, many erroneous results emerged. Technologies have started to be developed to tackle unexpected results. As for the solution to the issue, companies generally focus on algorithm‐oriented errors. The utilized solutions usually only work in some algorithms. Because the cause of the problem is not just the algorithm; it is also the data itself. For instance, deep learning cannot establish the cause–effect relationship quickly. In addition, the boundaries between statistical or heuristic algorithms are unclear. The algorithm's fairness may vary depending on the data related to context. From this point of view, our article focuses on how the data should be, which is not a matter of statistics. In this direction, the picture in question has been revealed through a scenario specific to “vulnerable and disadvantaged” groups, which is one of the most fundamental problems today. With the joint contribution of computer science and social sciences, it aims to predict the possible social dangers that may arise from artificial intelligence algorithms using the clues obtained in this study. To highlight the potential social and mass problems caused by data, Gerbner's “cultivation theory” is reinterpreted. To this end, we conduct an experimental evaluation on popular algorithms and their data sets, such as Word2Vec, GloVe, and ELMO. The article stresses the importance of a holistic approach combining the algorithm, data, and an interdisciplinary assessment. This article is categorized under: Algorithmic Development > Statistics

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

人类数据中的偏见:来自社会科学的反馈

与人相关的软件的公平性随着其在我们日常生活中的广泛使用而变得至关重要，在日常生活中做出改变生活的决定。然而，随着这些系统的使用，出现了许多错误的结果。已经开始开发技术来处理意想不到的结果。对于这个问题的解决方案，公司通常关注算法导向的错误。所使用的解通常只适用于某些算法。因为问题的原因不仅仅是算法;它也是数据本身。例如，深度学习无法快速建立因果关系。此外，统计算法和启发式算法之间的界限也不清楚。算法的公平性可能因与上下文相关的数据而异。从这个角度来看，我们的文章关注的是数据应该是怎样的，这不是一个统计问题。在这个方向上，所讨论的情况是通过一种针对“易受伤害和处境不利”群体的具体情况揭示出来的，这是当今最根本的问题之一。在计算机科学和社会科学的共同贡献下，它旨在利用本研究获得的线索预测人工智能算法可能产生的社会危险。为了突出数据带来的潜在社会和大众问题，格伯纳的“培养理论”被重新诠释。为此，我们对流行的算法及其数据集，如Word2Vec、GloVe和ELMO进行了实验评估。文章强调了综合算法、数据和跨学科评估的整体方法的重要性。本文分类如下:算法开发>统计数据

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

WIREs Data Mining and Knowledge Discovery

自引率

0.00%

发文量