{"title":"偏见的文化差异?预训练德语和法语词嵌入的起源和性别偏见","authors":"Mascha Kurpicz-Briki","doi":"10.24451/ARBOR.11922","DOIUrl":null,"url":null,"abstract":"Smart applications often rely on training data in form of text. If there is a bias in that training data, the decision of the applications might not be fair. Common training data has been shown to be biased towards different groups of minorities. However, there is no generic algorithm to determine the fairness of training data. One existing approach is to measure gender bias using word embeddings. Most research in this field has been dedicated to the English language. In this work, we identified that there is a bias towards gender and origin in both German and French word embeddings. In particular, we found that real-world bias and stereotypes from the 18th century are still included in today’s word embeddings. Furthermore, we show that the gender bias in German has a different form from English and there is indication that bias has cultural differences that need to be considered when analyzing texts and word embeddings in different languages.","PeriodicalId":45891,"journal":{"name":"ARBOR-CIENCIA PENSAMIENTO Y CULTURA","volume":"113 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2020-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Cultural Differences in Bias? Origin and Gender Bias in Pre-Trained German and French Word Embeddings\",\"authors\":\"Mascha Kurpicz-Briki\",\"doi\":\"10.24451/ARBOR.11922\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Smart applications often rely on training data in form of text. If there is a bias in that training data, the decision of the applications might not be fair. Common training data has been shown to be biased towards different groups of minorities. However, there is no generic algorithm to determine the fairness of training data. One existing approach is to measure gender bias using word embeddings. Most research in this field has been dedicated to the English language. In this work, we identified that there is a bias towards gender and origin in both German and French word embeddings. In particular, we found that real-world bias and stereotypes from the 18th century are still included in today’s word embeddings. Furthermore, we show that the gender bias in German has a different form from English and there is indication that bias has cultural differences that need to be considered when analyzing texts and word embeddings in different languages.\",\"PeriodicalId\":45891,\"journal\":{\"name\":\"ARBOR-CIENCIA PENSAMIENTO Y CULTURA\",\"volume\":\"113 1\",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2020-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ARBOR-CIENCIA PENSAMIENTO Y CULTURA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24451/ARBOR.11922\",\"RegionNum\":4,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"HUMANITIES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ARBOR-CIENCIA PENSAMIENTO Y CULTURA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24451/ARBOR.11922","RegionNum":4,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
Cultural Differences in Bias? Origin and Gender Bias in Pre-Trained German and French Word Embeddings
Smart applications often rely on training data in form of text. If there is a bias in that training data, the decision of the applications might not be fair. Common training data has been shown to be biased towards different groups of minorities. However, there is no generic algorithm to determine the fairness of training data. One existing approach is to measure gender bias using word embeddings. Most research in this field has been dedicated to the English language. In this work, we identified that there is a bias towards gender and origin in both German and French word embeddings. In particular, we found that real-world bias and stereotypes from the 18th century are still included in today’s word embeddings. Furthermore, we show that the gender bias in German has a different form from English and there is indication that bias has cultural differences that need to be considered when analyzing texts and word embeddings in different languages.
期刊介绍:
Arbor is a bimonthly Journal publishing original articles on Science, Thought and Culture. By examining different topics with a rigorous scientific approach, Arbor intends to service the Spanish society and scientific community by providing information, updating, reflection and debate on subjects of current interest. Arbor is among the oldest Journals published by CSIC, and is open to researchers and Culture creators and managers, both Spanish and foreign.