{"title":"Five sources of bias in natural language processing","authors":"Dirk Hovy, Shrimai Prabhumoye","doi":"10.1111/lnc3.12432","DOIUrl":null,"url":null,"abstract":"<p>Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context. Here, we provide a simple, actionable summary of this recent work. We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and finally (5) the research design (or how we conceptualize our research). We explore each of the bias sources in detail in this article, including examples and links to related work, as well as potential counter-measures.</p>","PeriodicalId":47472,"journal":{"name":"Language and Linguistics Compass","volume":"15 8","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2021-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1111/lnc3.12432","citationCount":"111","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language and Linguistics Compass","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/lnc3.12432","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 111
Abstract
Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context. Here, we provide a simple, actionable summary of this recent work. We outline five sources where bias can occur in NLP systems: (1) the data, (2) the annotation process, (3) the input representations, (4) the models, and finally (5) the research design (or how we conceptualize our research). We explore each of the bias sources in detail in this article, including examples and links to related work, as well as potential counter-measures.
期刊介绍:
Unique in its range, Language and Linguistics Compass is an online-only journal publishing original, peer-reviewed surveys of current research from across the entire discipline. Language and Linguistics Compass publishes state-of-the-art reviews, supported by a comprehensive bibliography and accessible to an international readership. Language and Linguistics Compass is aimed at senior undergraduates, postgraduates and academics, and will provide a unique reference tool for researching essays, preparing lectures, writing a research proposal, or just keeping up with new developments in a specific area of interest.