Gender bias in legal corpora and debiasing it

IF 2.3 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Natural Language Engineering Pub Date : 2022-03-30 DOI:10.1017/s1351324922000122
Nurullah Sevim, Furkan Şahinuç, Aykut Koç
{"title":"Gender bias in legal corpora and debiasing it","authors":"Nurullah Sevim, Furkan Şahinuç, Aykut Koç","doi":"10.1017/s1351324922000122","DOIUrl":null,"url":null,"abstract":"\n Word embeddings have become important building blocks that are used profoundly in natural language processing (NLP). Despite their several advantages, word embeddings can unintentionally accommodate some gender- and ethnicity-based biases that are present within the corpora they are trained on. Therefore, ethical concerns have been raised since word embeddings are extensively used in several high-level algorithms. Studying such biases and debiasing them have recently become an important research endeavor. Various studies have been conducted to measure the extent of bias that word embeddings capture and to eradicate them. Concurrently, as another subfield that has started to gain traction recently, the applications of NLP in the field of law have started to increase and develop rapidly. As law has a direct and utmost effect on people’s lives, the issues of bias for NLP applications in legal domain are certainly important. However, to the best of our knowledge, bias issues have not yet been studied in the context of legal corpora. In this article, we approach the gender bias problem from the scope of legal text processing domain. Word embedding models that are trained on corpora composed by legal documents and legislation from different countries have been utilized to measure and eliminate gender bias in legal documents. Several methods have been employed to reveal the degree of gender bias and observe its variations over countries. Moreover, a debiasing method has been used to neutralize unwanted bias. The preservation of semantic coherence of the debiased vector space has also been demonstrated by using high-level tasks. Finally, overall results and their implications have been discussed in the scope of NLP in legal domain.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"1 1","pages":"449-482"},"PeriodicalIF":2.3000,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1017/s1351324922000122","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 10

Abstract

Word embeddings have become important building blocks that are used profoundly in natural language processing (NLP). Despite their several advantages, word embeddings can unintentionally accommodate some gender- and ethnicity-based biases that are present within the corpora they are trained on. Therefore, ethical concerns have been raised since word embeddings are extensively used in several high-level algorithms. Studying such biases and debiasing them have recently become an important research endeavor. Various studies have been conducted to measure the extent of bias that word embeddings capture and to eradicate them. Concurrently, as another subfield that has started to gain traction recently, the applications of NLP in the field of law have started to increase and develop rapidly. As law has a direct and utmost effect on people’s lives, the issues of bias for NLP applications in legal domain are certainly important. However, to the best of our knowledge, bias issues have not yet been studied in the context of legal corpora. In this article, we approach the gender bias problem from the scope of legal text processing domain. Word embedding models that are trained on corpora composed by legal documents and legislation from different countries have been utilized to measure and eliminate gender bias in legal documents. Several methods have been employed to reveal the degree of gender bias and observe its variations over countries. Moreover, a debiasing method has been used to neutralize unwanted bias. The preservation of semantic coherence of the debiased vector space has also been demonstrated by using high-level tasks. Finally, overall results and their implications have been discussed in the scope of NLP in legal domain.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
法律语料库中的性别偏见及其消除
词嵌入已经成为自然语言处理(NLP)中广泛使用的重要组成部分。尽管词嵌入有一些优点,但它们可能会无意中适应一些基于性别和种族的偏见,这些偏见存在于它们所训练的语料库中。因此,由于词嵌入在一些高级算法中广泛使用,伦理问题已经提出。研究这些偏见并消除它们最近成为一项重要的研究工作。已经进行了各种各样的研究来测量词嵌入捕获的偏见程度并消除它们。与此同时,作为近年来兴起的另一个分支领域,自然语言处理在法律领域的应用也开始增多和迅速发展。由于法律对人们的生活有直接和最大的影响,自然语言处理在法律领域应用的偏见问题当然很重要。然而,据我们所知,偏见问题尚未在法律语料库的背景下进行研究。在本文中,我们从法律文本处理领域的角度来探讨性别偏见问题。利用对各国法律文件和立法组成的语料库进行训练的词嵌入模型来衡量和消除法律文件中的性别偏见。已经采用了几种方法来揭示性别偏见的程度,并观察其在各国之间的差异。此外,还采用了一种消除偏置的方法来消除不必要的偏置。通过使用高级任务也证明了去偏向量空间的语义一致性的保存。最后,在法律领域的自然语言处理范围内讨论了总体结果及其影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Natural Language Engineering
Natural Language Engineering COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
5.90
自引率
12.00%
发文量
60
审稿时长
>12 weeks
期刊介绍: Natural Language Engineering meets the needs of professionals and researchers working in all areas of computerised language processing, whether from the perspective of theoretical or descriptive linguistics, lexicology, computer science or engineering. Its aim is to bridge the gap between traditional computational linguistics research and the implementation of practical applications with potential real-world use. As well as publishing research articles on a broad range of topics - from text analysis, machine translation, information retrieval and speech analysis and generation to integrated systems and multi modal interfaces - it also publishes special issues on specific areas and technologies within these topics, an industry watch column and book reviews.
期刊最新文献
Start-up activity in the LLM ecosystem Anisotropic span embeddings and the negative impact of higher-order inference for coreference resolution: An empirical analysis Automated annotation of parallel bible corpora with cross-lingual semantic concordance How do control tokens affect natural language generation tasks like text simplification Emerging trends: When can users trust GPT, and when should they intervene?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1