Towards comparable ratings: Exploring bias in German physician reviews

IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Data & Knowledge Engineering Pub Date : 2023-11-01 DOI:10.1016/j.datak.2023.102235
Joschka Kersting, Falk Maoro, Michaela Geierhos
{"title":"Towards comparable ratings: Exploring bias in German physician reviews","authors":"Joschka Kersting,&nbsp;Falk Maoro,&nbsp;Michaela Geierhos","doi":"10.1016/j.datak.2023.102235","DOIUrl":null,"url":null,"abstract":"<div><p>In this study, we evaluate the impact of gender-biased data from German-language physician reviews on the fairness of fine-tuned language models. For two different downstream tasks, we use data reported to be gender biased and aggregate it with annotations. First, we propose a new approach to aspect-based sentiment analysis that allows identifying, extracting, and classifying implicit and explicit aspect phrases and their polarity within a single model. The second task we present is grade prediction, where we predict the overall grade of a review on the basis of the review text. For both tasks, we train numerous transformer models and evaluate their performance. The aggregation of sensitive attributes, such as a physician’s gender and migration background, with individual text reviews allows us to measure the performance of the models with respect to these sensitive groups. These group-wise performance measures act as extrinsic bias measures for our downstream tasks. In addition, we translate several gender-specific templates of the intrinsic bias metrics into the German language and evaluate our fine-tuned models. Based on this set of tasks, fine-tuned models, and intrinsic and extrinsic bias measures, we perform correlation analyses between intrinsic and extrinsic bias measures. In terms of sensitive groups and effect sizes, our bias measure results show different directions. Furthermore, correlations between measures of intrinsic and extrinsic bias can be observed in different directions. This leads us to conclude that gender-biased data does not inherently lead to biased models. Other variables, such as template dependency for intrinsic measures and label distribution in the data, must be taken into account as they strongly influence the metric results. Therefore, we suggest that metrics and templates should be chosen according to the given task and the biases to be assessed.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"148 ","pages":"Article 102235"},"PeriodicalIF":2.7000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X23000952/pdfft?md5=035f0e2eec55531089e125433a25b2bc&pid=1-s2.0-S0169023X23000952-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X23000952","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In this study, we evaluate the impact of gender-biased data from German-language physician reviews on the fairness of fine-tuned language models. For two different downstream tasks, we use data reported to be gender biased and aggregate it with annotations. First, we propose a new approach to aspect-based sentiment analysis that allows identifying, extracting, and classifying implicit and explicit aspect phrases and their polarity within a single model. The second task we present is grade prediction, where we predict the overall grade of a review on the basis of the review text. For both tasks, we train numerous transformer models and evaluate their performance. The aggregation of sensitive attributes, such as a physician’s gender and migration background, with individual text reviews allows us to measure the performance of the models with respect to these sensitive groups. These group-wise performance measures act as extrinsic bias measures for our downstream tasks. In addition, we translate several gender-specific templates of the intrinsic bias metrics into the German language and evaluate our fine-tuned models. Based on this set of tasks, fine-tuned models, and intrinsic and extrinsic bias measures, we perform correlation analyses between intrinsic and extrinsic bias measures. In terms of sensitive groups and effect sizes, our bias measure results show different directions. Furthermore, correlations between measures of intrinsic and extrinsic bias can be observed in different directions. This leads us to conclude that gender-biased data does not inherently lead to biased models. Other variables, such as template dependency for intrinsic measures and label distribution in the data, must be taken into account as they strongly influence the metric results. Therefore, we suggest that metrics and templates should be chosen according to the given task and the biases to be assessed.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
迈向可比评级:探索德国医师评价的偏倚
在这项研究中,我们评估了来自德语医师评论的性别偏见数据对微调语言模型公平性的影响。对于两个不同的下游任务,我们使用报告的有性别偏见的数据,并将其与注释一起汇总。首先,我们提出了一种基于方面的情感分析的新方法,该方法允许在单个模型中识别、提取和分类隐式和显式方面短语及其极性。我们提出的第二个任务是成绩预测,我们根据复习文本预测复习的总体成绩。对于这两项任务,我们训练了许多变压器模型并评估了它们的性能。敏感属性的聚合,例如医生的性别和迁移背景,以及单独的文本审查,使我们能够根据这些敏感组度量模型的性能。这些团队绩效指标作为我们下游任务的外在偏差指标。此外,我们将几个性别特定的内在偏见指标模板翻译成德语,并评估我们的微调模型。基于这组任务、微调模型以及内在和外在偏差测量,我们对内在和外在偏差测量进行了相关性分析。在敏感群体和效应大小方面,我们的偏倚测量结果显示出不同的方向。此外,内在偏差和外在偏差之间的相关性可以在不同的方向上观察到。这使我们得出结论,性别偏见的数据并不必然导致有偏见的模型。其他变量,如固有度量的模板依赖关系和数据中的标签分布,必须考虑在内,因为它们强烈影响度量结果。因此,我们建议指标和模板应该根据给定的任务和要评估的偏差来选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data & Knowledge Engineering
Data & Knowledge Engineering 工程技术-计算机:人工智能
CiteScore
5.00
自引率
0.00%
发文量
66
审稿时长
6 months
期刊介绍: Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.
期刊最新文献
Goal modelling in aeronautics: Practical applications for aircraft and manufacturing designs Ethical reasoning methods for ICT: What they are and when to use them SSQTKG: A Subgraph-based Semantic Query Approach for Temporal Knowledge Graph NoSQL document data migration strategy in the context of schema evolution VarClaMM: A reference meta-model to understand DNA variant classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1