Estimating Wage Disparities Using Foundation Models

Keyon Vafa, Susan Athey, David M. Blei
{"title":"Estimating Wage Disparities Using Foundation Models","authors":"Keyon Vafa, Susan Athey, David M. Blei","doi":"arxiv-2409.09894","DOIUrl":null,"url":null,"abstract":"One thread of empirical work in social science focuses on decomposing group\ndifferences in outcomes into unexplained components and components explained by\nobservable factors. In this paper, we study gender wage decompositions, which\nrequire estimating the portion of the gender wage gap explained by career\nhistories of workers. Classical methods for decomposing the wage gap employ\nsimple predictive models of wages which condition on a small set of simple\nsummaries of labor history. The problem is that these predictive models cannot\ntake advantage of the full complexity of a worker's history, and the resulting\ndecompositions thus suffer from omitted variable bias (OVB), where covariates\nthat are correlated with both gender and wages are not included in the model.\nHere we explore an alternative methodology for wage gap decomposition that\nemploys powerful foundation models, such as large language models, as the\npredictive engine. Foundation models excel at making accurate predictions from\ncomplex, high-dimensional inputs. We use a custom-built foundation model,\ndesigned to predict wages from full labor histories, to decompose the gender\nwage gap. We prove that the way such models are usually trained might still\nlead to OVB, but develop fine-tuning algorithms that empirically mitigate this\nissue. Our model captures a richer representation of career history than simple\nmodels and predicts wages more accurately. In detail, we first provide a novel\nset of conditions under which an estimator of the wage gap based on a\nfine-tuned foundation model is $\\sqrt{n}$-consistent. Building on the theory,\nwe then propose methods for fine-tuning foundation models that minimize OVB.\nUsing data from the Panel Study of Income Dynamics, we find that history\nexplains more of the gender wage gap than standard econometric models can\nmeasure, and we identify elements of history that are important for reducing\nOVB.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"49 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

One thread of empirical work in social science focuses on decomposing group differences in outcomes into unexplained components and components explained by observable factors. In this paper, we study gender wage decompositions, which require estimating the portion of the gender wage gap explained by career histories of workers. Classical methods for decomposing the wage gap employ simple predictive models of wages which condition on a small set of simple summaries of labor history. The problem is that these predictive models cannot take advantage of the full complexity of a worker's history, and the resulting decompositions thus suffer from omitted variable bias (OVB), where covariates that are correlated with both gender and wages are not included in the model. Here we explore an alternative methodology for wage gap decomposition that employs powerful foundation models, such as large language models, as the predictive engine. Foundation models excel at making accurate predictions from complex, high-dimensional inputs. We use a custom-built foundation model, designed to predict wages from full labor histories, to decompose the gender wage gap. We prove that the way such models are usually trained might still lead to OVB, but develop fine-tuning algorithms that empirically mitigate this issue. Our model captures a richer representation of career history than simple models and predicts wages more accurately. In detail, we first provide a novel set of conditions under which an estimator of the wage gap based on a fine-tuned foundation model is $\sqrt{n}$-consistent. Building on the theory, we then propose methods for fine-tuning foundation models that minimize OVB. Using data from the Panel Study of Income Dynamics, we find that history explains more of the gender wage gap than standard econometric models can measure, and we identify elements of history that are important for reducing OVB.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用基础模型估算工资差距
社会科学实证工作的一个重点是将结果中的群体差异分解为无法解释的部分和可观察因素所解释的部分。在本文中,我们研究了性别工资分解,这需要估算出工人的职业历史所解释的性别工资差距部分。分解工资差距的经典方法采用简单的工资预测模型,这些模型以一小部分简单的劳动历史记录为条件。问题在于,这些预测模型无法利用工人历史的全部复杂性,因此得出的分解结果存在遗漏变量偏差(OVB),即与性别和工资都相关的协变量未被纳入模型中。在此,我们探讨了工资差距分解的另一种方法,即利用强大的基础模型(如大型语言模型)作为预测引擎。基础模型擅长从复杂的高维输入中进行准确预测。我们使用一个定制的基础模型来分解性别工资差距,该模型旨在通过完整的劳动历史来预测工资。我们证明,通常训练此类模型的方式仍可能导致 OVB,但我们开发了微调算法,通过经验缓解了这一问题。与简单模型相比,我们的模型捕捉到了更丰富的职业历史表征,并能更准确地预测工资。具体来说,我们首先提供了一套新的条件,在这些条件下,基于微调基础模型的工资差距估计值是$\sqrt{n}$一致的。利用《收入动态面板研究》(Panel Study of Income Dynamics)的数据,我们发现历史对性别工资差距的解释比标准计量经济学模型所能测量的要多,而且我们发现了历史中对减少 OVB 很重要的因素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Simple robust two-stage estimation and inference for generalized impulse responses and multi-horizon causality GPT takes the SAT: Tracing changes in Test Difficulty and Math Performance of Students A Simple and Adaptive Confidence Interval when Nuisance Parameters Satisfy an Inequality Why you should also use OLS estimation of tail exponents On LASSO Inference for High Dimensional Predictive Regression
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1