{"title":"Estimating Wage Disparities Using Foundation Models","authors":"Keyon Vafa, Susan Athey, David M. Blei","doi":"arxiv-2409.09894","DOIUrl":null,"url":null,"abstract":"One thread of empirical work in social science focuses on decomposing group\ndifferences in outcomes into unexplained components and components explained by\nobservable factors. In this paper, we study gender wage decompositions, which\nrequire estimating the portion of the gender wage gap explained by career\nhistories of workers. Classical methods for decomposing the wage gap employ\nsimple predictive models of wages which condition on a small set of simple\nsummaries of labor history. The problem is that these predictive models cannot\ntake advantage of the full complexity of a worker's history, and the resulting\ndecompositions thus suffer from omitted variable bias (OVB), where covariates\nthat are correlated with both gender and wages are not included in the model.\nHere we explore an alternative methodology for wage gap decomposition that\nemploys powerful foundation models, such as large language models, as the\npredictive engine. Foundation models excel at making accurate predictions from\ncomplex, high-dimensional inputs. We use a custom-built foundation model,\ndesigned to predict wages from full labor histories, to decompose the gender\nwage gap. We prove that the way such models are usually trained might still\nlead to OVB, but develop fine-tuning algorithms that empirically mitigate this\nissue. Our model captures a richer representation of career history than simple\nmodels and predicts wages more accurately. In detail, we first provide a novel\nset of conditions under which an estimator of the wage gap based on a\nfine-tuned foundation model is $\\sqrt{n}$-consistent. Building on the theory,\nwe then propose methods for fine-tuning foundation models that minimize OVB.\nUsing data from the Panel Study of Income Dynamics, we find that history\nexplains more of the gender wage gap than standard econometric models can\nmeasure, and we identify elements of history that are important for reducing\nOVB.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"49 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
One thread of empirical work in social science focuses on decomposing group
differences in outcomes into unexplained components and components explained by
observable factors. In this paper, we study gender wage decompositions, which
require estimating the portion of the gender wage gap explained by career
histories of workers. Classical methods for decomposing the wage gap employ
simple predictive models of wages which condition on a small set of simple
summaries of labor history. The problem is that these predictive models cannot
take advantage of the full complexity of a worker's history, and the resulting
decompositions thus suffer from omitted variable bias (OVB), where covariates
that are correlated with both gender and wages are not included in the model.
Here we explore an alternative methodology for wage gap decomposition that
employs powerful foundation models, such as large language models, as the
predictive engine. Foundation models excel at making accurate predictions from
complex, high-dimensional inputs. We use a custom-built foundation model,
designed to predict wages from full labor histories, to decompose the gender
wage gap. We prove that the way such models are usually trained might still
lead to OVB, but develop fine-tuning algorithms that empirically mitigate this
issue. Our model captures a richer representation of career history than simple
models and predicts wages more accurately. In detail, we first provide a novel
set of conditions under which an estimator of the wage gap based on a
fine-tuned foundation model is $\sqrt{n}$-consistent. Building on the theory,
we then propose methods for fine-tuning foundation models that minimize OVB.
Using data from the Panel Study of Income Dynamics, we find that history
explains more of the gender wage gap than standard econometric models can
measure, and we identify elements of history that are important for reducing
OVB.
社会科学实证工作的一个重点是将结果中的群体差异分解为无法解释的部分和可观察因素所解释的部分。在本文中,我们研究了性别工资分解,这需要估算出工人的职业历史所解释的性别工资差距部分。分解工资差距的经典方法采用简单的工资预测模型,这些模型以一小部分简单的劳动历史记录为条件。问题在于,这些预测模型无法利用工人历史的全部复杂性,因此得出的分解结果存在遗漏变量偏差(OVB),即与性别和工资都相关的协变量未被纳入模型中。在此,我们探讨了工资差距分解的另一种方法,即利用强大的基础模型(如大型语言模型)作为预测引擎。基础模型擅长从复杂的高维输入中进行准确预测。我们使用一个定制的基础模型来分解性别工资差距,该模型旨在通过完整的劳动历史来预测工资。我们证明,通常训练此类模型的方式仍可能导致 OVB,但我们开发了微调算法,通过经验缓解了这一问题。与简单模型相比,我们的模型捕捉到了更丰富的职业历史表征,并能更准确地预测工资。具体来说,我们首先提供了一套新的条件,在这些条件下,基于微调基础模型的工资差距估计值是$\sqrt{n}$一致的。利用《收入动态面板研究》(Panel Study of Income Dynamics)的数据,我们发现历史对性别工资差距的解释比标准计量经济学模型所能测量的要多,而且我们发现了历史中对减少 OVB 很重要的因素。