Unlinked Monotone Regression

J. Mach. Learn. Res. Pub Date : 2020-07-02 DOI:10.3929/ETHZ-B-000501663

F. Balabdaoui, Charles R. Doss, C. Durot

{"title":"Unlinked Monotone Regression","authors":"F. Balabdaoui, Charles R. Doss, C. Durot","doi":"10.3929/ETHZ-B-000501663","DOIUrl":null,"url":null,"abstract":"We consider so-called univariate unlinked (sometimes \"decoupled,\" or \"shuffled\") regression when the unknown regression curve is monotone. In standard monotone regression, one observes a pair $(X,Y)$ where a response $Y$ is linked to a covariate $X$ through the model $Y= m_0(X) + \\epsilon$, with $m_0$ the (unknown) monotone regression function and $\\epsilon$ the unobserved error (assumed to be independent of $X$). In the unlinked regression setting one gets only to observe a vector of realizations from both the response $Y$ and from the covariate $X$ where now $Y \\stackrel{d}{=} m_0(X) + \\epsilon$. There is no (observed) pairing of $X$ and $Y$. Despite this, it is actually still possible to derive a consistent non-parametric estimator of $m_0$ under the assumption of monotonicity of $m_0$ and knowledge of the distribution of the noise $\\epsilon$. In this paper, we establish an upper bound on the rate of convergence of such an estimator under minimal assumption on the distribution of the covariate $X$. We discuss extensions to the case in which the distribution of the noise is unknown. We develop a gradient-descent-based algorithm for its computation, and we demonstrate its use on synthetic data. Finally, we apply our method (in a fully data driven way, without knowledge of the error distribution) on longitudinal data from the US Consumer Expenditure Survey.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"28 6 1","pages":"172:1-172:60"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3929/ETHZ-B-000501663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

We consider so-called univariate unlinked (sometimes "decoupled," or "shuffled") regression when the unknown regression curve is monotone. In standard monotone regression, one observes a pair $(X,Y)$ where a response $Y$ is linked to a covariate $X$ through the model $Y= m_0(X) + \epsilon$, with $m_0$ the (unknown) monotone regression function and $\epsilon$ the unobserved error (assumed to be independent of $X$). In the unlinked regression setting one gets only to observe a vector of realizations from both the response $Y$ and from the covariate $X$ where now $Y \stackrel{d}{=} m_0(X) + \epsilon$. There is no (observed) pairing of $X$ and $Y$. Despite this, it is actually still possible to derive a consistent non-parametric estimator of $m_0$ under the assumption of monotonicity of $m_0$ and knowledge of the distribution of the noise $\epsilon$. In this paper, we establish an upper bound on the rate of convergence of such an estimator under minimal assumption on the distribution of the covariate $X$. We discuss extensions to the case in which the distribution of the noise is unknown. We develop a gradient-descent-based algorithm for its computation, and we demonstrate its use on synthetic data. Finally, we apply our method (in a fully data driven way, without knowledge of the error distribution) on longitudinal data from the US Consumer Expenditure Survey.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

非链接单调回归

当未知的回归曲线是单调的时，我们考虑所谓的单变量非链接(有时“解耦”或“洗牌”)回归。在标准单调回归中，人们观察到一对$(X,Y)$，其中响应$Y$通过模型$Y= m_0(X) + \epsilon$链接到协变量$X$，其中$m_0$是(未知的)单调回归函数，$\epsilon$是未观察到的误差(假设与$X$无关)。在非链接回归设置中，人们只能从响应$Y$和协变量$X$中观察到实现向量，其中现在$Y \stackrel{d}{=} m_0(X) + \epsilon$。没有(观察到的)$X$和$Y$的配对。尽管如此，在假设$m_0$单调性和知道噪声$\epsilon$分布的情况下，实际上仍然可以推导出$m_0$的一致非参数估计量。本文在协变量$X$分布的最小假设下，建立了这类估计量收敛速率的上界。我们讨论了噪声分布未知情况下的扩展。我们开发了一种基于梯度下降的算法来计算它，并演示了它在合成数据上的应用。最后，我们将我们的方法(以完全数据驱动的方式，不知道误差分布)应用于美国消费者支出调查的纵向数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

J. Mach. Learn. Res.

自引率

0.00%

发文量

期刊最新文献

Scalable Computation of Causal Bounds A Unified Framework for Factorizing Distributional Value Functions for Multi-Agent Reinforcement Learning Adaptive False Discovery Rate Control with Privacy Guarantee Fairlearn: Assessing and Improving Fairness of AI Systems Generalization Bounds for Adversarial Contrastive Learning