{"title":"Unlinked Monotone Regression","authors":"F. Balabdaoui, Charles R. Doss, C. Durot","doi":"10.3929/ETHZ-B-000501663","DOIUrl":null,"url":null,"abstract":"We consider so-called univariate unlinked (sometimes \"decoupled,\" or \"shuffled\") regression when the unknown regression curve is monotone. In standard monotone regression, one observes a pair $(X,Y)$ where a response $Y$ is linked to a covariate $X$ through the model $Y= m_0(X) + \\epsilon$, with $m_0$ the (unknown) monotone regression function and $\\epsilon$ the unobserved error (assumed to be independent of $X$). In the unlinked regression setting one gets only to observe a vector of realizations from both the response $Y$ and from the covariate $X$ where now $Y \\stackrel{d}{=} m_0(X) + \\epsilon$. There is no (observed) pairing of $X$ and $Y$. Despite this, it is actually still possible to derive a consistent non-parametric estimator of $m_0$ under the assumption of monotonicity of $m_0$ and knowledge of the distribution of the noise $\\epsilon$. In this paper, we establish an upper bound on the rate of convergence of such an estimator under minimal assumption on the distribution of the covariate $X$. We discuss extensions to the case in which the distribution of the noise is unknown. We develop a gradient-descent-based algorithm for its computation, and we demonstrate its use on synthetic data. Finally, we apply our method (in a fully data driven way, without knowledge of the error distribution) on longitudinal data from the US Consumer Expenditure Survey.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"28 6 1","pages":"172:1-172:60"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3929/ETHZ-B-000501663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
We consider so-called univariate unlinked (sometimes "decoupled," or "shuffled") regression when the unknown regression curve is monotone. In standard monotone regression, one observes a pair $(X,Y)$ where a response $Y$ is linked to a covariate $X$ through the model $Y= m_0(X) + \epsilon$, with $m_0$ the (unknown) monotone regression function and $\epsilon$ the unobserved error (assumed to be independent of $X$). In the unlinked regression setting one gets only to observe a vector of realizations from both the response $Y$ and from the covariate $X$ where now $Y \stackrel{d}{=} m_0(X) + \epsilon$. There is no (observed) pairing of $X$ and $Y$. Despite this, it is actually still possible to derive a consistent non-parametric estimator of $m_0$ under the assumption of monotonicity of $m_0$ and knowledge of the distribution of the noise $\epsilon$. In this paper, we establish an upper bound on the rate of convergence of such an estimator under minimal assumption on the distribution of the covariate $X$. We discuss extensions to the case in which the distribution of the noise is unknown. We develop a gradient-descent-based algorithm for its computation, and we demonstrate its use on synthetic data. Finally, we apply our method (in a fully data driven way, without knowledge of the error distribution) on longitudinal data from the US Consumer Expenditure Survey.