{"title":"Estimation and inference in sparse multivariate regression and conditional Gaussian graphical models under an unbalanced distributed setting","authors":"Ensiyeh Nezakati, Eugen Pircalabelu","doi":"10.1214/23-ejs2193","DOIUrl":"https://doi.org/10.1214/23-ejs2193","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140523728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shogo H. Nakakita, Pierre Alquier, Masaaki Imaizumi
{"title":"Dimension-free bounds for sums of dependent matrices and operators with heavy-tailed distributions","authors":"Shogo H. Nakakita, Pierre Alquier, Masaaki Imaizumi","doi":"10.1214/24-ejs2224","DOIUrl":"https://doi.org/10.1214/24-ejs2224","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140515766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alina Braun, Michael Kohler, Jeongik Cho, A. Krzyżak
{"title":"Analysis of the rate of convergence of two regression estimates defined by neural features which are easy to implement","authors":"Alina Braun, Michael Kohler, Jeongik Cho, A. Krzyżak","doi":"10.1214/23-ejs2207","DOIUrl":"https://doi.org/10.1214/23-ejs2207","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140518472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse-limit approximation for t-statistics","authors":"Micól Tresoldi, Daniel Xiang, Peter McCullagh","doi":"10.1214/24-ejs2238","DOIUrl":"https://doi.org/10.1214/24-ejs2238","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140527192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01Epub Date: 2024-08-27DOI: 10.1214/24-ejs2275
Bohao Tang, Sandipan Pramanik, Yi Zhao, Brian Caffo, Abhirup Datta
In this manuscript, we study scalar-on-distribution regression; that is, instances where subject-specific distributions or densities are the covariates, related to a scalar outcome via a regression model. In practice, only repeated measures are observed from those covariate distributions and common approaches first use these to estimate subject-specific density functions, which are then used as covariates in standard scalar-on-function regression. We propose a simple and direct method for linear scalar-on-distribution regression that circumvents the intermediate step of estimating subject-specific covariate densities. We show that one can directly use the observed repeated measures as covariates and endow the regression function with a Gaussian process prior to obtain a closed form or conjugate Bayesian inference. Our method subsumes the standard Bayesian non-parametric regression using Gaussian processes as a special case, corresponding to covariates being Dirac-distributions. The model is also invariant to any transformation or ordering of the repeated measures. Theoretically, we show that, despite only using the observed repeated measures from the true density-valued covariates that generated the data, the method can achieve an optimal estimation error bound of the regression function. The theory extends beyond i.i.d. settings to accommodate certain forms of within-subject dependence among the repeated measures. To our knowledge, this is the first theoretical study on Bayesian regression using distribution-valued covariates. We propose numerous extensions including a scalable implementation using low-rank Gaussian processes and a generalization to non-linear scalar-on-distribution regression. Through simulation studies, we demonstrate that our method performs substantially better than approaches that require an intermediate density estimation step especially with a small number of repeated measures per subject. We apply our method to study association of age with activity counts.
{"title":"Direct Bayesian linear regression for distribution-valued covariates.","authors":"Bohao Tang, Sandipan Pramanik, Yi Zhao, Brian Caffo, Abhirup Datta","doi":"10.1214/24-ejs2275","DOIUrl":"10.1214/24-ejs2275","url":null,"abstract":"<p><p>In this manuscript, we study scalar-on-distribution regression; that is, instances where subject-specific distributions or densities are the covariates, related to a scalar outcome via a regression model. In practice, only repeated measures are observed from those covariate distributions and common approaches first use these to estimate subject-specific density functions, which are then used as covariates in standard scalar-on-function regression. We propose a simple and direct method for linear scalar-on-distribution regression that circumvents the intermediate step of estimating subject-specific covariate densities. We show that one can directly use the observed repeated measures as covariates and endow the regression function with a Gaussian process prior to obtain a closed form or conjugate Bayesian inference. Our method subsumes the standard Bayesian non-parametric regression using Gaussian processes as a special case, corresponding to covariates being Dirac-distributions. The model is also invariant to any transformation or ordering of the repeated measures. Theoretically, we show that, despite only using the observed repeated measures from the true density-valued covariates that generated the data, the method can achieve an optimal estimation error bound of the regression function. The theory extends beyond i.i.d. settings to accommodate certain forms of within-subject dependence among the repeated measures. To our knowledge, this is the first theoretical study on Bayesian regression using distribution-valued covariates. We propose numerous extensions including a scalable implementation using low-rank Gaussian processes and a generalization to non-linear scalar-on-distribution regression. Through simulation studies, we demonstrate that our method performs substantially better than approaches that require an intermediate density estimation step especially with a small number of repeated measures per subject. We apply our method to study association of age with activity counts.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11466299/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142401736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Penalized estimation of panel count data using generalized estimating equation","authors":"Minggen Lu","doi":"10.1214/24-ejs2239","DOIUrl":"https://doi.org/10.1214/24-ejs2239","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140523297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Limit theorems for entropic optimal transport maps and Sinkhorn divergence","authors":"Ziv Goldfeld, Kengo Kato, Gabriel Rioux, R. Sadhu","doi":"10.1214/24-ejs2217","DOIUrl":"https://doi.org/10.1214/24-ejs2217","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140524800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A tradeoff between false discovery and true positive proportions for sparse high-dimensional logistic regression","authors":"Jing Zhou, G. Claeskens","doi":"10.1214/23-ejs2204","DOIUrl":"https://doi.org/10.1214/23-ejs2204","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140516066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}