In the present section, our objective is to provide Monte-Carlo simulation results to corroborate the conclusions drawn from Proposition 1 and in Section 4. In the first simulation exercise, the objective is to illustrate Proposition 1. We generated M = 10, 000 independent samples of i.i.d. observations X (b) 1 , . . . ,X (b) 10,000, for b = 0, 1 4 , 1 2 , 1. The X (b) i ’s are i.i.d. with a common (p = 8)-dimensional Gaussian distribution with mean zero and covariance matrix
{"title":"Power Enhancement for Dimension Detection of Gaussian Signals","authors":"Gaspard Bernard, Thomas Verdebout","doi":"10.5705/ss.202022.0315","DOIUrl":"https://doi.org/10.5705/ss.202022.0315","url":null,"abstract":"In the present section, our objective is to provide Monte-Carlo simulation results to corroborate the conclusions drawn from Proposition 1 and in Section 4. In the first simulation exercise, the objective is to illustrate Proposition 1. We generated M = 10, 000 independent samples of i.i.d. observations X (b) 1 , . . . ,X (b) 10,000, for b = 0, 1 4 , 1 2 , 1. The X (b) i ’s are i.i.d. with a common (p = 8)-dimensional Gaussian distribution with mean zero and covariance matrix","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70939941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BANDWIDTH SELECTION FOR LARGE COVARIANCE AND PRECISION MATRICES Xuehu Zhu, Jian Guo, Xu Guo, Lixing Zhu∗3,4 and Jiasen Zheng 1 School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China Academy of Mathematics and Systems Science, Chinese Academy of Sciences 3 Center for Statistics and Data Science, Beijing Normal University, Zhuhai, China Department of Mathematics, Hong Kong Baptist University, Hong Kong Center for Statistical Science, Tsinghua University, Beijing, China
{"title":"Bandwidth Selection for Large Covariance and Precision Matrices","authors":"Xuehu Zhu, Jian Guo, Xu Guo, Lixing Zhu, Jiasen Zheng","doi":"10.5705/ss.202022.0337","DOIUrl":"https://doi.org/10.5705/ss.202022.0337","url":null,"abstract":"BANDWIDTH SELECTION FOR LARGE COVARIANCE AND PRECISION MATRICES Xuehu Zhu, Jian Guo, Xu Guo, Lixing Zhu∗3,4 and Jiasen Zheng 1 School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China Academy of Mathematics and Systems Science, Chinese Academy of Sciences 3 Center for Statistics and Data Science, Beijing Normal University, Zhuhai, China Department of Mathematics, Hong Kong Baptist University, Hong Kong Center for Statistical Science, Tsinghua University, Beijing, China","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70940014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J Kenneth Tay, Nima Aghaeepour, Trevor Hastie, Robert Tibshirani
In some supervised learning settings, the practitioner might have additional information on the features used for prediction. We propose a new method which leverages this additional information for better prediction. The method, which we call the feature-weighted elastic net ("fwelnet"), uses these "features of features" to adapt the relative penalties on the feature coefficients in the elastic net penalty. In our simulations, fwelnet outperforms the lasso in terms of test mean squared error and usually gives an improvement in true positive rate or false positive rate for feature selection. We also apply this method to early prediction of preeclampsia, where fwelnet outperforms the lasso in terms of 10-fold cross-validated area under the curve (0.86 vs. 0.80). We also provide a connection between fwelnet and the group lasso and suggest how fwelnet might be used for multi-task learning.
{"title":"Feature-weighted elastic net: using \"features of features\" for better prediction.","authors":"J Kenneth Tay, Nima Aghaeepour, Trevor Hastie, Robert Tibshirani","doi":"10.5705/ss.202020.0226","DOIUrl":"10.5705/ss.202020.0226","url":null,"abstract":"<p><p>In some supervised learning settings, the practitioner might have additional information on the features used for prediction. We propose a new method which leverages this additional information for better prediction. The method, which we call the <i>feature-weighted elastic net</i> (\"fwelnet\"), uses these \"features of features\" to adapt the relative penalties on the feature coefficients in the elastic net penalty. In our simulations, fwelnet outperforms the lasso in terms of test mean squared error and usually gives an improvement in true positive rate or false positive rate for feature selection. We also apply this method to early prediction of preeclampsia, where fwelnet outperforms the lasso in terms of 10-fold cross-validated area under the curve (0.86 vs. 0.80). We also provide a connection between fwelnet and the group lasso and suggest how fwelnet might be used for multi-task learning.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10129060/pdf/nihms-1843572.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9807052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we prove the strong consistency of the least squares estimator in a random sampled linear regression model with long-memory noise and an independent set of random times given by renewal process sampling. Additionally, we illustrate how to work with a random number of observations up to time T = 1. A simulation study is provided to illustrate the behavior of the different terms, as well as the performance of the estimator under various values of the Hurst parameter H.
{"title":"ON THE CONSISTENCY OF THE LEAST SQUARES ESTIMATOR IN MODELS SAMPLED AT RANDOM TIMES DRIVEN BY LONG MEMORY NOISE: THE RENEWAL CASE","authors":"Héctor Araya, Natalia Bahamonde, Lisandro Fermín, Tania Roa, Soledad Torres","doi":"10.5705/ss.202020.0457","DOIUrl":"https://doi.org/10.5705/ss.202020.0457","url":null,"abstract":"In this study, we prove the strong consistency of the least squares estimator in a random sampled linear regression model with long-memory noise and an independent set of random times given by renewal process sampling. Additionally, we illustrate how to work with a random number of observations up to time T = 1. A simulation study is provided to illustrate the behavior of the different terms, as well as the performance of the estimator under various values of the Hurst parameter H.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135181027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the present article we recast the semi-parametric mean dimension reduction approaches under a least squares framework, which turns the problem of recovering the central mean subspace into a series of problems of estimating slopes in linear regressions. It also facilitates to incorporate penalties to produce sparse solutions. We further adapt the semi-parametric mean dimension reduction approaches to distributed settings when massive data are scattered at various locations and cannot be aggregated or processed through a single machine. We propose three communication-efficient distributed algorithms, the first yields a dense solution, the second produces a sparse estimation, and the third provides an orthonormal basis. The distributed algorithms reduce the computational complexities of the pooled ones substantially. In addition, the distributed algorithms attain oracle rates after a finite number of iterations. We conduct extensive numerical studies to demonstrate the finite-sample performance of the distributed estimates and to compare with the pooled algorithms.
{"title":"Distributed Mean Dimension Reduction Through Semi-parametric Approaches","authors":"Zhengtian Zhu, Wang-li Xu, Liping Zhu","doi":"10.5705/ss.202022.0157","DOIUrl":"https://doi.org/10.5705/ss.202022.0157","url":null,"abstract":"In the present article we recast the semi-parametric mean dimension reduction approaches under a least squares framework, which turns the problem of recovering the central mean subspace into a series of problems of estimating slopes in linear regressions. It also facilitates to incorporate penalties to produce sparse solutions. We further adapt the semi-parametric mean dimension reduction approaches to distributed settings when massive data are scattered at various locations and cannot be aggregated or processed through a single machine. We propose three communication-efficient distributed algorithms, the first yields a dense solution, the second produces a sparse estimation, and the third provides an orthonormal basis. The distributed algorithms reduce the computational complexities of the pooled ones substantially. In addition, the distributed algorithms attain oracle rates after a finite number of iterations. We conduct extensive numerical studies to demonstrate the finite-sample performance of the distributed estimates and to compare with the pooled algorithms.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This section is dedicated to presenting the explicit forms of πij(β)’s and their derivatives, which are important parts in searching the maximum likelihood estimator and in the theoretical proofs. The categorical probability πij(β) for Models (2.1)-(2.4) can be calculated directly, and the first derivative of πij(β) with respect to β can be gotten through ∂πij(β) ∂β = πij(β) ∂ log πij(β) ∂β , (S1.1)
{"title":"Optimal Subsampling for Multinomial Logistic Models With Big Data","authors":"Zhiqiang Ye, Jun Yu, Mingyao Ai","doi":"10.5705/ss.202022.0277","DOIUrl":"https://doi.org/10.5705/ss.202022.0277","url":null,"abstract":"This section is dedicated to presenting the explicit forms of πij(β)’s and their derivatives, which are important parts in searching the maximum likelihood estimator and in the theoretical proofs. The categorical probability πij(β) for Models (2.1)-(2.4) can be calculated directly, and the first derivative of πij(β) with respect to β can be gotten through ∂πij(β) ∂β = πij(β) ∂ log πij(β) ∂β , (S1.1)","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70938873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}