{"title":"Robust consistent estimators for ROC curves with covariates","authors":"Ana M. Bianco, G. Boente, W. González-Manteiga","doi":"10.1214/22-ejs2042","DOIUrl":"https://doi.org/10.1214/22-ejs2042","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42756461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Dependency structure in recommender systems has been widely adopted in recent years to improve prediction accuracy. In this paper, we propose an innovative tensor-based recommender system, namely, the Ten- sor Factorization with Dependency (TFD). The proposed method utilizes shared factors to characterize the dependency between different modes, in addition to pairwise additive tensor factorization to integrate information among multiple modes. One advantage of the proposed method is that it provides flexibility for different dependency structures by incorporating shared latent factors. In addition, the proposed method unifies both binary and ordinal ratings in recommender systems. We achieve scalable computation for scarce tensors with high missing rates. In theory, we show the asymptotic consistency of estimators with various loss functions for both binary and ordinal data. Our numerical studies demonstrate that the pro- posed method outperforms the existing methods, especially on prediction accuracy.
{"title":"Tensor factorization recommender systems with dependency","authors":"Jiuchen Zhang, Yubai Yuan, Annie Qu","doi":"10.1214/22-ejs1978","DOIUrl":"https://doi.org/10.1214/22-ejs1978","url":null,"abstract":": Dependency structure in recommender systems has been widely adopted in recent years to improve prediction accuracy. In this paper, we propose an innovative tensor-based recommender system, namely, the Ten- sor Factorization with Dependency (TFD). The proposed method utilizes shared factors to characterize the dependency between different modes, in addition to pairwise additive tensor factorization to integrate information among multiple modes. One advantage of the proposed method is that it provides flexibility for different dependency structures by incorporating shared latent factors. In addition, the proposed method unifies both binary and ordinal ratings in recommender systems. We achieve scalable computation for scarce tensors with high missing rates. In theory, we show the asymptotic consistency of estimators with various loss functions for both binary and ordinal data. Our numerical studies demonstrate that the pro- posed method outperforms the existing methods, especially on prediction accuracy.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42174710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiwei Fan, Xiaoling Lu, Junlong Zhao, H. Fu, Yufeng Liu
Precision medicine is an increasingly important area of research. Due to the heterogeneity of individual characteristics, patients may respond differently to treatments. One of the most important goals for precision medicine is to develop individualized treatment rules (ITRs) involving patients’ characteristics directly. As an interesting topic in clinical research, many statistical methods have been developed in recent years to find optimal ITRs. For binary treatments, outcome weighted learning (OWL) was proposed to find a decision function of patient characteristics maximizing the expected clinical outcome. Treatments with hierarchical structure are commonly seen in practice. In hierarchical scenarios, how to estimate ITRs is still unclear. We propose a new framework named hierarchical outcome-weighted angle-based learning (HOAL) to estimate ITRs for treatments with hierarchical structure. Statistical properties including Fisher consistency and convergence rates of the proposed method are presented. Simulations and an application to a type 2 diabetes study under linear and nonlinear learning show the highly competitive performance of our proposed procedure in both numerical accuracy and computational efficiency.
{"title":"Estimating individualized treatment rules for treatments with hierarchical structure","authors":"Yiwei Fan, Xiaoling Lu, Junlong Zhao, H. Fu, Yufeng Liu","doi":"10.1214/21-ejs1948","DOIUrl":"https://doi.org/10.1214/21-ejs1948","url":null,"abstract":"Precision medicine is an increasingly important area of research. Due to the heterogeneity of individual characteristics, patients may respond differently to treatments. One of the most important goals for precision medicine is to develop individualized treatment rules (ITRs) involving patients’ characteristics directly. As an interesting topic in clinical research, many statistical methods have been developed in recent years to find optimal ITRs. For binary treatments, outcome weighted learning (OWL) was proposed to find a decision function of patient characteristics maximizing the expected clinical outcome. Treatments with hierarchical structure are commonly seen in practice. In hierarchical scenarios, how to estimate ITRs is still unclear. We propose a new framework named hierarchical outcome-weighted angle-based learning (HOAL) to estimate ITRs for treatments with hierarchical structure. Statistical properties including Fisher consistency and convergence rates of the proposed method are presented. Simulations and an application to a type 2 diabetes study under linear and nonlinear learning show the highly competitive performance of our proposed procedure in both numerical accuracy and computational efficiency.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43790857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Casting vector time series: algorithms for forecasting, imputation, and signal extraction","authors":"T. McElroy","doi":"10.1214/22-ejs2068","DOIUrl":"https://doi.org/10.1214/22-ejs2068","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47473755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Screening is an important technique for analyzing high-dimensional data. Most screening tools have been developed for vectors and are marginal in the sense that each variable is evaluated individually at a time. Many multi-dimensional arrays (tensors) are generated nowadays. In addition to being high-dimensional, these data further have the tensor structure that should be exploited for more efficient analysis. Variables adjacent to each other in a tensor tend to be important or unimportant at the same time. Such information is ignored by marginal screening methods. In this article, we propose a general framework for tensor screening called smoothed tensor screening (STS). STS combines the strength of current marginal screening methods with tensor structural information by aggregating the information of its adjacent variables when evaluating one variable. STS is widely applicable since the statistical utility used in screening can be chosen based on the underlying model or data type of the responses and predictors. Moreover, we establish the SURE screening property for STS under mild conditions. Numerical studies demonstrate that STS has better performance than marginal screening methods. MSC2020 subject classifications: 62P10, 62F07.
{"title":"A general framework for tensor screening through smoothing","authors":"Keqian Min, Qing Mai","doi":"10.1214/21-ejs1954","DOIUrl":"https://doi.org/10.1214/21-ejs1954","url":null,"abstract":"Screening is an important technique for analyzing high-dimensional data. Most screening tools have been developed for vectors and are marginal in the sense that each variable is evaluated individually at a time. Many multi-dimensional arrays (tensors) are generated nowadays. In addition to being high-dimensional, these data further have the tensor structure that should be exploited for more efficient analysis. Variables adjacent to each other in a tensor tend to be important or unimportant at the same time. Such information is ignored by marginal screening methods. In this article, we propose a general framework for tensor screening called smoothed tensor screening (STS). STS combines the strength of current marginal screening methods with tensor structural information by aggregating the information of its adjacent variables when evaluating one variable. STS is widely applicable since the statistical utility used in screening can be chosen based on the underlying model or data type of the responses and predictors. Moreover, we establish the SURE screening property for STS under mild conditions. Numerical studies demonstrate that STS has better performance than marginal screening methods. MSC2020 subject classifications: 62P10, 62F07.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41439926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01Epub Date: 2022-03-22DOI: 10.1214/22-EJS1982
Kunhui Zhang, Abolfazl Safikhani, Alex Tank, Ali Shojaie
Thanks to their simplicity and interpretable structure, autoregressive processes are widely used to model time series data. However, many real time series data sets exhibit non-linear patterns, requiring nonlinear modeling. The threshold Auto-Regressive (TAR) process provides a family of non-linear auto-regressive time series models in which the process dynamics are specific step functions of a thresholding variable. While estimation and inference for low-dimensional TAR models have been investigated, high-dimensional TAR models have received less attention. In this article, we develop a new framework for estimating high-dimensional TAR models, and propose two different sparsity-inducing penalties. The first penalty corresponds to a natural extension of classical TAR model to high-dimensional settings, where the same threshold is enforced for all model parameters. Our second penalty develops a more flexible TAR model, where different thresholds are allowed for different auto-regressive coefficients. We show that both penalized estimation strategies can be utilized in a three-step procedure that consistently learns both the thresholds and the corresponding auto-regressive coefficients. However, our theoretical and empirical investigations show that the direct extension of the TAR model is not appropriate for high-dimensional settings and is better suited for moderate dimensions. In contrast, the more flexible extension of the TAR model leads to consistent estimation and superior empirical performance in high dimensions.
自回归过程结构简单、易于解释,因此被广泛用于建立时间序列数据模型。然而,许多真实的时间序列数据集都表现出非线性模式,需要非线性建模。阈值自回归(TAR)过程提供了一系列非线性自回归时间序列模型,其中的过程动态是阈值变量的特定阶跃函数。虽然低维 TAR 模型的估计和推理已经得到研究,但高维 TAR 模型受到的关注较少。在本文中,我们为估计高维 TAR 模型开发了一个新框架,并提出了两种不同的稀疏性诱导惩罚。第一种惩罚相当于将经典 TAR 模型自然扩展到高维环境,在这种情况下,所有模型参数都有相同的阈值。我们的第二种惩罚方法开发了一种更灵活的 TAR 模型,允许对不同的自回归系数采用不同的阈值。我们的研究表明,这两种惩罚估计策略都可以在一个三步程序中使用,该程序可以持续学习阈值和相应的自回归系数。然而,我们的理论和实证研究表明,TAR 模型的直接扩展并不适合高维设置,而更适合中等维度。相比之下,TAR 模型更灵活的扩展则能在高维度下实现一致的估计和卓越的实证性能。
{"title":"Penalized estimation of threshold auto-regressive models with many components and thresholds.","authors":"Kunhui Zhang, Abolfazl Safikhani, Alex Tank, Ali Shojaie","doi":"10.1214/22-EJS1982","DOIUrl":"10.1214/22-EJS1982","url":null,"abstract":"<p><p>Thanks to their simplicity and interpretable structure, autoregressive processes are widely used to model time series data. However, many real time series data sets exhibit non-linear patterns, requiring nonlinear modeling. The threshold Auto-Regressive (TAR) process provides a family of non-linear auto-regressive time series models in which the process dynamics are specific step functions of a thresholding variable. While estimation and inference for low-dimensional TAR models have been investigated, high-dimensional TAR models have received less attention. In this article, we develop a new framework for estimating high-dimensional TAR models, and propose two different sparsity-inducing penalties. The first penalty corresponds to a natural extension of classical TAR model to high-dimensional settings, where the same threshold is enforced for all model parameters. Our second penalty develops a more flexible TAR model, where different thresholds are allowed for different auto-regressive coefficients. We show that both penalized estimation strategies can be utilized in a three-step procedure that consistently learns both the thresholds and the corresponding auto-regressive coefficients. However, our theoretical and empirical investigations show that the direct extension of the TAR model is not appropriate for high-dimensional settings and is better suited for moderate dimensions. In contrast, the more flexible extension of the TAR model leads to consistent estimation and superior empirical performance in high dimensions.</p>","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":"16 1","pages":"1891-1951"},"PeriodicalIF":1.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10088520/pdf/nihms-1885625.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9851486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: For ultrahigh-dimensional data, variable screening is an impor- tant step to reduce the scale of the problem, hence, to improve the estimation accuracy and efficiency. In this paper, we propose a new dependence measure which is called the log odds ratio statistic to be used under the sufficient variable screening framework. The sufficient variable screening approach ensures the sufficiency of the selected input features in model-ing the regression function and is an enhancement of existing marginal screening methods. In addition, we propose an ensemble variable screening approach to combine the proposed fused log odds ratio filter with the fused Kolmogorov filter to achieve supreme performance by taking advantages of both filters. We establish the sure screening properties of the fused log odds ratio filter for both marginal variable screening and sufficient variable screening. Extensive simulations and a real data analysis are provided to demonstrate the usefulness of the proposed log odds ratio filter and the sufficient variable screening procedure.
{"title":"On sufficient variable screening using log odds ratio filter","authors":"Baoying Yang, Wenbo Wu, Xiangrong Yin","doi":"10.1214/21-ejs1951","DOIUrl":"https://doi.org/10.1214/21-ejs1951","url":null,"abstract":": For ultrahigh-dimensional data, variable screening is an impor- tant step to reduce the scale of the problem, hence, to improve the estimation accuracy and efficiency. In this paper, we propose a new dependence measure which is called the log odds ratio statistic to be used under the sufficient variable screening framework. The sufficient variable screening approach ensures the sufficiency of the selected input features in model-ing the regression function and is an enhancement of existing marginal screening methods. In addition, we propose an ensemble variable screening approach to combine the proposed fused log odds ratio filter with the fused Kolmogorov filter to achieve supreme performance by taking advantages of both filters. We establish the sure screening properties of the fused log odds ratio filter for both marginal variable screening and sufficient variable screening. Extensive simulations and a real data analysis are provided to demonstrate the usefulness of the proposed log odds ratio filter and the sufficient variable screening procedure.","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47203671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monte Carlo Markov chains constrained on graphs for a target with disconnected support","authors":"R. Cerqueti, Emilio De Santis","doi":"10.1214/22-ejs2043","DOIUrl":"https://doi.org/10.1214/22-ejs2043","url":null,"abstract":"","PeriodicalId":49272,"journal":{"name":"Electronic Journal of Statistics","volume":" ","pages":""},"PeriodicalIF":1.1,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44387163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}