Statistica Sinica最新文献

英文中文

A Langevinized Ensemble Kalman Filter for Large-Scale Dynamic Learning 用于大规模动态学习的Langevinized集成卡尔曼滤波器

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202022.0172

Peiyi Zhang, Qifan Song, F. Liang

引用次数: 1

An Online Projection Estimator for Nonparametric Regression in Reproducing Kernel Hilbert Spaces. 再现核Hilbert空间中非参数回归的在线投影估计。

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202021.0018

Tianyu Zhang, Noah Simon

The goal of nonparametric regression is to recover an underlying regression function from noisy observations, under the assumption that the regression function belongs to a prespecified infinite-dimensional function space. In the online setting, in which the observations come in a stream, it is generally computationally infeasible to refit the whole model repeatedly. As yet, there are no methods that are both computationally efficient and statistically rate optimal. In this paper, we propose an estimator for online nonparametric regression. Notably, our estimator is an empirical risk minimizer in a deterministic linear space, which is quite different from existing methods that use random features and a functional stochastic gradient. Our theoretical analysis shows that this estimator obtains a rate-optimal generalization error when the regression function is known to live in a reproducing kernel Hilbert space. We also show, theoretically and empirically, that the computational cost of our estimator is much lower than that of other rate-optimal estimators proposed for this online setting.

非参数回归的目标是在假设回归函数属于预先指定的无限维函数空间的情况下，从噪声观测中恢复潜在的回归函数。在在线环境中，观测数据是连续的，重复修正整个模型在计算上通常是不可行的。到目前为止，还没有一种方法是计算效率和统计率最优的。本文提出了一种在线非参数回归的估计量。值得注意的是，我们的估计器是确定性线性空间中的经验风险最小化器，这与使用随机特征和函数随机梯度的现有方法有很大不同。我们的理论分析表明，当回归函数已知存在于再现核希尔伯特空间中时，该估计器获得了速率最优的泛化误差。我们还从理论上和经验上表明，我们的估计器的计算成本远低于针对该在线设置提出的其他速率最优估计器的计算成本。

{"title":"An Online Projection Estimator for Nonparametric Regression in Reproducing Kernel Hilbert Spaces.","authors":"Tianyu Zhang, Noah Simon","doi":"10.5705/ss.202021.0018","DOIUrl":"https://doi.org/10.5705/ss.202021.0018","url":null,"abstract":"<p><p>The goal of nonparametric regression is to recover an underlying regression function from noisy observations, under the assumption that the regression function belongs to a prespecified infinite-dimensional function space. In the online setting, in which the observations come in a stream, it is generally computationally infeasible to refit the whole model repeatedly. As yet, there are no methods that are both computationally efficient and statistically rate optimal. In this paper, we propose an estimator for online nonparametric regression. Notably, our estimator is an empirical risk minimizer in a deterministic linear space, which is quite different from existing methods that use random features and a functional stochastic gradient. Our theoretical analysis shows that this estimator obtains a rate-optimal generalization error when the regression function is known to live in a reproducing kernel Hilbert space. We also show, theoretically and empirically, that the computational cost of our estimator is much lower than that of other rate-optimal estimators proposed for this online setting.</p>","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"33 1","pages":"127-148"},"PeriodicalIF":1.4,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10162505/pdf/nihms-1807577.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9492993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Power Enhancement for Dimension Detection of Gaussian Signals 高斯信号维数检测的功率增强

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202022.0315

Gaspard Bernard, Thomas Verdebout

In the present section, our objective is to provide Monte-Carlo simulation results to corroborate the conclusions drawn from Proposition 1 and in Section 4. In the first simulation exercise, the objective is to illustrate Proposition 1. We generated M = 10, 000 independent samples of i.i.d. observations X (b) 1 , . . . ,X (b) 10,000, for b = 0, 1 4 , 1 2 , 1. The X (b) i ’s are i.i.d. with a common (p = 8)-dimensional Gaussian distribution with mean zero and covariance matrix

在本节中，我们的目标是提供蒙特卡罗模拟结果来证实从提案1和第4节中得出的结论。在第一个模拟练习中，目标是说明命题1。我们生成了M = 10,000个独立的i.i.d观测样本X (b) 1，…，X (b) 10000，对于b = 0,1,4,1,2,1。X (b) i ' s是i.i.d，具有平均为零的共(p = 8)维高斯分布和协方差矩阵

引用次数: 0

Bandwidth Selection for Large Covariance and Precision Matrices 大协方差和精度矩阵的带宽选择

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202022.0337

Xuehu Zhu, Jian Guo, Xu Guo, Lixing Zhu, Jiasen Zheng

BANDWIDTH SELECTION FOR LARGE COVARIANCE AND PRECISION MATRICES Xuehu Zhu, Jian Guo, Xu Guo, Lixing Zhu∗3,4 and Jiasen Zheng 1 School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, China Academy of Mathematics and Systems Science, Chinese Academy of Sciences 3 Center for Statistics and Data Science, Beijing Normal University, Zhuhai, China Department of Mathematics, Hong Kong Baptist University, Hong Kong Center for Statistical Science, Tsinghua University, Beijing, China

对于具有带结构的大协方差矩阵和相应的精度矩阵，本文提出了一种识别带宽的准则。该方法基于在真带宽处不连续的目标函数，呈现出“谷-崖”模式，从而使该位置的识别可视化，易于实现。给出了用该估计带宽估计的协方差矩阵和精度矩阵的估计一致性和估计误差界。数值研究证明了该方法的有限样本有效性，并用实际数据的有效性分析进行了说明。通讯作者(L. Zhu)。邮箱:zhuxuehu@xjtu.edu.cn(朱旭)、guojian191@mails.ucas.ac.cn(郭杰)、xustat12@bnu.edu.cn(郭旭)、lzhu@hkbu.edu.hk(朱磊)、jiasen zheng@mail.tsinghua.edu.cn(郑杰)。1 .《中国统计》:新录用论文(接受作者版本，需英文编辑)

引用次数: 1

Feature-weighted elastic net: using "features of features" for better prediction. 特征加权弹性网：利用 "特征的特征 "进行更好的预测。

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0226

J Kenneth Tay, Nima Aghaeepour, Trevor Hastie, Robert Tibshirani

In some supervised learning settings, the practitioner might have additional information on the features used for prediction. We propose a new method which leverages this additional information for better prediction. The method, which we call the feature-weighted elastic net ("fwelnet"), uses these "features of features" to adapt the relative penalties on the feature coefficients in the elastic net penalty. In our simulations, fwelnet outperforms the lasso in terms of test mean squared error and usually gives an improvement in true positive rate or false positive rate for feature selection. We also apply this method to early prediction of preeclampsia, where fwelnet outperforms the lasso in terms of 10-fold cross-validated area under the curve (0.86 vs. 0.80). We also provide a connection between fwelnet and the group lasso and suggest how fwelnet might be used for multi-task learning.

在某些监督学习环境中，实践者可能对用于预测的特征有额外的信息。我们提出了一种新方法，可以利用这些额外信息进行更好的预测。我们称这种方法为特征加权弹性网（"fwelnet"），它利用这些 "特征的特征 "来调整弹性网惩罚中对特征系数的相对惩罚。在我们的模拟中，fwelnet 在测试均方误差方面优于 lasso，而且在特征选择的真阳性率或假阳性率方面通常也有所提高。我们还将这种方法应用于子痫前期的早期预测，从 10 倍交叉验证的曲线下面积来看，fwelnet 优于 lasso（0.86 对 0.80）。我们还提供了 fwelnet 与群体套索之间的联系，并建议如何将 fwelnet 用于多任务学习。

引用次数: 0

ON THE CONSISTENCY OF THE LEAST SQUARES ESTIMATOR IN MODELS SAMPLED AT RANDOM TIMES DRIVEN BY LONG MEMORY NOISE: THE RENEWAL CASE 长记忆噪声驱动下随机采样模型最小二乘估计的一致性:更新情况

3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202020.0457

Héctor Araya, Natalia Bahamonde, Lisandro Fermín, Tania Roa, Soledad Torres

In this study, we prove the strong consistency of the least squares estimator in a random sampled linear regression model with long-memory noise and an independent set of random times given by renewal process sampling. Additionally, we illustrate how to work with a random number of observations up to time T = 1. A simulation study is provided to illustrate the behavior of the different terms, as well as the performance of the estimator under various values of the Hurst parameter H.

本文证明了具有长记忆噪声的随机抽样线性回归模型的最小二乘估计量的强相合性。此外，我们还说明了如何处理时间T = 1之前的随机观测数。仿真研究说明了不同项的行为，以及估计器在不同Hurst参数H值下的性能。

引用次数: 1

Distributed Mean Dimension Reduction Through Semi-parametric Approaches 半参数方法的分布均值降维

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202022.0157

Zhengtian Zhu, Wang-li Xu, Liping Zhu

In the present article we recast the semi-parametric mean dimension reduction approaches under a least squares framework, which turns the problem of recovering the central mean subspace into a series of problems of estimating slopes in linear regressions. It also facilitates to incorporate penalties to produce sparse solutions. We further adapt the semi-parametric mean dimension reduction approaches to distributed settings when massive data are scattered at various locations and cannot be aggregated or processed through a single machine. We propose three communication-efficient distributed algorithms, the first yields a dense solution, the second produces a sparse estimation, and the third provides an orthonormal basis. The distributed algorithms reduce the computational complexities of the pooled ones substantially. In addition, the distributed algorithms attain oracle rates after a finite number of iterations. We conduct extensive numerical studies to demonstrate the finite-sample performance of the distributed estimates and to compare with the pooled algorithms.

在本文中，我们在最小二乘框架下对半参数均值降维方法进行了改造，将中心均值子空间的恢复问题转化为一系列线性回归中斜率的估计问题。它还有助于合并惩罚以生成稀疏解。我们进一步将半参数平均降维方法应用于分布式环境，当大量数据分散在不同位置，无法通过单个机器进行聚合或处理时。我们提出了三种通信高效的分布式算法，第一种算法产生密集解，第二种算法产生稀疏估计，第三种算法提供标准正交基。分布式算法大大降低了池化算法的计算复杂度。此外，分布式算法在有限次迭代后达到oracle率。我们进行了广泛的数值研究，以证明分布式估计的有限样本性能，并与池算法进行比较。

引用次数: 0

Efficient Learning of Nonparametric Directed Acyclic Graph With Statistical Guarantee 具有统计保证的非参数有向无环图的有效学习

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202022.0272

Yibo Deng, Xin He, Shaogao Lv

Efficient Learning of Nonparametric Directed Acyclic Graph With Statistical Guarantee

有向无环图(DAG)模型被广泛用于表示所收集节点之间的因果关系。本文提出了一种具有一般因果依赖结构的高效一致的DAG学习方法，这与现有的大多数假设因果关系线性依赖的方法形成了鲜明对比。为了方便DAG学习，该方法利用拓扑层的概念，将非参数DAG学习与光滑再现核希尔伯特空间(RKHS)中的核脊回归和学习梯度联系起来，表明通过基于核的估计可以精确地重建非参数DAG的拓扑层，并且通过计算估计的梯度函数可以直接获得父-子关系。所开发的算法在计算上是高效的，因为它试图用解析解来解决一个凸优化问题，梯度的吕少高是相应的作者;作者对本文的贡献相同，他们的名字按字母顺序排列。中国统计:新录用论文(接受作者版本，需英文编辑)

引用次数: 0

Optimal Subsampling for Multinomial Logistic Models With Big Data 大数据下多项式Logistic模型的最优子抽样

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202022.0277

Zhiqiang Ye, Jun Yu, Mingyao Ai

This section is dedicated to presenting the explicit forms of πij(β)’s and their derivatives, which are important parts in searching the maximum likelihood estimator and in the theoretical proofs. The categorical probability πij(β) for Models (2.1)-(2.4) can be calculated directly, and the first derivative of πij(β) with respect to β can be gotten through ∂πij(β) ∂β = πij(β) ∂ log πij(β) ∂β , (S1.1)

本节给出π (β)及其导数的显式形式，π (β)及其导数是搜索极大似然估计量和理论证明的重要组成部分。可以直接计算出模型(2.1)-(2.4)的分类概率πij(β)，通过∂πij(β)∂β = πij(β)∂log πij(β)∂β， (S1.1)得到πij(β)对β的一阶导数。

引用次数: 0

Time-Varying Correlation for Noncentered Nonstationary Time Series: Simultaneous Inference and Visualization 非中心非平稳时间序列的时变相关:同时推理和可视化

IF 1.4 3区数学 Q2 STATISTICS & PROBABILITY

Statistica Sinica

Pub Date : 2023-01-01 DOI: 10.5705/ss.202022.0244

Ting Zhang, Yu Shao

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Statistica Sinica

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀