首页 > 最新文献

Statistics and Computing最新文献

英文 中文
Estimation of regime-switching diffusions via Fourier transforms 通过傅立叶变换估计制度切换扩散
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-03-05 DOI: 10.1007/s11222-024-10397-6
Thomas Lux

In this article, an algorithm for maximum-likelihood estimation of regime-switching diffusions is proposed. The proposed approach uses a Fourier transform to numerically solve the system of Fokker–Planck or forward Kolmogorow equations for the temporal evolution of the state densities. Monte Carlo simulations confirm the theoretically expected consistency of this approach for moderate sample sizes and its practical feasibility for certain regime-switching diffusions used in economics and biology with moderate numbers of states and parameters. An application to animal movement data serves as an illustration of the proposed algorithm.

本文提出了一种对制度切换扩散进行最大似然估计的算法。该方法利用傅立叶变换对状态密度时间演化的福克-普朗克方程或前向科尔莫格罗方程组进行数值求解。蒙特卡罗模拟证实了这种方法在中等样本量时的理论预期一致性,以及它在经济学和生物学中某些具有中等数量状态和参数的制度转换扩散的实际可行性。对动物运动数据的应用是对所提算法的一个说明。
{"title":"Estimation of regime-switching diffusions via Fourier transforms","authors":"Thomas Lux","doi":"10.1007/s11222-024-10397-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10397-6","url":null,"abstract":"<p>In this article, an algorithm for maximum-likelihood estimation of regime-switching diffusions is proposed. The proposed approach uses a Fourier transform to numerically solve the system of Fokker–Planck or forward Kolmogorow equations for the temporal evolution of the state densities. Monte Carlo simulations confirm the theoretically expected consistency of this approach for moderate sample sizes and its practical feasibility for certain regime-switching diffusions used in economics and biology with moderate numbers of states and parameters. An application to animal movement data serves as an illustration of the proposed algorithm.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"10 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140035718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional sparse single–index regression via Hilbert–Schmidt independence criterion 通过希尔伯特-施密特独立性准则实现高维稀疏单索引回归
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-27 DOI: 10.1007/s11222-024-10399-4
Xin Chen, Chang Deng, Shuaida He, Runxiong Wu, Jia Zhang

Hilbert-Schmidt Independence Criterion (HSIC) has recently been introduced to the field of single-index models to estimate the directions. Compared with other well-established methods, the HSIC based method requires relatively weak conditions. However, its performance has not yet been studied in the prevalent high-dimensional scenarios, where the number of covariates can be much larger than the sample size. In this article, based on HSIC, we propose to estimate the possibly sparse directions in the high-dimensional single-index models through a parameter reformulation. Our approach estimates the subspace of the direction directly and performs variable selection simultaneously. Due to the non-convexity of the objective function and the complexity of the constraints, a majorize-minimize algorithm together with the linearized alternating direction method of multipliers is developed to solve the optimization problem. Since it does not involve the inverse of the covariance matrix, the algorithm can naturally handle large p small n scenarios. Through extensive simulation studies and a real data analysis, we show that our proposal is efficient and effective in the high-dimensional settings. The (texttt {Matlab}) codes for this method are available online.

希尔伯特-施密特独立准则(Hilbert-Schmidt Independence Criterion,HSIC)最近被引入单指数模型领域,用于估计方向。与其他成熟的方法相比,基于 HSIC 的方法所需的条件相对较弱。然而,在协变量数量可能远大于样本量的普遍高维情况下,该方法的性能尚未得到研究。本文以 HSIC 为基础,提出通过参数重构来估计高维单指标模型中可能存在的稀疏方向。我们的方法直接估计方向子空间,并同时进行变量选择。由于目标函数的非凸性和约束条件的复杂性,我们开发了一种大数最小化算法和线性化交替方向乘法来解决优化问题。由于该算法不涉及协方差矩阵的逆,因此可以自然地处理大 p 小 n 的情况。通过大量的模拟研究和真实数据分析,我们证明了我们的建议在高维环境下是高效和有效的。该方法的(texttt {Matlab} )代码可在线获取。
{"title":"High-dimensional sparse single–index regression via Hilbert–Schmidt independence criterion","authors":"Xin Chen, Chang Deng, Shuaida He, Runxiong Wu, Jia Zhang","doi":"10.1007/s11222-024-10399-4","DOIUrl":"https://doi.org/10.1007/s11222-024-10399-4","url":null,"abstract":"<p>Hilbert-Schmidt Independence Criterion (HSIC) has recently been introduced to the field of single-index models to estimate the directions. Compared with other well-established methods, the HSIC based method requires relatively weak conditions. However, its performance has not yet been studied in the prevalent high-dimensional scenarios, where the number of covariates can be much larger than the sample size. In this article, based on HSIC, we propose to estimate the possibly sparse directions in the high-dimensional single-index models through a parameter reformulation. Our approach estimates the subspace of the direction directly and performs variable selection simultaneously. Due to the non-convexity of the objective function and the complexity of the constraints, a majorize-minimize algorithm together with the linearized alternating direction method of multipliers is developed to solve the optimization problem. Since it does not involve the inverse of the covariance matrix, the algorithm can naturally handle large <i>p</i> small <i>n</i> scenarios. Through extensive simulation studies and a real data analysis, we show that our proposal is efficient and effective in the high-dimensional settings. The <span>(texttt {Matlab})</span> codes for this method are available online.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"6 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140005016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improvements on scalable stochastic Bayesian inference methods for multivariate Hawkes process 改进多变量霍克斯过程的可扩展随机贝叶斯推理方法
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-27 DOI: 10.1007/s11222-024-10392-x
Alex Ziyu Jiang, Abel Rodriguez

Multivariate Hawkes Processes (MHPs) are a class of point processes that can account for complex temporal dynamics among event sequences. In this work, we study the accuracy and computational efficiency of three classes of algorithms which, while widely used in the context of Bayesian inference, have rarely been applied in the context of MHPs: stochastic gradient expectation-maximization, stochastic gradient variational inference and stochastic gradient Langevin Monte Carlo. An important contribution of this paper is a novel approximation to the likelihood function that allows us to retain the computational advantages associated with conjugate settings while reducing approximation errors associated with the boundary effects. The comparisons are based on various simulated scenarios as well as an application to the study of risk dynamics in the Standard & Poor’s 500 intraday index prices among its 11 sectors.

多变量霍克斯过程(MHPs)是一类能解释事件序列间复杂时间动态的点过程。在这项工作中,我们研究了三类算法的准确性和计算效率,这三类算法虽然广泛应用于贝叶斯推理,但很少应用于 MHPs:随机梯度期望最大化、随机梯度变分推理和随机梯度朗格文蒙特卡罗。本文的一个重要贡献是对似然函数进行了新的近似,使我们既能保留共轭设置带来的计算优势,又能减少与边界效应相关的近似误差。比较基于各种模拟情景以及对标准普尔 500 指数 11 个板块盘中价格风险动态研究的应用。
{"title":"Improvements on scalable stochastic Bayesian inference methods for multivariate Hawkes process","authors":"Alex Ziyu Jiang, Abel Rodriguez","doi":"10.1007/s11222-024-10392-x","DOIUrl":"https://doi.org/10.1007/s11222-024-10392-x","url":null,"abstract":"<p>Multivariate Hawkes Processes (MHPs) are a class of point processes that can account for complex temporal dynamics among event sequences. In this work, we study the accuracy and computational efficiency of three classes of algorithms which, while widely used in the context of Bayesian inference, have rarely been applied in the context of MHPs: stochastic gradient expectation-maximization, stochastic gradient variational inference and stochastic gradient Langevin Monte Carlo. An important contribution of this paper is a novel approximation to the likelihood function that allows us to retain the computational advantages associated with conjugate settings while reducing approximation errors associated with the boundary effects. The comparisons are based on various simulated scenarios as well as an application to the study of risk dynamics in the Standard &amp; Poor’s 500 intraday index prices among its 11 sectors.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"2018 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140005135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum likelihood estimation of log-concave densities on tree space 树空间对数凹密度的最大似然估计
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-23 DOI: 10.1007/s11222-024-10400-0
Yuki Takazawa, Tomonari Sei

Phylogenetic trees are key data objects in biology, and the method of phylogenetic reconstruction has been highly developed. The space of phylogenetic trees is a nonpositively curved metric space. Recently, statistical methods to analyze samples of trees on this space are being developed utilizing this property. Meanwhile, in Euclidean space, the log-concave maximum likelihood method has emerged as a new nonparametric method for probability density estimation. In this paper, we derive a sufficient condition for the existence and uniqueness of the log-concave maximum likelihood estimator on tree space. We also propose an estimation algorithm for one and two dimensions. Since various factors affect the inferred trees, it is difficult to specify the distribution of a sample of trees. The class of log-concave densities is nonparametric, and yet the estimation can be conducted by the maximum likelihood method without selecting hyperparameters. We compare the estimation performance with a previously developed kernel density estimator numerically. In our examples where the true density is log-concave, we demonstrate that our estimator has a smaller integrated squared error when the sample size is large. We also conduct numerical experiments of clustering using the Expectation-Maximization algorithm and compare the results with k-means++ clustering using Fréchet mean.

系统发生树是生物学中的关键数据对象,系统发生重建的方法也得到了高度发展。系统发育树的空间是一个非正向弯曲的度量空间。最近,利用这一特性开发出了在该空间上分析树样本的统计方法。同时,在欧几里得空间中,对数凹最大似然法作为一种新的非参数方法出现,用于概率密度估计。本文推导了树空间对数凹极大似然估计子存在性和唯一性的充分条件。我们还提出了一种一维和二维的估计算法。由于各种因素会影响推断出的树,因此很难确定树样本的分布。对数凹密度类是非参数的,但可以通过最大似然法进行估计,而无需选择超参数。我们将估计结果与之前开发的核密度估计器进行了数值比较。在真实密度为对数凹的例子中,我们证明了当样本量较大时,我们的估计器具有较小的综合平方误差。我们还对使用期望最大化算法进行聚类进行了数值实验,并将结果与使用弗雷谢特均值进行的 k-means++ 聚类进行了比较。
{"title":"Maximum likelihood estimation of log-concave densities on tree space","authors":"Yuki Takazawa, Tomonari Sei","doi":"10.1007/s11222-024-10400-0","DOIUrl":"https://doi.org/10.1007/s11222-024-10400-0","url":null,"abstract":"<p>Phylogenetic trees are key data objects in biology, and the method of phylogenetic reconstruction has been highly developed. The space of phylogenetic trees is a nonpositively curved metric space. Recently, statistical methods to analyze samples of trees on this space are being developed utilizing this property. Meanwhile, in Euclidean space, the log-concave maximum likelihood method has emerged as a new nonparametric method for probability density estimation. In this paper, we derive a sufficient condition for the existence and uniqueness of the log-concave maximum likelihood estimator on tree space. We also propose an estimation algorithm for one and two dimensions. Since various factors affect the inferred trees, it is difficult to specify the distribution of a sample of trees. The class of log-concave densities is nonparametric, and yet the estimation can be conducted by the maximum likelihood method without selecting hyperparameters. We compare the estimation performance with a previously developed kernel density estimator numerically. In our examples where the true density is log-concave, we demonstrate that our estimator has a smaller integrated squared error when the sample size is large. We also conduct numerical experiments of clustering using the Expectation-Maximization algorithm and compare the results with k-means++ clustering using Fréchet mean.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"10 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Do applied statisticians prefer more randomness or less? Bootstrap or Jackknife? 应用统计学家更喜欢随机性多一些还是少一些?Bootstrap 还是 Jackknife?
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-22 DOI: 10.1007/s11222-024-10388-7
Yannis G. Yatracos

Bootstrap and Jackknife estimates, (T_{n,B}^*) and (T_{n,J},) respectively, of a population parameter (theta ) are both used in statistical computations; n is the sample size, B is the number of Bootstrap samples. For any (n_0) and (B_0,) Bootstrap samples do not add new information about (theta ) being observations from the original sample and when (B_0<infty ,) (T_{n_0,B_0}^*) includes also resampling variability, an additional source of uncertainty not affecting (T_{n_0, J}.) These are neglected in theoretical papers with results for the utopian (T_{n, infty }^*, ) that do not hold for (B<infty .) The consequence is that (T^*_{n_0, B_0}) is expected to have larger mean squared error (MSE) than (T_{n_0,J},) namely (T_{n_0,B_0}^*) is inadmissible. The amount of inadmissibility may be very large when populations’ parameters, e.g. the variance, are unbounded and/or with big data. A palliating remedy is increasing B, the larger the better, but the MSEs ordering remains unchanged for (B<infty .) This is confirmed theoretically when (theta ) is the mean of a population, and is observed in the estimated total MSE for linear regression coefficients. In the latter, the chance the estimated total MSE with (T_{n,B}^*) improves that with (T_{n,J}) decreases to 0 as B increases.

Bootstrap和Jackknife估计值(分别为(T_{n,B}^*)和(T_{n,J},))在统计计算中都会用到;n是样本大小,B是Bootstrap样本的数量。对于任意的 (n_0) 和 (B_0,) Bootstrap 样本不会增加关于 (theta ) 的新信息,这些信息是来自原始样本的观察结果,当 (B_0<infty ,) (T_{n_0,B_0}^*) 也包括重采样的变异性,这是一个额外的不确定性来源,不会影响 (T_{n_0, J}.这些在理论文章中被忽略了,对于乌托邦式的(T_{n, infty }^*, )的结果并不成立,而对于(B<infty .其结果是,(T^*_{n_0, B_0}) 的均方误差(MSE)会大于(T_{n_0,J},),即(T_{n_0,B_0}^*)是不可接受的。当群体的参数(如方差)没有限制和/或数据量很大时,不允许的数量可能会非常大。一个缓解的办法是增加 B,越大越好,但 (B<infty .) 的 MSEs 排序保持不变,当 (theta ) 是一个种群的均值时,这一点在理论上得到了证实,并在线性回归系数的估计总 MSE 中得到了观察。在后者中,随着 B 的增加,用 (T_{n,B}^*) 估计出的总 MSE 改善用 (T_{n,J}) 估计出的总 MSE 的机会减小到 0。
{"title":"Do applied statisticians prefer more randomness or less? Bootstrap or Jackknife?","authors":"Yannis G. Yatracos","doi":"10.1007/s11222-024-10388-7","DOIUrl":"https://doi.org/10.1007/s11222-024-10388-7","url":null,"abstract":"<p>Bootstrap and Jackknife estimates, <span>(T_{n,B}^*)</span> and <span>(T_{n,J},)</span> respectively, of a population parameter <span>(theta )</span> are both used in statistical computations; <i>n</i> is the sample size, <i>B</i> is the number of Bootstrap samples. For any <span>(n_0)</span> and <span>(B_0,)</span> Bootstrap samples do not add new information about <span>(theta )</span> being observations from the original sample and when <span>(B_0&lt;infty ,)</span> <span>(T_{n_0,B_0}^*)</span> includes also resampling variability, an additional source of uncertainty not affecting <span>(T_{n_0, J}.)</span> These are neglected in theoretical papers with results for the utopian <span>(T_{n, infty }^*, )</span> that do not hold for <span>(B&lt;infty .)</span> The consequence is that <span>(T^*_{n_0, B_0})</span> is expected to have larger mean squared error (MSE) than <span>(T_{n_0,J},)</span> namely <span>(T_{n_0,B_0}^*)</span> is inadmissible. The amount of inadmissibility may be very large when populations’ parameters, e.g. the variance, are unbounded and/or with big data. A palliating remedy is increasing <i>B</i>, the larger the better, but the MSEs ordering remains unchanged for <span>(B&lt;infty .)</span> This is confirmed theoretically when <span>(theta )</span> is the mean of a population, and is observed in the estimated total MSE for linear regression coefficients. In the latter, the chance the estimated total MSE with <span>(T_{n,B}^*)</span> improves that with <span>(T_{n,J})</span> decreases to 0 as <i>B</i> increases.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"54 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139947598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Forward stability and model path selection 前向稳定性和模型路径选择
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-20 DOI: 10.1007/s11222-024-10395-8
Nicholas Kissel, Lucas Mentch

Most scientific publications follow the familiar recipe of (i) obtain data, (ii) fit a model, and (iii) comment on the scientific relevance of the effects of particular covariates in that model. This approach, however, ignores the fact that there may exist a multitude of similarly-accurate models in which the implied effects of individual covariates may be vastly different. This problem of finding an entire collection of plausible models has also received relatively little attention in the statistics community, with nearly all of the proposed methodologies being narrowly tailored to a particular model class and/or requiring an exhaustive search over all possible models, making them largely infeasible in the current big data era. This work develops the idea of forward stability and proposes a novel, computationally-efficient approach to finding collections of accurate models we refer to as model path selection (MPS). MPS builds up a plausible model collection via a forward selection approach and is entirely agnostic to the model class and loss function employed. The resulting model collection can be displayed in a simple and intuitive graphical fashion, easily allowing practitioners to visualize whether some covariates can be swapped for others with minimal loss.

大多数科学出版物都遵循我们熟悉的方法:(i) 获取数据,(ii) 拟合模型,(iii) 评论该模型中特定协变量效应的科学相关性。然而,这种方法忽略了这样一个事实,即可能存在许多类似的精确模型,而在这些模型中,各个协变量的隐含效应可能大相径庭。统计学界对寻找整个可信模型集合这一问题的关注也相对较少,几乎所有提出的方法都是狭隘地针对某一特定模型类别和/或要求对所有可能的模型进行穷举搜索,这在当前的大数据时代基本上是不可行的。这项工作发展了前向稳定性的思想,并提出了一种新颖的、计算效率高的方法来寻找精确模型集合,我们称之为模型路径选择(MPS)。MPS 通过前向选择方法建立了一个可信的模型集合,并且完全不考虑所使用的模型类别和损失函数。由此产生的模型集合可以用简单直观的图形方式显示出来,方便从业人员直观地了解是否可以在损失最小的情况下将某些协变量替换为其他协变量。
{"title":"Forward stability and model path selection","authors":"Nicholas Kissel, Lucas Mentch","doi":"10.1007/s11222-024-10395-8","DOIUrl":"https://doi.org/10.1007/s11222-024-10395-8","url":null,"abstract":"<p>Most scientific publications follow the familiar recipe of (i) obtain data, (ii) fit a model, and (iii) comment on the scientific relevance of the effects of particular covariates in that model. This approach, however, ignores the fact that there may exist a multitude of similarly-accurate models in which the implied effects of individual covariates may be vastly different. This problem of finding an entire collection of plausible models has also received relatively little attention in the statistics community, with nearly all of the proposed methodologies being narrowly tailored to a particular model class and/or requiring an exhaustive search over all possible models, making them largely infeasible in the current big data era. This work develops the idea of forward stability and proposes a novel, computationally-efficient approach to finding collections of accurate models we refer to as model path selection (MPS). MPS builds up a plausible model collection via a forward selection approach and is entirely agnostic to the model class and loss function employed. The resulting model collection can be displayed in a simple and intuitive graphical fashion, easily allowing practitioners to visualize whether some covariates can be swapped for others with minimal loss.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"41 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139927157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The minimum covariance determinant estimator for interval-valued data 区间值数据的最小协方差行列式估计器
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-17 DOI: 10.1007/s11222-024-10386-9
Wan Tian, Zhongfeng Qin

Effective estimation of covariance matrices is crucial for statistical analyses and applications. In this paper, we focus on the robust estimation of covariance matrix for interval-valued data in low and moderately high dimensions. In the low-dimensional scenario, we extend the Minimum Covariance Determinant (MCD) estimator to interval-valued data. We derive an iterative algorithm for computing this estimator, demonstrate its convergence, and theoretically establish that it retains the high breakdown-point property of the MCD estimator. Further, we propose a projection-based estimator and a regularization-based estimator to extend the MCD estimator to moderately high-dimensional settings, respectively. We propose efficient iterative algorithms for solving these two estimators and demonstrate their convergence properties. We conduct extensive simulation studies and real data analysis to validate the finite sample properties of these proposed estimators.

有效估计协方差矩阵对统计分析和应用至关重要。在本文中,我们将重点关注低维和中高维区间值数据协方差矩阵的稳健估计。在低维情况下,我们将最小协方差判定(MCD)估计器扩展到区间值数据。我们推导了计算该估计器的迭代算法,证明了它的收敛性,并从理论上确定它保留了 MCD 估计器的高分解点特性。此外,我们还提出了一种基于投影的估计器和一种基于正则化的估计器,分别将 MCD 估计器扩展到中高维环境。我们提出了求解这两种估计器的高效迭代算法,并证明了它们的收敛特性。我们进行了大量的模拟研究和实际数据分析,以验证这些估计器的有限样本特性。
{"title":"The minimum covariance determinant estimator for interval-valued data","authors":"Wan Tian, Zhongfeng Qin","doi":"10.1007/s11222-024-10386-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10386-9","url":null,"abstract":"<p>Effective estimation of covariance matrices is crucial for statistical analyses and applications. In this paper, we focus on the robust estimation of covariance matrix for interval-valued data in low and moderately high dimensions. In the low-dimensional scenario, we extend the Minimum Covariance Determinant (MCD) estimator to interval-valued data. We derive an iterative algorithm for computing this estimator, demonstrate its convergence, and theoretically establish that it retains the high breakdown-point property of the MCD estimator. Further, we propose a projection-based estimator and a regularization-based estimator to extend the MCD estimator to moderately high-dimensional settings, respectively. We propose efficient iterative algorithms for solving these two estimators and demonstrate their convergence properties. We conduct extensive simulation studies and real data analysis to validate the finite sample properties of these proposed estimators.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"11 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139902728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering longitudinal ordinal data via finite mixture of matrix-variate distributions 通过矩阵变量分布的有限混合物对纵向序数数据进行聚类
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-17 DOI: 10.1007/s11222-024-10390-z
Francesco Amato, Julien Jacques, Isabelle Prim-Allaz

In social sciences, studies are often based on questionnaires asking participants to express ordered responses several times over a study period. We present a model-based clustering algorithm for such longitudinal ordinal data. Assuming that an ordinal variable is the discretization of an underlying latent continuous variable, the model relies on a mixture of matrix-variate normal distributions, accounting simultaneously for within- and between-time dependence structures. The model is thus able to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure. An EM algorithm is developed and presented for parameters estimation, and approaches to deal with some arising computational challenges are outlined. An evaluation of the model through synthetic data shows its estimation abilities and its advantages when compared to competitors. A real-world application concerning changes in eating behaviors during the Covid-19 pandemic period in France will be presented.

在社会科学领域,研究通常基于调查问卷,要求参与者在研究期间多次表达有序的回答。我们针对此类纵向序数数据提出了一种基于模型的聚类算法。假设顺序变量是潜在连续变量的离散化,该模型依赖于矩阵变量正态分布的混合,同时考虑时间内和时间间的依赖结构。因此,该模型能够同时模拟异质性、反应之间的关联性和时间依赖结构。该模型开发并提出了一种用于参数估计的 EM 算法,并概述了应对一些计算挑战的方法。通过合成数据对模型进行的评估显示了其估算能力以及与竞争对手相比的优势。此外,还将介绍一个有关法国 Covid-19 大流行期间饮食行为变化的实际应用。
{"title":"Clustering longitudinal ordinal data via finite mixture of matrix-variate distributions","authors":"Francesco Amato, Julien Jacques, Isabelle Prim-Allaz","doi":"10.1007/s11222-024-10390-z","DOIUrl":"https://doi.org/10.1007/s11222-024-10390-z","url":null,"abstract":"<p>In social sciences, studies are often based on questionnaires asking participants to express ordered responses several times over a study period. We present a model-based clustering algorithm for such longitudinal ordinal data. Assuming that an ordinal variable is the discretization of an underlying latent continuous variable, the model relies on a mixture of matrix-variate normal distributions, accounting simultaneously for within- and between-time dependence structures. The model is thus able to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure. An EM algorithm is developed and presented for parameters estimation, and approaches to deal with some arising computational challenges are outlined. An evaluation of the model through synthetic data shows its estimation abilities and its advantages when compared to competitors. A real-world application concerning changes in eating behaviors during the Covid-19 pandemic period in France will be presented.\u0000</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"39 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139902724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enmsp: an elastic-net multi-step screening procedure for high-dimensional regression Enmsp:用于高维回归的弹性网多步筛选程序
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-16 DOI: 10.1007/s11222-024-10394-9
Yushan Xue, Jie Ren, Bin Yang

To improve the estimation efficiency of high-dimensional regression problems, penalized regularization is routinely used. However, accurately estimating the model remains challenging, particularly in the presence of correlated effects, wherein irrelevant covariates exhibit strong correlation with relevant ones. This situation, referred to as correlated data, poses additional complexities for model estimation. In this paper, we propose the elastic-net multi-step screening procedure (EnMSP), an iterative algorithm designed to recover sparse linear models in the context of correlated data. EnMSP uses a small repeated penalty strategy to identify truly relevant covariates in a few iterations. Specifically, in each iteration, EnMSP enhances the adaptive lasso method by adding a weighted (l_2) penalty, which improves the selection of relevant covariates. The method is shown to select the true model and achieve the (l_2)-norm error bound under certain conditions. The effectiveness of EnMSP is demonstrated through numerical comparisons and applications in financial data.

为了提高高维回归问题的估计效率,通常会使用惩罚正则化。然而,准确估计模型仍然具有挑战性,尤其是在存在相关效应的情况下,即无关协变量与相关协变量表现出很强的相关性。这种情况被称为相关数据,给模型估计带来了额外的复杂性。在本文中,我们提出了弹性网多步筛选程序(EnMSP),这是一种迭代算法,旨在恢复相关数据背景下的稀疏线性模型。EnMSP 采用小规模重复惩罚策略,在几次迭代中识别出真正相关的协变量。具体来说,在每次迭代中,EnMSP通过添加加权(l_2)惩罚来增强自适应套索方法,从而改进相关协变量的选择。结果表明,该方法可以选择真实模型,并在一定条件下实现 (l_2)-norm 误差约束。通过数值比较和在金融数据中的应用,证明了 EnMSP 的有效性。
{"title":"Enmsp: an elastic-net multi-step screening procedure for high-dimensional regression","authors":"Yushan Xue, Jie Ren, Bin Yang","doi":"10.1007/s11222-024-10394-9","DOIUrl":"https://doi.org/10.1007/s11222-024-10394-9","url":null,"abstract":"<p>To improve the estimation efficiency of high-dimensional regression problems, penalized regularization is routinely used. However, accurately estimating the model remains challenging, particularly in the presence of correlated effects, wherein irrelevant covariates exhibit strong correlation with relevant ones. This situation, referred to as correlated data, poses additional complexities for model estimation. In this paper, we propose the elastic-net multi-step screening procedure (EnMSP), an iterative algorithm designed to recover sparse linear models in the context of correlated data. EnMSP uses a small repeated penalty strategy to identify truly relevant covariates in a few iterations. Specifically, in each iteration, EnMSP enhances the adaptive lasso method by adding a weighted <span>(l_2)</span> penalty, which improves the selection of relevant covariates. The method is shown to select the true model and achieve the <span>(l_2)</span>-norm error bound under certain conditions. The effectiveness of EnMSP is demonstrated through numerical comparisons and applications in financial data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"26 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139753860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian parameter inference for partially observed stochastic volterra equations 部分观测随机伏特拉方程的贝叶斯参数推断
IF 2.2 2区 数学 Q2 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-02-14 DOI: 10.1007/s11222-024-10389-6
Ajay Jasra, Hamza Ruzayqat, Amin Wu

In this article we consider Bayesian parameter inference for a type of partially observed stochastic Volterra equation (SVE). SVEs are found in many areas such as physics and mathematical finance. In the latter field they can be used to represent long memory in unobserved volatility processes. In many cases of practical interest, SVEs must be time-discretized and then parameter inference is based upon the posterior associated to this time-discretized process. Based upon recent studies on time-discretization of SVEs (e.g. Richard et al. in Stoch Proc Appl 141:109–138, 2021) we use Euler–Maruyama methods for the afore-mentioned discretization. We then show how multilevel Markov chain Monte Carlo (MCMC) methods (Jasra et al. in SIAM J Sci Comp 40:A887–A902, 2018) can be applied in this context. In the examples we study, we give a proof that shows that the cost to achieve a mean square error (MSE) of (mathcal {O}(epsilon ^2)), (epsilon >0), is (mathcal {O}(epsilon ^{-tfrac{4}{2H+1}})), where H is the Hurst parameter. If one uses a single level MCMC method then the cost is (mathcal {O}(epsilon ^{-tfrac{2(2H+3)}{2H+1}})) to achieve the same MSE. We illustrate these results in the context of state-space and stochastic volatility models, with the latter applied to real data.

在本文中,我们考虑对一种部分观测随机伏特拉方程(SVE)进行贝叶斯参数推断。SVE 存在于物理学和数学金融学等许多领域。在数学金融领域,它们可以用来表示未观测波动过程中的长记忆。在许多实际案例中,SVE 必须进行时间离散化,然后根据与时间离散化过程相关的后验结果进行参数推断。根据最近对 SVE 时间离散化的研究(如 Richard 等人在 Stoch Proc Appl 141:109-138, 2021 年),我们使用 Euler-Maruyama 方法进行上述离散化。然后,我们展示了多级马尔科夫链蒙特卡罗 (MCMC) 方法(Jasra 等人,载于 SIAM J Sci Comp 40:A887-A902, 2018)如何应用于这种情况。在我们研究的例子中,我们给出了一个证明,表明实现均方误差(MSE)为 (mathcal {O}(epsilon ^2)), (epsilon >0)的代价是 (mathcal {O}(epsilon ^{-tfrac{4}{2H+1}}), 其中 H 是赫斯特参数。如果使用单级 MCMC 方法,则达到相同 MSE 的成本为 (mathcal {O}(epsilon ^{-tfrac{2(2H+3)}{2H+1}})) 。我们结合状态空间模型和随机波动模型来说明这些结果,其中随机波动模型应用于真实数据。
{"title":"Bayesian parameter inference for partially observed stochastic volterra equations","authors":"Ajay Jasra, Hamza Ruzayqat, Amin Wu","doi":"10.1007/s11222-024-10389-6","DOIUrl":"https://doi.org/10.1007/s11222-024-10389-6","url":null,"abstract":"<p>In this article we consider Bayesian parameter inference for a type of partially observed stochastic Volterra equation (SVE). SVEs are found in many areas such as physics and mathematical finance. In the latter field they can be used to represent long memory in unobserved volatility processes. In many cases of practical interest, SVEs must be time-discretized and then parameter inference is based upon the posterior associated to this time-discretized process. Based upon recent studies on time-discretization of SVEs (e.g. Richard et al. in Stoch Proc Appl 141:109–138, 2021) we use Euler–Maruyama methods for the afore-mentioned discretization. We then show how multilevel Markov chain Monte Carlo (MCMC) methods (Jasra et al. in SIAM J Sci Comp 40:A887–A902, 2018) can be applied in this context. In the examples we study, we give a proof that shows that the cost to achieve a mean square error (MSE) of <span>(mathcal {O}(epsilon ^2))</span>, <span>(epsilon &gt;0)</span>, is <span>(mathcal {O}(epsilon ^{-tfrac{4}{2H+1}}))</span>, where <i>H</i> is the Hurst parameter. If one uses a single level MCMC method then the cost is <span>(mathcal {O}(epsilon ^{-tfrac{2(2H+3)}{2H+1}}))</span> to achieve the same MSE. We illustrate these results in the context of state-space and stochastic volatility models, with the latter applied to real data.</p>","PeriodicalId":22058,"journal":{"name":"Statistics and Computing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139754209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics and Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1