Computational Statistics & Data Analysis最新文献

英文中文

A double Pólya-Gamma data augmentation scheme for a hierarchical Negative Binomial - Binomial data model 分层负二项-二项数据模型的双 Pólya-Gamma 数据扩充方案

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-06-20 DOI: 10.1016/j.csda.2024.108009

Xuan Ma, Jenný Brynjarsdóttir, Thomas LaFramboise

A double Pólya-Gamma data augmentation scheme is developed for posterior sampling from a Bayesian hierarchical model of total and categorical count data. The scheme applies to a Negative Binomial - Binomial (NBB) hierarchical regression model with logit links and normal priors on regression coefficients. The approach is shown to be very efficient and in most cases out-performs the Stan program. The hierarchical modeling framework and the Pólya-Gamma data augmentation scheme are applied to human mitochondrial DNA data.

本文提出了一种双 Pólya-Gamma 数据扩增方案，用于从总体和分类计数数据的贝叶斯分层模型中进行后验采样。该方案适用于带有对数链接和回归系数正态先验的负二项-二项（NBB）分层回归模型。结果表明，该方法非常高效，在大多数情况下都优于 Stan 程序。分层建模框架和 Pólya-Gamma 数据增强方案被应用于人类线粒体 DNA 数据。

引用次数: 0

Strong orthogonal Latin hypercubes for computer experiments 用于计算机实验的强正交拉丁超立方体

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-06-20 DOI: 10.1016/j.csda.2024.107999

Chunyan Wang , Dennis K.J. Lin

Orthogonal Latin hypercubes are widely used for computer experiments. They achieve both orthogonality and the maximum one-dimensional stratification property. When two-factor (and higher-order) interactions are active, two- and three-dimensional stratifications are also important. Unfortunately, little is known about orthogonal Latin hypercubes with good two (and higher)–dimensional stratification properties. A method is proposed for constructing a new class of orthogonal Latin hypercubes whose columns can be partitioned into groups, such that the columns from different groups maintain two- and three-dimensional stratification properties. The proposed designs perform well under almost all popular criteria (e.g., the orthogonality, stratification, and maximin distance criterion). They are the most ideal designs for computer experiments. The construction method can be straightforward to implement, and the relevant theoretical supports are well established. The proposed strong orthogonal Latin hypercubes are tabulated for practical needs.

正交拉丁超立方体被广泛用于计算机实验。它们既具有正交性，又具有最大一维分层特性。当双因素（和高阶）相互作用活跃时，二维和三维分层也很重要。遗憾的是，人们对具有良好二维（和更高维）分层特性的正交拉丁超立方体知之甚少。本文提出了一种方法，用于构建一类新的正交拉丁超立方体，其列可以分成若干组，从而使来自不同组的列保持二维和三维分层特性。所提出的设计在几乎所有常用标准（如正交性标准、分层标准和最大距离标准）下都表现良好。它们是最理想的计算机实验设计。它们的构建方法简单易行，相关的理论支持也已确立。为满足实际需要，现将所提出的强正交拉丁超立方体列成表格。

引用次数: 0

Nonnegative GARCH-type models with conditional Gamma distributions and their applications 具有条件伽马分布的非负 GARCH 型模型及其应用

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-06-13 DOI: 10.1016/j.csda.2024.108006

Eunju Hwang, ChanHyeok Jeon

Most of real data are characterized by positive, asymmetric and skewed distributions of various shapes. Modelling and forecasting of such data are addressed by proposing nonnegative conditional heteroscedastic time series models with Gamma distributions. Three types of time-varying parameters of Gamma distributions are adopted to construct the nonnegative GARCH models. A condition for the existence of a stationary Gamma-GARCH model is given. Parameter estimates are discussed via maximum likelihood estimation (MLE) method. A Monte-Carlo study is conducted to illustrate sample paths of the proposed models and to see finite-sample validity of the MLEs, as well as to evaluate model diagnostics using standardized Pearson residuals. Furthermore, out-of-sample forecasting analysis is performed to compute forecasting accuracy measures. Applications to oil price and Bitcoin data are given, respectively.

大多数真实数据都具有正分布、非对称分布和各种形状的倾斜分布。针对这类数据的建模和预测，提出了伽玛分布的非负条件异方差时间序列模型。在构建非负 GARCH 模型时，采用了 Gamma 分布的三种时变参数。给出了静态 Gamma-GARCH 模型的存在条件。通过最大似然估计（MLE）方法讨论了参数估计。进行了蒙特卡洛研究，以说明所提模型的样本路径，了解 MLE 的有限样本有效性，并使用标准化皮尔逊残差对模型诊断进行评估。此外，还进行了样本外预测分析，以计算预测准确度。分别给出了石油价格和比特币数据的应用。

引用次数: 0

Conditional mean dimension reduction for tensor time series 张量时间序列的条件均值降维

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-06-11 DOI: 10.1016/j.csda.2024.107998

Chung Eun Lee , Xin Zhang

The dimension reduction problem for a stationary tensor time series is addressed. The goal is to remove linear combinations of the tensor time series that are mean independent of the past, without imposing any parametric models or distributional assumptions. To achieve this goal, a new metric called cumulative tensor martingale difference divergence is introduced and its theoretical properties are studied. Unlike existing methods, the proposed approach achieves dimension reduction by estimating a distinctive subspace that can fully retain the conditional mean information. By focusing on the conditional mean, the proposed dimension reduction method is potentially more accurate in prediction. The method can be viewed as a factor model-based approach that extends the existing techniques for estimating central subspace or central mean subspace in vector time series. The effectiveness of the proposed method is illustrated by extensive simulations and two real-world data applications.

本文探讨了静态张量时间序列的降维问题。其目标是在不施加任何参数模型或分布假设的情况下，去除张量时间序列中与过去均值无关的线性组合。为实现这一目标，引入了一种称为累积张量马汀尔差分发散的新指标，并对其理论特性进行了研究。与现有方法不同的是，所提出的方法通过估计一个能完全保留条件均值信息的独特子空间来实现降维。通过关注条件均值，所提出的降维方法在预测方面可能更加准确。该方法可视为一种基于因子模型的方法，它扩展了现有的矢量时间序列中心子空间或中心均值子空间估计技术。大量模拟和两个实际数据应用说明了所提方法的有效性。

引用次数: 0

Study of imputation procedures for nonparametric density estimation based on missing censored lifetimes 基于缺失普查寿命的非参数密度估计的估算程序研究

IF 1.8 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-06-06 DOI: 10.1016/j.csda.2024.107994

Sam Efromovich, Lirit Fuksman

Imputation is a standard procedure in dealing with missing data and there are many competing imputation methods. It is proposed to analyze imputation procedures via comparison with a benchmark developed by the asymptotic theory. Considered model is nonparametric density estimation of the missing right censored lifetime of interest. This model is of a special interest for understanding imputation because each underlying observation is the pair of censored lifetime and indicator of censoring. The latter creates a number of interesting scenarios and challenges for imputation when best methods may or may not be applicable. Further, the theory sheds light on why the effect of imputation depends on an underlying density. The methodology is tested on real life datasets and via intensive simulations. Data and R code are provided.

估算是处理缺失数据的标准程序，有许多相互竞争的估算方法。建议通过与渐近理论开发的基准进行比较来分析估算程序。所考虑的模型是对缺失的右删失寿命进行非参数密度估计。该模型对于理解估算具有特殊意义，因为每个基础观测值都是一对删减寿命和删减指标。后者在最佳方法可能适用也可能不适用的情况下，为估算带来了许多有趣的情况和挑战。此外，该理论还揭示了为什么估算的效果取决于基础密度。该方法在实际数据集上并通过密集模拟进行了测试。提供数据和 R 代码。

引用次数: 0

Inference for high-dimensional linear expectile regression with de-biasing method 用去偏差法进行高维线性预期回归推断

IF 1.8 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-06-06 DOI: 10.1016/j.csda.2024.107997

Xiang Li , Yu-Ning Li , Li-Xin Zhang , Jun Zhao

The methodology for the inference problem in high-dimensional linear expectile regression is developed. By transforming the expectile loss into a weighted-least-squares form and applying a de-biasing strategy, Wald-type tests for multiple constraints within a regularized framework are established. An estimator for the pseudo-inverse of the generalized Hessian matrix in high dimension is constructed using general amenable regularizers, including Lasso and SCAD, with its consistency demonstrated through a novel proof technique. Simulation studies and real data applications demonstrate the efficacy of the proposed test statistic in both homoscedastic and heteroscedastic scenarios.

本文提出了解决高维线性期望回归推理问题的方法。通过将期望损失转化为加权最小二乘法形式并应用去偏置策略，建立了正则化框架内多重约束的沃尔德类型检验。利用包括 Lasso 和 SCAD 在内的通用可正则化器，构建了高维广义 Hessian 矩阵伪逆估计器，并通过新颖的证明技术展示了其一致性。模拟研究和实际数据应用证明了所提出的检验统计量在同弹性和异弹性情况下的有效性。

引用次数: 0

Latent event history models for quasi-reaction systems 准反应系统的潜在事件历史模型

IF 1.8 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-05-31 DOI: 10.1016/j.csda.2024.107996

Matteo Framba , Veronica Vinciotti , Ernst C. Wit

Various processes, such as cell differentiation and disease spreading, can be modelled as quasi-reaction systems of particles using stochastic differential equations. The existing Local Linear Approximation (LLA) method infers the parameters driving these systems from measurements of particle abundances over time. While dense observations of the process in time should in theory improve parameter estimation, LLA fails in these situations due to numerical instability. Defining a latent event history model of the underlying quasi-reaction system resolves this problem. A computationally efficient Expectation-Maximization algorithm is proposed for parameter estimation, incorporating an extended Kalman filter for evaluating the latent reactions. A simulation study demonstrates the method's performance and highlights the settings where it is particularly advantageous compared to the existing LLA approaches. An illustration of the method applied to the diffusion of COVID-19 in Italy is presented.

细胞分化和疾病传播等各种过程都可以用随机微分方程模拟为粒子的准反应系统。现有的局部线性近似（LLA）方法是通过测量粒子随时间变化的丰度来推断驱动这些系统的参数。虽然从理论上讲，对时间过程的密集观测应能改进参数估计，但由于数值不稳定性，LLA 在这些情况下会失效。定义基本准反应系统的潜在事件历史模型可以解决这个问题。我们提出了一种计算效率高的期望最大化算法来进行参数估计，该算法结合了用于评估潜在反应的扩展卡尔曼滤波器。一项模拟研究证明了该方法的性能，并强调了与现有的 LLA 方法相比，该方法在哪些情况下更具优势。该方法应用于 COVID-19 在意大利的传播情况进行了说明。

{"title":"Latent event history models for quasi-reaction systems","authors":"Matteo Framba , Veronica Vinciotti , Ernst C. Wit","doi":"10.1016/j.csda.2024.107996","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107996","url":null,"abstract":"<div><p>Various processes, such as cell differentiation and disease spreading, can be modelled as quasi-reaction systems of particles using stochastic differential equations. The existing Local Linear Approximation (LLA) method infers the parameters driving these systems from measurements of particle abundances over time. While dense observations of the process in time should in theory improve parameter estimation, LLA fails in these situations due to numerical instability. Defining a latent event history model of the underlying quasi-reaction system resolves this problem. A computationally efficient Expectation-Maximization algorithm is proposed for parameter estimation, incorporating an extended Kalman filter for evaluating the latent reactions. A simulation study demonstrates the method's performance and highlights the settings where it is particularly advantageous compared to the existing LLA approaches. An illustration of the method applied to the diffusion of COVID-19 in Italy is presented.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107996"},"PeriodicalIF":1.8,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016794732400080X/pdfft?md5=524e7377774b8a5df2e3a994373e6394&pid=1-s2.0-S016794732400080X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Medoid splits for efficient random forests in metric spaces 度量空间中高效随机森林的 Medoid 分裂

IF 1.5 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-05-31 DOI: 10.1016/j.csda.2024.107995

Matthieu Bulté , Helle Sørensen

An adaptation of the random forest algorithm for Fréchet regression is revisited, addressing the challenge of regression with random objects in metric spaces. To overcome the limitations of previous approaches, a new splitting rule is introduced, substituting the computationally expensive Fréchet means with a medoid-based approach. The asymptotic equivalence of this method to Fréchet mean-based procedures is demonstrated, along with the consistency of the associated regression estimator. This approach provides a sound theoretical framework and a more efficient computational solution to Fréchet regression, broadening its application to non-standard data types and complex use cases.

本文重新审视了用于弗雷谢回归的随机森林算法的改编，以解决在度量空间中使用随机对象进行回归的难题。为了克服以往方法的局限性，本文引入了一种新的分割规则，用基于中间值的方法取代了计算成本高昂的弗雷谢特均值法。该方法与基于弗雷谢特均值的程序的渐近等价性以及相关回归估计器的一致性得到了证明。这种方法为弗雷谢特回归提供了合理的理论框架和更有效的计算解决方案，将其应用范围扩大到非标准数据类型和复杂的使用案例。

引用次数: 0

Consistent skinny Gibbs in probit regression 概率回归中的一致瘦吉布斯

IF 1.8 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-05-27 DOI: 10.1016/j.csda.2024.107993

Jiarong Ouyang, Xuan Cao

Spike and slab priors have emerged as effective and computationally scalable tools for Bayesian variable selection in high-dimensional linear regression. However, the crucial model selection consistency and efficient computational strategies using spike and slab priors in probit regression have rarely been investigated. A hierarchical probit model with continuous spike and slab priors over regression coefficients is considered, and a highly scalable Gibbs sampler with a computational complexity that grows only linearly in the dimension of predictors is proposed. Specifically, the “Skinny Gibbs” algorithm is adapted to the setting of probit and negative binomial regression and model selection consistency for the proposed method under probit model is established, when the number of covariates is allowed to grow much larger than the sample size. Through simulation studies, the method is shown to achieve superior empirical performance compared with other state-of-the art methods. Gene expression data from 51 asthmatic and 44 non-asthmatic samples are analyzed and the performance for predicting asthma using the proposed approach is compared with existing approaches.

尖峰和板块先验已成为高维线性回归中贝叶斯变量选择的有效且可扩展计算的工具。然而，在 probit 回归中使用尖峰和板块先验的关键模型选择一致性和高效计算策略却鲜有研究。本文考虑了对回归系数具有连续尖峰和板块先验的分层 probit 模型，并提出了一种具有高度可扩展性的吉布斯采样器，其计算复杂度仅随预测维度线性增长。具体地说，"Skinny Gibbs "算法适用于 probit 和负二项回归，当协变量的数量远大于样本量时，建立了拟议方法在 probit 模型下的模型选择一致性。通过模拟研究表明，与其他最先进的方法相比，该方法具有更优越的经验性能。对 51 个哮喘样本和 44 个非哮喘样本的基因表达数据进行了分析，并将拟议方法与现有方法预测哮喘的性能进行了比较。

{"title":"Consistent skinny Gibbs in probit regression","authors":"Jiarong Ouyang, Xuan Cao","doi":"10.1016/j.csda.2024.107993","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107993","url":null,"abstract":"<div><p>Spike and slab priors have emerged as effective and computationally scalable tools for Bayesian variable selection in high-dimensional linear regression. However, the crucial model selection consistency and efficient computational strategies using spike and slab priors in probit regression have rarely been investigated. A hierarchical probit model with continuous spike and slab priors over regression coefficients is considered, and a highly scalable Gibbs sampler with a computational complexity that grows only linearly in the dimension of predictors is proposed. Specifically, the “Skinny Gibbs” algorithm is adapted to the setting of probit and negative binomial regression and model selection consistency for the proposed method under probit model is established, when the number of covariates is allowed to grow much larger than the sample size. Through simulation studies, the method is shown to achieve superior empirical performance compared with other state-of-the art methods. Gene expression data from 51 asthmatic and 44 non-asthmatic samples are analyzed and the performance for predicting asthma using the proposed approach is compared with existing approaches.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107993"},"PeriodicalIF":1.8,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Online bootstrap inference for the geometric median 几何中值的在线引导推断

IF 1.8 3区数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computational Statistics & Data Analysis

Pub Date : 2024-05-23 DOI: 10.1016/j.csda.2024.107992

Guanghui Cheng , Qiang Xiong , Ruitao Lin

In real-world applications, the geometric median is a natural quantity to consider for robust inference of location or central tendency, particularly when dealing with non-standard or irregular data distributions. An innovative online bootstrap inference algorithm, using the averaged nonlinear stochastic gradient algorithm, is proposed to make statistical inference about the geometric median from massive datasets. The method is computationally fast and memory-friendly, and it is easy to update as new data is received sequentially. The validity of the proposed online bootstrap inference is theoretically justified. Simulation studies under a variety of scenarios are conducted to demonstrate its effectiveness and efficiency in terms of computation speed and memory usage. Additionally, the online inference procedure is applied to a large publicly available dataset for skin segmentation.

在现实世界的应用中，几何中值是稳健推断位置或中心倾向时需要考虑的一个自然量，尤其是在处理非标准或不规则数据分布时。本文提出了一种创新的在线引导推断算法，利用平均非线性随机梯度算法，从海量数据集中对几何中值进行统计推断。该方法计算速度快、内存友好，而且在连续收到新数据时易于更新。所提出的在线自举推断方法的有效性在理论上得到了证明。在各种情况下进行的仿真研究证明了该方法在计算速度和内存使用方面的有效性和效率。此外，在线推断程序还被应用于一个大型公开数据集的皮肤分割。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Computational Statistics & Data Analysis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀