首页 > 最新文献

Computational Statistics & Data Analysis最新文献

英文 中文
Bayesian modal regression based on mixture distributions 基于混合分布的贝叶斯模态回归
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-27 DOI: 10.1016/j.csda.2024.108012
Qingyang Liu, Xianzheng Huang, Ray Bai

Compared to mean regression and quantile regression, the literature on modal regression is very sparse. A unifying framework for Bayesian modal regression is proposed, based on a family of unimodal distributions indexed by the mode, along with other parameters that allow for flexible shapes and tail behaviors. Sufficient conditions for posterior propriety under an improper prior on the mode parameter are derived. Following prior elicitation, regression analysis of simulated data and datasets from several real-life applications are conducted. Besides drawing inference for covariate effects that are easy to interpret, prediction and model selection under the proposed Bayesian modal regression framework are also considered. Evidence from these analyses suggest that the proposed inference procedures are very robust to outliers, enabling one to discover interesting covariate effects missed by mean or median regression, and to construct much tighter prediction intervals than those from mean or median regression. Computer programs for implementing the proposed Bayesian modal regression are available at https://github.com/rh8liuqy/Bayesian_modal_regression.

与均值回归和量值回归相比,模态回归的文献非常稀少。本文提出了贝叶斯模态回归的统一框架,该框架基于以模态为索引的单模态分布系列,以及允许灵活形状和尾部行为的其他参数。推导出了在模态参数不恰当先验条件下后验适当性的充分条件。在得出先验之后,对模拟数据和来自若干实际应用的数据集进行了回归分析。除了得出易于解释的协变量效应推论外,还考虑了在所提出的贝叶斯模态回归框架下的预测和模型选择。这些分析的证据表明,所提出的推断程序对异常值具有很强的鲁棒性,使人们能够发现平均值或中位数回归所遗漏的有趣的协变量效应,并构建比平均值或中位数回归更为严格的预测区间。实现贝叶斯模态回归的计算机程序可在 https://github.com/rh8liuqy/Bayesian_modal_regression 上获取。
{"title":"Bayesian modal regression based on mixture distributions","authors":"Qingyang Liu,&nbsp;Xianzheng Huang,&nbsp;Ray Bai","doi":"10.1016/j.csda.2024.108012","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108012","url":null,"abstract":"<div><p>Compared to mean regression and quantile regression, the literature on modal regression is very sparse. A unifying framework for Bayesian modal regression is proposed, based on a family of unimodal distributions indexed by the mode, along with other parameters that allow for flexible shapes and tail behaviors. Sufficient conditions for posterior propriety under an improper prior on the mode parameter are derived. Following prior elicitation, regression analysis of simulated data and datasets from several real-life applications are conducted. Besides drawing inference for covariate effects that are easy to interpret, prediction and model selection under the proposed Bayesian modal regression framework are also considered. Evidence from these analyses suggest that the proposed inference procedures are very robust to outliers, enabling one to discover interesting covariate effects missed by mean or median regression, and to construct much tighter prediction intervals than those from mean or median regression. Computer programs for implementing the proposed Bayesian modal regression are available at <span>https://github.com/rh8liuqy/Bayesian_modal_regression</span><svg><path></path></svg>.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108012"},"PeriodicalIF":1.5,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A nonparametrically corrected likelihood for Bayesian spectral analysis of multivariate time series 多变量时间序列贝叶斯谱分析的非参数校正似然法
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-25 DOI: 10.1016/j.csda.2024.108010
Yixuan Liu , Claudia Kirch , Jeong Eun Lee , Renate Meyer

A novel approach to Bayesian nonparametric spectral analysis of stationary multivariate time series is presented. Starting with a parametric vector-autoregressive model, the parametric likelihood is nonparametrically adjusted in the frequency domain to account for potential deviations from parametric assumptions. A proof of mutual contiguity of the nonparametrically corrected likelihood, the multivariate Whittle likelihood approximation and the exact likelihood for Gaussian time series is given. A multivariate extension of the nonparametric Bernstein-Dirichlet process prior for univariate spectral densities to the space of Hermitian positive definite spectral density matrices is specified directly on the correction matrices. An infinite series representation of this prior is then used to develop a Markov chain Monte Carlo algorithm to sample from the posterior distribution. The code is made publicly available for ease of use and reproducibility. With this novel approach, a generalisation of the multivariate Whittle-likelihood-based method of Meier et al. (2020) as well as an extension of the nonparametrically corrected likelihood for univariate stationary time series of Kirch et al. (2019) to the multivariate case is presented. It is demonstrated that the nonparametrically corrected likelihood combines the efficiencies of a parametric with the robustness of a nonparametric model. Its numerical accuracy is illustrated in a comprehensive simulation study. Its practical advantages are illustrated by a spectral analysis of two environmental time series data sets: a bivariate time series of the Southern Oscillation Index and fish recruitment and a multivariate time series of windspeed data at six locations in California.

本文提出了一种对静态多变量时间序列进行贝叶斯非参数谱分析的新方法。从参数向量自回归模型开始,在频域对参数似然进行非参数调整,以考虑参数假设的潜在偏差。给出了非参数修正似然、多变量惠特尔似然近似和高斯时间序列精确似然的相互连续性证明。将用于单变量谱密度的非参数伯恩斯坦-德里赫特过程先验的多变量扩展到赫米特正定谱密度矩阵空间,并直接在校正矩阵上指定。然后使用该先验的无穷级数表示来开发马尔科夫链蒙特卡罗算法,以便从后验分布中采样。为了便于使用和复制,我们公开了代码。通过这种新方法,介绍了 Meier 等人(2020 年)基于惠特尔似然法的多变量方法的一般化,以及 Kirch 等人(2019 年)单变量静态时间序列非参数校正似然法在多变量情况下的扩展。研究表明,非参数校正似然结合了参数模型的效率和非参数模型的稳健性。综合模拟研究说明了其数值精确性。通过对两个环境时间序列数据集(南方涛动指数和鱼类繁殖的双变量时间序列以及加利福尼亚州六个地点风速数据的多变量时间序列)进行频谱分析,说明了该模型的实际优势。
{"title":"A nonparametrically corrected likelihood for Bayesian spectral analysis of multivariate time series","authors":"Yixuan Liu ,&nbsp;Claudia Kirch ,&nbsp;Jeong Eun Lee ,&nbsp;Renate Meyer","doi":"10.1016/j.csda.2024.108010","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108010","url":null,"abstract":"<div><p>A novel approach to Bayesian nonparametric spectral analysis of stationary multivariate time series is presented. Starting with a parametric vector-autoregressive model, the parametric likelihood is nonparametrically adjusted in the frequency domain to account for potential deviations from parametric assumptions. A proof of mutual contiguity of the nonparametrically corrected likelihood, the multivariate Whittle likelihood approximation and the exact likelihood for Gaussian time series is given. A multivariate extension of the nonparametric Bernstein-Dirichlet process prior for univariate spectral densities to the space of Hermitian positive definite spectral density matrices is specified directly on the correction matrices. An infinite series representation of this prior is then used to develop a Markov chain Monte Carlo algorithm to sample from the posterior distribution. The code is made publicly available for ease of use and reproducibility. With this novel approach, a generalisation of the multivariate Whittle-likelihood-based method of <span>Meier et al. (2020)</span> as well as an extension of the nonparametrically corrected likelihood for univariate stationary time series of <span>Kirch et al. (2019)</span> to the multivariate case is presented. It is demonstrated that the nonparametrically corrected likelihood combines the efficiencies of a parametric with the robustness of a nonparametric model. Its numerical accuracy is illustrated in a comprehensive simulation study. Its practical advantages are illustrated by a spectral analysis of two environmental time series data sets: a bivariate time series of the Southern Oscillation Index and fish recruitment and a multivariate time series of windspeed data at six locations in California.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108010"},"PeriodicalIF":1.5,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016794732400094X/pdfft?md5=4194de676b76fa0193f3ea88ff4e7bdc&pid=1-s2.0-S016794732400094X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An embedded diachronic sense change model with a case study from ancient Greek 以古希腊文为例的嵌入式非同步意义变化模型
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-21 DOI: 10.1016/j.csda.2024.108011
Schyan Zafar, Geoff K. Nicholls

Word meanings change over time, and word senses evolve, emerge or die out in the process. For ancient languages, where the corpora are often small and sparse, modelling such changes accurately proves challenging, and quantifying uncertainty in sense-change estimates consequently becomes important. GASC (Genre-Aware Semantic Change) and DiSC (Diachronic Sense Change) are existing generative models that have been used to analyse sense change for target words from an ancient Greek text corpus, using unsupervised learning without the help of any pre-training. These models represent the senses of a given target word such as “kosmos” (meaning decoration, order or world) as distributions over context words, and sense prevalence as a distribution over senses. The models are fitted using Markov Chain Monte Carlo (MCMC) methods to measure temporal changes in these representations. This paper introduces EDiSC, an Embedded DiSC model, which combines word embeddings with DiSC to provide superior model performance. It is shown empirically that EDiSC offers improved predictive accuracy, ground-truth recovery and uncertainty quantification, as well as better sampling efficiency and scalability properties with MCMC methods. The challenges of fitting these models are also discussed.

词义会随着时间的推移而发生变化,词义也会在这一过程中演变、出现或消亡。对于古代语言来说,语料库通常较小且稀疏,要准确模拟这种变化具有挑战性,因此量化意义变化估计值的不确定性变得非常重要。GASC(Genre-Aware Semantic Change,体裁感知语义变化)和 DiSC(Diachronic Sense Change,同步语义变化)是现有的生成模型,用于分析古希腊文本语料库中目标词的语义变化,采用无监督学习,无需任何预训练。这些模型将给定目标词(如 "kosmos",意为装饰、秩序或世界)的词义表示为上下文词的分布,将词义流行度表示为词义的分布。使用马尔可夫链蒙特卡洛(MCMC)方法对模型进行拟合,以测量这些表征的时间变化。本文介绍的 EDiSC 是一种嵌入式 DiSC 模型,它将词嵌入与 DiSC 结合在一起,从而提供卓越的模型性能。经验表明,与 MCMC 方法相比,EDiSC 在预测准确性、地面实况恢复和不确定性量化方面都有提高,而且具有更好的采样效率和可扩展性。此外,还讨论了拟合这些模型所面临的挑战。
{"title":"An embedded diachronic sense change model with a case study from ancient Greek","authors":"Schyan Zafar,&nbsp;Geoff K. Nicholls","doi":"10.1016/j.csda.2024.108011","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108011","url":null,"abstract":"<div><p>Word meanings change over time, and word <em>senses</em> evolve, emerge or die out in the process. For ancient languages, where the corpora are often small and sparse, modelling such changes accurately proves challenging, and quantifying uncertainty in sense-change estimates consequently becomes important. GASC (Genre-Aware Semantic Change) and DiSC (Diachronic Sense Change) are existing generative models that have been used to analyse sense change for target words from an ancient Greek text corpus, using unsupervised learning without the help of any pre-training. These models represent the senses of a given target word such as “kosmos” (meaning decoration, order or world) as distributions over context words, and sense prevalence as a distribution over senses. The models are fitted using Markov Chain Monte Carlo (MCMC) methods to measure temporal changes in these representations. This paper introduces EDiSC, an Embedded DiSC model, which combines word embeddings with DiSC to provide superior model performance. It is shown empirically that EDiSC offers improved predictive accuracy, ground-truth recovery and uncertainty quantification, as well as better sampling efficiency and scalability properties with MCMC methods. The challenges of fitting these models are also discussed.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108011"},"PeriodicalIF":1.5,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000951/pdfft?md5=12930590074b9c3008e514576f2c4ba0&pid=1-s2.0-S0167947324000951-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A double Pólya-Gamma data augmentation scheme for a hierarchical Negative Binomial - Binomial data model 分层负二项-二项数据模型的双 Pólya-Gamma 数据扩充方案
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-20 DOI: 10.1016/j.csda.2024.108009
Xuan Ma, Jenný Brynjarsdóttir, Thomas LaFramboise

A double Pólya-Gamma data augmentation scheme is developed for posterior sampling from a Bayesian hierarchical model of total and categorical count data. The scheme applies to a Negative Binomial - Binomial (NBB) hierarchical regression model with logit links and normal priors on regression coefficients. The approach is shown to be very efficient and in most cases out-performs the Stan program. The hierarchical modeling framework and the Pólya-Gamma data augmentation scheme are applied to human mitochondrial DNA data.

本文提出了一种双 Pólya-Gamma 数据扩增方案,用于从总体和分类计数数据的贝叶斯分层模型中进行后验采样。该方案适用于带有对数链接和回归系数正态先验的负二项-二项(NBB)分层回归模型。结果表明,该方法非常高效,在大多数情况下都优于 Stan 程序。分层建模框架和 Pólya-Gamma 数据增强方案被应用于人类线粒体 DNA 数据。
{"title":"A double Pólya-Gamma data augmentation scheme for a hierarchical Negative Binomial - Binomial data model","authors":"Xuan Ma,&nbsp;Jenný Brynjarsdóttir,&nbsp;Thomas LaFramboise","doi":"10.1016/j.csda.2024.108009","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108009","url":null,"abstract":"<div><p>A double Pólya-Gamma data augmentation scheme is developed for posterior sampling from a Bayesian hierarchical model of total and categorical count data. The scheme applies to a Negative Binomial - Binomial (NBB) hierarchical regression model with logit links and normal priors on regression coefficients. The approach is shown to be very efficient and in most cases out-performs the Stan program. The hierarchical modeling framework and the Pólya-Gamma data augmentation scheme are applied to human mitochondrial DNA data.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108009"},"PeriodicalIF":1.5,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000938/pdfft?md5=5e06b3420d4ee7efb587c1f231e8d551&pid=1-s2.0-S0167947324000938-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strong orthogonal Latin hypercubes for computer experiments 用于计算机实验的强正交拉丁超立方体
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-20 DOI: 10.1016/j.csda.2024.107999
Chunyan Wang , Dennis K.J. Lin

Orthogonal Latin hypercubes are widely used for computer experiments. They achieve both orthogonality and the maximum one-dimensional stratification property. When two-factor (and higher-order) interactions are active, two- and three-dimensional stratifications are also important. Unfortunately, little is known about orthogonal Latin hypercubes with good two (and higher)–dimensional stratification properties. A method is proposed for constructing a new class of orthogonal Latin hypercubes whose columns can be partitioned into groups, such that the columns from different groups maintain two- and three-dimensional stratification properties. The proposed designs perform well under almost all popular criteria (e.g., the orthogonality, stratification, and maximin distance criterion). They are the most ideal designs for computer experiments. The construction method can be straightforward to implement, and the relevant theoretical supports are well established. The proposed strong orthogonal Latin hypercubes are tabulated for practical needs.

正交拉丁超立方体被广泛用于计算机实验。它们既具有正交性,又具有最大一维分层特性。当双因素(和高阶)相互作用活跃时,二维和三维分层也很重要。遗憾的是,人们对具有良好二维(和更高维)分层特性的正交拉丁超立方体知之甚少。本文提出了一种方法,用于构建一类新的正交拉丁超立方体,其列可以分成若干组,从而使来自不同组的列保持二维和三维分层特性。所提出的设计在几乎所有常用标准(如正交性标准、分层标准和最大距离标准)下都表现良好。它们是最理想的计算机实验设计。它们的构建方法简单易行,相关的理论支持也已确立。为满足实际需要,现将所提出的强正交拉丁超立方体列成表格。
{"title":"Strong orthogonal Latin hypercubes for computer experiments","authors":"Chunyan Wang ,&nbsp;Dennis K.J. Lin","doi":"10.1016/j.csda.2024.107999","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107999","url":null,"abstract":"<div><p>Orthogonal Latin hypercubes are widely used for computer experiments. They achieve both orthogonality and the maximum one-dimensional stratification property. When two-factor (and higher-order) interactions are active, two- and three-dimensional stratifications are also important. Unfortunately, little is known about orthogonal Latin hypercubes with good two (and higher)–dimensional stratification properties. A method is proposed for constructing a new class of orthogonal Latin hypercubes whose columns can be partitioned into groups, such that the columns from different groups maintain two- and three-dimensional stratification properties. The proposed designs perform well under almost all popular criteria (e.g., the orthogonality, stratification, and maximin distance criterion). They are the most ideal designs for computer experiments. The construction method can be straightforward to implement, and the relevant theoretical supports are well established. The proposed strong orthogonal Latin hypercubes are tabulated for practical needs.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107999"},"PeriodicalIF":1.5,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nonnegative GARCH-type models with conditional Gamma distributions and their applications 具有条件伽马分布的非负 GARCH 型模型及其应用
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-13 DOI: 10.1016/j.csda.2024.108006
Eunju Hwang, ChanHyeok Jeon

Most of real data are characterized by positive, asymmetric and skewed distributions of various shapes. Modelling and forecasting of such data are addressed by proposing nonnegative conditional heteroscedastic time series models with Gamma distributions. Three types of time-varying parameters of Gamma distributions are adopted to construct the nonnegative GARCH models. A condition for the existence of a stationary Gamma-GARCH model is given. Parameter estimates are discussed via maximum likelihood estimation (MLE) method. A Monte-Carlo study is conducted to illustrate sample paths of the proposed models and to see finite-sample validity of the MLEs, as well as to evaluate model diagnostics using standardized Pearson residuals. Furthermore, out-of-sample forecasting analysis is performed to compute forecasting accuracy measures. Applications to oil price and Bitcoin data are given, respectively.

大多数真实数据都具有正分布、非对称分布和各种形状的倾斜分布。针对这类数据的建模和预测,提出了伽玛分布的非负条件异方差时间序列模型。在构建非负 GARCH 模型时,采用了 Gamma 分布的三种时变参数。给出了静态 Gamma-GARCH 模型的存在条件。通过最大似然估计(MLE)方法讨论了参数估计。进行了蒙特卡洛研究,以说明所提模型的样本路径,了解 MLE 的有限样本有效性,并使用标准化皮尔逊残差对模型诊断进行评估。此外,还进行了样本外预测分析,以计算预测准确度。分别给出了石油价格和比特币数据的应用。
{"title":"Nonnegative GARCH-type models with conditional Gamma distributions and their applications","authors":"Eunju Hwang,&nbsp;ChanHyeok Jeon","doi":"10.1016/j.csda.2024.108006","DOIUrl":"10.1016/j.csda.2024.108006","url":null,"abstract":"<div><p>Most of real data are characterized by positive, asymmetric and skewed distributions of various shapes. Modelling and forecasting of such data are addressed by proposing nonnegative conditional heteroscedastic time series models with Gamma distributions. Three types of time-varying parameters of Gamma distributions are adopted to construct the nonnegative GARCH models. A condition for the existence of a stationary Gamma-GARCH model is given. Parameter estimates are discussed via maximum likelihood estimation (MLE) method. A Monte-Carlo study is conducted to illustrate sample paths of the proposed models and to see finite-sample validity of the MLEs, as well as to evaluate model diagnostics using standardized Pearson residuals. Furthermore, out-of-sample forecasting analysis is performed to compute forecasting accuracy measures. Applications to oil price and Bitcoin data are given, respectively.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 108006"},"PeriodicalIF":1.5,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141395917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional mean dimension reduction for tensor time series 张量时间序列的条件均值降维
IF 1.5 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-11 DOI: 10.1016/j.csda.2024.107998
Chung Eun Lee , Xin Zhang

The dimension reduction problem for a stationary tensor time series is addressed. The goal is to remove linear combinations of the tensor time series that are mean independent of the past, without imposing any parametric models or distributional assumptions. To achieve this goal, a new metric called cumulative tensor martingale difference divergence is introduced and its theoretical properties are studied. Unlike existing methods, the proposed approach achieves dimension reduction by estimating a distinctive subspace that can fully retain the conditional mean information. By focusing on the conditional mean, the proposed dimension reduction method is potentially more accurate in prediction. The method can be viewed as a factor model-based approach that extends the existing techniques for estimating central subspace or central mean subspace in vector time series. The effectiveness of the proposed method is illustrated by extensive simulations and two real-world data applications.

本文探讨了静态张量时间序列的降维问题。其目标是在不施加任何参数模型或分布假设的情况下,去除张量时间序列中与过去均值无关的线性组合。为实现这一目标,引入了一种称为累积张量马汀尔差分发散的新指标,并对其理论特性进行了研究。与现有方法不同的是,所提出的方法通过估计一个能完全保留条件均值信息的独特子空间来实现降维。通过关注条件均值,所提出的降维方法在预测方面可能更加准确。该方法可视为一种基于因子模型的方法,它扩展了现有的矢量时间序列中心子空间或中心均值子空间估计技术。大量模拟和两个实际数据应用说明了所提方法的有效性。
{"title":"Conditional mean dimension reduction for tensor time series","authors":"Chung Eun Lee ,&nbsp;Xin Zhang","doi":"10.1016/j.csda.2024.107998","DOIUrl":"10.1016/j.csda.2024.107998","url":null,"abstract":"<div><p>The dimension reduction problem for a stationary tensor time series is addressed. The goal is to remove linear combinations of the tensor time series that are mean independent of the past, without imposing any parametric models or distributional assumptions. To achieve this goal, a new metric called cumulative tensor martingale difference divergence is introduced and its theoretical properties are studied. Unlike existing methods, the proposed approach achieves dimension reduction by estimating a distinctive subspace that can fully retain the conditional mean information. By focusing on the conditional mean, the proposed dimension reduction method is potentially more accurate in prediction. The method can be viewed as a factor model-based approach that extends the existing techniques for estimating central subspace or central mean subspace in vector time series. The effectiveness of the proposed method is illustrated by extensive simulations and two real-world data applications.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 107998"},"PeriodicalIF":1.5,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141389420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Study of imputation procedures for nonparametric density estimation based on missing censored lifetimes 基于缺失普查寿命的非参数密度估计的估算程序研究
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-06 DOI: 10.1016/j.csda.2024.107994
Sam Efromovich, Lirit Fuksman

Imputation is a standard procedure in dealing with missing data and there are many competing imputation methods. It is proposed to analyze imputation procedures via comparison with a benchmark developed by the asymptotic theory. Considered model is nonparametric density estimation of the missing right censored lifetime of interest. This model is of a special interest for understanding imputation because each underlying observation is the pair of censored lifetime and indicator of censoring. The latter creates a number of interesting scenarios and challenges for imputation when best methods may or may not be applicable. Further, the theory sheds light on why the effect of imputation depends on an underlying density. The methodology is tested on real life datasets and via intensive simulations. Data and R code are provided.

估算是处理缺失数据的标准程序,有许多相互竞争的估算方法。建议通过与渐近理论开发的基准进行比较来分析估算程序。所考虑的模型是对缺失的右删失寿命进行非参数密度估计。该模型对于理解估算具有特殊意义,因为每个基础观测值都是一对删减寿命和删减指标。后者在最佳方法可能适用也可能不适用的情况下,为估算带来了许多有趣的情况和挑战。此外,该理论还揭示了为什么估算的效果取决于基础密度。该方法在实际数据集上并通过密集模拟进行了测试。提供数据和 R 代码。
{"title":"Study of imputation procedures for nonparametric density estimation based on missing censored lifetimes","authors":"Sam Efromovich,&nbsp;Lirit Fuksman","doi":"10.1016/j.csda.2024.107994","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107994","url":null,"abstract":"<div><p>Imputation is a standard procedure in dealing with missing data and there are many competing imputation methods. It is proposed to analyze imputation procedures via comparison with a benchmark developed by the asymptotic theory. Considered model is nonparametric density estimation of the missing right censored lifetime of interest. This model is of a special interest for understanding imputation because each underlying observation is the pair of censored lifetime and indicator of censoring. The latter creates a number of interesting scenarios and challenges for imputation when best methods may or may not be applicable. Further, the theory sheds light on why the effect of imputation depends on an underlying density. The methodology is tested on real life datasets and via intensive simulations. Data and R code are provided.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107994"},"PeriodicalIF":1.8,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141308344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference for high-dimensional linear expectile regression with de-biasing method 用去偏差法进行高维线性预期回归推断
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-06-06 DOI: 10.1016/j.csda.2024.107997
Xiang Li , Yu-Ning Li , Li-Xin Zhang , Jun Zhao

The methodology for the inference problem in high-dimensional linear expectile regression is developed. By transforming the expectile loss into a weighted-least-squares form and applying a de-biasing strategy, Wald-type tests for multiple constraints within a regularized framework are established. An estimator for the pseudo-inverse of the generalized Hessian matrix in high dimension is constructed using general amenable regularizers, including Lasso and SCAD, with its consistency demonstrated through a novel proof technique. Simulation studies and real data applications demonstrate the efficacy of the proposed test statistic in both homoscedastic and heteroscedastic scenarios.

本文提出了解决高维线性期望回归推理问题的方法。通过将期望损失转化为加权最小二乘法形式并应用去偏置策略,建立了正则化框架内多重约束的沃尔德类型检验。利用包括 Lasso 和 SCAD 在内的通用可正则化器,构建了高维广义 Hessian 矩阵伪逆估计器,并通过新颖的证明技术展示了其一致性。模拟研究和实际数据应用证明了所提出的检验统计量在同弹性和异弹性情况下的有效性。
{"title":"Inference for high-dimensional linear expectile regression with de-biasing method","authors":"Xiang Li ,&nbsp;Yu-Ning Li ,&nbsp;Li-Xin Zhang ,&nbsp;Jun Zhao","doi":"10.1016/j.csda.2024.107997","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107997","url":null,"abstract":"<div><p>The methodology for the inference problem in high-dimensional linear expectile regression is developed. By transforming the expectile loss into a weighted-least-squares form and applying a de-biasing strategy, Wald-type tests for multiple constraints within a regularized framework are established. An estimator for the pseudo-inverse of the generalized Hessian matrix in high dimension is constructed using general amenable regularizers, including Lasso and SCAD, with its consistency demonstrated through a novel proof technique. Simulation studies and real data applications demonstrate the efficacy of the proposed test statistic in both homoscedastic and heteroscedastic scenarios.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107997"},"PeriodicalIF":1.8,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141324737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Latent event history models for quasi-reaction systems 准反应系统的潜在事件历史模型
IF 1.8 3区 数学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-05-31 DOI: 10.1016/j.csda.2024.107996
Matteo Framba , Veronica Vinciotti , Ernst C. Wit

Various processes, such as cell differentiation and disease spreading, can be modelled as quasi-reaction systems of particles using stochastic differential equations. The existing Local Linear Approximation (LLA) method infers the parameters driving these systems from measurements of particle abundances over time. While dense observations of the process in time should in theory improve parameter estimation, LLA fails in these situations due to numerical instability. Defining a latent event history model of the underlying quasi-reaction system resolves this problem. A computationally efficient Expectation-Maximization algorithm is proposed for parameter estimation, incorporating an extended Kalman filter for evaluating the latent reactions. A simulation study demonstrates the method's performance and highlights the settings where it is particularly advantageous compared to the existing LLA approaches. An illustration of the method applied to the diffusion of COVID-19 in Italy is presented.

细胞分化和疾病传播等各种过程都可以用随机微分方程模拟为粒子的准反应系统。现有的局部线性近似(LLA)方法是通过测量粒子随时间变化的丰度来推断驱动这些系统的参数。虽然从理论上讲,对时间过程的密集观测应能改进参数估计,但由于数值不稳定性,LLA 在这些情况下会失效。定义基本准反应系统的潜在事件历史模型可以解决这个问题。我们提出了一种计算效率高的期望最大化算法来进行参数估计,该算法结合了用于评估潜在反应的扩展卡尔曼滤波器。一项模拟研究证明了该方法的性能,并强调了与现有的 LLA 方法相比,该方法在哪些情况下更具优势。该方法应用于 COVID-19 在意大利的传播情况进行了说明。
{"title":"Latent event history models for quasi-reaction systems","authors":"Matteo Framba ,&nbsp;Veronica Vinciotti ,&nbsp;Ernst C. Wit","doi":"10.1016/j.csda.2024.107996","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107996","url":null,"abstract":"<div><p>Various processes, such as cell differentiation and disease spreading, can be modelled as quasi-reaction systems of particles using stochastic differential equations. The existing Local Linear Approximation (LLA) method infers the parameters driving these systems from measurements of particle abundances over time. While dense observations of the process in time should in theory improve parameter estimation, LLA fails in these situations due to numerical instability. Defining a latent event history model of the underlying quasi-reaction system resolves this problem. A computationally efficient Expectation-Maximization algorithm is proposed for parameter estimation, incorporating an extended Kalman filter for evaluating the latent reactions. A simulation study demonstrates the method's performance and highlights the settings where it is particularly advantageous compared to the existing LLA approaches. An illustration of the method applied to the diffusion of COVID-19 in Italy is presented.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107996"},"PeriodicalIF":1.8,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016794732400080X/pdfft?md5=524e7377774b8a5df2e3a994373e6394&pid=1-s2.0-S016794732400080X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics & Data Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1