Pub Date : 2024-06-27DOI: 10.1016/j.csda.2024.108012
Qingyang Liu, Xianzheng Huang, Ray Bai
Compared to mean regression and quantile regression, the literature on modal regression is very sparse. A unifying framework for Bayesian modal regression is proposed, based on a family of unimodal distributions indexed by the mode, along with other parameters that allow for flexible shapes and tail behaviors. Sufficient conditions for posterior propriety under an improper prior on the mode parameter are derived. Following prior elicitation, regression analysis of simulated data and datasets from several real-life applications are conducted. Besides drawing inference for covariate effects that are easy to interpret, prediction and model selection under the proposed Bayesian modal regression framework are also considered. Evidence from these analyses suggest that the proposed inference procedures are very robust to outliers, enabling one to discover interesting covariate effects missed by mean or median regression, and to construct much tighter prediction intervals than those from mean or median regression. Computer programs for implementing the proposed Bayesian modal regression are available at https://github.com/rh8liuqy/Bayesian_modal_regression.
{"title":"Bayesian modal regression based on mixture distributions","authors":"Qingyang Liu, Xianzheng Huang, Ray Bai","doi":"10.1016/j.csda.2024.108012","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108012","url":null,"abstract":"<div><p>Compared to mean regression and quantile regression, the literature on modal regression is very sparse. A unifying framework for Bayesian modal regression is proposed, based on a family of unimodal distributions indexed by the mode, along with other parameters that allow for flexible shapes and tail behaviors. Sufficient conditions for posterior propriety under an improper prior on the mode parameter are derived. Following prior elicitation, regression analysis of simulated data and datasets from several real-life applications are conducted. Besides drawing inference for covariate effects that are easy to interpret, prediction and model selection under the proposed Bayesian modal regression framework are also considered. Evidence from these analyses suggest that the proposed inference procedures are very robust to outliers, enabling one to discover interesting covariate effects missed by mean or median regression, and to construct much tighter prediction intervals than those from mean or median regression. Computer programs for implementing the proposed Bayesian modal regression are available at <span>https://github.com/rh8liuqy/Bayesian_modal_regression</span><svg><path></path></svg>.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108012"},"PeriodicalIF":1.5,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-25DOI: 10.1016/j.csda.2024.108010
Yixuan Liu , Claudia Kirch , Jeong Eun Lee , Renate Meyer
A novel approach to Bayesian nonparametric spectral analysis of stationary multivariate time series is presented. Starting with a parametric vector-autoregressive model, the parametric likelihood is nonparametrically adjusted in the frequency domain to account for potential deviations from parametric assumptions. A proof of mutual contiguity of the nonparametrically corrected likelihood, the multivariate Whittle likelihood approximation and the exact likelihood for Gaussian time series is given. A multivariate extension of the nonparametric Bernstein-Dirichlet process prior for univariate spectral densities to the space of Hermitian positive definite spectral density matrices is specified directly on the correction matrices. An infinite series representation of this prior is then used to develop a Markov chain Monte Carlo algorithm to sample from the posterior distribution. The code is made publicly available for ease of use and reproducibility. With this novel approach, a generalisation of the multivariate Whittle-likelihood-based method of Meier et al. (2020) as well as an extension of the nonparametrically corrected likelihood for univariate stationary time series of Kirch et al. (2019) to the multivariate case is presented. It is demonstrated that the nonparametrically corrected likelihood combines the efficiencies of a parametric with the robustness of a nonparametric model. Its numerical accuracy is illustrated in a comprehensive simulation study. Its practical advantages are illustrated by a spectral analysis of two environmental time series data sets: a bivariate time series of the Southern Oscillation Index and fish recruitment and a multivariate time series of windspeed data at six locations in California.
{"title":"A nonparametrically corrected likelihood for Bayesian spectral analysis of multivariate time series","authors":"Yixuan Liu , Claudia Kirch , Jeong Eun Lee , Renate Meyer","doi":"10.1016/j.csda.2024.108010","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108010","url":null,"abstract":"<div><p>A novel approach to Bayesian nonparametric spectral analysis of stationary multivariate time series is presented. Starting with a parametric vector-autoregressive model, the parametric likelihood is nonparametrically adjusted in the frequency domain to account for potential deviations from parametric assumptions. A proof of mutual contiguity of the nonparametrically corrected likelihood, the multivariate Whittle likelihood approximation and the exact likelihood for Gaussian time series is given. A multivariate extension of the nonparametric Bernstein-Dirichlet process prior for univariate spectral densities to the space of Hermitian positive definite spectral density matrices is specified directly on the correction matrices. An infinite series representation of this prior is then used to develop a Markov chain Monte Carlo algorithm to sample from the posterior distribution. The code is made publicly available for ease of use and reproducibility. With this novel approach, a generalisation of the multivariate Whittle-likelihood-based method of <span>Meier et al. (2020)</span> as well as an extension of the nonparametrically corrected likelihood for univariate stationary time series of <span>Kirch et al. (2019)</span> to the multivariate case is presented. It is demonstrated that the nonparametrically corrected likelihood combines the efficiencies of a parametric with the robustness of a nonparametric model. Its numerical accuracy is illustrated in a comprehensive simulation study. Its practical advantages are illustrated by a spectral analysis of two environmental time series data sets: a bivariate time series of the Southern Oscillation Index and fish recruitment and a multivariate time series of windspeed data at six locations in California.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108010"},"PeriodicalIF":1.5,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016794732400094X/pdfft?md5=4194de676b76fa0193f3ea88ff4e7bdc&pid=1-s2.0-S016794732400094X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1016/j.csda.2024.108011
Schyan Zafar, Geoff K. Nicholls
Word meanings change over time, and word senses evolve, emerge or die out in the process. For ancient languages, where the corpora are often small and sparse, modelling such changes accurately proves challenging, and quantifying uncertainty in sense-change estimates consequently becomes important. GASC (Genre-Aware Semantic Change) and DiSC (Diachronic Sense Change) are existing generative models that have been used to analyse sense change for target words from an ancient Greek text corpus, using unsupervised learning without the help of any pre-training. These models represent the senses of a given target word such as “kosmos” (meaning decoration, order or world) as distributions over context words, and sense prevalence as a distribution over senses. The models are fitted using Markov Chain Monte Carlo (MCMC) methods to measure temporal changes in these representations. This paper introduces EDiSC, an Embedded DiSC model, which combines word embeddings with DiSC to provide superior model performance. It is shown empirically that EDiSC offers improved predictive accuracy, ground-truth recovery and uncertainty quantification, as well as better sampling efficiency and scalability properties with MCMC methods. The challenges of fitting these models are also discussed.
{"title":"An embedded diachronic sense change model with a case study from ancient Greek","authors":"Schyan Zafar, Geoff K. Nicholls","doi":"10.1016/j.csda.2024.108011","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108011","url":null,"abstract":"<div><p>Word meanings change over time, and word <em>senses</em> evolve, emerge or die out in the process. For ancient languages, where the corpora are often small and sparse, modelling such changes accurately proves challenging, and quantifying uncertainty in sense-change estimates consequently becomes important. GASC (Genre-Aware Semantic Change) and DiSC (Diachronic Sense Change) are existing generative models that have been used to analyse sense change for target words from an ancient Greek text corpus, using unsupervised learning without the help of any pre-training. These models represent the senses of a given target word such as “kosmos” (meaning decoration, order or world) as distributions over context words, and sense prevalence as a distribution over senses. The models are fitted using Markov Chain Monte Carlo (MCMC) methods to measure temporal changes in these representations. This paper introduces EDiSC, an Embedded DiSC model, which combines word embeddings with DiSC to provide superior model performance. It is shown empirically that EDiSC offers improved predictive accuracy, ground-truth recovery and uncertainty quantification, as well as better sampling efficiency and scalability properties with MCMC methods. The challenges of fitting these models are also discussed.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108011"},"PeriodicalIF":1.5,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000951/pdfft?md5=12930590074b9c3008e514576f2c4ba0&pid=1-s2.0-S0167947324000951-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1016/j.csda.2024.108009
Xuan Ma, Jenný Brynjarsdóttir, Thomas LaFramboise
A double Pólya-Gamma data augmentation scheme is developed for posterior sampling from a Bayesian hierarchical model of total and categorical count data. The scheme applies to a Negative Binomial - Binomial (NBB) hierarchical regression model with logit links and normal priors on regression coefficients. The approach is shown to be very efficient and in most cases out-performs the Stan program. The hierarchical modeling framework and the Pólya-Gamma data augmentation scheme are applied to human mitochondrial DNA data.
本文提出了一种双 Pólya-Gamma 数据扩增方案,用于从总体和分类计数数据的贝叶斯分层模型中进行后验采样。该方案适用于带有对数链接和回归系数正态先验的负二项-二项(NBB)分层回归模型。结果表明,该方法非常高效,在大多数情况下都优于 Stan 程序。分层建模框架和 Pólya-Gamma 数据增强方案被应用于人类线粒体 DNA 数据。
{"title":"A double Pólya-Gamma data augmentation scheme for a hierarchical Negative Binomial - Binomial data model","authors":"Xuan Ma, Jenný Brynjarsdóttir, Thomas LaFramboise","doi":"10.1016/j.csda.2024.108009","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108009","url":null,"abstract":"<div><p>A double Pólya-Gamma data augmentation scheme is developed for posterior sampling from a Bayesian hierarchical model of total and categorical count data. The scheme applies to a Negative Binomial - Binomial (NBB) hierarchical regression model with logit links and normal priors on regression coefficients. The approach is shown to be very efficient and in most cases out-performs the Stan program. The hierarchical modeling framework and the Pólya-Gamma data augmentation scheme are applied to human mitochondrial DNA data.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108009"},"PeriodicalIF":1.5,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000938/pdfft?md5=5e06b3420d4ee7efb587c1f231e8d551&pid=1-s2.0-S0167947324000938-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141485449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1016/j.csda.2024.107999
Chunyan Wang , Dennis K.J. Lin
Orthogonal Latin hypercubes are widely used for computer experiments. They achieve both orthogonality and the maximum one-dimensional stratification property. When two-factor (and higher-order) interactions are active, two- and three-dimensional stratifications are also important. Unfortunately, little is known about orthogonal Latin hypercubes with good two (and higher)–dimensional stratification properties. A method is proposed for constructing a new class of orthogonal Latin hypercubes whose columns can be partitioned into groups, such that the columns from different groups maintain two- and three-dimensional stratification properties. The proposed designs perform well under almost all popular criteria (e.g., the orthogonality, stratification, and maximin distance criterion). They are the most ideal designs for computer experiments. The construction method can be straightforward to implement, and the relevant theoretical supports are well established. The proposed strong orthogonal Latin hypercubes are tabulated for practical needs.
{"title":"Strong orthogonal Latin hypercubes for computer experiments","authors":"Chunyan Wang , Dennis K.J. Lin","doi":"10.1016/j.csda.2024.107999","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107999","url":null,"abstract":"<div><p>Orthogonal Latin hypercubes are widely used for computer experiments. They achieve both orthogonality and the maximum one-dimensional stratification property. When two-factor (and higher-order) interactions are active, two- and three-dimensional stratifications are also important. Unfortunately, little is known about orthogonal Latin hypercubes with good two (and higher)–dimensional stratification properties. A method is proposed for constructing a new class of orthogonal Latin hypercubes whose columns can be partitioned into groups, such that the columns from different groups maintain two- and three-dimensional stratification properties. The proposed designs perform well under almost all popular criteria (e.g., the orthogonality, stratification, and maximin distance criterion). They are the most ideal designs for computer experiments. The construction method can be straightforward to implement, and the relevant theoretical supports are well established. The proposed strong orthogonal Latin hypercubes are tabulated for practical needs.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107999"},"PeriodicalIF":1.5,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-13DOI: 10.1016/j.csda.2024.108006
Eunju Hwang, ChanHyeok Jeon
Most of real data are characterized by positive, asymmetric and skewed distributions of various shapes. Modelling and forecasting of such data are addressed by proposing nonnegative conditional heteroscedastic time series models with Gamma distributions. Three types of time-varying parameters of Gamma distributions are adopted to construct the nonnegative GARCH models. A condition for the existence of a stationary Gamma-GARCH model is given. Parameter estimates are discussed via maximum likelihood estimation (MLE) method. A Monte-Carlo study is conducted to illustrate sample paths of the proposed models and to see finite-sample validity of the MLEs, as well as to evaluate model diagnostics using standardized Pearson residuals. Furthermore, out-of-sample forecasting analysis is performed to compute forecasting accuracy measures. Applications to oil price and Bitcoin data are given, respectively.
{"title":"Nonnegative GARCH-type models with conditional Gamma distributions and their applications","authors":"Eunju Hwang, ChanHyeok Jeon","doi":"10.1016/j.csda.2024.108006","DOIUrl":"10.1016/j.csda.2024.108006","url":null,"abstract":"<div><p>Most of real data are characterized by positive, asymmetric and skewed distributions of various shapes. Modelling and forecasting of such data are addressed by proposing nonnegative conditional heteroscedastic time series models with Gamma distributions. Three types of time-varying parameters of Gamma distributions are adopted to construct the nonnegative GARCH models. A condition for the existence of a stationary Gamma-GARCH model is given. Parameter estimates are discussed via maximum likelihood estimation (MLE) method. A Monte-Carlo study is conducted to illustrate sample paths of the proposed models and to see finite-sample validity of the MLEs, as well as to evaluate model diagnostics using standardized Pearson residuals. Furthermore, out-of-sample forecasting analysis is performed to compute forecasting accuracy measures. Applications to oil price and Bitcoin data are given, respectively.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 108006"},"PeriodicalIF":1.5,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141395917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-11DOI: 10.1016/j.csda.2024.107998
Chung Eun Lee , Xin Zhang
The dimension reduction problem for a stationary tensor time series is addressed. The goal is to remove linear combinations of the tensor time series that are mean independent of the past, without imposing any parametric models or distributional assumptions. To achieve this goal, a new metric called cumulative tensor martingale difference divergence is introduced and its theoretical properties are studied. Unlike existing methods, the proposed approach achieves dimension reduction by estimating a distinctive subspace that can fully retain the conditional mean information. By focusing on the conditional mean, the proposed dimension reduction method is potentially more accurate in prediction. The method can be viewed as a factor model-based approach that extends the existing techniques for estimating central subspace or central mean subspace in vector time series. The effectiveness of the proposed method is illustrated by extensive simulations and two real-world data applications.
{"title":"Conditional mean dimension reduction for tensor time series","authors":"Chung Eun Lee , Xin Zhang","doi":"10.1016/j.csda.2024.107998","DOIUrl":"10.1016/j.csda.2024.107998","url":null,"abstract":"<div><p>The dimension reduction problem for a stationary tensor time series is addressed. The goal is to remove linear combinations of the tensor time series that are mean independent of the past, without imposing any parametric models or distributional assumptions. To achieve this goal, a new metric called cumulative tensor martingale difference divergence is introduced and its theoretical properties are studied. Unlike existing methods, the proposed approach achieves dimension reduction by estimating a distinctive subspace that can fully retain the conditional mean information. By focusing on the conditional mean, the proposed dimension reduction method is potentially more accurate in prediction. The method can be viewed as a factor model-based approach that extends the existing techniques for estimating central subspace or central mean subspace in vector time series. The effectiveness of the proposed method is illustrated by extensive simulations and two real-world data applications.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 107998"},"PeriodicalIF":1.5,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141389420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1016/j.csda.2024.107994
Sam Efromovich, Lirit Fuksman
Imputation is a standard procedure in dealing with missing data and there are many competing imputation methods. It is proposed to analyze imputation procedures via comparison with a benchmark developed by the asymptotic theory. Considered model is nonparametric density estimation of the missing right censored lifetime of interest. This model is of a special interest for understanding imputation because each underlying observation is the pair of censored lifetime and indicator of censoring. The latter creates a number of interesting scenarios and challenges for imputation when best methods may or may not be applicable. Further, the theory sheds light on why the effect of imputation depends on an underlying density. The methodology is tested on real life datasets and via intensive simulations. Data and R code are provided.
估算是处理缺失数据的标准程序,有许多相互竞争的估算方法。建议通过与渐近理论开发的基准进行比较来分析估算程序。所考虑的模型是对缺失的右删失寿命进行非参数密度估计。该模型对于理解估算具有特殊意义,因为每个基础观测值都是一对删减寿命和删减指标。后者在最佳方法可能适用也可能不适用的情况下,为估算带来了许多有趣的情况和挑战。此外,该理论还揭示了为什么估算的效果取决于基础密度。该方法在实际数据集上并通过密集模拟进行了测试。提供数据和 R 代码。
{"title":"Study of imputation procedures for nonparametric density estimation based on missing censored lifetimes","authors":"Sam Efromovich, Lirit Fuksman","doi":"10.1016/j.csda.2024.107994","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107994","url":null,"abstract":"<div><p>Imputation is a standard procedure in dealing with missing data and there are many competing imputation methods. It is proposed to analyze imputation procedures via comparison with a benchmark developed by the asymptotic theory. Considered model is nonparametric density estimation of the missing right censored lifetime of interest. This model is of a special interest for understanding imputation because each underlying observation is the pair of censored lifetime and indicator of censoring. The latter creates a number of interesting scenarios and challenges for imputation when best methods may or may not be applicable. Further, the theory sheds light on why the effect of imputation depends on an underlying density. The methodology is tested on real life datasets and via intensive simulations. Data and R code are provided.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107994"},"PeriodicalIF":1.8,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141308344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1016/j.csda.2024.107997
Xiang Li , Yu-Ning Li , Li-Xin Zhang , Jun Zhao
The methodology for the inference problem in high-dimensional linear expectile regression is developed. By transforming the expectile loss into a weighted-least-squares form and applying a de-biasing strategy, Wald-type tests for multiple constraints within a regularized framework are established. An estimator for the pseudo-inverse of the generalized Hessian matrix in high dimension is constructed using general amenable regularizers, including Lasso and SCAD, with its consistency demonstrated through a novel proof technique. Simulation studies and real data applications demonstrate the efficacy of the proposed test statistic in both homoscedastic and heteroscedastic scenarios.
{"title":"Inference for high-dimensional linear expectile regression with de-biasing method","authors":"Xiang Li , Yu-Ning Li , Li-Xin Zhang , Jun Zhao","doi":"10.1016/j.csda.2024.107997","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107997","url":null,"abstract":"<div><p>The methodology for the inference problem in high-dimensional linear expectile regression is developed. By transforming the expectile loss into a weighted-least-squares form and applying a de-biasing strategy, Wald-type tests for multiple constraints within a regularized framework are established. An estimator for the pseudo-inverse of the generalized Hessian matrix in high dimension is constructed using general amenable regularizers, including Lasso and SCAD, with its consistency demonstrated through a novel proof technique. Simulation studies and real data applications demonstrate the efficacy of the proposed test statistic in both homoscedastic and heteroscedastic scenarios.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107997"},"PeriodicalIF":1.8,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141324737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-31DOI: 10.1016/j.csda.2024.107996
Matteo Framba , Veronica Vinciotti , Ernst C. Wit
Various processes, such as cell differentiation and disease spreading, can be modelled as quasi-reaction systems of particles using stochastic differential equations. The existing Local Linear Approximation (LLA) method infers the parameters driving these systems from measurements of particle abundances over time. While dense observations of the process in time should in theory improve parameter estimation, LLA fails in these situations due to numerical instability. Defining a latent event history model of the underlying quasi-reaction system resolves this problem. A computationally efficient Expectation-Maximization algorithm is proposed for parameter estimation, incorporating an extended Kalman filter for evaluating the latent reactions. A simulation study demonstrates the method's performance and highlights the settings where it is particularly advantageous compared to the existing LLA approaches. An illustration of the method applied to the diffusion of COVID-19 in Italy is presented.
{"title":"Latent event history models for quasi-reaction systems","authors":"Matteo Framba , Veronica Vinciotti , Ernst C. Wit","doi":"10.1016/j.csda.2024.107996","DOIUrl":"https://doi.org/10.1016/j.csda.2024.107996","url":null,"abstract":"<div><p>Various processes, such as cell differentiation and disease spreading, can be modelled as quasi-reaction systems of particles using stochastic differential equations. The existing Local Linear Approximation (LLA) method infers the parameters driving these systems from measurements of particle abundances over time. While dense observations of the process in time should in theory improve parameter estimation, LLA fails in these situations due to numerical instability. Defining a latent event history model of the underlying quasi-reaction system resolves this problem. A computationally efficient Expectation-Maximization algorithm is proposed for parameter estimation, incorporating an extended Kalman filter for evaluating the latent reactions. A simulation study demonstrates the method's performance and highlights the settings where it is particularly advantageous compared to the existing LLA approaches. An illustration of the method applied to the diffusion of COVID-19 in Italy is presented.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"198 ","pages":"Article 107996"},"PeriodicalIF":1.8,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S016794732400080X/pdfft?md5=524e7377774b8a5df2e3a994373e6394&pid=1-s2.0-S016794732400080X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141243341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}