Pub Date : 2024-07-18DOI: 10.1007/s00180-024-01532-y
Zeytu Gashaw Asfaw, Patrick E. Brown, Jamie Stafford
The study of aggregated data influenced by time, space, and extra changes in geographic region borders was the main emphasis of the current paper. This may occur if the regions used to count the reported incidences of a health outcome over time change periodically. In order to handle the spatial-temporal scenario, we enhance the spatial root-Gaussian Cox Process (RGCP), which makes use of the square-root link function rather than the more typical log-link function. The algorithm’s ability to estimate a risk surface has been proven by a simulation study, and it has also been validated by real datasets.
{"title":"The root-Gaussian Cox Process for spatial-temporal disease mapping with aggregated data","authors":"Zeytu Gashaw Asfaw, Patrick E. Brown, Jamie Stafford","doi":"10.1007/s00180-024-01532-y","DOIUrl":"https://doi.org/10.1007/s00180-024-01532-y","url":null,"abstract":"<p>The study of aggregated data influenced by time, space, and extra changes in geographic region borders was the main emphasis of the current paper. This may occur if the regions used to count the reported incidences of a health outcome over time change periodically. In order to handle the spatial-temporal scenario, we enhance the spatial root-Gaussian Cox Process (RGCP), which makes use of the square-root link function rather than the more typical log-link function. The algorithm’s ability to estimate a risk surface has been proven by a simulation study, and it has also been validated by real datasets.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"24 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.1007/s00180-024-01527-9
Davood Poursina, B. Wade Brorsen
Bayesian Kriging (BK) provides a way to estimate regression models where the parameters are smoothed across space. Such estimates could help guide site-specific fertilizer recommendations. One advantage of BK is that it can readily fill in the missing values that are common in yield monitor data. The problem is that previous methods are too computationally intensive to be commercially feasible when estimating a nonlinear production function. This paper sought to increase computational speed by imposing restrictions on the spatial covariance matrix. Previous research used an exponential function for the spatial covariance matrix. The two alternatives considered are the conditional autoregressive and simultaneous autoregressive models. In addition, a new analytical solution is provided for finding the optimal value of nitrogen with a stochastic linear plateau model. A comparison among models in the accuracy and computational burden shows that the restrictions significantly reduced the computational burden, although they did sacrifice some accuracy in the dataset considered.
{"title":"Site-specific nitrogen recommendation: fast, accurate, and feasible Bayesian kriging","authors":"Davood Poursina, B. Wade Brorsen","doi":"10.1007/s00180-024-01527-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01527-9","url":null,"abstract":"<p>Bayesian Kriging (BK) provides a way to estimate regression models where the parameters are smoothed across space. Such estimates could help guide site-specific fertilizer recommendations. One advantage of BK is that it can readily fill in the missing values that are common in yield monitor data. The problem is that previous methods are too computationally intensive to be commercially feasible when estimating a nonlinear production function. This paper sought to increase computational speed by imposing restrictions on the spatial covariance matrix. Previous research used an exponential function for the spatial covariance matrix. The two alternatives considered are the conditional autoregressive and simultaneous autoregressive models. In addition, a new analytical solution is provided for finding the optimal value of nitrogen with a stochastic linear plateau model. A comparison among models in the accuracy and computational burden shows that the restrictions significantly reduced the computational burden, although they did sacrifice some accuracy in the dataset considered.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"13 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1007/s00180-024-01504-2
Yonghui Liu, Jiawei Lu, Gilberto A. Paula, Shuangzhe Liu
This paper studies a Bayesian local influence method to detect influential observations in a partially linear model with first-order autoregressive skew-normal errors. This method appears suitable for small or moderate-sized data sets ((n=200{sim }400)) and overcomes some theoretical limitations, bridging the diagnostic gap for small or moderate-sized data in classical methods. The MCMC algorithm is employed for parameter estimation, and Bayesian local influence analysis is made using three perturbation schemes (priors, variances, and data) and three measurement scales (Bayes factor, (phi )-divergence, and posterior mean). Simulation studies are conducted to validate the reliability of the diagnostics. Finally, a practical application uses data on the 1976 Los Angeles ozone concentration to further demonstrate the effectiveness of the diagnostics.
{"title":"Bayesian diagnostics in a partially linear model with first-order autoregressive skew-normal errors","authors":"Yonghui Liu, Jiawei Lu, Gilberto A. Paula, Shuangzhe Liu","doi":"10.1007/s00180-024-01504-2","DOIUrl":"https://doi.org/10.1007/s00180-024-01504-2","url":null,"abstract":"<p>This paper studies a Bayesian local influence method to detect influential observations in a partially linear model with first-order autoregressive skew-normal errors. This method appears suitable for small or moderate-sized data sets (<span>(n=200{sim }400)</span>) and overcomes some theoretical limitations, bridging the diagnostic gap for small or moderate-sized data in classical methods. The MCMC algorithm is employed for parameter estimation, and Bayesian local influence analysis is made using three perturbation schemes (priors, variances, and data) and three measurement scales (Bayes factor, <span>(phi )</span>-divergence, and posterior mean). Simulation studies are conducted to validate the reliability of the diagnostics. Finally, a practical application uses data on the 1976 Los Angeles ozone concentration to further demonstrate the effectiveness of the diagnostics.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-10DOI: 10.1007/s00180-024-01526-w
Suthakaran Ratnasingam, Ramadha D. Piyadi Gamage
Quantile regression is an extension of linear regression which estimates a conditional quantile of interest. In this paper, we propose an empirical likelihood-based non-parametric procedure to detect structural changes in the quantile regression models. Further, we have modified the proposed smoothed empirical likelihood-based method using adjusted smoothed empirical likelihood and transformed smoothed empirical likelihood techniques. We have shown that under the null hypothesis, the limiting distribution of the smoothed empirical likelihood ratio test statistic is identical to that of the classical parametric likelihood. Simulations are conducted to investigate the finite sample properties of the proposed methods. Finally, to demonstrate the effectiveness of the proposed method, it is applied to urinary Glycosaminoglycans (GAGs) data to detect structural changes.
{"title":"Empirical likelihood change point detection in quantile regression models","authors":"Suthakaran Ratnasingam, Ramadha D. Piyadi Gamage","doi":"10.1007/s00180-024-01526-w","DOIUrl":"https://doi.org/10.1007/s00180-024-01526-w","url":null,"abstract":"<p>Quantile regression is an extension of linear regression which estimates a conditional quantile of interest. In this paper, we propose an empirical likelihood-based non-parametric procedure to detect structural changes in the quantile regression models. Further, we have modified the proposed smoothed empirical likelihood-based method using adjusted smoothed empirical likelihood and transformed smoothed empirical likelihood techniques. We have shown that under the null hypothesis, the limiting distribution of the smoothed empirical likelihood ratio test statistic is identical to that of the classical parametric likelihood. Simulations are conducted to investigate the finite sample properties of the proposed methods. Finally, to demonstrate the effectiveness of the proposed method, it is applied to urinary Glycosaminoglycans (GAGs) data to detect structural changes.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-05DOI: 10.1007/s00180-024-01524-y
Hang Zou, Xiaowen Huang, Yunlu Jiang
Additive coefficient models generalize linear regression models by assuming that the relationship between the response and some covariates is linear, while their regression coefficients are additive functions. Because of its advantages in dealing with the “curse of dimensionality”, additive coefficient models gain a lot of attention. The commonly used estimation methods for additive coefficient models are not robust against high leverage points. To circumvent this difficulty, we develop a robust variable selection procedure based on the exponential squared loss function and group penalty for the additive coefficient models, which can tackle outliers in the response and covariates simultaneously. Under some regularity conditions, we show that the oracle estimator is a local solution of the proposed method. Furthermore, we apply the local linear approximation and minorization-maximization algorithm for the implementation of the proposed estimator. Meanwhile, we propose a data-driven procedure to select the tuning parameters. Simulation studies and an application to a plasma beta-carotene level data set illustrate that the proposed method can offer more reliable results than other existing methods in contamination schemes.
{"title":"Robust variable selection for additive coefficient models","authors":"Hang Zou, Xiaowen Huang, Yunlu Jiang","doi":"10.1007/s00180-024-01524-y","DOIUrl":"https://doi.org/10.1007/s00180-024-01524-y","url":null,"abstract":"<p>Additive coefficient models generalize linear regression models by assuming that the relationship between the response and some covariates is linear, while their regression coefficients are additive functions. Because of its advantages in dealing with the “curse of dimensionality”, additive coefficient models gain a lot of attention. The commonly used estimation methods for additive coefficient models are not robust against high leverage points. To circumvent this difficulty, we develop a robust variable selection procedure based on the exponential squared loss function and group penalty for the additive coefficient models, which can tackle outliers in the response and covariates simultaneously. Under some regularity conditions, we show that the oracle estimator is a local solution of the proposed method. Furthermore, we apply the local linear approximation and minorization-maximization algorithm for the implementation of the proposed estimator. Meanwhile, we propose a data-driven procedure to select the tuning parameters. Simulation studies and an application to a plasma beta-carotene level data set illustrate that the proposed method can offer more reliable results than other existing methods in contamination schemes.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1007/s00180-024-01521-1
Ting Wang, Liya Fu, Yanan Song
This paper proposes a polynomial structure identification (PSI) method for variable selection and model structure identification of additive models with longitudinal data. First, the backfitting algorithm and zero-order local polynomial smoothing method are used to select important variables in the additive model, and the importance of variables is determined through the inverse of the bandwidth parameter in the nonparametric partial kernel function. Second, the backfitting algorithm and Q-order local polynomial smoothing method are utilized to identify the specific structure of each selected predictor. To incorporate correlations within longitudinal data, a two-stage estimation method is proposed for estimating the regression parameters of the identified important variables: (i) Parameter estimators of the important variables are firstly obtained under an independence working model assumption; (ii) Generalized estimating equations with a working correlation matrix based on B-splines are constructed to obtain the final estimators of the parameters, which improve the efficiency of parameter estimation. Finally, simulation studies are carried out to evaluate the performance of the proposed method, followed by the presentation of two real-world examples for illustration.
{"title":"Variable selection and structure identification for additive models with longitudinal data","authors":"Ting Wang, Liya Fu, Yanan Song","doi":"10.1007/s00180-024-01521-1","DOIUrl":"https://doi.org/10.1007/s00180-024-01521-1","url":null,"abstract":"<p>This paper proposes a polynomial structure identification (PSI) method for variable selection and model structure identification of additive models with longitudinal data. First, the backfitting algorithm and zero-order local polynomial smoothing method are used to select important variables in the additive model, and the importance of variables is determined through the inverse of the bandwidth parameter in the nonparametric partial kernel function. Second, the backfitting algorithm and <i>Q</i>-order local polynomial smoothing method are utilized to identify the specific structure of each selected predictor. To incorporate correlations within longitudinal data, a two-stage estimation method is proposed for estimating the regression parameters of the identified important variables: (i) Parameter estimators of the important variables are firstly obtained under an independence working model assumption; (ii) Generalized estimating equations with a working correlation matrix based on B-splines are constructed to obtain the final estimators of the parameters, which improve the efficiency of parameter estimation. Finally, simulation studies are carried out to evaluate the performance of the proposed method, followed by the presentation of two real-world examples for illustration.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"30 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1007/s00180-024-01518-w
Peter C. Austin
In time-to-event analyses, a competing risk is an event whose occurrence precludes the occurrence of the event of interest. Settings with competing risks occur frequently in clinical research. Missing data, which is a common problem in research, occurs when the value of a variable is recorded for some, but not all, records in the dataset. Multiple Imputation (MI) is a popular method to address the presence of missing data. MI uses an imputation model to generate M (M > 1) values for each variable that is missing, resulting in the creation of M complete datasets. A popular algorithm for imputing missing data is multivariate imputation using chained equations (MICE). We used a complex simulation design with covariates and missing data patterns reflective of patients hospitalized with acute myocardial infarction (AMI) to compare three strategies for imputing missing predictor variables when the analysis model is a cause-specific hazard when there were three different event types. We compared two MICE-based strategies that differed according to which cause-specific cumulative hazard functions were included in the imputation models (the three cause-specific cumulative hazard functions vs. only the cause-specific cumulative hazard function for the primary outcome) with the use of the substantive model compatible fully conditional specification (SMCFCS) algorithm. While no strategy had consistently superior performance compared to the other strategies, SMCFCS may be the preferred strategy. We illustrated the application of the strategies using a case study of patients hospitalized with AMI.
{"title":"Multiple imputation with competing risk outcomes","authors":"Peter C. Austin","doi":"10.1007/s00180-024-01518-w","DOIUrl":"https://doi.org/10.1007/s00180-024-01518-w","url":null,"abstract":"<p>In time-to-event analyses, a competing risk is an event whose occurrence precludes the occurrence of the event of interest. Settings with competing risks occur frequently in clinical research. Missing data, which is a common problem in research, occurs when the value of a variable is recorded for some, but not all, records in the dataset. Multiple Imputation (MI) is a popular method to address the presence of missing data. MI uses an imputation model to generate M (M > 1) values for each variable that is missing, resulting in the creation of M complete datasets. A popular algorithm for imputing missing data is multivariate imputation using chained equations (MICE). We used a complex simulation design with covariates and missing data patterns reflective of patients hospitalized with acute myocardial infarction (AMI) to compare three strategies for imputing missing predictor variables when the analysis model is a cause-specific hazard when there were three different event types. We compared two MICE-based strategies that differed according to which cause-specific cumulative hazard functions were included in the imputation models (the three cause-specific cumulative hazard functions vs. only the cause-specific cumulative hazard function for the primary outcome) with the use of the substantive model compatible fully conditional specification (SMCFCS) algorithm. While no strategy had consistently superior performance compared to the other strategies, SMCFCS may be the preferred strategy. We illustrated the application of the strategies using a case study of patients hospitalized with AMI.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"176 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-24DOI: 10.1007/s00180-024-01522-0
Florian Felice
In this work, we present a methodology to estimate the strength of handball teams. We propose the use of the Conway-Maxwell-Poisson distribution to model the number of goals scored by a team as a flexible discrete distribution which can handle situations of non equi-dispersion. From its parameters, we derive a mathematical formula to determine the strength of a team. We propose a ranking based on the estimated strengths to compare teams across different championships. Applied to female handball club data from European competitions over the 2022/2023 season, we show that our new proposed ranking can have an echo in real sports events and is linked to recent results from European competitions.
{"title":"Ranking handball teams from statistical strength estimation","authors":"Florian Felice","doi":"10.1007/s00180-024-01522-0","DOIUrl":"https://doi.org/10.1007/s00180-024-01522-0","url":null,"abstract":"<p>In this work, we present a methodology to estimate the strength of handball teams. We propose the use of the Conway-Maxwell-Poisson distribution to model the number of goals scored by a team as a flexible discrete distribution which can handle situations of non equi-dispersion. From its parameters, we derive a mathematical formula to determine the strength of a team. We propose a ranking based on the estimated strengths to compare teams across different championships. Applied to female handball club data from European competitions over the 2022/2023 season, we show that our new proposed ranking can have an echo in real sports events and is linked to recent results from European competitions.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"24 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-23DOI: 10.1007/s00180-024-01520-2
Hyunman Sim, Sungjeong Lee, Bo-Hyung Kim, Eun Shin, Woojoo Lee
Hypothesis testing for the regression coefficient associated with a dichotomized continuous covariate in a Cox proportional hazards model has been considered in clinical research. Although most existing testing methods do not allow covariates, except for a dichotomized continuous covariate, they have generally been applied. Through an analytic bias analysis and a numerical study, we show that the current practice is not free from an inflated type I error and a loss of power. To overcome this limitation, we develop a bootstrap-based test that allows additional covariates and dichotomizes two-dimensional covariates into a binary variable. In addition, we develop an efficient algorithm to speed up the calculation of the proposed test statistic. Our numerical study demonstrates that the proposed bootstrap-based test maintains the type I error well at the nominal level and exhibits higher power than other methods, as well as that the proposed efficient algorithm reduces computational costs.
临床研究一直在考虑对 Cox 比例危险模型中与二分连续协变量相关的回归系数进行假设检验。尽管除二分连续协变量外,现有的大多数检验方法不允许使用协变量,但这些方法已被普遍应用。通过分析偏差分析和数值研究,我们发现目前的做法并不能避免 I 型误差的扩大和功率的损失。为了克服这一局限性,我们开发了一种基于 bootstrap 的检验方法,允许使用额外的协变量,并将二维协变量二分为二元变量。此外,我们还开发了一种高效算法,以加快所提检验统计量的计算速度。我们的数值研究表明,与其他方法相比,所提出的基于引导的检验能在名义水平上很好地保持 I 型误差,并表现出更高的功率,同时所提出的高效算法也降低了计算成本。
{"title":"Hypothesis testing in Cox models when continuous covariates are dichotomized: bias analysis and bootstrap-based test","authors":"Hyunman Sim, Sungjeong Lee, Bo-Hyung Kim, Eun Shin, Woojoo Lee","doi":"10.1007/s00180-024-01520-2","DOIUrl":"https://doi.org/10.1007/s00180-024-01520-2","url":null,"abstract":"<p>Hypothesis testing for the regression coefficient associated with a dichotomized continuous covariate in a Cox proportional hazards model has been considered in clinical research. Although most existing testing methods do not allow covariates, except for a dichotomized continuous covariate, they have generally been applied. Through an analytic bias analysis and a numerical study, we show that the current practice is not free from an inflated type I error and a loss of power. To overcome this limitation, we develop a bootstrap-based test that allows additional covariates and dichotomizes two-dimensional covariates into a binary variable. In addition, we develop an efficient algorithm to speed up the calculation of the proposed test statistic. Our numerical study demonstrates that the proposed bootstrap-based test maintains the type I error well at the nominal level and exhibits higher power than other methods, as well as that the proposed efficient algorithm reduces computational costs.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1007/s00180-024-01519-9
Emilie Lebarbier, Nicolas Marie, Amélie Rosier
This article focuses on the practical issue of a recent theoretical method proposed for trend estimation in high dimensional time series. This method falls within the scope of the low-rank matrix factorization methods in which the temporal structure is taken into account. It consists of minimizing a penalized criterion, theoretically efficient but which depends on two constants to be chosen in practice. We propose a two-step strategy to solve this question based on two different known heuristics. The performance and a comparison of the strategies are studied through an important simulation study in various scenarios. In order to make the estimation method with the best strategy available to the community, we implemented the method in an R package TrendTM which is presented and used here. Finally, we give a geometric interpretation of the results by linking it to PCA and use the results to solve a high-dimensional curve clustering problem. The package is available on CRAN.
本文重点讨论最近提出的一种用于高维时间序列趋势估计的理论方法的实际问题。该方法属于低秩矩阵因式分解方法的范畴,其中考虑了时间结构。它包括最小化一个惩罚性标准,该标准在理论上是有效的,但在实践中取决于两个常量的选择。我们基于两种不同的已知启发式方法,提出了一种分两步解决这一问题的策略。通过在各种情况下进行重要的模拟研究,对这些策略的性能和比较进行了研究。为了向社会提供具有最佳策略的估算方法,我们在 R 软件包 TrendTM 中实现了该方法,并在此介绍和使用。最后,我们通过将其与 PCA 相结合,对结果进行了几何解释,并利用结果解决了一个高维曲线聚类问题。该软件包可在 CRAN 上下载。
{"title":"Trend of high dimensional time series estimation using low-rank matrix factorization: heuristics and numerical experiments via the TrendTM package","authors":"Emilie Lebarbier, Nicolas Marie, Amélie Rosier","doi":"10.1007/s00180-024-01519-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01519-9","url":null,"abstract":"<p>This article focuses on the practical issue of a recent theoretical method proposed for trend estimation in high dimensional time series. This method falls within the scope of the low-rank matrix factorization methods in which the temporal structure is taken into account. It consists of minimizing a penalized criterion, theoretically efficient but which depends on two constants to be chosen in practice. We propose a two-step strategy to solve this question based on two different known heuristics. The performance and a comparison of the strategies are studied through an important simulation study in various scenarios. In order to make the estimation method with the best strategy available to the community, we implemented the method in an R package <span>TrendTM</span> which is presented and used here. Finally, we give a geometric interpretation of the results by linking it to PCA and use the results to solve a high-dimensional curve clustering problem. The package is available on CRAN.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"3 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141530231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}