Pub Date : 2024-07-29DOI: 10.1007/s00180-024-01528-8
Lei Ge, Yang Li, Jianguo Sun
Panel binary data arise in an event history study when study subjects are observed only at discrete time points instead of continuously and the only available information on the occurrence of the recurrent event of interest is whether the event has occurred over two consecutive observation times or each observation window. Although some methods have been proposed for regression analysis of such data, all of them assume independent observation times or processes, which may not be true sometimes. To address this, we propose a joint modeling procedure that allows for informative observation processes. For the implementation of the proposed method, a computationally efficient EM algorithm is developed and the resulting estimators are consistent and asymptotically normal. The simulation study conducted to assess its performance indicates that it works well in practical situations, and the proposed approach is applied to the motivating data set from the Health and Retirement Study.
事件史研究中会出现面板二元数据,即研究对象只在离散的时间点而不是连续的时间点接受观察,而关于所关注的重复事件发生情况的唯一可用信息是该事件是否在两个连续的观察时间或每个观察窗口中发生。虽然已经提出了一些对此类数据进行回归分析的方法,但所有这些方法都假定观察时间或观察过程是独立的,但有时可能并非如此。为了解决这个问题,我们提出了一种联合建模程序,允许有信息的观测过程。为了实现所提出的方法,我们开发了一种计算效率高的 EM 算法,所得到的估计值具有一致性和渐近正态性。为评估该方法的性能而进行的模拟研究表明,该方法在实际情况下运行良好。
{"title":"Semiparametric regression analysis of panel binary data with an informative observation process","authors":"Lei Ge, Yang Li, Jianguo Sun","doi":"10.1007/s00180-024-01528-8","DOIUrl":"https://doi.org/10.1007/s00180-024-01528-8","url":null,"abstract":"<p>Panel binary data arise in an event history study when study subjects are observed only at discrete time points instead of continuously and the only available information on the occurrence of the recurrent event of interest is whether the event has occurred over two consecutive observation times or each observation window. Although some methods have been proposed for regression analysis of such data, all of them assume independent observation times or processes, which may not be true sometimes. To address this, we propose a joint modeling procedure that allows for informative observation processes. For the implementation of the proposed method, a computationally efficient EM algorithm is developed and the resulting estimators are consistent and asymptotically normal. The simulation study conducted to assess its performance indicates that it works well in practical situations, and the proposed approach is applied to the motivating data set from the Health and Retirement Study.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"44 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-26DOI: 10.1007/s00180-024-01517-x
Ting-Wu Wang, Eric J. Beh, Rosaria Lombardo, Ian W. Renner
Power transformations of count data, including cell frequencies of a contingency table, have been well understood for nearly 100 years, with much of the attention focused on the square root transformation. Over the past 15 years, this topic has been the focus of some new insights into areas of correspondence analysis where two forms of power transformation have been discussed. One type considers the impact of raising the joint proportions of the cell frequencies of a table to a known power while the other examines the power transformation of the relative distribution of the cell frequencies. While the foundations of the graphical features of correspondence analysis rest with the numerical algorithms like reciprocal averaging, and other analogous techniques, discussions of the role of power transformations in reciprocal averaging have not been described. Therefore, this paper examines this link where a power transformation is applied to the cell frequencies of a two-way contingency table. In doing so, we show that reciprocal averaging can be performed under such a transformation to obtain row and column scores that provide the maximum association between the variables and the greatest discrimination between the categories. Finally, we discuss the connection between performing reciprocal averaging and singular value decomposition under this type of power transformation. The R function, powerRA.exe is included in the Appendix and performs reciprocal averaging of a power transformation of the cell frequencies of a two-way contingency table.
近 100 年来,人们对计数数据(包括或然率表中的单元频率)的幂变换已经有了很好的理解,其中大部分注意力都集中在平方根变换上。在过去的 15 年里,这个话题成为了对应分析领域一些新见解的焦点,其中有两种形式的幂变换得到了讨论。一种是考虑将表格中单元格频率的联合比例提高到已知幂的影响,另一种是研究单元格频率相对分布的幂变换。虽然对应分析图形特征的基础是倒数平均等数值算法和其他类似技术,但关于幂变换在倒数平均中的作用的讨论却未曾涉及。因此,本文在对双向或然表的单元频率进行幂变换时,对这一联系进行了研究。在此过程中,我们证明了在这种变换下可以进行往复平均,从而获得行和列分数,使变量之间的关联度最大,类别之间的区分度最大。最后,我们讨论了在这种幂变换下进行倒数平均和奇异值分解之间的联系。附录中包含了 R 函数 powerRA.exe,它可以对双向或然表的单元频率进行幂变换的倒数平均。
{"title":"Profile transformations for reciprocal averaging and singular value decomposition","authors":"Ting-Wu Wang, Eric J. Beh, Rosaria Lombardo, Ian W. Renner","doi":"10.1007/s00180-024-01517-x","DOIUrl":"https://doi.org/10.1007/s00180-024-01517-x","url":null,"abstract":"<p>Power transformations of count data, including cell frequencies of a contingency table, have been well understood for nearly 100 years, with much of the attention focused on the square root transformation. Over the past 15 years, this topic has been the focus of some new insights into areas of correspondence analysis where two forms of power transformation have been discussed. One type considers the impact of raising the joint proportions of the cell frequencies of a table to a known power while the other examines the power transformation of the relative distribution of the cell frequencies. While the foundations of the graphical features of correspondence analysis rest with the numerical algorithms like reciprocal averaging, and other analogous techniques, discussions of the role of power transformations in reciprocal averaging have not been described. Therefore, this paper examines this link where a power transformation is applied to the cell frequencies of a two-way contingency table. In doing so, we show that reciprocal averaging can be performed under such a transformation to obtain row and column scores that provide the maximum association between the variables and the greatest discrimination between the categories. Finally, we discuss the connection between performing reciprocal averaging and singular value decomposition under this type of power transformation. The <span>R</span> function, <span>powerRA.exe</span> is included in the Appendix and performs reciprocal averaging of a power transformation of the cell frequencies of a two-way contingency table.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"17 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-24DOI: 10.1007/s00180-024-01531-z
Taiane Schaedler Prass, Guilherme Pumi, Cleiton Guollo Taufemback, Jonas Hendler Carlos
This paper discusses dynamic ARMA-type regression models for positive time series, which can handle bounded non-Gaussian time series without requiring data transformations. Our proposed model includes a conditional mean modeled by a dynamic structure containing autoregressive and moving average terms, time-varying covariates, unknown parameters, and link functions. Additionally, we present the PTSR package and discuss partial maximum likelihood estimation, asymptotic theory, hypothesis testing inference, diagnostic analysis, and forecasting for a variety of regression-based dynamic models for positive time series. A Monte Carlo simulation and a real data application are provided.
本文讨论了正时间序列的动态 ARMA 型回归模型,该模型无需数据转换即可处理有界非高斯时间序列。我们提出的模型包括一个由动态结构建模的条件均值,其中包含自回归项和移动平均项、时变协变量、未知参数和链接函数。此外,我们还介绍了 PTSR 软件包,并讨论了各种基于回归的正时间序列动态模型的偏极大似然估计、渐近理论、假设检验推理、诊断分析和预测。此外,还提供了蒙特卡罗模拟和真实数据应用。
{"title":"Positive time series regression models: theoretical and computational aspects","authors":"Taiane Schaedler Prass, Guilherme Pumi, Cleiton Guollo Taufemback, Jonas Hendler Carlos","doi":"10.1007/s00180-024-01531-z","DOIUrl":"https://doi.org/10.1007/s00180-024-01531-z","url":null,"abstract":"<p>This paper discusses dynamic ARMA-type regression models for positive time series, which can handle bounded non-Gaussian time series without requiring data transformations. Our proposed model includes a conditional mean modeled by a dynamic structure containing autoregressive and moving average terms, time-varying covariates, unknown parameters, and link functions. Additionally, we present the <span>PTSR</span> package and discuss partial maximum likelihood estimation, asymptotic theory, hypothesis testing inference, diagnostic analysis, and forecasting for a variety of regression-based dynamic models for positive time series. A Monte Carlo simulation and a real data application are provided.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"40 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.1007/s00180-024-01532-y
Zeytu Gashaw Asfaw, Patrick E. Brown, Jamie Stafford
The study of aggregated data influenced by time, space, and extra changes in geographic region borders was the main emphasis of the current paper. This may occur if the regions used to count the reported incidences of a health outcome over time change periodically. In order to handle the spatial-temporal scenario, we enhance the spatial root-Gaussian Cox Process (RGCP), which makes use of the square-root link function rather than the more typical log-link function. The algorithm’s ability to estimate a risk surface has been proven by a simulation study, and it has also been validated by real datasets.
{"title":"The root-Gaussian Cox Process for spatial-temporal disease mapping with aggregated data","authors":"Zeytu Gashaw Asfaw, Patrick E. Brown, Jamie Stafford","doi":"10.1007/s00180-024-01532-y","DOIUrl":"https://doi.org/10.1007/s00180-024-01532-y","url":null,"abstract":"<p>The study of aggregated data influenced by time, space, and extra changes in geographic region borders was the main emphasis of the current paper. This may occur if the regions used to count the reported incidences of a health outcome over time change periodically. In order to handle the spatial-temporal scenario, we enhance the spatial root-Gaussian Cox Process (RGCP), which makes use of the square-root link function rather than the more typical log-link function. The algorithm’s ability to estimate a risk surface has been proven by a simulation study, and it has also been validated by real datasets.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"24 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.1007/s00180-024-01527-9
Davood Poursina, B. Wade Brorsen
Bayesian Kriging (BK) provides a way to estimate regression models where the parameters are smoothed across space. Such estimates could help guide site-specific fertilizer recommendations. One advantage of BK is that it can readily fill in the missing values that are common in yield monitor data. The problem is that previous methods are too computationally intensive to be commercially feasible when estimating a nonlinear production function. This paper sought to increase computational speed by imposing restrictions on the spatial covariance matrix. Previous research used an exponential function for the spatial covariance matrix. The two alternatives considered are the conditional autoregressive and simultaneous autoregressive models. In addition, a new analytical solution is provided for finding the optimal value of nitrogen with a stochastic linear plateau model. A comparison among models in the accuracy and computational burden shows that the restrictions significantly reduced the computational burden, although they did sacrifice some accuracy in the dataset considered.
{"title":"Site-specific nitrogen recommendation: fast, accurate, and feasible Bayesian kriging","authors":"Davood Poursina, B. Wade Brorsen","doi":"10.1007/s00180-024-01527-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01527-9","url":null,"abstract":"<p>Bayesian Kriging (BK) provides a way to estimate regression models where the parameters are smoothed across space. Such estimates could help guide site-specific fertilizer recommendations. One advantage of BK is that it can readily fill in the missing values that are common in yield monitor data. The problem is that previous methods are too computationally intensive to be commercially feasible when estimating a nonlinear production function. This paper sought to increase computational speed by imposing restrictions on the spatial covariance matrix. Previous research used an exponential function for the spatial covariance matrix. The two alternatives considered are the conditional autoregressive and simultaneous autoregressive models. In addition, a new analytical solution is provided for finding the optimal value of nitrogen with a stochastic linear plateau model. A comparison among models in the accuracy and computational burden shows that the restrictions significantly reduced the computational burden, although they did sacrifice some accuracy in the dataset considered.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"13 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-11DOI: 10.1007/s00180-024-01504-2
Yonghui Liu, Jiawei Lu, Gilberto A. Paula, Shuangzhe Liu
This paper studies a Bayesian local influence method to detect influential observations in a partially linear model with first-order autoregressive skew-normal errors. This method appears suitable for small or moderate-sized data sets ((n=200{sim }400)) and overcomes some theoretical limitations, bridging the diagnostic gap for small or moderate-sized data in classical methods. The MCMC algorithm is employed for parameter estimation, and Bayesian local influence analysis is made using three perturbation schemes (priors, variances, and data) and three measurement scales (Bayes factor, (phi )-divergence, and posterior mean). Simulation studies are conducted to validate the reliability of the diagnostics. Finally, a practical application uses data on the 1976 Los Angeles ozone concentration to further demonstrate the effectiveness of the diagnostics.
{"title":"Bayesian diagnostics in a partially linear model with first-order autoregressive skew-normal errors","authors":"Yonghui Liu, Jiawei Lu, Gilberto A. Paula, Shuangzhe Liu","doi":"10.1007/s00180-024-01504-2","DOIUrl":"https://doi.org/10.1007/s00180-024-01504-2","url":null,"abstract":"<p>This paper studies a Bayesian local influence method to detect influential observations in a partially linear model with first-order autoregressive skew-normal errors. This method appears suitable for small or moderate-sized data sets (<span>(n=200{sim }400)</span>) and overcomes some theoretical limitations, bridging the diagnostic gap for small or moderate-sized data in classical methods. The MCMC algorithm is employed for parameter estimation, and Bayesian local influence analysis is made using three perturbation schemes (priors, variances, and data) and three measurement scales (Bayes factor, <span>(phi )</span>-divergence, and posterior mean). Simulation studies are conducted to validate the reliability of the diagnostics. Finally, a practical application uses data on the 1976 Los Angeles ozone concentration to further demonstrate the effectiveness of the diagnostics.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-10DOI: 10.1007/s00180-024-01526-w
Suthakaran Ratnasingam, Ramadha D. Piyadi Gamage
Quantile regression is an extension of linear regression which estimates a conditional quantile of interest. In this paper, we propose an empirical likelihood-based non-parametric procedure to detect structural changes in the quantile regression models. Further, we have modified the proposed smoothed empirical likelihood-based method using adjusted smoothed empirical likelihood and transformed smoothed empirical likelihood techniques. We have shown that under the null hypothesis, the limiting distribution of the smoothed empirical likelihood ratio test statistic is identical to that of the classical parametric likelihood. Simulations are conducted to investigate the finite sample properties of the proposed methods. Finally, to demonstrate the effectiveness of the proposed method, it is applied to urinary Glycosaminoglycans (GAGs) data to detect structural changes.
{"title":"Empirical likelihood change point detection in quantile regression models","authors":"Suthakaran Ratnasingam, Ramadha D. Piyadi Gamage","doi":"10.1007/s00180-024-01526-w","DOIUrl":"https://doi.org/10.1007/s00180-024-01526-w","url":null,"abstract":"<p>Quantile regression is an extension of linear regression which estimates a conditional quantile of interest. In this paper, we propose an empirical likelihood-based non-parametric procedure to detect structural changes in the quantile regression models. Further, we have modified the proposed smoothed empirical likelihood-based method using adjusted smoothed empirical likelihood and transformed smoothed empirical likelihood techniques. We have shown that under the null hypothesis, the limiting distribution of the smoothed empirical likelihood ratio test statistic is identical to that of the classical parametric likelihood. Simulations are conducted to investigate the finite sample properties of the proposed methods. Finally, to demonstrate the effectiveness of the proposed method, it is applied to urinary Glycosaminoglycans (GAGs) data to detect structural changes.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-05DOI: 10.1007/s00180-024-01524-y
Hang Zou, Xiaowen Huang, Yunlu Jiang
Additive coefficient models generalize linear regression models by assuming that the relationship between the response and some covariates is linear, while their regression coefficients are additive functions. Because of its advantages in dealing with the “curse of dimensionality”, additive coefficient models gain a lot of attention. The commonly used estimation methods for additive coefficient models are not robust against high leverage points. To circumvent this difficulty, we develop a robust variable selection procedure based on the exponential squared loss function and group penalty for the additive coefficient models, which can tackle outliers in the response and covariates simultaneously. Under some regularity conditions, we show that the oracle estimator is a local solution of the proposed method. Furthermore, we apply the local linear approximation and minorization-maximization algorithm for the implementation of the proposed estimator. Meanwhile, we propose a data-driven procedure to select the tuning parameters. Simulation studies and an application to a plasma beta-carotene level data set illustrate that the proposed method can offer more reliable results than other existing methods in contamination schemes.
{"title":"Robust variable selection for additive coefficient models","authors":"Hang Zou, Xiaowen Huang, Yunlu Jiang","doi":"10.1007/s00180-024-01524-y","DOIUrl":"https://doi.org/10.1007/s00180-024-01524-y","url":null,"abstract":"<p>Additive coefficient models generalize linear regression models by assuming that the relationship between the response and some covariates is linear, while their regression coefficients are additive functions. Because of its advantages in dealing with the “curse of dimensionality”, additive coefficient models gain a lot of attention. The commonly used estimation methods for additive coefficient models are not robust against high leverage points. To circumvent this difficulty, we develop a robust variable selection procedure based on the exponential squared loss function and group penalty for the additive coefficient models, which can tackle outliers in the response and covariates simultaneously. Under some regularity conditions, we show that the oracle estimator is a local solution of the proposed method. Furthermore, we apply the local linear approximation and minorization-maximization algorithm for the implementation of the proposed estimator. Meanwhile, we propose a data-driven procedure to select the tuning parameters. Simulation studies and an application to a plasma beta-carotene level data set illustrate that the proposed method can offer more reliable results than other existing methods in contamination schemes.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1007/s00180-024-01521-1
Ting Wang, Liya Fu, Yanan Song
This paper proposes a polynomial structure identification (PSI) method for variable selection and model structure identification of additive models with longitudinal data. First, the backfitting algorithm and zero-order local polynomial smoothing method are used to select important variables in the additive model, and the importance of variables is determined through the inverse of the bandwidth parameter in the nonparametric partial kernel function. Second, the backfitting algorithm and Q-order local polynomial smoothing method are utilized to identify the specific structure of each selected predictor. To incorporate correlations within longitudinal data, a two-stage estimation method is proposed for estimating the regression parameters of the identified important variables: (i) Parameter estimators of the important variables are firstly obtained under an independence working model assumption; (ii) Generalized estimating equations with a working correlation matrix based on B-splines are constructed to obtain the final estimators of the parameters, which improve the efficiency of parameter estimation. Finally, simulation studies are carried out to evaluate the performance of the proposed method, followed by the presentation of two real-world examples for illustration.
{"title":"Variable selection and structure identification for additive models with longitudinal data","authors":"Ting Wang, Liya Fu, Yanan Song","doi":"10.1007/s00180-024-01521-1","DOIUrl":"https://doi.org/10.1007/s00180-024-01521-1","url":null,"abstract":"<p>This paper proposes a polynomial structure identification (PSI) method for variable selection and model structure identification of additive models with longitudinal data. First, the backfitting algorithm and zero-order local polynomial smoothing method are used to select important variables in the additive model, and the importance of variables is determined through the inverse of the bandwidth parameter in the nonparametric partial kernel function. Second, the backfitting algorithm and <i>Q</i>-order local polynomial smoothing method are utilized to identify the specific structure of each selected predictor. To incorporate correlations within longitudinal data, a two-stage estimation method is proposed for estimating the regression parameters of the identified important variables: (i) Parameter estimators of the important variables are firstly obtained under an independence working model assumption; (ii) Generalized estimating equations with a working correlation matrix based on B-splines are constructed to obtain the final estimators of the parameters, which improve the efficiency of parameter estimation. Finally, simulation studies are carried out to evaluate the performance of the proposed method, followed by the presentation of two real-world examples for illustration.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"30 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1007/s00180-024-01518-w
Peter C. Austin
In time-to-event analyses, a competing risk is an event whose occurrence precludes the occurrence of the event of interest. Settings with competing risks occur frequently in clinical research. Missing data, which is a common problem in research, occurs when the value of a variable is recorded for some, but not all, records in the dataset. Multiple Imputation (MI) is a popular method to address the presence of missing data. MI uses an imputation model to generate M (M > 1) values for each variable that is missing, resulting in the creation of M complete datasets. A popular algorithm for imputing missing data is multivariate imputation using chained equations (MICE). We used a complex simulation design with covariates and missing data patterns reflective of patients hospitalized with acute myocardial infarction (AMI) to compare three strategies for imputing missing predictor variables when the analysis model is a cause-specific hazard when there were three different event types. We compared two MICE-based strategies that differed according to which cause-specific cumulative hazard functions were included in the imputation models (the three cause-specific cumulative hazard functions vs. only the cause-specific cumulative hazard function for the primary outcome) with the use of the substantive model compatible fully conditional specification (SMCFCS) algorithm. While no strategy had consistently superior performance compared to the other strategies, SMCFCS may be the preferred strategy. We illustrated the application of the strategies using a case study of patients hospitalized with AMI.
{"title":"Multiple imputation with competing risk outcomes","authors":"Peter C. Austin","doi":"10.1007/s00180-024-01518-w","DOIUrl":"https://doi.org/10.1007/s00180-024-01518-w","url":null,"abstract":"<p>In time-to-event analyses, a competing risk is an event whose occurrence precludes the occurrence of the event of interest. Settings with competing risks occur frequently in clinical research. Missing data, which is a common problem in research, occurs when the value of a variable is recorded for some, but not all, records in the dataset. Multiple Imputation (MI) is a popular method to address the presence of missing data. MI uses an imputation model to generate M (M > 1) values for each variable that is missing, resulting in the creation of M complete datasets. A popular algorithm for imputing missing data is multivariate imputation using chained equations (MICE). We used a complex simulation design with covariates and missing data patterns reflective of patients hospitalized with acute myocardial infarction (AMI) to compare three strategies for imputing missing predictor variables when the analysis model is a cause-specific hazard when there were three different event types. We compared two MICE-based strategies that differed according to which cause-specific cumulative hazard functions were included in the imputation models (the three cause-specific cumulative hazard functions vs. only the cause-specific cumulative hazard function for the primary outcome) with the use of the substantive model compatible fully conditional specification (SMCFCS) algorithm. While no strategy had consistently superior performance compared to the other strategies, SMCFCS may be the preferred strategy. We illustrated the application of the strategies using a case study of patients hospitalized with AMI.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"176 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}