首页 > 最新文献

Computational Statistics最新文献

英文 中文
Semiparametric regression analysis of panel binary data with an informative observation process 具有信息观测过程的面板二元数据的半参数回归分析
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-29 DOI: 10.1007/s00180-024-01528-8
Lei Ge, Yang Li, Jianguo Sun

Panel binary data arise in an event history study when study subjects are observed only at discrete time points instead of continuously and the only available information on the occurrence of the recurrent event of interest is whether the event has occurred over two consecutive observation times or each observation window. Although some methods have been proposed for regression analysis of such data, all of them assume independent observation times or processes, which may not be true sometimes. To address this, we propose a joint modeling procedure that allows for informative observation processes. For the implementation of the proposed method, a computationally efficient EM algorithm is developed and the resulting estimators are consistent and asymptotically normal. The simulation study conducted to assess its performance indicates that it works well in practical situations, and the proposed approach is applied to the motivating data set from the Health and Retirement Study.

事件史研究中会出现面板二元数据,即研究对象只在离散的时间点而不是连续的时间点接受观察,而关于所关注的重复事件发生情况的唯一可用信息是该事件是否在两个连续的观察时间或每个观察窗口中发生。虽然已经提出了一些对此类数据进行回归分析的方法,但所有这些方法都假定观察时间或观察过程是独立的,但有时可能并非如此。为了解决这个问题,我们提出了一种联合建模程序,允许有信息的观测过程。为了实现所提出的方法,我们开发了一种计算效率高的 EM 算法,所得到的估计值具有一致性和渐近正态性。为评估该方法的性能而进行的模拟研究表明,该方法在实际情况下运行良好。
{"title":"Semiparametric regression analysis of panel binary data with an informative observation process","authors":"Lei Ge, Yang Li, Jianguo Sun","doi":"10.1007/s00180-024-01528-8","DOIUrl":"https://doi.org/10.1007/s00180-024-01528-8","url":null,"abstract":"<p>Panel binary data arise in an event history study when study subjects are observed only at discrete time points instead of continuously and the only available information on the occurrence of the recurrent event of interest is whether the event has occurred over two consecutive observation times or each observation window. Although some methods have been proposed for regression analysis of such data, all of them assume independent observation times or processes, which may not be true sometimes. To address this, we propose a joint modeling procedure that allows for informative observation processes. For the implementation of the proposed method, a computationally efficient EM algorithm is developed and the resulting estimators are consistent and asymptotically normal. The simulation study conducted to assess its performance indicates that it works well in practical situations, and the proposed approach is applied to the motivating data set from the Health and Retirement Study.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"44 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141862646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Profile transformations for reciprocal averaging and singular value decomposition 用于倒数平均和奇异值分解的轮廓变换
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-26 DOI: 10.1007/s00180-024-01517-x
Ting-Wu Wang, Eric J. Beh, Rosaria Lombardo, Ian W. Renner

Power transformations of count data, including cell frequencies of a contingency table, have been well understood for nearly 100 years, with much of the attention focused on the square root transformation. Over the past 15 years, this topic has been the focus of some new insights into areas of correspondence analysis where two forms of power transformation have been discussed. One type considers the impact of raising the joint proportions of the cell frequencies of a table to a known power while the other examines the power transformation of the relative distribution of the cell frequencies. While the foundations of the graphical features of correspondence analysis rest with the numerical algorithms like reciprocal averaging, and other analogous techniques, discussions of the role of power transformations in reciprocal averaging have not been described. Therefore, this paper examines this link where a power transformation is applied to the cell frequencies of a two-way contingency table. In doing so, we show that reciprocal averaging can be performed under such a transformation to obtain row and column scores that provide the maximum association between the variables and the greatest discrimination between the categories. Finally, we discuss the connection between performing reciprocal averaging and singular value decomposition under this type of power transformation. The R function, powerRA.exe is included in the Appendix and performs reciprocal averaging of a power transformation of the cell frequencies of a two-way contingency table.

近 100 年来,人们对计数数据(包括或然率表中的单元频率)的幂变换已经有了很好的理解,其中大部分注意力都集中在平方根变换上。在过去的 15 年里,这个话题成为了对应分析领域一些新见解的焦点,其中有两种形式的幂变换得到了讨论。一种是考虑将表格中单元格频率的联合比例提高到已知幂的影响,另一种是研究单元格频率相对分布的幂变换。虽然对应分析图形特征的基础是倒数平均等数值算法和其他类似技术,但关于幂变换在倒数平均中的作用的讨论却未曾涉及。因此,本文在对双向或然表的单元频率进行幂变换时,对这一联系进行了研究。在此过程中,我们证明了在这种变换下可以进行往复平均,从而获得行和列分数,使变量之间的关联度最大,类别之间的区分度最大。最后,我们讨论了在这种幂变换下进行倒数平均和奇异值分解之间的联系。附录中包含了 R 函数 powerRA.exe,它可以对双向或然表的单元频率进行幂变换的倒数平均。
{"title":"Profile transformations for reciprocal averaging and singular value decomposition","authors":"Ting-Wu Wang, Eric J. Beh, Rosaria Lombardo, Ian W. Renner","doi":"10.1007/s00180-024-01517-x","DOIUrl":"https://doi.org/10.1007/s00180-024-01517-x","url":null,"abstract":"<p>Power transformations of count data, including cell frequencies of a contingency table, have been well understood for nearly 100 years, with much of the attention focused on the square root transformation. Over the past 15 years, this topic has been the focus of some new insights into areas of correspondence analysis where two forms of power transformation have been discussed. One type considers the impact of raising the joint proportions of the cell frequencies of a table to a known power while the other examines the power transformation of the relative distribution of the cell frequencies. While the foundations of the graphical features of correspondence analysis rest with the numerical algorithms like reciprocal averaging, and other analogous techniques, discussions of the role of power transformations in reciprocal averaging have not been described. Therefore, this paper examines this link where a power transformation is applied to the cell frequencies of a two-way contingency table. In doing so, we show that reciprocal averaging can be performed under such a transformation to obtain row and column scores that provide the maximum association between the variables and the greatest discrimination between the categories. Finally, we discuss the connection between performing reciprocal averaging and singular value decomposition under this type of power transformation. The <span>R</span> function, <span>powerRA.exe</span> is included in the Appendix and performs reciprocal averaging of a power transformation of the cell frequencies of a two-way contingency table.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"17 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Positive time series regression models: theoretical and computational aspects 正时间序列回归模型:理论与计算方面
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-24 DOI: 10.1007/s00180-024-01531-z
Taiane Schaedler Prass, Guilherme Pumi, Cleiton Guollo Taufemback, Jonas Hendler Carlos

This paper discusses dynamic ARMA-type regression models for positive time series, which can handle bounded non-Gaussian time series without requiring data transformations. Our proposed model includes a conditional mean modeled by a dynamic structure containing autoregressive and moving average terms, time-varying covariates, unknown parameters, and link functions. Additionally, we present the PTSR package and discuss partial maximum likelihood estimation, asymptotic theory, hypothesis testing inference, diagnostic analysis, and forecasting for a variety of regression-based dynamic models for positive time series. A Monte Carlo simulation and a real data application are provided.

本文讨论了正时间序列的动态 ARMA 型回归模型,该模型无需数据转换即可处理有界非高斯时间序列。我们提出的模型包括一个由动态结构建模的条件均值,其中包含自回归项和移动平均项、时变协变量、未知参数和链接函数。此外,我们还介绍了 PTSR 软件包,并讨论了各种基于回归的正时间序列动态模型的偏极大似然估计、渐近理论、假设检验推理、诊断分析和预测。此外,还提供了蒙特卡罗模拟和真实数据应用。
{"title":"Positive time series regression models: theoretical and computational aspects","authors":"Taiane Schaedler Prass, Guilherme Pumi, Cleiton Guollo Taufemback, Jonas Hendler Carlos","doi":"10.1007/s00180-024-01531-z","DOIUrl":"https://doi.org/10.1007/s00180-024-01531-z","url":null,"abstract":"<p>This paper discusses dynamic ARMA-type regression models for positive time series, which can handle bounded non-Gaussian time series without requiring data transformations. Our proposed model includes a conditional mean modeled by a dynamic structure containing autoregressive and moving average terms, time-varying covariates, unknown parameters, and link functions. Additionally, we present the <span>PTSR</span> package and discuss partial maximum likelihood estimation, asymptotic theory, hypothesis testing inference, diagnostic analysis, and forecasting for a variety of regression-based dynamic models for positive time series. A Monte Carlo simulation and a real data application are provided.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"40 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The root-Gaussian Cox Process for spatial-temporal disease mapping with aggregated data 根高斯考克斯过程(root-Gaussian Cox Process):利用汇总数据绘制时空疾病图谱
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-18 DOI: 10.1007/s00180-024-01532-y
Zeytu Gashaw Asfaw, Patrick E. Brown, Jamie Stafford

The study of aggregated data influenced by time, space, and extra changes in geographic region borders was the main emphasis of the current paper. This may occur if the regions used to count the reported incidences of a health outcome over time change periodically. In order to handle the spatial-temporal scenario, we enhance the spatial root-Gaussian Cox Process (RGCP), which makes use of the square-root link function rather than the more typical log-link function. The algorithm’s ability to estimate a risk surface has been proven by a simulation study, and it has also been validated by real datasets.

本文的重点是研究受时间、空间和地理区域边界额外变化影响的汇总数据。如果用于统计健康结果报告发病率的地区随时间发生周期性变化,就可能出现这种情况。为了处理时空情景,我们改进了空间根高斯考克斯过程(RGCP),该过程使用平方根链接函数,而不是更典型的对数链接函数。该算法估计风险面的能力已通过模拟研究得到证实,并通过真实数据集得到验证。
{"title":"The root-Gaussian Cox Process for spatial-temporal disease mapping with aggregated data","authors":"Zeytu Gashaw Asfaw, Patrick E. Brown, Jamie Stafford","doi":"10.1007/s00180-024-01532-y","DOIUrl":"https://doi.org/10.1007/s00180-024-01532-y","url":null,"abstract":"<p>The study of aggregated data influenced by time, space, and extra changes in geographic region borders was the main emphasis of the current paper. This may occur if the regions used to count the reported incidences of a health outcome over time change periodically. In order to handle the spatial-temporal scenario, we enhance the spatial root-Gaussian Cox Process (RGCP), which makes use of the square-root link function rather than the more typical log-link function. The algorithm’s ability to estimate a risk surface has been proven by a simulation study, and it has also been validated by real datasets.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"24 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Site-specific nitrogen recommendation: fast, accurate, and feasible Bayesian kriging 针对具体地点的氮推荐:快速、准确、可行的贝叶斯克里金法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-18 DOI: 10.1007/s00180-024-01527-9
Davood Poursina, B. Wade Brorsen

Bayesian Kriging (BK) provides a way to estimate regression models where the parameters are smoothed across space. Such estimates could help guide site-specific fertilizer recommendations. One advantage of BK is that it can readily fill in the missing values that are common in yield monitor data. The problem is that previous methods are too computationally intensive to be commercially feasible when estimating a nonlinear production function. This paper sought to increase computational speed by imposing restrictions on the spatial covariance matrix. Previous research used an exponential function for the spatial covariance matrix. The two alternatives considered are the conditional autoregressive and simultaneous autoregressive models. In addition, a new analytical solution is provided for finding the optimal value of nitrogen with a stochastic linear plateau model. A comparison among models in the accuracy and computational burden shows that the restrictions significantly reduced the computational burden, although they did sacrifice some accuracy in the dataset considered.

贝叶斯克里金法(BK)提供了一种估计回归模型的方法,其中的参数在空间上被平滑处理。这种估计有助于指导针对具体地点的施肥建议。贝叶斯克里金法的一个优点是,它可以随时填补产量监测数据中常见的缺失值。问题在于,以前的方法计算量过大,在估算非线性生产函数时不具有商业可行性。本文试图通过对空间协方差矩阵施加限制来提高计算速度。以前的研究使用指数函数来计算空间协方差矩阵。考虑的两种替代方法是条件自回归模型和同步自回归模型。此外,还为利用随机线性高原模型寻找氮的最佳值提供了新的分析解决方案。对各种模型的准确性和计算负担进行比较后发现,限制条件大大减轻了计算负担,但在所考虑的数据集中牺牲了一些准确性。
{"title":"Site-specific nitrogen recommendation: fast, accurate, and feasible Bayesian kriging","authors":"Davood Poursina, B. Wade Brorsen","doi":"10.1007/s00180-024-01527-9","DOIUrl":"https://doi.org/10.1007/s00180-024-01527-9","url":null,"abstract":"<p>Bayesian Kriging (BK) provides a way to estimate regression models where the parameters are smoothed across space. Such estimates could help guide site-specific fertilizer recommendations. One advantage of BK is that it can readily fill in the missing values that are common in yield monitor data. The problem is that previous methods are too computationally intensive to be commercially feasible when estimating a nonlinear production function. This paper sought to increase computational speed by imposing restrictions on the spatial covariance matrix. Previous research used an exponential function for the spatial covariance matrix. The two alternatives considered are the conditional autoregressive and simultaneous autoregressive models. In addition, a new analytical solution is provided for finding the optimal value of nitrogen with a stochastic linear plateau model. A comparison among models in the accuracy and computational burden shows that the restrictions significantly reduced the computational burden, although they did sacrifice some accuracy in the dataset considered.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"13 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian diagnostics in a partially linear model with first-order autoregressive skew-normal errors 具有一阶自回归偏态误差的部分线性模型的贝叶斯诊断法
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-11 DOI: 10.1007/s00180-024-01504-2
Yonghui Liu, Jiawei Lu, Gilberto A. Paula, Shuangzhe Liu

This paper studies a Bayesian local influence method to detect influential observations in a partially linear model with first-order autoregressive skew-normal errors. This method appears suitable for small or moderate-sized data sets ((n=200{sim }400)) and overcomes some theoretical limitations, bridging the diagnostic gap for small or moderate-sized data in classical methods. The MCMC algorithm is employed for parameter estimation, and Bayesian local influence analysis is made using three perturbation schemes (priors, variances, and data) and three measurement scales (Bayes factor, (phi )-divergence, and posterior mean). Simulation studies are conducted to validate the reliability of the diagnostics. Finally, a practical application uses data on the 1976 Los Angeles ozone concentration to further demonstrate the effectiveness of the diagnostics.

本文研究了一种贝叶斯局部影响方法,用于在具有一阶自回归偏态误差的部分线性模型中检测有影响的观测值。该方法适用于中小型数据集(n=200{/sim }400),并克服了一些理论限制,弥补了经典方法在中小型数据诊断方面的不足。采用 MCMC 算法进行参数估计,并使用三种扰动方案(先验、方差和数据)和三种测量尺度(贝叶斯因子、(phi )-发散和后验均值)进行贝叶斯局部影响分析。模拟研究验证了诊断的可靠性。最后,利用 1976 年洛杉矶臭氧浓度的数据进行了实际应用,进一步证明了诊断方法的有效性。
{"title":"Bayesian diagnostics in a partially linear model with first-order autoregressive skew-normal errors","authors":"Yonghui Liu, Jiawei Lu, Gilberto A. Paula, Shuangzhe Liu","doi":"10.1007/s00180-024-01504-2","DOIUrl":"https://doi.org/10.1007/s00180-024-01504-2","url":null,"abstract":"<p>This paper studies a Bayesian local influence method to detect influential observations in a partially linear model with first-order autoregressive skew-normal errors. This method appears suitable for small or moderate-sized data sets (<span>(n=200{sim }400)</span>) and overcomes some theoretical limitations, bridging the diagnostic gap for small or moderate-sized data in classical methods. The MCMC algorithm is employed for parameter estimation, and Bayesian local influence analysis is made using three perturbation schemes (priors, variances, and data) and three measurement scales (Bayes factor, <span>(phi )</span>-divergence, and posterior mean). Simulation studies are conducted to validate the reliability of the diagnostics. Finally, a practical application uses data on the 1976 Los Angeles ozone concentration to further demonstrate the effectiveness of the diagnostics.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"20 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141613205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical likelihood change point detection in quantile regression models 量子回归模型中的经验似然变化点检测
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-10 DOI: 10.1007/s00180-024-01526-w
Suthakaran Ratnasingam, Ramadha D. Piyadi Gamage

Quantile regression is an extension of linear regression which estimates a conditional quantile of interest. In this paper, we propose an empirical likelihood-based non-parametric procedure to detect structural changes in the quantile regression models. Further, we have modified the proposed smoothed empirical likelihood-based method using adjusted smoothed empirical likelihood and transformed smoothed empirical likelihood techniques. We have shown that under the null hypothesis, the limiting distribution of the smoothed empirical likelihood ratio test statistic is identical to that of the classical parametric likelihood. Simulations are conducted to investigate the finite sample properties of the proposed methods. Finally, to demonstrate the effectiveness of the proposed method, it is applied to urinary Glycosaminoglycans (GAGs) data to detect structural changes.

量子回归是线性回归的一种扩展,它估计的是感兴趣的条件量子。在本文中,我们提出了一种基于经验似然的非参数程序,用于检测量值回归模型中的结构变化。此外,我们还利用调整平滑经验似然和变换平滑经验似然技术对所提出的基于平滑经验似然的方法进行了改进。我们证明,在零假设下,平滑经验似然比检验统计量的极限分布与经典参数似然的极限分布相同。我们还进行了模拟,以研究拟议方法的有限样本特性。最后,为了证明所提方法的有效性,我们将其应用于尿液糖胺聚糖(GAGs)数据的结构变化检测。
{"title":"Empirical likelihood change point detection in quantile regression models","authors":"Suthakaran Ratnasingam, Ramadha D. Piyadi Gamage","doi":"10.1007/s00180-024-01526-w","DOIUrl":"https://doi.org/10.1007/s00180-024-01526-w","url":null,"abstract":"<p>Quantile regression is an extension of linear regression which estimates a conditional quantile of interest. In this paper, we propose an empirical likelihood-based non-parametric procedure to detect structural changes in the quantile regression models. Further, we have modified the proposed smoothed empirical likelihood-based method using adjusted smoothed empirical likelihood and transformed smoothed empirical likelihood techniques. We have shown that under the null hypothesis, the limiting distribution of the smoothed empirical likelihood ratio test statistic is identical to that of the classical parametric likelihood. Simulations are conducted to investigate the finite sample properties of the proposed methods. Finally, to demonstrate the effectiveness of the proposed method, it is applied to urinary Glycosaminoglycans (GAGs) data to detect structural changes.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust variable selection for additive coefficient models 加法系数模型的稳健变量选择
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-07-05 DOI: 10.1007/s00180-024-01524-y
Hang Zou, Xiaowen Huang, Yunlu Jiang

Additive coefficient models generalize linear regression models by assuming that the relationship between the response and some covariates is linear, while their regression coefficients are additive functions. Because of its advantages in dealing with the “curse of dimensionality”, additive coefficient models gain a lot of attention. The commonly used estimation methods for additive coefficient models are not robust against high leverage points. To circumvent this difficulty, we develop a robust variable selection procedure based on the exponential squared loss function and group penalty for the additive coefficient models, which can tackle outliers in the response and covariates simultaneously. Under some regularity conditions, we show that the oracle estimator is a local solution of the proposed method. Furthermore, we apply the local linear approximation and minorization-maximization algorithm for the implementation of the proposed estimator. Meanwhile, we propose a data-driven procedure to select the tuning parameters. Simulation studies and an application to a plasma beta-carotene level data set illustrate that the proposed method can offer more reliable results than other existing methods in contamination schemes.

加法系数模型是对线性回归模型的概括,它假设响应与某些协变量之间是线性关系,而其回归系数是加法函数。由于其在处理 "维度诅咒 "方面的优势,加系数模型受到了广泛关注。常用的加法系数模型估计方法对高杠杆点并不稳健。为了规避这一难题,我们开发了一种基于指数平方损失函数和组惩罚的加法系数模型稳健变量选择程序,可以同时处理响应和协变量中的异常值。在一些正则性条件下,我们证明了oracle估计器是所提方法的局部解。此外,我们还应用了局部线性近似和最小化-最大化算法来实现所提出的估计器。同时,我们提出了一种数据驱动程序来选择调整参数。模拟研究和血浆β-胡萝卜素水平数据集的应用表明,与其他现有的污染方案方法相比,建议的方法能提供更可靠的结果。
{"title":"Robust variable selection for additive coefficient models","authors":"Hang Zou, Xiaowen Huang, Yunlu Jiang","doi":"10.1007/s00180-024-01524-y","DOIUrl":"https://doi.org/10.1007/s00180-024-01524-y","url":null,"abstract":"<p>Additive coefficient models generalize linear regression models by assuming that the relationship between the response and some covariates is linear, while their regression coefficients are additive functions. Because of its advantages in dealing with the “curse of dimensionality”, additive coefficient models gain a lot of attention. The commonly used estimation methods for additive coefficient models are not robust against high leverage points. To circumvent this difficulty, we develop a robust variable selection procedure based on the exponential squared loss function and group penalty for the additive coefficient models, which can tackle outliers in the response and covariates simultaneously. Under some regularity conditions, we show that the oracle estimator is a local solution of the proposed method. Furthermore, we apply the local linear approximation and minorization-maximization algorithm for the implementation of the proposed estimator. Meanwhile, we propose a data-driven procedure to select the tuning parameters. Simulation studies and an application to a plasma beta-carotene level data set illustrate that the proposed method can offer more reliable results than other existing methods in contamination schemes.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable selection and structure identification for additive models with longitudinal data 纵向数据加法模型的变量选择和结构识别
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-06-26 DOI: 10.1007/s00180-024-01521-1
Ting Wang, Liya Fu, Yanan Song

This paper proposes a polynomial structure identification (PSI) method for variable selection and model structure identification of additive models with longitudinal data. First, the backfitting algorithm and zero-order local polynomial smoothing method are used to select important variables in the additive model, and the importance of variables is determined through the inverse of the bandwidth parameter in the nonparametric partial kernel function. Second, the backfitting algorithm and Q-order local polynomial smoothing method are utilized to identify the specific structure of each selected predictor. To incorporate correlations within longitudinal data, a two-stage estimation method is proposed for estimating the regression parameters of the identified important variables: (i) Parameter estimators of the important variables are firstly obtained under an independence working model assumption; (ii) Generalized estimating equations with a working correlation matrix based on B-splines are constructed to obtain the final estimators of the parameters, which improve the efficiency of parameter estimation. Finally, simulation studies are carried out to evaluate the performance of the proposed method, followed by the presentation of two real-world examples for illustration.

本文提出了一种多项式结构识别(PSI)方法,用于纵向数据加法模型的变量选择和模型结构识别。首先,利用反拟合算法和零阶局部多项式平滑法来选择加法模型中的重要变量,并通过非参数偏核函数中带宽参数的倒数来确定变量的重要性。其次,利用反拟合算法和 Q 阶局部多项式平滑法来确定每个选定预测因子的具体结构。为了将纵向数据中的相关性考虑在内,提出了一种两阶段估计方法来估计所确定的重要变量的回归参数:(i) 首先在独立工作模型假设下得到重要变量的参数估计值;(ii) 基于 B-样条曲线构建具有工作相关矩阵的广义估计方程,得到最终的参数估计值,从而提高参数估计的效率。最后,通过模拟研究评估了所提方法的性能,并列举了两个实际案例进行说明。
{"title":"Variable selection and structure identification for additive models with longitudinal data","authors":"Ting Wang, Liya Fu, Yanan Song","doi":"10.1007/s00180-024-01521-1","DOIUrl":"https://doi.org/10.1007/s00180-024-01521-1","url":null,"abstract":"<p>This paper proposes a polynomial structure identification (PSI) method for variable selection and model structure identification of additive models with longitudinal data. First, the backfitting algorithm and zero-order local polynomial smoothing method are used to select important variables in the additive model, and the importance of variables is determined through the inverse of the bandwidth parameter in the nonparametric partial kernel function. Second, the backfitting algorithm and <i>Q</i>-order local polynomial smoothing method are utilized to identify the specific structure of each selected predictor. To incorporate correlations within longitudinal data, a two-stage estimation method is proposed for estimating the regression parameters of the identified important variables: (i) Parameter estimators of the important variables are firstly obtained under an independence working model assumption; (ii) Generalized estimating equations with a working correlation matrix based on B-splines are constructed to obtain the final estimators of the parameters, which improve the efficiency of parameter estimation. Finally, simulation studies are carried out to evaluate the performance of the proposed method, followed by the presentation of two real-world examples for illustration.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"30 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiple imputation with competing risk outcomes 具有竞争风险结果的多重估算
IF 1.3 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-06-26 DOI: 10.1007/s00180-024-01518-w
Peter C. Austin

In time-to-event analyses, a competing risk is an event whose occurrence precludes the occurrence of the event of interest. Settings with competing risks occur frequently in clinical research. Missing data, which is a common problem in research, occurs when the value of a variable is recorded for some, but not all, records in the dataset. Multiple Imputation (MI) is a popular method to address the presence of missing data. MI uses an imputation model to generate M (M > 1) values for each variable that is missing, resulting in the creation of M complete datasets. A popular algorithm for imputing missing data is multivariate imputation using chained equations (MICE). We used a complex simulation design with covariates and missing data patterns reflective of patients hospitalized with acute myocardial infarction (AMI) to compare three strategies for imputing missing predictor variables when the analysis model is a cause-specific hazard when there were three different event types. We compared two MICE-based strategies that differed according to which cause-specific cumulative hazard functions were included in the imputation models (the three cause-specific cumulative hazard functions vs. only the cause-specific cumulative hazard function for the primary outcome) with the use of the substantive model compatible fully conditional specification (SMCFCS) algorithm. While no strategy had consistently superior performance compared to the other strategies, SMCFCS may be the preferred strategy. We illustrated the application of the strategies using a case study of patients hospitalized with AMI.

在时间到事件分析中,竞争风险是指其发生排除了相关事件发生的事件。临床研究中经常出现有竞争风险的情况。缺失数据是研究中的一个常见问题,当数据集中的某些记录记录了变量值,但并非所有记录都记录了变量值时,就会出现缺失数据。多重估算(MI)是解决数据缺失问题的常用方法。多重估算使用估算模型为每个缺失变量生成 M(M > 1)个值,从而创建 M 个完整的数据集。一种流行的缺失数据归因算法是使用链式方程的多变量归因(MICE)。我们使用了一个复杂的模拟设计,其中的协变量和缺失数据模式反映了急性心肌梗死(AMI)住院患者的情况,比较了在分析模型为特定病因危险时,当有三种不同的事件类型时,对缺失的预测变量进行归因的三种策略。我们比较了两种基于 MICE 的策略,这两种策略的不同之处在于,在使用实质性模型兼容全条件规范 (SMCFCS) 算法的情况下,归因模型中包含了哪些特定病因累积危险函数(三个特定病因累积危险函数与仅包含主要结果的特定病因累积危险函数)。虽然与其他策略相比,没有一种策略具有持续的优越性,但 SMCFCS 可能是首选策略。我们通过对急性心肌梗死住院患者的病例研究来说明这些策略的应用。
{"title":"Multiple imputation with competing risk outcomes","authors":"Peter C. Austin","doi":"10.1007/s00180-024-01518-w","DOIUrl":"https://doi.org/10.1007/s00180-024-01518-w","url":null,"abstract":"<p>In time-to-event analyses, a competing risk is an event whose occurrence precludes the occurrence of the event of interest. Settings with competing risks occur frequently in clinical research. Missing data, which is a common problem in research, occurs when the value of a variable is recorded for some, but not all, records in the dataset. Multiple Imputation (MI) is a popular method to address the presence of missing data. MI uses an imputation model to generate M (M &gt; 1) values for each variable that is missing, resulting in the creation of M complete datasets. A popular algorithm for imputing missing data is multivariate imputation using chained equations (MICE). We used a complex simulation design with covariates and missing data patterns reflective of patients hospitalized with acute myocardial infarction (AMI) to compare three strategies for imputing missing predictor variables when the analysis model is a cause-specific hazard when there were three different event types. We compared two MICE-based strategies that differed according to which cause-specific cumulative hazard functions were included in the imputation models (the three cause-specific cumulative hazard functions vs. only the cause-specific cumulative hazard function for the primary outcome) with the use of the substantive model compatible fully conditional specification (SMCFCS) algorithm. While no strategy had consistently superior performance compared to the other strategies, SMCFCS may be the preferred strategy. We illustrated the application of the strategies using a case study of patients hospitalized with AMI.</p>","PeriodicalId":55223,"journal":{"name":"Computational Statistics","volume":"176 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1