Pub Date : 2024-08-22DOI: 10.1016/j.csda.2024.108042
Tak-Shing T. Chan, Alex Gibberd
Real-world inputs to principal component analysis are often corrupted by temporally or spatially correlated errors. There are several methods to mitigate this, e.g., generalized least-square matrix decomposition and maximum likelihood approaches; however, they all require that the number of components or the error covariances to be known in advance, rendering the methods infeasible. To address this issue, a novel method is developed which estimates the number of components and the error covariances at the same time. The method is based on working covariance models, an idea adapted from generalized estimating equations, where the user only specifies the structural form of the error covariances. If the structural form is also unknown, working covariance selection can be used to search for the best structure from a user-defined library. Experiments on synthetic and real data confirm the efficacy of the proposed approach.
{"title":"Feasible model-based principal component analysis: Joint estimation of rank and error covariance matrix","authors":"Tak-Shing T. Chan, Alex Gibberd","doi":"10.1016/j.csda.2024.108042","DOIUrl":"10.1016/j.csda.2024.108042","url":null,"abstract":"<div><p>Real-world inputs to principal component analysis are often corrupted by temporally or spatially correlated errors. There are several methods to mitigate this, e.g., generalized least-square matrix decomposition and maximum likelihood approaches; however, they all require that the number of components or the error covariances to be known in advance, rendering the methods infeasible. To address this issue, a novel method is developed which estimates the number of components and the error covariances at the same time. The method is based on working covariance models, an idea adapted from generalized estimating equations, where the user only specifies the structural form of the error covariances. If the structural form is also unknown, working covariance selection can be used to search for the best structure from a user-defined library. Experiments on synthetic and real data confirm the efficacy of the proposed approach.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108042"},"PeriodicalIF":1.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001269/pdfft?md5=ac444320856de4406b797dc038c23d54&pid=1-s2.0-S0167947324001269-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1016/j.csda.2024.108036
Peter Lenk , Jangwon Lee , Dongu Han , Jichan Park , Taeryon Choi
We propose a hierarchical Bayesian (HB) model for multi-group analysis with group–specific, flexible regression functions. The lower–level (within group) and upper–level (between groups) regression functions have hierarchical Gaussian process priors. HB smoothing priors are developed for the spectral coefficients. The HB priors smooth the estimated functions within and between groups. The HB model is particularly useful when data within groups are sparse because it shares information across groups, and provides more accurate estimates than fitting separate nonparametric models to each group. The proposed model also allows shape constraints, such as monotone, U and S–shaped, and multi-modal constraints. When appropriate, shape constraints improve estimation by recognizing violations of the shape constraints as noise. The model is illustrated by two examples: monotone growth curves for children, and happiness as a convex, U-shaped function of age in multiple countries. Various basis functions could also be used, and the paper also implements versions with B-splines and orthogonal polynomials.
我们提出了一种分层贝叶斯(HB)模型,用于多组分析,具有针对特定组的灵活回归函数。下层(组内)和上层(组间)回归函数具有分层高斯过程先验。为频谱系数开发了 HB 平滑先验。HB 先验可平滑组内和组间的估计函数。在组内数据稀少的情况下,HB 模型尤其有用,因为它可以共享各组间的信息,并且比为每个组分别拟合非参数模型提供更精确的估计值。建议的模型还允许形状约束,如单调、U 形和 S 形以及多模式约束。在适当的情况下,形状约束可将违反形状约束的行为视为噪声,从而改进估计结果。该模型通过两个例子进行了说明:儿童的单调增长曲线,以及多个国家的幸福感与年龄的凸 U 型函数。还可以使用各种基函数,本文还使用 B-样条函数和正交多项式实现了各种版本。
{"title":"Hierarchical Bayesian spectral regression with shape constraints for multi-group data","authors":"Peter Lenk , Jangwon Lee , Dongu Han , Jichan Park , Taeryon Choi","doi":"10.1016/j.csda.2024.108036","DOIUrl":"10.1016/j.csda.2024.108036","url":null,"abstract":"<div><p>We propose a hierarchical Bayesian (HB) model for multi-group analysis with group–specific, flexible regression functions. The lower–level (within group) and upper–level (between groups) regression functions have hierarchical Gaussian process priors. HB smoothing priors are developed for the spectral coefficients. The HB priors smooth the estimated functions within and between groups. The HB model is particularly useful when data within groups are sparse because it shares information across groups, and provides more accurate estimates than fitting separate nonparametric models to each group. The proposed model also allows shape constraints, such as monotone, U and S–shaped, and multi-modal constraints. When appropriate, shape constraints improve estimation by recognizing violations of the shape constraints as noise. The model is illustrated by two examples: monotone growth curves for children, and happiness as a convex, U-shaped function of age in multiple countries. Various basis functions could also be used, and the paper also implements versions with B-splines and orthogonal polynomials.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"200 ","pages":"Article 108036"},"PeriodicalIF":1.5,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141979432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-31DOI: 10.1016/j.csda.2024.108028
Mathias Born , Peter Goos
Completely randomized designs are often infeasible due to the hard-to-change nature of one or more experimental factors. In those cases, restrictions are imposed on the order of the experimental tests. The resulting experimental designs are often split-plot or split-split-plot designs in which the levels of certain hard-to-change factors are varied only a limited number of times. In agricultural machinery optimization, the number of hard-to-change factors is so large and the available time for experimentation is so short that split-plot or split-split-plot designs are infeasible as well. The only feasible kinds of designs are generalizations of split-split-plot designs, which are referred to as splitk-designs, where k is larger than 2. The coordinate-exchange algorithm is extended to construct optimal splitk-plot designs and the added value of the algorithm is demonstrated by applying it to an experiment involving a self propelled forage harvester. The optimal design generated using the extended algorithm is substantially more efficient than the design that was actually used. Update formulas for the determinant and the inverse of the information matrix speed up the coordinate-exchange algorithm, making it feasible for large designs.
由于一个或多个实验因素难以改变,完全随机化设计往往是不可行的。在这种情况下,就需要限制实验测试的顺序。由此产生的实验设计通常是分割图或分割-分割-图设计,其中某些难以改变的因素的水平只变化有限的次数。在农业机械优化中,难以改变的因素数量非常多,而可用于试验的时间非常短,因此分割图或分割-分割-图设计也是不可行的。唯一可行的设计是分割-分割-绘图设计的一般化,称为分割 k-设计,其中 k 大于 2。坐标交换算法被扩展用于构建最佳分割 k-绘图设计,并通过应用于涉及自走式牧草收割机的实验来证明该算法的附加值。使用扩展算法生成的最优设计比实际使用的设计更有效。行列式和信息矩阵逆的更新公式加快了坐标交换算法的速度,使其适用于大型设计。
{"title":"Optimal splitk-plot designs","authors":"Mathias Born , Peter Goos","doi":"10.1016/j.csda.2024.108028","DOIUrl":"10.1016/j.csda.2024.108028","url":null,"abstract":"<div><p>Completely randomized designs are often infeasible due to the hard-to-change nature of one or more experimental factors. In those cases, restrictions are imposed on the order of the experimental tests. The resulting experimental designs are often split-plot or split-split-plot designs in which the levels of certain hard-to-change factors are varied only a limited number of times. In agricultural machinery optimization, the number of hard-to-change factors is so large and the available time for experimentation is so short that split-plot or split-split-plot designs are infeasible as well. The only feasible kinds of designs are generalizations of split-split-plot designs, which are referred to as split<sup><em>k</em></sup>-designs, where <em>k</em> is larger than 2. The coordinate-exchange algorithm is extended to construct optimal split<sup><em>k</em></sup>-plot designs and the added value of the algorithm is demonstrated by applying it to an experiment involving a self propelled forage harvester. The optimal design generated using the extended algorithm is substantially more efficient than the design that was actually used. Update formulas for the determinant and the inverse of the information matrix speed up the coordinate-exchange algorithm, making it feasible for large designs.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"201 ","pages":"Article 108028"},"PeriodicalIF":1.5,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001129/pdfft?md5=a6856543c46f3f3fa3089527fd43efb7&pid=1-s2.0-S0167947324001129-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142075844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-29DOI: 10.1016/j.csda.2024.108027
Ida Bauer , Harry Haupt , Stefan Linner
An algorithm for boosting regression quantiles using asymmetric least absolute deviations, better known as pinball loss, is proposed. Existing approaches for boosting regression quantiles are essentially equal to least squares boosting of regression means with the single difference that their working residuals are based on pinball loss. All steps of our boosting algorithm are embedded in the well-established framework of quantile regression, and its main components – sequential base learning, fitting, and updating – are based on consistent scoring rules for regression quantiles. The Monte Carlo simulations performed indicate that the pinball boosting algorithm is competitive with existing approaches for boosting regression quantiles in terms of estimation accuracy and variable selection, and that its application to the study of regression quantiles of hedonic price functions allows the estimation of previously infeasible high-dimensional specifications.
{"title":"Pinball boosting of regression quantiles","authors":"Ida Bauer , Harry Haupt , Stefan Linner","doi":"10.1016/j.csda.2024.108027","DOIUrl":"10.1016/j.csda.2024.108027","url":null,"abstract":"<div><p>An algorithm for boosting regression quantiles using asymmetric least absolute deviations, better known as pinball loss, is proposed. Existing approaches for boosting regression quantiles are essentially equal to least squares boosting of regression means with the single difference that their working residuals are based on pinball loss. All steps of our boosting algorithm are embedded in the well-established framework of quantile regression, and its main components – sequential base learning, fitting, and updating – are based on consistent scoring rules for regression quantiles. The Monte Carlo simulations performed indicate that the pinball boosting algorithm is competitive with existing approaches for boosting regression quantiles in terms of estimation accuracy and variable selection, and that its application to the study of regression quantiles of hedonic price functions allows the estimation of previously infeasible high-dimensional specifications.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"200 ","pages":"Article 108027"},"PeriodicalIF":1.5,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001117/pdfft?md5=a5bb1b64a0df9825011d53531f3280e4&pid=1-s2.0-S0167947324001117-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-25DOI: 10.1016/j.csda.2024.108016
Mehrdad Naderi , Mostafa Tamandi , Elham Mirfarah , Wan-Lun Wang , Tsung-I Lin
With the steady growth of computer technologies, the application of statistical techniques to analyze extensive datasets has garnered substantial attention. The analysis of three-way (matrix-variate) data has emerged as a burgeoning field that has inspired statisticians in recent years to develop novel analytical methods. This paper introduces a unified finite mixture model that relies on the mean-mixture of matrix-variate normal distributions. The strength of our proposed model lies in its capability to capture and cluster a wide range of three-way data that exhibit heterogeneous, asymmetric and leptokurtic features. A computationally feasible ECME algorithm is developed to compute the maximum likelihood (ML) estimates. Numerous simulation studies are conducted to investigate the asymptotic properties of the ML estimators, validate the effectiveness of the Bayesian information criterion in selecting the appropriate model, and assess the classification ability in presence of contaminated noise. The utility of the proposed methodology is demonstrated by analyzing a real-life data example.
{"title":"Three-way data clustering based on the mean-mixture of matrix-variate normal distributions","authors":"Mehrdad Naderi , Mostafa Tamandi , Elham Mirfarah , Wan-Lun Wang , Tsung-I Lin","doi":"10.1016/j.csda.2024.108016","DOIUrl":"10.1016/j.csda.2024.108016","url":null,"abstract":"<div><p>With the steady growth of computer technologies, the application of statistical techniques to analyze extensive datasets has garnered substantial attention. The analysis of three-way (matrix-variate) data has emerged as a burgeoning field that has inspired statisticians in recent years to develop novel analytical methods. This paper introduces a unified finite mixture model that relies on the mean-mixture of matrix-variate normal distributions. The strength of our proposed model lies in its capability to capture and cluster a wide range of three-way data that exhibit heterogeneous, asymmetric and leptokurtic features. A computationally feasible ECME algorithm is developed to compute the maximum likelihood (ML) estimates. Numerous simulation studies are conducted to investigate the asymptotic properties of the ML estimators, validate the effectiveness of the Bayesian information criterion in selecting the appropriate model, and assess the classification ability in presence of contaminated noise. The utility of the proposed methodology is demonstrated by analyzing a real-life data example.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108016"},"PeriodicalIF":1.5,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141947240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-17DOI: 10.1016/j.csda.2024.108026
Weichao Yang , Xu Guo , Lixing Zhu
This study investigates the testing of regression coefficients within high-dimensional generalized linear models featuring general covariance structures. The derived asymptotic properties reveal that distinct covariance structures can lead to varying limiting null distributions, including the normal distribution, for a widely employed quadratic-norm based test statistic. This circumstance renders it infeasible to determine critical values through a limiting null distribution. In response to this challenge, we propose a multiplier bootstrap test procedure for practical implementation. Additionally, we introduce a modified version of this procedure, incorporating projection when dealing with nuisance parameters. We then proceed to examine the asymptotic level and power of the proposed tests and assess their finite-sample performance through simulations. Finally, we present a real data analysis to illustrate the practical application of the proposed tests.
{"title":"Tests for high-dimensional generalized linear models under general covariance structure","authors":"Weichao Yang , Xu Guo , Lixing Zhu","doi":"10.1016/j.csda.2024.108026","DOIUrl":"10.1016/j.csda.2024.108026","url":null,"abstract":"<div><p>This study investigates the testing of regression coefficients within high-dimensional generalized linear models featuring general covariance structures. The derived asymptotic properties reveal that distinct covariance structures can lead to varying limiting null distributions, including the normal distribution, for a widely employed quadratic-norm based test statistic. This circumstance renders it infeasible to determine critical values through a limiting null distribution. In response to this challenge, we propose a multiplier bootstrap test procedure for practical implementation. Additionally, we introduce a modified version of this procedure, incorporating projection when dealing with nuisance parameters. We then proceed to examine the asymptotic level and power of the proposed tests and assess their finite-sample performance through simulations. Finally, we present a real data analysis to illustrate the practical application of the proposed tests.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108026"},"PeriodicalIF":1.5,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141728824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-14DOI: 10.1016/j.csda.2024.108025
C.J.R. Murphy-Barltrop , J.L. Wadsworth
In many practical applications, evaluating the joint impact of combinations of environmental variables is important for risk management and structural design analysis. When such variables are considered simultaneously, non-stationarity can exist within both the marginal distributions and dependence structure, resulting in complex data structures. In the context of extremes, few methods have been proposed for modelling trends in extremal dependence, even though capturing this feature is important for quantifying joint impact. Moreover, most proposed techniques are only applicable to data structures exhibiting asymptotic dependence. Motivated by observed dependence trends of data from the UK Climate Projections, a novel semi-parametric modelling framework for bivariate extremal dependence structures is proposed. This framework can capture a wide variety of dependence trends for data exhibiting asymptotic independence. When applied to the climate projection dataset, the model detects significant dependence trends in observations and, in combination with models for marginal non-stationarity, can be used to produce estimates of bivariate risk measures at future time points.
{"title":"Modelling non-stationarity in asymptotically independent extremes","authors":"C.J.R. Murphy-Barltrop , J.L. Wadsworth","doi":"10.1016/j.csda.2024.108025","DOIUrl":"10.1016/j.csda.2024.108025","url":null,"abstract":"<div><p>In many practical applications, evaluating the joint impact of combinations of environmental variables is important for risk management and structural design analysis. When such variables are considered simultaneously, non-stationarity can exist within both the marginal distributions and dependence structure, resulting in complex data structures. In the context of extremes, few methods have been proposed for modelling trends in extremal dependence, even though capturing this feature is important for quantifying joint impact. Moreover, most proposed techniques are only applicable to data structures exhibiting asymptotic dependence. Motivated by observed dependence trends of data from the UK Climate Projections, a novel semi-parametric modelling framework for bivariate extremal dependence structures is proposed. This framework can capture a wide variety of dependence trends for data exhibiting asymptotic independence. When applied to the climate projection dataset, the model detects significant dependence trends in observations and, in combination with models for marginal non-stationarity, can be used to produce estimates of bivariate risk measures at future time points.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108025"},"PeriodicalIF":1.5,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324001099/pdfft?md5=30bf72d73c4164fa1e95447a8e89f109&pid=1-s2.0-S0167947324001099-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141636850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1016/j.csda.2024.108013
Laura Vana-Gür
A multivariate ordinal regression model which allows the joint modeling of three-dimensional panel data containing both repeated and multiple measurements for a collection of subjects is proposed. This is achieved by a multivariate autoregressive structure on the errors of the latent variables underlying the ordinal responses, which accounts for the correlations at a single point in time and the persistence over time. The error distribution is assumed to be normal or Student-t distributed. The estimation is performed using composite likelihood methods. Through several simulation exercises, the quality of the estimates in different settings as well as in comparison with a Bayesian approach is investigated. The simulation study confirms that the estimation procedure is able to recover the model parameters well and is competitive in terms of computation time. Finally, the framework is illustrated using a data set containing bankruptcy and credit rating information for US exchange-listed companies.
{"title":"Multivariate ordinal regression for multiple repeated measurements","authors":"Laura Vana-Gür","doi":"10.1016/j.csda.2024.108013","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108013","url":null,"abstract":"<div><p>A multivariate ordinal regression model which allows the joint modeling of three-dimensional panel data containing both repeated and multiple measurements for a collection of subjects is proposed. This is achieved by a multivariate autoregressive structure on the errors of the latent variables underlying the ordinal responses, which accounts for the correlations at a single point in time and the persistence over time. The error distribution is assumed to be normal or Student-<em>t</em> distributed. The estimation is performed using composite likelihood methods. Through several simulation exercises, the quality of the estimates in different settings as well as in comparison with a Bayesian approach is investigated. The simulation study confirms that the estimation procedure is able to recover the model parameters well and is competitive in terms of computation time. Finally, the framework is illustrated using a data set containing bankruptcy and credit rating information for US exchange-listed companies.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108013"},"PeriodicalIF":1.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000975/pdfft?md5=ab85b2830c29a159e869e1da23f9a25e&pid=1-s2.0-S0167947324000975-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141541625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1016/j.csda.2024.108015
Joakim Nyberg , Andrew C. Hooker , Georg Zimmermann , Johan Verbeeck , Martin Geroldinger , Konstantin Emil Thiel , Geert Molenberghs , Martin Laimer , Verena Wally
Epidermolysis bullosa simplex (EBS) skin disease is a rare disease, which renders the use of optimal design techniques especially important to maximize the potential information in a future study, that is, to make efficient use of the limited number of available subjects and observations. A generalized linear mixed effects model (GLMM), built on an EBS trial was used to optimize the design. The model assumed a full treatment effect in the follow-up period. In addition to this model, two models with either no assumed treatment effect or a linearly declining treatment effect in the follow-up were assumed. The information gain and loss when changing the number of EBS blisters counts, altering the duration of the treatment as well as changing the study period was assessed. In addition, optimization of the EBS blister assessment times was performed. The optimization was utilizing the derived Fisher information matrix for the GLMM with EBS blister counts and the information gain and loss was quantified by D-optimal efficiency. The optimization results indicated that using optimal assessment times increases the information of about 110-120%, varying slightly between the assumed treatment models. In addition, the result showed that the assessment times were also sensitive to be moved ± one week, but assessment times within ± two days were not decreasing the information as long as three assessments (out of four assessments in the trial period) were within the treatment period and not in the follow-up period. Increasing the number of assessments to six or five per trial period increased the information to 130% and 115%, respectively, while decreasing the number of assessments to two or three, decreased the information to 50% and 80%, respectively. Increasing the length of the trial period had a minor impact on the information, while increasing the treatment period by two and four weeks had a larger impact, 120% and 130%, respectively. To conclude, general applications of optimal design methodology, derivation of the Fisher information matrix for GLMM with count data and examples on how optimal design could be used when designing trials for treatment of the EBS disease is presented. The methodology is also of interest for study designs where maximizing the information is essential. Therefore, a general applied research guidance for using optimal design is also provided.
{"title":"Optimizing designs in clinical trials with an application in treatment of Epidermolysis bullosa simplex, a rare genetic skin disease","authors":"Joakim Nyberg , Andrew C. Hooker , Georg Zimmermann , Johan Verbeeck , Martin Geroldinger , Konstantin Emil Thiel , Geert Molenberghs , Martin Laimer , Verena Wally","doi":"10.1016/j.csda.2024.108015","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108015","url":null,"abstract":"<div><p>Epidermolysis bullosa simplex (EBS) skin disease is a rare disease, which renders the use of optimal design techniques especially important to maximize the potential information in a future study, that is, to make efficient use of the limited number of available subjects and observations. A generalized linear mixed effects model (GLMM), built on an EBS trial was used to optimize the design. The model assumed a full treatment effect in the follow-up period. In addition to this model, two models with either no assumed treatment effect or a linearly declining treatment effect in the follow-up were assumed. The information gain and loss when changing the number of EBS blisters counts, altering the duration of the treatment as well as changing the study period was assessed. In addition, optimization of the EBS blister assessment times was performed. The optimization was utilizing the derived Fisher information matrix for the GLMM with EBS blister counts and the information gain and loss was quantified by D-optimal efficiency. The optimization results indicated that using optimal assessment times increases the information of about 110-120%, varying slightly between the assumed treatment models. In addition, the result showed that the assessment times were also sensitive to be moved ± one week, but assessment times within ± two days were not decreasing the information as long as three assessments (out of four assessments in the trial period) were within the treatment period and not in the follow-up period. Increasing the number of assessments to six or five per trial period increased the information to 130% and 115%, respectively, while decreasing the number of assessments to two or three, decreased the information to 50% and 80%, respectively. Increasing the length of the trial period had a minor impact on the information, while increasing the treatment period by two and four weeks had a larger impact, 120% and 130%, respectively. To conclude, general applications of optimal design methodology, derivation of the Fisher information matrix for GLMM with count data and examples on how optimal design could be used when designing trials for treatment of the EBS disease is presented. The methodology is also of interest for study designs where maximizing the information is essential. Therefore, a general applied research guidance for using optimal design is also provided.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108015"},"PeriodicalIF":1.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000999/pdfft?md5=f5085e42686fa3be3531f90fc0181a2c&pid=1-s2.0-S0167947324000999-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1016/j.csda.2024.108014
Katarzyna Reluga , María-José Lombardía , Stefan Sperlich
Linear mixed effects are considered excellent predictors of cluster-level parameters in various domains. However, previous research has demonstrated that their performance is affected by departures from model assumptions. Given the common occurrence of these departures in empirical studies, there is a need for inferential methods that are robust to misspecifications while remaining accessible and appealing to practitioners. Statistical tools have been developed for cluster-wise and simultaneous inference for mixed effects under distributional misspecifications, employing a user-friendly semiparametric random effect bootstrap. The merits and limitations of this approach are discussed in the general context of model misspecification. Theoretical analysis demonstrates the asymptotic consistency of the methods under general regularity conditions. Simulations show that the proposed intervals are robust to departures from modelling assumptions, including asymmetry and long tails in the distributions of errors and random effects, outperforming competitors in terms of empirical coverage probability. Finally, the methodology is applied to construct confidence intervals for household income across counties in the Spanish region of Galicia.
{"title":"Bootstrap-based statistical inference for linear mixed effects under misspecifications","authors":"Katarzyna Reluga , María-José Lombardía , Stefan Sperlich","doi":"10.1016/j.csda.2024.108014","DOIUrl":"https://doi.org/10.1016/j.csda.2024.108014","url":null,"abstract":"<div><p>Linear mixed effects are considered excellent predictors of cluster-level parameters in various domains. However, previous research has demonstrated that their performance is affected by departures from model assumptions. Given the common occurrence of these departures in empirical studies, there is a need for inferential methods that are robust to misspecifications while remaining accessible and appealing to practitioners. Statistical tools have been developed for cluster-wise and simultaneous inference for mixed effects under distributional misspecifications, employing a user-friendly semiparametric random effect bootstrap. The merits and limitations of this approach are discussed in the general context of model misspecification. Theoretical analysis demonstrates the asymptotic consistency of the methods under general regularity conditions. Simulations show that the proposed intervals are robust to departures from modelling assumptions, including asymmetry and long tails in the distributions of errors and random effects, outperforming competitors in terms of empirical coverage probability. Finally, the methodology is applied to construct confidence intervals for household income across counties in the Spanish region of Galicia.</p></div>","PeriodicalId":55225,"journal":{"name":"Computational Statistics & Data Analysis","volume":"199 ","pages":"Article 108014"},"PeriodicalIF":1.5,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167947324000987/pdfft?md5=733458402da2cf31e9cef3842c8c4865&pid=1-s2.0-S0167947324000987-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141541624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}