{"title":"Acknowledgement of referees' services remerciements aux membres des jurys","authors":"","doi":"10.1002/cjs.11806","DOIUrl":"https://doi.org/10.1002/cjs.11806","url":null,"abstract":"","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 1","pages":"327-331"},"PeriodicalIF":0.6,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140123773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new functional additive hazards model to investigate the potential effects of functional and scalar predictors on mortality risks, and develop a penalized least squares estimation method for model parameters based on a pseudoscore estimating equation. A reproducing kernel Hilbert space approach is used to establish the consistency, convergence rate, and joint asymptotic distribution of the resulting estimators for finite-dimensional and infinite-dimensional parameters. Our simulation studies demonstrate that the proposed estimation procedure performs well. For illustration, we apply the proposed method to the Medical Information Mart for Intensive Care III dataset.
我们提出了一种新的功能加性危害模型来研究功能和标量预测因子对死亡风险的潜在影响,并开发了一种基于伪core估计方程的模型参数惩罚性最小二乘估计方法。我们使用重现核希尔伯特空间方法来确定有限维和无限维参数估计结果的一致性、收敛率和联合渐近分布。我们的模拟研究表明,所提出的估计程序性能良好。为说明起见,我们将提议的方法应用于重症监护医疗信息市场 III 数据集。
{"title":"Semiparametric estimation for the functional additive hazards model","authors":"Meiling Hao, Kin-yat Liu, Wen Su, Xingqiu Zhao","doi":"10.1002/cjs.11805","DOIUrl":"10.1002/cjs.11805","url":null,"abstract":"<p>We propose a new functional additive hazards model to investigate the potential effects of functional and scalar predictors on mortality risks, and develop a penalized least squares estimation method for model parameters based on a pseudoscore estimating equation. A reproducing kernel Hilbert space approach is used to establish the consistency, convergence rate, and joint asymptotic distribution of the resulting estimators for finite-dimensional and infinite-dimensional parameters. Our simulation studies demonstrate that the proposed estimation procedure performs well. For illustration, we apply the proposed method to the Medical Information Mart for Intensive Care III dataset.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"755-782"},"PeriodicalIF":0.8,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139421542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a Bayesian nonparametric clustering approach to study the spatial heterogeneity effect for functional data observed at spatially correlated locations. We consider a geographically weighted Chinese restaurant process equipped with a conditional autoregressive prior to capture fully the spatial correlation of function curves. To sample efficiently from our model, we customize a prior called Quadratic Gamma, which ensures conjugacy. We design a Markov chain Monte Carlo algorithm to infer simultaneously the posterior distributions of the number of groups and the grouping configurations. The superior numerical performance of the proposed method over competing methods is demonstrated using simulated examples and a U.S. annual precipitation study.
{"title":"Clustering spatial functional data using a geographically weighted Dirichlet process","authors":"Tianyu Pan, Weining Shen, Guanyu Hu","doi":"10.1002/cjs.11803","DOIUrl":"10.1002/cjs.11803","url":null,"abstract":"<p>We propose a Bayesian nonparametric clustering approach to study the spatial heterogeneity effect for functional data observed at spatially correlated locations. We consider a geographically weighted Chinese restaurant process equipped with a conditional autoregressive prior to capture fully the spatial correlation of function curves. To sample efficiently from our model, we customize a prior called Quadratic Gamma, which ensures conjugacy. We design a Markov chain Monte Carlo algorithm to infer simultaneously the posterior distributions of the number of groups and the grouping configurations. The superior numerical performance of the proposed method over competing methods is demonstrated using simulated examples and a U.S. annual precipitation study.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"696-712"},"PeriodicalIF":0.8,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140055778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider data integration problems where correlated data are collected from multiple platforms. Within each platform, there are linear relationships between the responses and a collection of predictors. We extend the linear models to include random errors coming from a much wider family of sub-Gaussian and subexponential distributions. The goal is to select important predictors across multiple platforms, where the number of predictors and the number of observations both increase to infinity. We combine the marginal densities of the responses obtained from different platforms to form a composite likelihood and propose a model selection criterion based on Bayesian composite posterior probabilities. Under some regularity conditions, we prove that the model selection criterion is consistent to recover the union support of the predictors with divergent true model size.
{"title":"Bayesian Model Selection via Composite Likelihood for High-dimensional Data Integration","authors":"Guanlin Zhang, Yuehua Wu, Xin Gao","doi":"10.1002/cjs.11800","DOIUrl":"10.1002/cjs.11800","url":null,"abstract":"<p>We consider data integration problems where correlated data are collected from multiple platforms. Within each platform, there are linear relationships between the responses and a collection of predictors. We extend the linear models to include random errors coming from a much wider family of sub-Gaussian and subexponential distributions. The goal is to select important predictors across multiple platforms, where the number of predictors and the number of observations both increase to infinity. We combine the marginal densities of the responses obtained from different platforms to form a composite likelihood and propose a model selection criterion based on Bayesian composite posterior probabilities. Under some regularity conditions, we prove that the model selection criterion is consistent to recover the union support of the predictors with divergent true model size.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"924-938"},"PeriodicalIF":0.8,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140056288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Semicontinuous data frequently occur in longitudinal studies. The popular two-part modelling approach deals with longitudinal semicontinuous data by analyzing the occurrence of positive values and the intensity of positive values separately; however, this separation may break down the natural sequence of semicontinuous data within a subject and destroy its serial dependence structure. In this article, we introduce a Tweedie compound Poisson mixed model to study the occurrence of positive values and the quantity of the semicontinuous response simultaneously. In our approach, covariate effects on the semicontinuous response are assessed directly. The correlation within a subject and the unobserved heterogeneity are incorporated with serially correlated nonparametric random effects. Our model unifies subject-specific and population-averaged interpretations. We illustrate the approach with applications to a Brief Symptom Inventory study and an infants' fluoride intake study.
{"title":"Modelling occurrence and quantity of longitudinal semicontinuous data simultaneously with nonparametric unobserved heterogeneity","authors":"Guohua Yan, Renjun Ma","doi":"10.1002/cjs.11801","DOIUrl":"10.1002/cjs.11801","url":null,"abstract":"<p>Semicontinuous data frequently occur in longitudinal studies. The popular two-part modelling approach deals with longitudinal semicontinuous data by analyzing the occurrence of positive values and the intensity of positive values separately; however, this separation may break down the natural sequence of semicontinuous data within a subject and destroy its serial dependence structure. In this article, we introduce a Tweedie compound Poisson mixed model to study the occurrence of positive values and the quantity of the semicontinuous response simultaneously. In our approach, covariate effects on the semicontinuous response are assessed directly. The correlation within a subject and the unobserved heterogeneity are incorporated with serially correlated nonparametric random effects. Our model unifies subject-specific and population-averaged interpretations. We illustrate the approach with applications to a Brief Symptom Inventory study and an infants' fluoride intake study.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"855-872"},"PeriodicalIF":0.8,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138561336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Item nonresponse is a common issue in surveys. Because unadjusted estimators may be biased in the presence of nonresponse, it is common practice to impute the missing values with the objective of reducing the nonresponse bias as much as possible. However, commonly used imputation procedures may lead to unstable estimators of population totals/means when influential units are present in the set of respondents. In this article, we consider the class of multiply robust imputation procedures that provide some protection against the failure of underlying model assumptions. We develop an efficient version of multiply robust estimators based on the concept of conditional bias, a measure of influence. We present the results of a simulation study to show the benefits of our proposed method in terms of bias and efficiency.
{"title":"Efficient multiply robust imputation in the presence of influential units in surveys","authors":"Sixia Chen, David Haziza, Victoire Michal","doi":"10.1002/cjs.11802","DOIUrl":"10.1002/cjs.11802","url":null,"abstract":"<p>Item nonresponse is a common issue in surveys. Because unadjusted estimators may be biased in the presence of nonresponse, it is common practice to impute the missing values with the objective of reducing the nonresponse bias as much as possible. However, commonly used imputation procedures may lead to unstable estimators of population totals/means when influential units are present in the set of respondents. In this article, we consider the class of multiply robust imputation procedures that provide some protection against the failure of underlying model assumptions. We develop an efficient version of multiply robust estimators based on the concept of conditional bias, a measure of influence. We present the results of a simulation study to show the benefits of our proposed method in terms of bias and efficiency.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"829-854"},"PeriodicalIF":0.8,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11802","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138544108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article considers the challenge of designing football group draw mechanisms, which have a uniform distribution over all valid draw assignments, but are also entertaining, practical and transparent. Although this problem is trivial in completely symmetric problems, it becomes challenging when there are draw constraints that are not exchangeable across each of the competing teams, so that symmetry breaks down. We explain how to simulate the FIFA sequential draw method and compute the nonuniformity of its draws by comparison with a uniform rejection sampler. We then propose several practical methods of achieving the uniform distribution while still using balls and bowls in a way which is suitable for a televised draw. The solutions can also be carried out interactively. The general methodology we provide can readily be transported to different competition draws and is not restricted to football events.
{"title":"Football group draw probabilities and corrections","authors":"Gareth O. Roberts, Jeffrey S. Rosenthal","doi":"10.1002/cjs.11798","DOIUrl":"10.1002/cjs.11798","url":null,"abstract":"<p>This article considers the challenge of designing football group draw mechanisms, which have a uniform distribution over all valid draw assignments, but are also entertaining, practical and transparent. Although this problem is trivial in completely symmetric problems, it becomes challenging when there are draw constraints that are not exchangeable across each of the competing teams, so that symmetry breaks down. We explain how to simulate the FIFA sequential draw method and compute the nonuniformity of its draws by comparison with a uniform rejection sampler. We then propose several practical methods of achieving the uniform distribution while still using balls and bowls in a way which is suitable for a televised draw. The solutions can also be carried out interactively. The general methodology we provide can readily be transported to different competition draws and is not restricted to football events.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"659-677"},"PeriodicalIF":0.8,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article discusses nonparametric estimation of a survival function in the presence of measurement errors on the observation of the failure time of interest. One situation where such issues arise would be clinical studies of chronic diseases where the observation on the time to the failure event of interest such as the onset of the disease relies on patient recall or chart review of electronic medical records. It is easy to see that both situations can be subject to measurement errors. To resolve this problem, we propose a simulation extrapolation approach to correct the bias induced by the measurement error. To overcome potential computational difficulties, we use spline regression to approximate the unspecified extrapolated coefficient function of time, and establish the asymptotic properties of our proposed estimator. The proposed method is applied to nonparametric estimation based on interval-censored data. Extensive numerical experiments involving both simulated and actual study datasets demonstrate the feasibility of this proposed estimation procedure.
{"title":"Nonparametric estimation of a survival function in the presence of measurement errors on the failure time of interest","authors":"Shaojia Jin, Yanyan Liu, Guangcai Mao, Jianguo Sun, Yuanshan Wu","doi":"10.1002/cjs.11799","DOIUrl":"10.1002/cjs.11799","url":null,"abstract":"<p>This article discusses nonparametric estimation of a survival function in the presence of measurement errors on the observation of the failure time of interest. One situation where such issues arise would be clinical studies of chronic diseases where the observation on the time to the failure event of interest such as the onset of the disease relies on patient recall or chart review of electronic medical records. It is easy to see that both situations can be subject to measurement errors. To resolve this problem, we propose a simulation extrapolation approach to correct the bias induced by the measurement error. To overcome potential computational difficulties, we use spline regression to approximate the unspecified extrapolated coefficient function of time, and establish the asymptotic properties of our proposed estimator. The proposed method is applied to nonparametric estimation based on interval-censored data. Extensive numerical experiments involving both simulated and actual study datasets demonstrate the feasibility of this proposed estimation procedure.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"783-803"},"PeriodicalIF":0.8,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivated by image-on-scalar regression with data aggregated across multiple sites, we consider a setting in which multiple independent studies each collect multiple dependent vector outcomes, with potential mean model parameter homogeneity between studies and outcome vectors. To determine the validity of a joint analysis of these data sources, we must learn which of them share mean model parameters. We propose a new model fusion approach that delivers improved flexibility and statistical performance over existing methods. Our proposed approach specifies a quadratic inference function within each data source and fuses mean model parameter vectors in their entirety based on a new formulation of a pairwise fusion penalty. We establish theoretical properties of our estimator and propose an asymptotically equivalent weighted oracle meta-estimator that is more computationally efficient. Simulations and an application to the ABIDE neuroimaging consortium highlight the flexibility of the proposed approach. An R package is provided for ease of implementation.
受图像-尺度回归与多站点数据汇总的启发,我们考虑了这样一种情况,即多项独立研究各自收集多个因变向量结果,而研究与结果向量之间可能存在平均模型参数同质性。为了确定对这些数据源进行联合分析的有效性,我们必须了解其中哪些数据源共享平均模型参数。我们提出了一种新的模型融合方法,与现有方法相比,这种方法具有更好的灵活性和统计性能。我们提出的方法在每个数据源中指定了一个二次推理函数,并根据成对融合罚则的新表述融合了整个平均模型参数向量。我们建立了估计器的理论属性,并提出了一种计算效率更高的渐进等效加权甲骨文元估计器。模拟和在 ABIDE 神经成像联盟中的应用凸显了所提方法的灵活性。为了便于实施,我们还提供了一个 R 软件包。
{"title":"Fused mean structure learning in data integration with dependence","authors":"Emily C. Hector","doi":"10.1002/cjs.11797","DOIUrl":"10.1002/cjs.11797","url":null,"abstract":"<p>Motivated by image-on-scalar regression with data aggregated across multiple sites, we consider a setting in which multiple independent studies each collect multiple dependent vector outcomes, with potential mean model parameter homogeneity between studies and outcome vectors. To determine the validity of a joint analysis of these data sources, we must learn which of them share mean model parameters. We propose a new model fusion approach that delivers improved flexibility and statistical performance over existing methods. Our proposed approach specifies a quadratic inference function within each data source and fuses mean model parameter vectors in their entirety based on a new formulation of a pairwise fusion penalty. We establish theoretical properties of our estimator and propose an asymptotically equivalent weighted oracle meta-estimator that is more computationally efficient. Simulations and an application to the ABIDE neuroimaging consortium highlight the flexibility of the proposed approach. An <span>R</span> package is provided for ease of implementation.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"939-961"},"PeriodicalIF":0.8,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11797","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136317032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this article, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with subhomogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO and a multidirectional shrinkage penalty method). Finally, we apply the proposed method to the multicentre Childhood Adenotonsillectomy Trial to identify subhomogeneity in the treatment effects across different study sites.
{"title":"High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources","authors":"Tingting Yu, Shangyuan Ye, Rui Wang","doi":"10.1002/cjs.11793","DOIUrl":"10.1002/cjs.11793","url":null,"abstract":"<p>When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this article, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with subhomogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO and a multidirectional shrinkage penalty method). Finally, we apply the proposed method to the multicentre Childhood Adenotonsillectomy Trial to identify subhomogeneity in the treatment effects across different study sites.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"900-923"},"PeriodicalIF":0.8,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42707966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}