This article discusses nonparametric estimation of a survival function in the presence of measurement errors on the observation of the failure time of interest. One situation where such issues arise would be clinical studies of chronic diseases where the observation on the time to the failure event of interest such as the onset of the disease relies on patient recall or chart review of electronic medical records. It is easy to see that both situations can be subject to measurement errors. To resolve this problem, we propose a simulation extrapolation approach to correct the bias induced by the measurement error. To overcome potential computational difficulties, we use spline regression to approximate the unspecified extrapolated coefficient function of time, and establish the asymptotic properties of our proposed estimator. The proposed method is applied to nonparametric estimation based on interval-censored data. Extensive numerical experiments involving both simulated and actual study datasets demonstrate the feasibility of this proposed estimation procedure.
{"title":"Nonparametric estimation of a survival function in the presence of measurement errors on the failure time of interest","authors":"Shaojia Jin, Yanyan Liu, Guangcai Mao, Jianguo Sun, Yuanshan Wu","doi":"10.1002/cjs.11799","DOIUrl":"10.1002/cjs.11799","url":null,"abstract":"<p>This article discusses nonparametric estimation of a survival function in the presence of measurement errors on the observation of the failure time of interest. One situation where such issues arise would be clinical studies of chronic diseases where the observation on the time to the failure event of interest such as the onset of the disease relies on patient recall or chart review of electronic medical records. It is easy to see that both situations can be subject to measurement errors. To resolve this problem, we propose a simulation extrapolation approach to correct the bias induced by the measurement error. To overcome potential computational difficulties, we use spline regression to approximate the unspecified extrapolated coefficient function of time, and establish the asymptotic properties of our proposed estimator. The proposed method is applied to nonparametric estimation based on interval-censored data. Extensive numerical experiments involving both simulated and actual study datasets demonstrate the feasibility of this proposed estimation procedure.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"783-803"},"PeriodicalIF":0.8,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135141149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivated by image-on-scalar regression with data aggregated across multiple sites, we consider a setting in which multiple independent studies each collect multiple dependent vector outcomes, with potential mean model parameter homogeneity between studies and outcome vectors. To determine the validity of a joint analysis of these data sources, we must learn which of them share mean model parameters. We propose a new model fusion approach that delivers improved flexibility and statistical performance over existing methods. Our proposed approach specifies a quadratic inference function within each data source and fuses mean model parameter vectors in their entirety based on a new formulation of a pairwise fusion penalty. We establish theoretical properties of our estimator and propose an asymptotically equivalent weighted oracle meta-estimator that is more computationally efficient. Simulations and an application to the ABIDE neuroimaging consortium highlight the flexibility of the proposed approach. An R package is provided for ease of implementation.
受图像-尺度回归与多站点数据汇总的启发,我们考虑了这样一种情况,即多项独立研究各自收集多个因变向量结果,而研究与结果向量之间可能存在平均模型参数同质性。为了确定对这些数据源进行联合分析的有效性,我们必须了解其中哪些数据源共享平均模型参数。我们提出了一种新的模型融合方法,与现有方法相比,这种方法具有更好的灵活性和统计性能。我们提出的方法在每个数据源中指定了一个二次推理函数,并根据成对融合罚则的新表述融合了整个平均模型参数向量。我们建立了估计器的理论属性,并提出了一种计算效率更高的渐进等效加权甲骨文元估计器。模拟和在 ABIDE 神经成像联盟中的应用凸显了所提方法的灵活性。为了便于实施,我们还提供了一个 R 软件包。
{"title":"Fused mean structure learning in data integration with dependence","authors":"Emily C. Hector","doi":"10.1002/cjs.11797","DOIUrl":"10.1002/cjs.11797","url":null,"abstract":"<p>Motivated by image-on-scalar regression with data aggregated across multiple sites, we consider a setting in which multiple independent studies each collect multiple dependent vector outcomes, with potential mean model parameter homogeneity between studies and outcome vectors. To determine the validity of a joint analysis of these data sources, we must learn which of them share mean model parameters. We propose a new model fusion approach that delivers improved flexibility and statistical performance over existing methods. Our proposed approach specifies a quadratic inference function within each data source and fuses mean model parameter vectors in their entirety based on a new formulation of a pairwise fusion penalty. We establish theoretical properties of our estimator and propose an asymptotically equivalent weighted oracle meta-estimator that is more computationally efficient. Simulations and an application to the ABIDE neuroimaging consortium highlight the flexibility of the proposed approach. An <span>R</span> package is provided for ease of implementation.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"939-961"},"PeriodicalIF":0.8,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11797","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136317032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this article, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with subhomogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO and a multidirectional shrinkage penalty method). Finally, we apply the proposed method to the multicentre Childhood Adenotonsillectomy Trial to identify subhomogeneity in the treatment effects across different study sites.
{"title":"High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources","authors":"Tingting Yu, Shangyuan Ye, Rui Wang","doi":"10.1002/cjs.11793","DOIUrl":"10.1002/cjs.11793","url":null,"abstract":"<p>When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this article, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with subhomogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO and a multidirectional shrinkage penalty method). Finally, we apply the proposed method to the multicentre Childhood Adenotonsillectomy Trial to identify subhomogeneity in the treatment effects across different study sites.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"900-923"},"PeriodicalIF":0.8,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42707966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Functional analysis of variance (ANOVA) models are often used to compare groups of functional data. Similar to the traditional ANOVA model, a common follow-up procedure to the rejection of the functional ANOVA null hypothesis is to perform functional linear contrast tests to identify which groups have different mean functions. Most existing functional contrast tests assume independent functional observations within each group. In this article, we introduce a new functional linear contrast test procedure that accounts for possible time dependency among functional group members. The test statistic and its normalized version, based on the Karhunen–Loève decomposition of the covariance function and a weak convergence result of the error processes, follow respectively a mixture chi-squared and a chi-squared distribution. An extensive simulation study is conducted to compare the empirical performance of the existing and new contrast tests. We also present two applications of these contrast tests to a weather study and a battery-life study. We provide software implementation and example data in the Supplementary Material.
{"title":"Contrast tests for groups of functional data","authors":"Quyen Do, Pang Du","doi":"10.1002/cjs.11794","DOIUrl":"10.1002/cjs.11794","url":null,"abstract":"<p>Functional analysis of variance (ANOVA) models are often used to compare groups of functional data. Similar to the traditional ANOVA model, a common follow-up procedure to the rejection of the functional ANOVA null hypothesis is to perform functional linear contrast tests to identify which groups have different mean functions. Most existing functional contrast tests assume independent functional observations within each group. In this article, we introduce a new functional linear contrast test procedure that accounts for possible time dependency among functional group members. The test statistic and its normalized version, based on the Karhunen–Loève decomposition of the covariance function and a weak convergence result of the error processes, follow respectively a mixture chi-squared and a chi-squared distribution. An extensive simulation study is conducted to compare the empirical performance of the existing and new contrast tests. We also present two applications of these contrast tests to a weather study and a battery-life study. We provide software implementation and example data in the Supplementary Material.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"713-733"},"PeriodicalIF":0.8,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11794","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48159209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A reduced-rank mixed-effects model is developed for robust modelling of sparsely observed paired functional data. In this model, the curves for each functional variable are summarized using a few functional principal components, and the association of the two functional variables is modelled through the association of the principal component scores. A multivariate-scale mixture of normal distributions is used to model the principal component scores and the measurement errors in order to handle outlying observations and achieve robust inference. The mean functions and principal component functions are modelled using splines, and roughness penalties are applied to avoid overfitting. An EM algorithm is developed for computation of model fitting and prediction. A simulation study shows that the proposed method outperforms an existing method, which is not designed for robust estimation. The effectiveness of the proposed method is illustrated through an application of fitting multiband light curves of Type Ia supernovae.
{"title":"Robust joint modelling of sparsely observed paired functional data","authors":"Huiya Zhou, Xiaomeng Yan, Lan Zhou","doi":"10.1002/cjs.11796","DOIUrl":"10.1002/cjs.11796","url":null,"abstract":"<p>A reduced-rank mixed-effects model is developed for robust modelling of sparsely observed paired functional data. In this model, the curves for each functional variable are summarized using a few functional principal components, and the association of the two functional variables is modelled through the association of the principal component scores. A multivariate-scale mixture of normal distributions is used to model the principal component scores and the measurement errors in order to handle outlying observations and achieve robust inference. The mean functions and principal component functions are modelled using splines, and roughness penalties are applied to avoid overfitting. An EM algorithm is developed for computation of model fitting and prediction. A simulation study shows that the proposed method outperforms an existing method, which is not designed for robust estimation. The effectiveness of the proposed method is illustrated through an application of fitting multiband light curves of Type Ia supernovae.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"734-754"},"PeriodicalIF":0.8,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11796","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45602827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are delighted to present a special issue of The Canadian Journal of Statistics (CJS) in honour of Professor Nancy Reid. The articles in this collection have been contributed by a group of participants who attended a workshop entitled “Statistics at its Best” in Toronto on 5 May 2022. The workshop was organized by the Department of Statistical Sciences at the University of Toronto to celebrate Professor Reid’s 70th birthday. It highlighted her remarkable contributions to Statistical Science and her dedication to the profession, exemplified in research, leadership, service and education of the next generation of statisticians. Professor Reid’s impactful career has played a crucial role in fostering the growth of the Canadian statistical community. This workshop was part of a series of celebratory activities coordinated by the Statistical Society of Canada, marking the 50th anniversary of the statistical community in this country. This collection of articles encompasses a wide range of topics. First, the engaging dialogue A conversation with Nancy Reid by Craiu and Yi sheds light on Professor Reid’s intellectual journey and perspectives on statistical science and data science. In The inducement of population sparsity, Battey presents the pioneering work on parameter orthogonalization by Cox and Reid as an inducement of abstract population-level sparsity. The article focuses on three important examples related to sparsity-inducing parameterizations or data transformations: covariance models, nuisance parameter elimination and high-dimensional regression. Strategies for inducing sparsity vary depending on the context and may involve solving partial differential equations or specifying parameterized paths. Battey concludes by presenting some open problems. McCullagh then highlights, in A tale of two variances, the ambiguity and potential misinterpretation of the standard repeated-sampling concept of the variance in a finite-dimensional parametric model. He presents three operational interpretations, all numerically distinct and compatible with repeated sampling from a fixed parameter population. These interpretations help resolve contradictions between Fisherian variance and inverse-information variance. We next turn to hypothesis testing for parameters on the boundary of their domain. In Improved inference for a boundary parameter, Elkantassi, Bellio, Brazzale and Davison review theoretical work on the problem, including hard and soft boundaries, and iceberg estimators. They highlight the significant underestimation of the probability due to the limiting results, propose remedies based on the normal approximation for the profile score function, and outline the success of higher order approximations. Using these approaches, the authors develop an accurate test to assess the need for a spline component in a linear mixed model. In Sparse estimation within Pearson’s system, with an application to financial market risk, Carey, Genest and Ramsay tackle t
{"title":"Special issue in honour of Nancy Reid: Guest Editors' introduction","authors":"","doi":"10.1002/cjs.11792","DOIUrl":"https://doi.org/10.1002/cjs.11792","url":null,"abstract":"We are delighted to present a special issue of The Canadian Journal of Statistics (CJS) in honour of Professor Nancy Reid. The articles in this collection have been contributed by a group of participants who attended a workshop entitled “Statistics at its Best” in Toronto on 5 May 2022. The workshop was organized by the Department of Statistical Sciences at the University of Toronto to celebrate Professor Reid’s 70th birthday. It highlighted her remarkable contributions to Statistical Science and her dedication to the profession, exemplified in research, leadership, service and education of the next generation of statisticians. Professor Reid’s impactful career has played a crucial role in fostering the growth of the Canadian statistical community. This workshop was part of a series of celebratory activities coordinated by the Statistical Society of Canada, marking the 50th anniversary of the statistical community in this country. This collection of articles encompasses a wide range of topics. First, the engaging dialogue A conversation with Nancy Reid by Craiu and Yi sheds light on Professor Reid’s intellectual journey and perspectives on statistical science and data science. In The inducement of population sparsity, Battey presents the pioneering work on parameter orthogonalization by Cox and Reid as an inducement of abstract population-level sparsity. The article focuses on three important examples related to sparsity-inducing parameterizations or data transformations: covariance models, nuisance parameter elimination and high-dimensional regression. Strategies for inducing sparsity vary depending on the context and may involve solving partial differential equations or specifying parameterized paths. Battey concludes by presenting some open problems. McCullagh then highlights, in A tale of two variances, the ambiguity and potential misinterpretation of the standard repeated-sampling concept of the variance in a finite-dimensional parametric model. He presents three operational interpretations, all numerically distinct and compatible with repeated sampling from a fixed parameter population. These interpretations help resolve contradictions between Fisherian variance and inverse-information variance. We next turn to hypothesis testing for parameters on the boundary of their domain. In Improved inference for a boundary parameter, Elkantassi, Bellio, Brazzale and Davison review theoretical work on the problem, including hard and soft boundaries, and iceberg estimators. They highlight the significant underestimation of the probability due to the limiting results, propose remedies based on the normal approximation for the profile score function, and outline the success of higher order approximations. Using these approaches, the authors develop an accurate test to assess the need for a spline component in a linear mixed model. In Sparse estimation within Pearson’s system, with an application to financial market risk, Carey, Genest and Ramsay tackle t","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"51 1","pages":""},"PeriodicalIF":0.6,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"51300145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We are delighted to present a special issue of The Canadian Journal of Statistics (CJS) in honour of Professor Nancy Reid. The articles in this collection have been contributed by a group of participants who attended a workshop entitled “Statistics at its Best” in Toronto on 5 May 2022. The workshop was organized by the Department of Statistical Sciences at the University of Toronto to celebrate Professor Reid’s 70th birthday. It highlighted her remarkable contributions to Statistical Science and her dedication to the profession, exemplified in research, leadership, service and education of the next generation of statisticians. Professor Reid’s impactful career has played a crucial role in fostering the growth of the Canadian statistical community. This workshop was part of a series of celebratory activities coordinated by the Statistical Society of Canada, marking the 50th anniversary of the statistical community in this country. This collection of articles encompasses a wide range of topics. First, the engaging dialogue A conversation with Nancy Reid by Craiu and Yi sheds light on Professor Reid’s intellectual journey and perspectives on statistical science and data science. In The inducement of population sparsity, Battey presents the pioneering work on parameter orthogonalization by Cox and Reid as an inducement of abstract population-level sparsity. The article focuses on three important examples related to sparsity-inducing parameterizations or data transformations: covariance models, nuisance parameter elimination and high-dimensional regression. Strategies for inducing sparsity vary depending on the context and may involve solving partial differential equations or specifying parameterized paths. Battey concludes by presenting some open problems. McCullagh then highlights, in A tale of two variances, the ambiguity and potential misinterpretation of the standard repeated-sampling concept of the variance in a finite-dimensional parametric model. He presents three operational interpretations, all numerically distinct and compatible with repeated sampling from a fixed parameter population. These interpretations help resolve contradictions between Fisherian variance and inverse-information variance. We next turn to hypothesis testing for parameters on the boundary of their domain. In Improved inference for a boundary parameter, Elkantassi, Bellio, Brazzale and Davison review theoretical work on the problem, including hard and soft boundaries, and iceberg estimators. They highlight the significant underestimation of the probability due to the limiting results, propose remedies based on the normal approximation for the profile score function, and outline the success of higher order approximations. Using these approaches, the authors develop an accurate test to assess the need for a spline component in a linear mixed model. In Sparse estimation within Pearson’s system, with an application to financial market risk, Carey, Genest and Ramsay tackle t
{"title":"Special issue in honour of Nancy Reid: Guest Editors' introduction","authors":"","doi":"10.1002/cjs.11792","DOIUrl":"https://doi.org/10.1002/cjs.11792","url":null,"abstract":"We are delighted to present a special issue of The Canadian Journal of Statistics (CJS) in honour of Professor Nancy Reid. The articles in this collection have been contributed by a group of participants who attended a workshop entitled “Statistics at its Best” in Toronto on 5 May 2022. The workshop was organized by the Department of Statistical Sciences at the University of Toronto to celebrate Professor Reid’s 70th birthday. It highlighted her remarkable contributions to Statistical Science and her dedication to the profession, exemplified in research, leadership, service and education of the next generation of statisticians. Professor Reid’s impactful career has played a crucial role in fostering the growth of the Canadian statistical community. This workshop was part of a series of celebratory activities coordinated by the Statistical Society of Canada, marking the 50th anniversary of the statistical community in this country. This collection of articles encompasses a wide range of topics. First, the engaging dialogue A conversation with Nancy Reid by Craiu and Yi sheds light on Professor Reid’s intellectual journey and perspectives on statistical science and data science. In The inducement of population sparsity, Battey presents the pioneering work on parameter orthogonalization by Cox and Reid as an inducement of abstract population-level sparsity. The article focuses on three important examples related to sparsity-inducing parameterizations or data transformations: covariance models, nuisance parameter elimination and high-dimensional regression. Strategies for inducing sparsity vary depending on the context and may involve solving partial differential equations or specifying parameterized paths. Battey concludes by presenting some open problems. McCullagh then highlights, in A tale of two variances, the ambiguity and potential misinterpretation of the standard repeated-sampling concept of the variance in a finite-dimensional parametric model. He presents three operational interpretations, all numerically distinct and compatible with repeated sampling from a fixed parameter population. These interpretations help resolve contradictions between Fisherian variance and inverse-information variance. We next turn to hypothesis testing for parameters on the boundary of their domain. In Improved inference for a boundary parameter, Elkantassi, Bellio, Brazzale and Davison review theoretical work on the problem, including hard and soft boundaries, and iceberg estimators. They highlight the significant underestimation of the probability due to the limiting results, propose remedies based on the normal approximation for the profile score function, and outline the success of higher order approximations. Using these approaches, the authors develop an accurate test to assess the need for a spline component in a linear mixed model. In Sparse estimation within Pearson’s system, with an application to financial market risk, Carey, Genest and Ramsay tackle t","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"51 3","pages":"747-751"},"PeriodicalIF":0.6,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50135645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Finite mixture models have been used for unsupervised learning for some time, and their use within the semisupervised paradigm is becoming more commonplace. Clickstream data are one of the various emerging data types that demand particular attention because there is a notable paucity of statistical learning approaches currently available. A mixture of first-order continuous-time Markov models is introduced for unsupervised and semisupervised learning of clickstream data. This approach assumes continuous time, which distinguishes it from existing mixture model-based approaches; practically, this allows account to be taken of the amount of time each user spends on each webpage. The approach is evaluated and compared with the discrete-time approach, using simulated and real data.
{"title":"Clustering and semi-supervised classification for clickstream data via mixture models","authors":"Michael P. B. Gallaugher, Paul D. McNicholas","doi":"10.1002/cjs.11795","DOIUrl":"10.1002/cjs.11795","url":null,"abstract":"<p>Finite mixture models have been used for unsupervised learning for some time, and their use within the semisupervised paradigm is becoming more commonplace. Clickstream data are one of the various emerging data types that demand particular attention because there is a notable paucity of statistical learning approaches currently available. A mixture of first-order continuous-time Markov models is introduced for unsupervised and semisupervised learning of clickstream data. This approach assumes continuous time, which distinguishes it from existing mixture model-based approaches; practically, this allows account to be taken of the amount of time each user spends on each webpage. The approach is evaluated and compared with the discrete-time approach, using simulated and real data.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 3","pages":"678-695"},"PeriodicalIF":0.8,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49122235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifiability constraints are necessary for parameter estimation when fitting models with nonlinear covariate associations. The choice of constraint affects standard errors of the estimated curve. Centring constraints are often applied by default because they are thought to yield lowest standard errors out of any constraint, but this claim has not been investigated. We show that whether centring constraints are optimal depends on the response distribution and parameterization, and that for natural exponential family responses under the canonical parametrization, centring constraints are optimal only for Gaussian response.
{"title":"Identifiability constraints in generalized additive models","authors":"Alex Stringer","doi":"10.1002/cjs.11786","DOIUrl":"10.1002/cjs.11786","url":null,"abstract":"<p>Identifiability constraints are necessary for parameter estimation when fitting models with nonlinear covariate associations. The choice of constraint affects standard errors of the estimated curve. Centring constraints are often applied by default because they are thought to yield lowest standard errors out of any constraint, but this claim has not been investigated. We show that whether centring constraints are optimal depends on the response distribution and parameterization, and that for natural exponential family responses under the canonical parametrization, centring constraints are optimal only for Gaussian response.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 2","pages":"461-476"},"PeriodicalIF":0.6,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11786","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45183591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinhan Xie, Xianwen Ding, Bei Jiang, Xiaodong Yan, Linglong Kong
This article considers robust prediction issues in ultrahigh-dimensional (UHD) datasets and proposes combining quantile regression with sequential model averaging to arrive at a quantile sequential model averaging (QSMA) procedure. The QSMA method is made computationally feasible by employing a sequential screening process and a Bayesian information criterion (BIC) model averaging method for UHD quantile regression and provides a more accurate and stable prediction of the conditional quantile of a response variable. Meanwhile, the proposed method shows effective behaviour in dealing with prediction in UHD datasets and saves a great deal of computational cost with the help of the sequential technique. Under some suitable conditions, we show that the proposed QSMA method can mitigate overfitting and yields reliable predictions. Numerical studies, including extensive simulations and a real data example, are presented to confirm that the proposed method performs well.
{"title":"High-dimensional model averaging for quantile regression","authors":"Jinhan Xie, Xianwen Ding, Bei Jiang, Xiaodong Yan, Linglong Kong","doi":"10.1002/cjs.11789","DOIUrl":"10.1002/cjs.11789","url":null,"abstract":"<p>This article considers robust prediction issues in ultrahigh-dimensional (UHD) datasets and proposes combining quantile regression with sequential model averaging to arrive at a quantile sequential model averaging (QSMA) procedure. The QSMA method is made computationally feasible by employing a sequential screening process and a Bayesian information criterion (BIC) model averaging method for UHD quantile regression and provides a more accurate and stable prediction of the conditional quantile of a response variable. Meanwhile, the proposed method shows effective behaviour in dealing with prediction in UHD datasets and saves a great deal of computational cost with the help of the sequential technique. Under some suitable conditions, we show that the proposed QSMA method can mitigate overfitting and yields reliable predictions. Numerical studies, including extensive simulations and a real data example, are presented to confirm that the proposed method performs well.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 2","pages":"618-635"},"PeriodicalIF":0.6,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11789","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48251981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}