首页 > 最新文献

Canadian Journal of Statistics-Revue Canadienne De Statistique最新文献

英文 中文
High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources 考虑多元数据源回归系数异质性的高维变量选择
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-08-19 DOI: 10.1002/cjs.11793
Tingting Yu, Shangyuan Ye, Rui Wang

When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this article, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with subhomogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO and a multidirectional shrinkage penalty method). Finally, we apply the proposed method to the multicentre Childhood Adenotonsillectomy Trial to identify subhomogeneity in the treatment effects across different study sites.

在分析来自多个来源(如医院、研究)的综合数据时,必须考虑到不同来源之间的异质性。在本文中,我们考虑采用高维线性回归模型进行综合数据分析。我们提出了一种新的自适应聚类惩罚(ACP)方法来同时选择变量和具有亚同质性的特定聚类源的回归系数。我们证明了在一定的正则性条件下,基于ACP方法的估计量具有很强的预言性。我们还开发了一种基于乘法器交替方向法(ADMM)的高效参数估计算法。我们进行了仿真研究,将所提出的方法与三种现有方法(融合LASSO与相邻融合、两两融合LASSO和多向收缩惩罚方法)的性能进行了比较。最后,我们将提出的方法应用于多中心儿童腺扁桃体切除术试验,以确定不同研究地点治疗效果的亚均匀性。
{"title":"High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources","authors":"Tingting Yu,&nbsp;Shangyuan Ye,&nbsp;Rui Wang","doi":"10.1002/cjs.11793","DOIUrl":"10.1002/cjs.11793","url":null,"abstract":"<p>When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this article, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with subhomogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO and a multidirectional shrinkage penalty method). Finally, we apply the proposed method to the multicentre Childhood Adenotonsillectomy Trial to identify subhomogeneity in the treatment effects across different study sites.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42707966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contrast tests for groups of functional data 功能数据组的对比测试
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-08-19 DOI: 10.1002/cjs.11794
Quyen Do, Pang Du

Functional analysis of variance (ANOVA) models are often used to compare groups of functional data. Similar to the traditional ANOVA model, a common follow-up procedure to the rejection of the functional ANOVA null hypothesis is to perform functional linear contrast tests to identify which groups have different mean functions. Most existing functional contrast tests assume independent functional observations within each group. In this article, we introduce a new functional linear contrast test procedure that accounts for possible time dependency among functional group members. The test statistic and its normalized version, based on the Karhunen–Loève decomposition of the covariance function and a weak convergence result of the error processes, follow respectively a mixture chi-squared and a chi-squared distribution. An extensive simulation study is conducted to compare the empirical performance of the existing and new contrast tests. We also present two applications of these contrast tests to a weather study and a battery-life study. We provide software implementation and example data in the Supplementary Material.

方差函数分析(ANOVA)模型通常用于比较函数数据组。与传统的方差分析模型类似,拒绝功能性方差分析零假设的常见后续程序是进行功能性线性对比测试,以确定哪些组具有不同的平均函数。大多数现有的功能对比测试都假设在每组中进行独立的功能观察。在本文中,我们介绍了一种新的函数线性对比测试程序,该程序考虑了函数组成员之间可能的时间依赖性。基于协方差函数的Karhunen–Loève分解和误差过程的弱收敛结果,检验统计量及其归一化版本分别遵循混合卡方分布和卡方分布。进行了广泛的模拟研究,以比较现有和新的对比测试的经验性能。我们还介绍了这些对比测试在天气研究和电池寿命研究中的两个应用。我们在补充材料中提供了软件实现和示例数据。
{"title":"Contrast tests for groups of functional data","authors":"Quyen Do,&nbsp;Pang Du","doi":"10.1002/cjs.11794","DOIUrl":"10.1002/cjs.11794","url":null,"abstract":"<p>Functional analysis of variance (ANOVA) models are often used to compare groups of functional data. Similar to the traditional ANOVA model, a common follow-up procedure to the rejection of the functional ANOVA null hypothesis is to perform functional linear contrast tests to identify which groups have different mean functions. Most existing functional contrast tests assume independent functional observations within each group. In this article, we introduce a new functional linear contrast test procedure that accounts for possible time dependency among functional group members. The test statistic and its normalized version, based on the Karhunen–Loève decomposition of the covariance function and a weak convergence result of the error processes, follow respectively a mixture chi-squared and a chi-squared distribution. An extensive simulation study is conducted to compare the empirical performance of the existing and new contrast tests. We also present two applications of these contrast tests to a weather study and a battery-life study. We provide software implementation and example data in the Supplementary Material.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11794","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48159209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust joint modelling of sparsely observed paired functional data 稀疏观测配对函数数据的鲁棒联合建模
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-08-19 DOI: 10.1002/cjs.11796
Huiya Zhou, Xiaomeng Yan, Lan Zhou

A reduced-rank mixed-effects model is developed for robust modelling of sparsely observed paired functional data. In this model, the curves for each functional variable are summarized using a few functional principal components, and the association of the two functional variables is modelled through the association of the principal component scores. A multivariate-scale mixture of normal distributions is used to model the principal component scores and the measurement errors in order to handle outlying observations and achieve robust inference. The mean functions and principal component functions are modelled using splines, and roughness penalties are applied to avoid overfitting. An EM algorithm is developed for computation of model fitting and prediction. A simulation study shows that the proposed method outperforms an existing method, which is not designed for robust estimation. The effectiveness of the proposed method is illustrated through an application of fitting multiband light curves of Type Ia supernovae.

为稀疏观测的配对功能数据的鲁棒建模,开发了一种降秩混合效应模型。在该模型中,每个功能变量的曲线使用几个功能主成分进行总结,并通过主成分得分的关联来建模两个功能变量的关联。一个多元尺度的正态分布混合用于模拟主成分得分和测量误差,以处理离群观测和实现稳健推理。均值函数和主成分函数使用样条建模,并应用粗糙度惩罚以避免过拟合。提出了一种用于模型拟合和预测计算的电磁算法。仿真研究表明,该方法优于现有的非鲁棒估计方法。通过对Ia型超新星多波段光曲线的拟合,说明了该方法的有效性。
{"title":"Robust joint modelling of sparsely observed paired functional data","authors":"Huiya Zhou,&nbsp;Xiaomeng Yan,&nbsp;Lan Zhou","doi":"10.1002/cjs.11796","DOIUrl":"10.1002/cjs.11796","url":null,"abstract":"<p>A reduced-rank mixed-effects model is developed for robust modelling of sparsely observed paired functional data. In this model, the curves for each functional variable are summarized using a few functional principal components, and the association of the two functional variables is modelled through the association of the principal component scores. A multivariate-scale mixture of normal distributions is used to model the principal component scores and the measurement errors in order to handle outlying observations and achieve robust inference. The mean functions and principal component functions are modelled using splines, and roughness penalties are applied to avoid overfitting. An EM algorithm is developed for computation of model fitting and prediction. A simulation study shows that the proposed method outperforms an existing method, which is not designed for robust estimation. The effectiveness of the proposed method is illustrated through an application of fitting multiband light curves of Type Ia supernovae.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11796","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45602827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Special issue in honour of Nancy Reid: Guest Editors' introduction 纪念南希·里德的特刊:客座编辑的介绍
IF 0.6 4区 数学 Q4 Mathematics Pub Date : 2023-08-17 DOI: 10.1002/cjs.11792
We are delighted to present a special issue of The Canadian Journal of Statistics (CJS) in honour of Professor Nancy Reid. The articles in this collection have been contributed by a group of participants who attended a workshop entitled “Statistics at its Best” in Toronto on 5 May 2022. The workshop was organized by the Department of Statistical Sciences at the University of Toronto to celebrate Professor Reid’s 70th birthday. It highlighted her remarkable contributions to Statistical Science and her dedication to the profession, exemplified in research, leadership, service and education of the next generation of statisticians. Professor Reid’s impactful career has played a crucial role in fostering the growth of the Canadian statistical community. This workshop was part of a series of celebratory activities coordinated by the Statistical Society of Canada, marking the 50th anniversary of the statistical community in this country. This collection of articles encompasses a wide range of topics. First, the engaging dialogue A conversation with Nancy Reid by Craiu and Yi sheds light on Professor Reid’s intellectual journey and perspectives on statistical science and data science. In The inducement of population sparsity, Battey presents the pioneering work on parameter orthogonalization by Cox and Reid as an inducement of abstract population-level sparsity. The article focuses on three important examples related to sparsity-inducing parameterizations or data transformations: covariance models, nuisance parameter elimination and high-dimensional regression. Strategies for inducing sparsity vary depending on the context and may involve solving partial differential equations or specifying parameterized paths. Battey concludes by presenting some open problems. McCullagh then highlights, in A tale of two variances, the ambiguity and potential misinterpretation of the standard repeated-sampling concept of the variance in a finite-dimensional parametric model. He presents three operational interpretations, all numerically distinct and compatible with repeated sampling from a fixed parameter population. These interpretations help resolve contradictions between Fisherian variance and inverse-information variance. We next turn to hypothesis testing for parameters on the boundary of their domain. In Improved inference for a boundary parameter, Elkantassi, Bellio, Brazzale and Davison review theoretical work on the problem, including hard and soft boundaries, and iceberg estimators. They highlight the significant underestimation of the probability due to the limiting results, propose remedies based on the normal approximation for the profile score function, and outline the success of higher order approximations. Using these approaches, the authors develop an accurate test to assess the need for a spline component in a linear mixed model. In Sparse estimation within Pearson’s system, with an application to financial market risk, Carey, Genest and Ramsay tackle t
我们很高兴向南希·里德教授颁发《加拿大统计杂志》(CJS)特刊。本文集中的文章是由参加2022年5月5日在多伦多举行的题为“最佳统计”的讲习班的一组参与者提供的。该研讨会由多伦多大学统计科学系举办,以庆祝里德教授70岁生日。它突出了她对统计科学的杰出贡献和她对这一职业的奉献精神,体现在下一代统计学家的研究、领导、服务和教育方面。里德教授影响深远的职业生涯在促进加拿大统计界的发展方面发挥了至关重要的作用。这个讲习班是加拿大统计学会协调的一系列庆祝活动的一部分,以纪念该国统计界成立50周年。这个文章集合包含了广泛的主题。首先是引人入胜的对话:craig和Yi与Nancy Reid的对话,揭示了Reid教授在统计科学和数据科学方面的知识历程和观点。在种群稀疏性的诱导中,Battey将Cox和Reid在参数正交化方面的开创性工作作为抽象种群级稀疏性的诱导。本文重点介绍了与稀疏性参数化或数据转换相关的三个重要示例:协方差模型、干扰参数消除和高维回归。诱导稀疏性的策略因环境而异,可能涉及求解偏微分方程或指定参数化路径。巴特最后提出了一些有待解决的问题。McCullagh接着在《两个方差的故事》中强调了有限维参数模型中方差的标准重复采样概念的模糊性和潜在的误解。他提出了三种可操作的解释,所有这些解释在数字上都是不同的,并且与固定参数总体的重复抽样兼容。这些解释有助于解决fisher方差和逆信息方差之间的矛盾。接下来,我们转向对其域边界上的参数进行假设检验。在边界参数的改进推理中,Elkantassi, Bellio, Brazzale和Davison回顾了该问题的理论工作,包括硬边界和软边界,以及冰山估计器。他们强调了由于限制结果而导致的概率严重低估,提出了基于剖面分数函数的正态近似的补救措施,并概述了高阶近似的成功。使用这些方法,作者开发了一个准确的测试,以评估需要一个样条成分在一个线性混合模型。在皮尔逊系统内的稀疏估计中,Carey、Genest和Ramsay将其应用于金融市场风险,解决了估计皮尔逊系统内密度的挑战性任务,这是一类包含许多经典单变量分布的模型。作者提出了一种将惩罚回归和轮廓估计技术相结合的有效方法。通过模拟和使用标准普尔500指数数据的应用,他们证明了该方法大大提高了市场风险评估,优于金融机构和监管机构目前使用的风险价值和预期不足估计。Urban, Bong, Orellana和Kass探索振荡神经回路:相位,振幅和复杂的正态分布。他们考虑了频域中的多个振荡时间序列,并讨论了复值相关性,它与实值Pearson的相似之处
{"title":"Special issue in honour of Nancy Reid: Guest Editors' introduction","authors":"","doi":"10.1002/cjs.11792","DOIUrl":"https://doi.org/10.1002/cjs.11792","url":null,"abstract":"We are delighted to present a special issue of The Canadian Journal of Statistics (CJS) in honour of Professor Nancy Reid. The articles in this collection have been contributed by a group of participants who attended a workshop entitled “Statistics at its Best” in Toronto on 5 May 2022. The workshop was organized by the Department of Statistical Sciences at the University of Toronto to celebrate Professor Reid’s 70th birthday. It highlighted her remarkable contributions to Statistical Science and her dedication to the profession, exemplified in research, leadership, service and education of the next generation of statisticians. Professor Reid’s impactful career has played a crucial role in fostering the growth of the Canadian statistical community. This workshop was part of a series of celebratory activities coordinated by the Statistical Society of Canada, marking the 50th anniversary of the statistical community in this country. This collection of articles encompasses a wide range of topics. First, the engaging dialogue A conversation with Nancy Reid by Craiu and Yi sheds light on Professor Reid’s intellectual journey and perspectives on statistical science and data science. In The inducement of population sparsity, Battey presents the pioneering work on parameter orthogonalization by Cox and Reid as an inducement of abstract population-level sparsity. The article focuses on three important examples related to sparsity-inducing parameterizations or data transformations: covariance models, nuisance parameter elimination and high-dimensional regression. Strategies for inducing sparsity vary depending on the context and may involve solving partial differential equations or specifying parameterized paths. Battey concludes by presenting some open problems. McCullagh then highlights, in A tale of two variances, the ambiguity and potential misinterpretation of the standard repeated-sampling concept of the variance in a finite-dimensional parametric model. He presents three operational interpretations, all numerically distinct and compatible with repeated sampling from a fixed parameter population. These interpretations help resolve contradictions between Fisherian variance and inverse-information variance. We next turn to hypothesis testing for parameters on the boundary of their domain. In Improved inference for a boundary parameter, Elkantassi, Bellio, Brazzale and Davison review theoretical work on the problem, including hard and soft boundaries, and iceberg estimators. They highlight the significant underestimation of the probability due to the limiting results, propose remedies based on the normal approximation for the profile score function, and outline the success of higher order approximations. Using these approaches, the authors develop an accurate test to assess the need for a spline component in a linear mixed model. In Sparse estimation within Pearson’s system, with an application to financial market risk, Carey, Genest and Ramsay tackle t","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"51300145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Special issue in honour of Nancy Reid: Guest Editors' introduction 南希·里德特刊:客座编辑介绍
IF 0.6 4区 数学 Q4 Mathematics Pub Date : 2023-08-17 DOI: 10.1002/cjs.11792
We are delighted to present a special issue of The Canadian Journal of Statistics (CJS) in honour of Professor Nancy Reid. The articles in this collection have been contributed by a group of participants who attended a workshop entitled “Statistics at its Best” in Toronto on 5 May 2022. The workshop was organized by the Department of Statistical Sciences at the University of Toronto to celebrate Professor Reid’s 70th birthday. It highlighted her remarkable contributions to Statistical Science and her dedication to the profession, exemplified in research, leadership, service and education of the next generation of statisticians. Professor Reid’s impactful career has played a crucial role in fostering the growth of the Canadian statistical community. This workshop was part of a series of celebratory activities coordinated by the Statistical Society of Canada, marking the 50th anniversary of the statistical community in this country. This collection of articles encompasses a wide range of topics. First, the engaging dialogue A conversation with Nancy Reid by Craiu and Yi sheds light on Professor Reid’s intellectual journey and perspectives on statistical science and data science. In The inducement of population sparsity, Battey presents the pioneering work on parameter orthogonalization by Cox and Reid as an inducement of abstract population-level sparsity. The article focuses on three important examples related to sparsity-inducing parameterizations or data transformations: covariance models, nuisance parameter elimination and high-dimensional regression. Strategies for inducing sparsity vary depending on the context and may involve solving partial differential equations or specifying parameterized paths. Battey concludes by presenting some open problems. McCullagh then highlights, in A tale of two variances, the ambiguity and potential misinterpretation of the standard repeated-sampling concept of the variance in a finite-dimensional parametric model. He presents three operational interpretations, all numerically distinct and compatible with repeated sampling from a fixed parameter population. These interpretations help resolve contradictions between Fisherian variance and inverse-information variance. We next turn to hypothesis testing for parameters on the boundary of their domain. In Improved inference for a boundary parameter, Elkantassi, Bellio, Brazzale and Davison review theoretical work on the problem, including hard and soft boundaries, and iceberg estimators. They highlight the significant underestimation of the probability due to the limiting results, propose remedies based on the normal approximation for the profile score function, and outline the success of higher order approximations. Using these approaches, the authors develop an accurate test to assess the need for a spline component in a linear mixed model. In Sparse estimation within Pearson’s system, with an application to financial market risk, Carey, Genest and Ramsay tackle t
{"title":"Special issue in honour of Nancy Reid: Guest Editors' introduction","authors":"","doi":"10.1002/cjs.11792","DOIUrl":"https://doi.org/10.1002/cjs.11792","url":null,"abstract":"We are delighted to present a special issue of The Canadian Journal of Statistics (CJS) in honour of Professor Nancy Reid. The articles in this collection have been contributed by a group of participants who attended a workshop entitled “Statistics at its Best” in Toronto on 5 May 2022. The workshop was organized by the Department of Statistical Sciences at the University of Toronto to celebrate Professor Reid’s 70th birthday. It highlighted her remarkable contributions to Statistical Science and her dedication to the profession, exemplified in research, leadership, service and education of the next generation of statisticians. Professor Reid’s impactful career has played a crucial role in fostering the growth of the Canadian statistical community. This workshop was part of a series of celebratory activities coordinated by the Statistical Society of Canada, marking the 50th anniversary of the statistical community in this country. This collection of articles encompasses a wide range of topics. First, the engaging dialogue A conversation with Nancy Reid by Craiu and Yi sheds light on Professor Reid’s intellectual journey and perspectives on statistical science and data science. In The inducement of population sparsity, Battey presents the pioneering work on parameter orthogonalization by Cox and Reid as an inducement of abstract population-level sparsity. The article focuses on three important examples related to sparsity-inducing parameterizations or data transformations: covariance models, nuisance parameter elimination and high-dimensional regression. Strategies for inducing sparsity vary depending on the context and may involve solving partial differential equations or specifying parameterized paths. Battey concludes by presenting some open problems. McCullagh then highlights, in A tale of two variances, the ambiguity and potential misinterpretation of the standard repeated-sampling concept of the variance in a finite-dimensional parametric model. He presents three operational interpretations, all numerically distinct and compatible with repeated sampling from a fixed parameter population. These interpretations help resolve contradictions between Fisherian variance and inverse-information variance. We next turn to hypothesis testing for parameters on the boundary of their domain. In Improved inference for a boundary parameter, Elkantassi, Bellio, Brazzale and Davison review theoretical work on the problem, including hard and soft boundaries, and iceberg estimators. They highlight the significant underestimation of the probability due to the limiting results, propose remedies based on the normal approximation for the profile score function, and outline the success of higher order approximations. Using these approaches, the authors develop an accurate test to assess the need for a spline component in a linear mixed model. In Sparse estimation within Pearson’s system, with an application to financial market risk, Carey, Genest and Ramsay tackle t","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50135645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering and semi-supervised classification for clickstream data via mixture models 通过混合模型聚类和半监督分类点击流数据
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-08-17 DOI: 10.1002/cjs.11795
Michael P. B. Gallaugher, Paul D. McNicholas

Finite mixture models have been used for unsupervised learning for some time, and their use within the semisupervised paradigm is becoming more commonplace. Clickstream data are one of the various emerging data types that demand particular attention because there is a notable paucity of statistical learning approaches currently available. A mixture of first-order continuous-time Markov models is introduced for unsupervised and semisupervised learning of clickstream data. This approach assumes continuous time, which distinguishes it from existing mixture model-based approaches; practically, this allows account to be taken of the amount of time each user spends on each webpage. The approach is evaluated and compared with the discrete-time approach, using simulated and real data.

有限混合模型用于无监督学习已经有一段时间了,它们在半监督范式中的使用越来越普遍。点击流数据是需要特别关注的各种新兴数据类型之一,因为目前可用的统计学习方法明显不足。引入了一阶连续时间马尔可夫模型的混合,用于点击流数据的无监督和半监督学习。这种方法假设时间是连续的,这将其与现有的基于混合模型的方法区分开来;实际上,这允许考虑每个用户在每个网页上花费的时间量。使用模拟和真实数据对该方法进行了评估,并将其与离散时间方法进行了比较。
{"title":"Clustering and semi-supervised classification for clickstream data via mixture models","authors":"Michael P. B. Gallaugher,&nbsp;Paul D. McNicholas","doi":"10.1002/cjs.11795","DOIUrl":"10.1002/cjs.11795","url":null,"abstract":"<p>Finite mixture models have been used for unsupervised learning for some time, and their use within the semisupervised paradigm is becoming more commonplace. Clickstream data are one of the various emerging data types that demand particular attention because there is a notable paucity of statistical learning approaches currently available. A mixture of first-order continuous-time Markov models is introduced for unsupervised and semisupervised learning of clickstream data. This approach assumes continuous time, which distinguishes it from existing mixture model-based approaches; practically, this allows account to be taken of the amount of time each user spends on each webpage. The approach is evaluated and compared with the discrete-time approach, using simulated and real data.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.8,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49122235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifiability constraints in generalized additive models 广义加性模型中的可识别性约束
IF 0.6 4区 数学 Q4 Mathematics Pub Date : 2023-08-08 DOI: 10.1002/cjs.11786
Alex Stringer

Identifiability constraints are necessary for parameter estimation when fitting models with nonlinear covariate associations. The choice of constraint affects standard errors of the estimated curve. Centring constraints are often applied by default because they are thought to yield lowest standard errors out of any constraint, but this claim has not been investigated. We show that whether centring constraints are optimal depends on the response distribution and parameterization, and that for natural exponential family responses under the canonical parametrization, centring constraints are optimal only for Gaussian response.

当拟合具有非线性协变量关联的模型时,可辨识性约束是参数估计所必需的。约束条件的选择影响估计曲线的标准误差。定心约束通常默认应用,因为它们被认为在任何约束中产生最低的标准误差,但这种说法尚未得到调查。我们证明了集中约束是否最优取决于响应分布和参数化,并且对于典型参数化下的自然指数族响应,集中约束仅对高斯响应是最优的。
{"title":"Identifiability constraints in generalized additive models","authors":"Alex Stringer","doi":"10.1002/cjs.11786","DOIUrl":"10.1002/cjs.11786","url":null,"abstract":"<p>Identifiability constraints are necessary for parameter estimation when fitting models with nonlinear covariate associations. The choice of constraint affects standard errors of the estimated curve. Centring constraints are often applied by default because they are thought to yield lowest standard errors out of any constraint, but this claim has not been investigated. We show that whether centring constraints are optimal depends on the response distribution and parameterization, and that for natural exponential family responses under the canonical parametrization, centring constraints are optimal only for Gaussian response.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11786","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45183591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional model averaging for quantile regression 分位数回归的高维模型平均
IF 0.6 4区 数学 Q4 Mathematics Pub Date : 2023-08-08 DOI: 10.1002/cjs.11789
Jinhan Xie, Xianwen Ding, Bei Jiang, Xiaodong Yan, Linglong Kong

This article considers robust prediction issues in ultrahigh-dimensional (UHD) datasets and proposes combining quantile regression with sequential model averaging to arrive at a quantile sequential model averaging (QSMA) procedure. The QSMA method is made computationally feasible by employing a sequential screening process and a Bayesian information criterion (BIC) model averaging method for UHD quantile regression and provides a more accurate and stable prediction of the conditional quantile of a response variable. Meanwhile, the proposed method shows effective behaviour in dealing with prediction in UHD datasets and saves a great deal of computational cost with the help of the sequential technique. Under some suitable conditions, we show that the proposed QSMA method can mitigate overfitting and yields reliable predictions. Numerical studies, including extensive simulations and a real data example, are presented to confirm that the proposed method performs well.

本文考虑了超高维(UHD)数据集的鲁棒预测问题,并提出将分位数回归与顺序模型平均相结合,以达到分位数顺序模型平均(QSMA)过程。采用序列筛选过程和贝叶斯信息准则(BIC)模型平均方法进行UHD分位数回归,使QSMA方法在计算上可行,并能更准确、更稳定地预测响应变量的条件分位数。同时,该方法在处理超高清数据集的预测方面表现出有效的性能,并借助序列技术节省了大量的计算成本。在一些合适的条件下,我们证明了所提出的QSMA方法可以减轻过拟合并产生可靠的预测。数值研究,包括大量的模拟和一个真实的数据实例,证实了所提出的方法是有效的。
{"title":"High-dimensional model averaging for quantile regression","authors":"Jinhan Xie,&nbsp;Xianwen Ding,&nbsp;Bei Jiang,&nbsp;Xiaodong Yan,&nbsp;Linglong Kong","doi":"10.1002/cjs.11789","DOIUrl":"10.1002/cjs.11789","url":null,"abstract":"<p>This article considers robust prediction issues in ultrahigh-dimensional (UHD) datasets and proposes combining quantile regression with sequential model averaging to arrive at a quantile sequential model averaging (QSMA) procedure. The QSMA method is made computationally feasible by employing a sequential screening process and a Bayesian information criterion (BIC) model averaging method for UHD quantile regression and provides a more accurate and stable prediction of the conditional quantile of a response variable. Meanwhile, the proposed method shows effective behaviour in dealing with prediction in UHD datasets and saves a great deal of computational cost with the help of the sequential technique. Under some suitable conditions, we show that the proposed QSMA method can mitigate overfitting and yields reliable predictions. Numerical studies, including extensive simulations and a real data example, are presented to confirm that the proposed method performs well.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11789","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48251981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved inference for a boundary parameter 改进的边界参数推理
IF 0.6 4区 数学 Q4 Mathematics Pub Date : 2023-08-04 DOI: 10.1002/cjs.11791
Soumaya Elkantassi, Ruggero Bellio, Alessandra R. Brazzale, Anthony C. Davison

The limiting distributions of statistics used to test hypotheses about parameters on the boundary of their domains may provide very poor approximations to the finite-sample behaviour of these statistics, even for very large samples. We review theoretical work on this problem, describe hard and soft boundaries and iceberg estimators, and give examples highlighting how the limiting results greatly underestimate the probability that the parameter lies on its boundary even in very large samples. We propose and evaluate some simple remedies for this difficulty based on normal approximation for the profile score function, and then outline how higher order approximations yield excellent results in a range of hard and soft boundary examples. We use the approach to develop an accurate test for the need for a spline component in a linear mixed model.

用于测试域边界上参数假设的统计数据的极限分布可能会对这些统计数据的有限样本行为提供非常差的近似值,即使对于非常大的样本也是如此。我们回顾了关于这个问题的理论工作,描述了硬边界和软边界以及冰山估计量,并举例强调了极限结果如何大大低估了参数位于其边界上的概率,即使在非常大的样本中也是如此。基于轮廓分数函数的正态近似,我们提出并评估了一些解决这一困难的简单方法,然后概述了高阶近似如何在一系列硬边界和软边界示例中产生出色的结果。我们使用该方法来开发一个精确的测试,以满足线性混合模型中样条曲线组件的需求。
{"title":"Improved inference for a boundary parameter","authors":"Soumaya Elkantassi,&nbsp;Ruggero Bellio,&nbsp;Alessandra R. Brazzale,&nbsp;Anthony C. Davison","doi":"10.1002/cjs.11791","DOIUrl":"10.1002/cjs.11791","url":null,"abstract":"<p>The limiting distributions of statistics used to test hypotheses about parameters on the boundary of their domains may provide very poor approximations to the finite-sample behaviour of these statistics, even for very large samples. We review theoretical work on this problem, describe hard and soft boundaries and iceberg estimators, and give examples highlighting how the limiting results greatly underestimate the probability that the parameter lies on its boundary even in very large samples. We propose and evaluate some simple remedies for this difficulty based on normal approximation for the profile score function, and then outline how higher order approximations yield excellent results in a range of hard and soft boundary examples. We use the approach to develop an accurate test for the need for a spline component in a linear mixed model.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2023-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11791","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41342009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Joint modelling of quantile regression for longitudinal data with information observation times and a terminal event 具有信息观测时间和终端事件的纵向数据的分位数回归联合建模
IF 0.6 4区 数学 Q4 Mathematics Pub Date : 2023-07-31 DOI: 10.1002/cjs.11782
Weicai Pang, Yutao Liu, Xingqiu Zhao, Yong Zhou

Longitudinal data arise frequently in biomedical follow-up observation studies. Conditional mean regression and conditional quantile regression are two popular approaches to model longitudinal data. Many results are derived under the case where the response variables are independent of the observation times. In this article, we propose a quantile regression model for the analysis of longitudinal data, where the longitudinal responses are allowed to not only depend on the past observation history but also associate with a terminal event (e.g., death). Non-smoothing estimating equation approaches are developed to estimate parameters, and the consistency and asymptotic normality of the proposed estimators are established. The asymptotic variance is estimated by a resampling method. A majorize-minimize algorithm is proposed to compute the proposed estimators. Simulation studies show that the proposed estimators perform well, and an HIV-RNA dataset is used to illustrate the proposed method.

纵向数据经常出现在生物医学跟踪观察研究中。条件均值回归和条件量回归是建立纵向数据模型的两种常用方法。许多结果都是在响应变量与观察时间无关的情况下得出的。在本文中,我们提出了一种用于分析纵向数据的量化回归模型,在这种模型中,纵向响应不仅取决于过去的观察历史,而且还与终结事件(如死亡)相关联。我们开发了非平滑估计方程方法来估计参数,并建立了所建议估计器的一致性和渐近正态性。渐近方差是通过重采样方法估算的。此外,还提出了一种计算拟议估计值的主要最小化算法。模拟研究表明,所提出的估计器性能良好,并使用 HIV-RNA 数据集来说明所提出的方法。
{"title":"Joint modelling of quantile regression for longitudinal data with information observation times and a terminal event","authors":"Weicai Pang,&nbsp;Yutao Liu,&nbsp;Xingqiu Zhao,&nbsp;Yong Zhou","doi":"10.1002/cjs.11782","DOIUrl":"10.1002/cjs.11782","url":null,"abstract":"<p>Longitudinal data arise frequently in biomedical follow-up observation studies. Conditional mean regression and conditional quantile regression are two popular approaches to model longitudinal data. Many results are derived under the case where the response variables are independent of the observation times. In this article, we propose a quantile regression model for the analysis of longitudinal data, where the longitudinal responses are allowed to not only depend on the past observation history but also associate with a terminal event (e.g., death). Non-smoothing estimating equation approaches are developed to estimate parameters, and the consistency and asymptotic normality of the proposed estimators are established. The asymptotic variance is estimated by a resampling method. A majorize-minimize algorithm is proposed to compute the proposed estimators. Simulation studies show that the proposed estimators perform well, and an HIV-RNA dataset is used to illustrate the proposed method.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44865451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Canadian Journal of Statistics-Revue Canadienne De Statistique
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1