首页 > 最新文献

Canadian Journal of Statistics-Revue Canadienne De Statistique最新文献

英文 中文
A parameter transformation of the anisotropic Matérn covariance function 各向异性mat<s:1>协方差函数的参数变换
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-02-10 DOI: 10.1002/cjs.11839
Kamal Rai, Patrick E. Brown

We describe a polar coordinate transformation of the anisotropy parameters of the Matérn covariance function, which provides two benefits over the standard parameterization. First, it identifies a single point (the origin) with the special case of isotropy. Second, the posterior distribution of the transformed anisotropic angle and ratio is approximately bell-shaped and unimodal even in the case of isotropy. This has advantages for parameter inference and density estimation. We also apply a transformation to the standard deviation and range such that they are approximately orthogonal. We demonstrate this parameter transformation through two simulated and two real data sets, and conclude by considering possible extensions, such as implementing this transformation for approximate Bayesian inference methods.

我们描述了mat协方差函数的各向异性参数的极坐标变换,它比标准参数化提供了两个好处。首先,它在各向同性的特殊情况下识别单个点(原点)。在各向同性的情况下,变换后的各向异性角和比值的后向分布近似为钟形单峰分布。这在参数推断和密度估计方面具有优势。我们还对标准差和极差进行变换,使它们近似正交。我们通过两个模拟数据集和两个真实数据集演示了这种参数转换,并考虑了可能的扩展,例如在近似贝叶斯推理方法中实现这种转换。
{"title":"A parameter transformation of the anisotropic Matérn covariance function","authors":"Kamal Rai,&nbsp;Patrick E. Brown","doi":"10.1002/cjs.11839","DOIUrl":"https://doi.org/10.1002/cjs.11839","url":null,"abstract":"<p>We describe a polar coordinate transformation of the anisotropy parameters of the Matérn covariance function, which provides two benefits over the standard parameterization. First, it identifies a single point (the origin) with the special case of isotropy. Second, the posterior distribution of the transformed anisotropic angle and ratio is approximately bell-shaped and unimodal even in the case of isotropy. This has advantages for parameter inference and density estimation. We also apply a transformation to the standard deviation and range such that they are approximately orthogonal. We demonstrate this parameter transformation through two simulated and two real data sets, and conclude by considering possible extensions, such as implementing this transformation for approximate Bayesian inference methods.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 2","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11839","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The quantile-based classifier with variable-wise parameters 具有可变参数的基于分位数的分类器
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-02-01 DOI: 10.1002/cjs.11837
Marco Berrettini, Christian Martin Hennig, Cinzia Viroli

Quantile-based classifiers can classify high-dimensional observations by minimizing a discrepancy of an observation to a class based on suitable quantiles of the within-class distributions, corresponding to a unique percentage for all variables. The present work extends these classifiers by introducing a way to determine potentially different optimal percentages for different variables. Furthermore, a variable-wise scale parameter is introduced. A simple greedy algorithm to estimate the parameters is proposed. Their consistency in a nonparametric setting is proved. Experiments using artificially generated and real data confirm the potential of the quantile-based classifier with variable-wise parameters.

基于分位数的分类器可以根据类内分布的合适分位数(对应于所有变量的唯一百分比)最小化观测值与类的差异,从而对高维观测值进行分类。目前的工作通过引入一种方法来确定不同变量的潜在不同的最佳百分比来扩展这些分类器。此外,还引入了可变尺度参数。提出了一种简单的贪心算法来估计参数。证明了它们在非参数条件下的相合性。使用人工生成和真实数据的实验证实了具有可变参数的基于分位数的分类器的潜力。
{"title":"The quantile-based classifier with variable-wise parameters","authors":"Marco Berrettini,&nbsp;Christian Martin Hennig,&nbsp;Cinzia Viroli","doi":"10.1002/cjs.11837","DOIUrl":"https://doi.org/10.1002/cjs.11837","url":null,"abstract":"<p>Quantile-based classifiers can classify high-dimensional observations by minimizing a discrepancy of an observation to a class based on suitable quantiles of the within-class distributions, corresponding to a unique percentage for all variables. The present work extends these classifiers by introducing a way to determine potentially different optimal percentages for different variables. Furthermore, a variable-wise scale parameter is introduced. A simple greedy algorithm to estimate the parameters is proposed. Their consistency in a nonparametric setting is proved. Experiments using artificially generated and real data confirm the potential of the quantile-based classifier with variable-wise parameters.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 2","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11837","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balanced longitudinal data clustering with a copula kernel mixture model 基于copula核混合模型的平衡纵向数据聚类
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-01-31 DOI: 10.1002/cjs.11838
Xi Zhang, Orla A. Murphy, Paul D. McNicholas

Many common clustering methods cannot be used for clustering balanced multivariate longitudinal data in cases where the covariance of variables is a function of the time points. In this article, a copula kernel mixture model (CKMM) is proposed for clustering data of this type. The CKMM is a finite mixture model that decomposes each mixture component's joint density function into a copula and marginal distribution functions. In this decomposition, the Gaussian copula is used due to its mathematical tractability and Gaussian kernel functions are used to estimate the marginal distributions. A generalized expectation-maximization algorithm is used to estimate the model parameters. The performance of the proposed model is assessed in a simulation study and on two real datasets. The proposed model is shown to have effective performance in comparison with standard methods, such as K-means with dynamic time warping clustering, latent growth models and functional high-dimensional data clustering.

在变量协方差是时间点函数的情况下,许多常见的聚类方法都无法用于平衡多变量纵向数据的聚类。本文提出了一种共轭核混合模型(CKMM),用于对这类数据进行聚类。CKMM 是一种有限混合物模型,它将每个混合物成分的联合密度函数分解为 copula 和边际分布函数。在这一分解中,由于高斯协方差在数学上的可操作性,因此使用了高斯协方差,并使用高斯核函数来估计边际分布。使用广义期望最大化算法来估计模型参数。在模拟研究和两个真实数据集上对所提模型的性能进行了评估。结果表明,与标准方法(如带有动态时间扭曲聚类的 K -均值法、潜在增长模型和函数式高维数据聚类)相比,所提出的模型具有有效的性能。
{"title":"Balanced longitudinal data clustering with a copula kernel mixture model","authors":"Xi Zhang,&nbsp;Orla A. Murphy,&nbsp;Paul D. McNicholas","doi":"10.1002/cjs.11838","DOIUrl":"https://doi.org/10.1002/cjs.11838","url":null,"abstract":"<p>Many common clustering methods cannot be used for clustering balanced multivariate longitudinal data in cases where the covariance of variables is a function of the time points. In this article, a copula kernel mixture model (CKMM) is proposed for clustering data of this type. The CKMM is a finite mixture model that decomposes each mixture component's joint density function into a copula and marginal distribution functions. In this decomposition, the Gaussian copula is used due to its mathematical tractability and Gaussian kernel functions are used to estimate the marginal distributions. A generalized expectation-maximization algorithm is used to estimate the model parameters. The performance of the proposed model is assessed in a simulation study and on two real datasets. The proposed model is shown to have effective performance in comparison with standard methods, such as <span></span><math>\u0000 <mrow>\u0000 <mi>K</mi>\u0000 </mrow></math>-means with dynamic time warping clustering, latent growth models and functional high-dimensional data clustering.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 1","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11838","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143497168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Susceptible-infected-recovered model with stochastic transmission 随机传播易感-感染-恢复模型
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-01-31 DOI: 10.1002/cjs.11835
Christian Gouriéroux, Yang Lu

The susceptible-infected-recovered (SIR) model is the cornerstone of epidemiological models. However, this specification depends on two parameters only, which results in its lack of flexibility and explains its difficulty to replicate the volatile reproduction numbers observed in practice. We extend the standard SIR model to a semiparametric SIR model, by first introducing a functional parameter of transmission, and then making this function stochastic. This leads to a SIR model with stochastic transmission. Our model is particularly tractable. We derive its closed-form solution and use it to compute key indicators, such as the condition (and the threshold) of herd immunity and the timing of the peak. When the population size is finite and the observations are in discrete time, there is also observational uncertainty. We propose a nonlinear state-space framework under which we analyze the relative magnitudes of the observational and intrinsic uncertainties during the evolution of the epidemic. We emphasize the lack of robustness of the notion of herd immunity when the SIR model is time-discretized.

易感-感染-康复模型是流行病学模型的基础。然而,该规范仅取决于两个参数,这导致其缺乏灵活性,并解释了其难以复制在实践中观察到的不稳定的复制数。我们将标准SIR模型扩展为半参数SIR模型,首先引入传输函数参数,然后使该函数随机化。这就得到了具有随机传输的SIR模型。我们的模型特别容易处理。我们推导了它的封闭解,并用它来计算关键指标,如群体免疫的条件(和阈值)和高峰的时间。当种群规模有限且观测时间离散时,也存在观测不确定性。我们提出了一个非线性状态空间框架,在该框架下,我们分析了在疫情演变过程中观测和内在不确定性的相对大小。我们强调,当SIR模型是时间离散时,群体免疫的概念缺乏鲁棒性。
{"title":"Susceptible-infected-recovered model with stochastic transmission","authors":"Christian Gouriéroux,&nbsp;Yang Lu","doi":"10.1002/cjs.11835","DOIUrl":"https://doi.org/10.1002/cjs.11835","url":null,"abstract":"<p>The susceptible-infected-recovered (SIR) model is the cornerstone of epidemiological models. However, this specification depends on two parameters only, which results in its lack of flexibility and explains its difficulty to replicate the volatile reproduction numbers observed in practice. We extend the standard SIR model to a semiparametric SIR model, by first introducing a functional parameter of transmission, and then making this function stochastic. This leads to a SIR model with stochastic transmission. Our model is particularly tractable. We derive its closed-form solution and use it to compute key indicators, such as the condition (and the threshold) of herd immunity and the timing of the peak. When the population size is finite and the observations are in discrete time, there is also observational uncertainty. We propose a nonlinear state-space framework under which we analyze the relative magnitudes of the observational and intrinsic uncertainties during the evolution of the epidemic. We emphasize the lack of robustness of the notion of herd immunity when the SIR model is time-discretized.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 2","pages":""},"PeriodicalIF":0.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11835","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probabilistic weighted Dirichlet process mixture with an application to stochastic volatility models 概率加权狄利克雷混合过程及其在随机波动模型中的应用
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-12-31 DOI: 10.1002/cjs.11834
Peng Sun, Inyoung Kim, Ki-Ahm Lee

In this article, we propose a flexible Bayesian modelling framework and investigate the probabilistic weighted Dirichlet process mixture (pWDPM). The construction and properties of a probabilistic weight function are illustrated. The advantage of the pWDPM under the log-squared transformed stochastic volatility (SV) model is demonstrated. We achieve greater modelling flexibility by relaxing the distributional assumption of the error term. Bayesian inference for the pWDPM under SV and sampling procedures are provided. The performance of the pWDPM is evaluated using simulation studies and empirical results. Both computational efficiency and model accuracy are achieved through the pWDPM.

在本文中,我们提出了一个灵活的贝叶斯建模框架,并研究了概率加权狄利克雷过程混合(pWDPM)。说明了概率权函数的构造和性质。验证了pWDPM在对数平方变换随机波动率(SV)模型下的优势。我们通过放宽误差项的分布假设来实现更大的建模灵活性。给出了SV下pWDPM的贝叶斯推断和抽样过程。利用仿真研究和实证结果对pWDPM的性能进行了评价。该方法既提高了计算效率,又提高了模型精度。
{"title":"Probabilistic weighted Dirichlet process mixture with an application to stochastic volatility models","authors":"Peng Sun,&nbsp;Inyoung Kim,&nbsp;Ki-Ahm Lee","doi":"10.1002/cjs.11834","DOIUrl":"https://doi.org/10.1002/cjs.11834","url":null,"abstract":"<p>In this article, we propose a flexible Bayesian modelling framework and investigate the probabilistic weighted Dirichlet process mixture (pWDPM). The construction and properties of a probabilistic weight function are illustrated. The advantage of the pWDPM under the log-squared transformed stochastic volatility (SV) model is demonstrated. We achieve greater modelling flexibility by relaxing the distributional assumption of the error term. Bayesian inference for the pWDPM under SV and sampling procedures are provided. The performance of the pWDPM is evaluated using simulation studies and empirical results. Both computational efficiency and model accuracy are achieved through the pWDPM.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 2","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11834","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new class of asymptotic maximin distance Latin hypercube designs 一类新的渐近极大距离拉丁超立方体设计
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-12-30 DOI: 10.1002/cjs.11836
Xinxin Xia, Wenlong Li, Pengnan Li

Maximin distance Latin hypercube designs have been widely used in computer experiments because they can achieve one-dimensional stratification and full-dimensional space-filling properties. In this article, we propose a new method for constructing a class of Latin hypercube designs that can accommodate many columns. We show that the resulting designs are asymptotically optimal under the maximin distance criterion, and enjoy a large proportion of low-dimensional stratification properties that strong orthogonal arrays should have. In addition, the proposed method can be used to construct a class of asymptotically optimal sliced maximin distance Latin hypercube designs. These designs are well suited to computer experiments due to their good space-filling properties.

最大距离拉丁超立方体设计由于能够实现一维分层和全维空间填充特性,在计算机实验中得到了广泛应用。在本文中,我们提出了一种新的方法来构造一类可以容纳多列的拉丁超立方体设计。结果表明,在最大距离准则下,所得到的设计是渐近最优的,并且具有强正交阵列应有的低维分层特性的很大比例。此外,该方法还可用于构造一类渐近最优切片最大距离拉丁超立方体设计。由于其良好的空间填充特性,这些设计非常适合于计算机实验。
{"title":"A new class of asymptotic maximin distance Latin hypercube designs","authors":"Xinxin Xia,&nbsp;Wenlong Li,&nbsp;Pengnan Li","doi":"10.1002/cjs.11836","DOIUrl":"https://doi.org/10.1002/cjs.11836","url":null,"abstract":"<p>Maximin distance Latin hypercube designs have been widely used in computer experiments because they can achieve one-dimensional stratification and full-dimensional space-filling properties. In this article, we propose a new method for constructing a class of Latin hypercube designs that can accommodate many columns. We show that the resulting designs are asymptotically optimal under the maximin distance criterion, and enjoy a large proportion of low-dimensional stratification properties that strong orthogonal arrays should have. In addition, the proposed method can be used to construct a class of asymptotically optimal sliced maximin distance Latin hypercube designs. These designs are well suited to computer experiments due to their good space-filling properties.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 2","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
True and false discoveries with independent and sequential e-values 具有独立和连续电子值的真假发现
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-10-21 DOI: 10.1002/cjs.11833
Vladimir Vovk, Ruodu Wang

In this article, we use e-values in the context of multiple hypothesis testing, assuming that the base tests produce independent, or sequential, e-values. Our simulation and empirical studies, as well as theoretical considerations, suggest that, under this assumption, our new algorithms are superior to the known algorithms using independent p-values and to our recent algorithms designed for e-values without the assumption of independence.

在本文中,我们在多重假设检验中使用 e 值,假设基本检验产生独立或连续的 e 值。我们的模拟和实证研究以及理论考虑表明,在这一假设下,我们的新算法优于使用独立 p 值的已知算法,也优于我们最近为不带独立性假设的 e 值设计的算法。
{"title":"True and false discoveries with independent and sequential e-values","authors":"Vladimir Vovk,&nbsp;Ruodu Wang","doi":"10.1002/cjs.11833","DOIUrl":"https://doi.org/10.1002/cjs.11833","url":null,"abstract":"<p>In this article, we use <i>e</i>-values in the context of multiple hypothesis testing, assuming that the base tests produce independent, or sequential, <i>e</i>-values. Our simulation and empirical studies, as well as theoretical considerations, suggest that, under this assumption, our new algorithms are superior to the known algorithms using independent <i>p</i>-values and to our recent algorithms designed for <i>e</i>-values without the assumption of independence.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 4","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11833","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142642392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust causal inference for point exposures with missing confounders 缺失混杂因素的点暴露的稳健因果推理
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-19 DOI: 10.1002/cjs.11832
Alexander W. Levis, Rajarshi Mukherjee, Rui Wang, Sebastien Haneuse

Large observational databases are often subject to missing data. As such, methods for causal inference must simultaneously handle confounding and missingness; surprisingly little work has been done at this intersection. Motivated by this, we propose an efficient and robust estimator of the causal average treatment effect from cohort studies when confounders are missing at random. The approach is based on a novel factorization of the likelihood that, unlike alternative methods, facilitates flexible modelling of nuisance functions (e.g., with state-of-the-art machine learning methods) while maintaining nominal convergence rates of the final estimators. Simulated data, derived from an electronic health record-based study of the long-term effects of bariatric surgery on weight outcomes, verify the robustness properties of the proposed estimators in finite samples. Our approach may serve as a theoretical benchmark against which ad hoc methods may be assessed.

大型观测数据库常常存在数据缺失的问题。因此,因果推理的方法必须同时处理混淆和缺失;令人惊讶的是,在这个十字路口几乎没有做什么工作。受此启发,我们提出了一种有效且稳健的估计方法,用于随机缺失混杂因素时队列研究的因果平均治疗效果。该方法基于一种新的似然分解,与其他方法不同,它有助于灵活地建模干扰函数(例如,使用最先进的机器学习方法),同时保持最终估计器的名义收敛率。模拟数据来源于一项基于电子健康记录的减肥手术对体重结果的长期影响的研究,在有限样本中验证了所提出的估计器的鲁棒性。我们的方法可以作为评估特别方法的理论基准。
{"title":"Robust causal inference for point exposures with missing confounders","authors":"Alexander W. Levis,&nbsp;Rajarshi Mukherjee,&nbsp;Rui Wang,&nbsp;Sebastien Haneuse","doi":"10.1002/cjs.11832","DOIUrl":"https://doi.org/10.1002/cjs.11832","url":null,"abstract":"<p>Large observational databases are often subject to missing data. As such, methods for causal inference must simultaneously handle confounding and missingness; surprisingly little work has been done at this intersection. Motivated by this, we propose an efficient and robust estimator of the causal average treatment effect from cohort studies when confounders are missing at random. The approach is based on a novel factorization of the likelihood that, unlike alternative methods, facilitates flexible modelling of nuisance functions (e.g., with state-of-the-art machine learning methods) while maintaining nominal convergence rates of the final estimators. Simulated data, derived from an electronic health record-based study of the long-term effects of bariatric surgery on weight outcomes, verify the robustness properties of the proposed estimators in finite samples. Our approach may serve as a theoretical benchmark against which ad hoc methods may be assessed.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 2","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11832","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed learning for kernel mode–based regression 基于核模式回归的分布式学习
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-03 DOI: 10.1002/cjs.11831
Tao Wang

We propose a parametric kernel mode–based regression built on the mode value, which provides robust and efficient estimators for datasets containing outliers or heavy-tailed distributions. To address the challenges posed by massive datasets, we integrate this regression method with distributed statistical learning techniques, which greatly reduces the required amount of primary memory and simultaneously accommodates heterogeneity in the estimation process. By approximating the local kernel objective function with a least squares format, we are able to preserve compact statistics for each worker machine, facilitating the reconstruction of estimates for the entire dataset with minimal asymptotic approximation error. Additionally, we explore shrinkage estimation through local quadratic approximation, showcasing that the resulting estimator possesses the oracle property through an adaptive LASSO approach. The finite-sample performance of the developed method is illustrated using simulations and real data analysis.

我们提出了一种基于模态值的参数核模态回归方法,它能为包含异常值或重尾分布的数据集提供稳健高效的估计值。为了应对海量数据集带来的挑战,我们将这种回归方法与分布式统计学习技术相结合,从而大大减少了所需的主内存量,并同时适应了估计过程中的异质性。通过用最小二乘法近似本地核目标函数,我们能够保留每台工作机的紧凑统计数据,从而以最小的渐近近似误差重建整个数据集的估计值。此外,我们还探索了通过局部二次逼近进行收缩估计的方法,并通过自适应 LASSO 方法展示了由此产生的估计器具有神谕特性。我们通过模拟和实际数据分析说明了所开发方法的有限样本性能。
{"title":"Distributed learning for kernel mode–based regression","authors":"Tao Wang","doi":"10.1002/cjs.11831","DOIUrl":"10.1002/cjs.11831","url":null,"abstract":"<p>We propose a parametric kernel mode–based regression built on the mode value, which provides robust and efficient estimators for datasets containing outliers or heavy-tailed distributions. To address the challenges posed by massive datasets, we integrate this regression method with distributed statistical learning techniques, which greatly reduces the required amount of primary memory and simultaneously accommodates heterogeneity in the estimation process. By approximating the local kernel objective function with a least squares format, we are able to preserve compact statistics for each worker machine, facilitating the reconstruction of estimates for the entire dataset with minimal asymptotic approximation error. Additionally, we explore shrinkage estimation through local quadratic approximation, showcasing that the resulting estimator possesses the oracle property through an adaptive LASSO approach. The finite-sample performance of the developed method is illustrated using simulations and real data analysis.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 2","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient semiparametric estimation in two-sample comparison via semisupervised learning 通过半监督学习进行双样本比较中的高效半参数估计
IF 0.8 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2024-09-03 DOI: 10.1002/cjs.11813
Tao Tan, Shuyi Zhang, Yong Zhou

We develop a general semisupervised framework for statistical inference in the two-sample comparison setting. Although the supervised Mann–Whitney statistic outperforms many estimators in the two-sample problem for nonnormally distributed responses, it is excessively inefficient because it ignores large amounts of unlabelled information. To borrow strength from unlabelled data, we propose a class of efficient and adaptive estimators that use two-step semiparametric imputation. The probabilistic index model is adopted primarily to achieve dimension reduction for multivariate covariates, and a follow-up reweighting step balances the contributions of labelled and unlabelled data. The asymptotic properties of our estimator are derived with variance comparison through a phase diagram. Efficiency theory shows our estimators achieve the semiparametric variance lower bound if the probabilistic index model is correctly specified, and are more efficient than their supervised counterpart when the model is not degenerate. The asymptotic variance is estimated through a two-step perturbation resampling procedure. To gauge the finite sample performance, we conducted extensive simulation studies which verify the adaptive nature of our methods with respect to model misspecification. To illustrate the merits of our proposed method, we analyze a dataset concerning homelessness in Los Angeles.

我们为双样本比较环境下的统计推断开发了一个通用的半监督框架。虽然在非正态分布响应的双样本问题中,有监督的曼-惠特尼统计法优于许多估计法,但由于它忽略了大量未标记的信息,因此效率过低。为了从无标记数据中借力,我们提出了一类使用两步半参数估算的高效自适应估计器。采用概率指数模型主要是为了降低多元协变量的维度,而后续的重新加权步骤则是为了平衡标记数据和非标记数据的贡献。我们通过相图进行方差比较,得出了估计器的渐近特性。效率理论表明,如果正确指定了概率指数模型,我们的估计器就能达到半参数方差下限;如果模型没有退化,我们的估计器比监督估计器更有效率。渐近方差是通过两步扰动重采样程序估算出来的。为了衡量有限样本的性能,我们进行了广泛的模拟研究,验证了我们的方法对模型错误指定的适应性。为了说明我们提出的方法的优点,我们分析了一个有关洛杉矶无家可归者的数据集。
{"title":"Efficient semiparametric estimation in two-sample comparison via semisupervised learning","authors":"Tao Tan,&nbsp;Shuyi Zhang,&nbsp;Yong Zhou","doi":"10.1002/cjs.11813","DOIUrl":"10.1002/cjs.11813","url":null,"abstract":"<p>We develop a general semisupervised framework for statistical inference in the two-sample comparison setting. Although the supervised Mann–Whitney statistic outperforms many estimators in the two-sample problem for nonnormally distributed responses, it is excessively inefficient because it ignores large amounts of unlabelled information. To borrow strength from unlabelled data, we propose a class of efficient and adaptive estimators that use two-step semiparametric imputation. The probabilistic index model is adopted primarily to achieve dimension reduction for multivariate covariates, and a follow-up reweighting step balances the contributions of labelled and unlabelled data. The asymptotic properties of our estimator are derived with variance comparison through a phase diagram. Efficiency theory shows our estimators achieve the semiparametric variance lower bound if the probabilistic index model is correctly specified, and are more efficient than their supervised counterpart when the model is not degenerate. The asymptotic variance is estimated through a two-step perturbation resampling procedure. To gauge the finite sample performance, we conducted extensive simulation studies which verify the adaptive nature of our methods with respect to model misspecification. To illustrate the merits of our proposed method, we analyze a dataset concerning homelessness in Los Angeles.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 2","pages":""},"PeriodicalIF":0.8,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142198225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Canadian Journal of Statistics-Revue Canadienne De Statistique
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1