首页 > 最新文献

Canadian Journal of Statistics-Revue Canadienne De Statistique最新文献

英文 中文
A precision trial case study for heterogeneous treatment effects in obstructive sleep apnea 阻塞性睡眠呼吸暂停异质性治疗效果的精确试验案例研究
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-11-18 DOI: 10.1002/cjs.70028
Lara Maleyeff, Shirin Golchi, Erica E. M. Moodie, R. John Kimoff

Precision medicine tailors treatments to individual patient characteristics, which is especially valuable for obstructive sleep apnea (OSA), where treatment responses vary widely. Traditional trials often overlook subgroup differences, leading to suboptimal recommendations. Current approaches rely on prespecified thresholds, which may be incorrectly specified. This case study compares prespecified thresholds to two Bayesian methods: the established FK-BMA (free-knot Bayesian model averaging) method, and its novel variant, FK. The FK approach retains the flexibility of free-knot splines but omits variable selection, providing stable, interpretable models. Using biomarker data from large studies, this design identifies subgroups dynamically, allowing early trial termination or enrollment adjustments. Simulation results—motivated by real-world biomarker distributions and clinical constraints—show that under conditions of limited signal-to-noise ratio and limited candidate biomarkers, FK improves efficiency and subgroup detection.

精准医疗根据患者的个体特征量身定制治疗方案,这对于治疗效果差异很大的阻塞性睡眠呼吸暂停(OSA)尤其有价值。传统的试验经常忽略亚组差异,导致不理想的建议。当前的方法依赖于预先指定的阈值,这些阈值可能被不正确地指定。本案例研究将预先设定的阈值与两种贝叶斯方法进行比较:既定的FK- bma(自由结贝叶斯模型平均)方法及其新变体FK。FK方法保留了自由结样条的灵活性,但省略了变量选择,提供了稳定的、可解释的模型。使用来自大型研究的生物标志物数据,该设计动态识别亚组,允许早期试验终止或入组调整。仿真结果显示,在有限的信噪比和有限的候选生物标志物条件下,FK提高了效率和亚群检测。仿真结果受现实世界生物标志物分布和临床约束的影响。
{"title":"A precision trial case study for heterogeneous treatment effects in obstructive sleep apnea","authors":"Lara Maleyeff,&nbsp;Shirin Golchi,&nbsp;Erica E. M. Moodie,&nbsp;R. John Kimoff","doi":"10.1002/cjs.70028","DOIUrl":"https://doi.org/10.1002/cjs.70028","url":null,"abstract":"<p>Precision medicine tailors treatments to individual patient characteristics, which is especially valuable for obstructive sleep apnea (OSA), where treatment responses vary widely. Traditional trials often overlook subgroup differences, leading to suboptimal recommendations. Current approaches rely on prespecified thresholds, which may be incorrectly specified. This case study compares prespecified thresholds to two Bayesian methods: the established FK-BMA (free-knot Bayesian model averaging) method, and its novel variant, FK. The FK approach retains the flexibility of free-knot splines but omits variable selection, providing stable, interpretable models. Using biomarker data from large studies, this design identifies subgroups dynamically, allowing early trial termination or enrollment adjustments. Simulation results—motivated by real-world biomarker distributions and clinical constraints—show that under conditions of limited signal-to-noise ratio and limited candidate biomarkers, FK improves efficiency and subgroup detection.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 4","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145619062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Receiver operating characteristic curve analysis with non-ignorable missing disease status 具有不可忽略缺失病状态的患者工作特征曲线分析
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-11-13 DOI: 10.1002/cjs.70025
Dingding Hu, Tao Yu, Pengfei Li

This article considers the receiver operating characteristic (ROC) curve analysis for medical data with non-ignorable missingness in the disease status. In the framework of the logistic regression models for both the disease status and the verification status, we first establish the identifiability of model parameters, and then propose a likelihood method to estimate the model parameters, the ROC curve, and the area under the ROC curve for the biomarker. The asymptotic distributions of these estimators are established. Via extensive simulation studies, we compare our method with competing methods of point estimation and assess the accuracy of confidence interval estimation under various scenarios. To illustrate the use of our proposed approach in a practical setting, we apply our method to the Alzheimer's disease dataset from the National Alzheimer's Coordinating Center.

本文考虑对疾病状态中不可忽略缺失的医疗数据进行受试者工作特征(ROC)曲线分析。在疾病状态和验证状态的逻辑回归模型框架下,我们首先建立模型参数的可识别性,然后提出一种似然方法来估计生物标志物的模型参数、ROC曲线和ROC曲线下的面积。建立了这些估计量的渐近分布。通过大量的仿真研究,我们比较了我们的方法与竞争的点估计方法,并评估了在各种情况下置信区间估计的准确性。为了说明我们提出的方法在实际环境中的使用,我们将我们的方法应用于来自国家阿尔茨海默病协调中心的阿尔茨海默病数据集。
{"title":"Receiver operating characteristic curve analysis with non-ignorable missing disease status","authors":"Dingding Hu,&nbsp;Tao Yu,&nbsp;Pengfei Li","doi":"10.1002/cjs.70025","DOIUrl":"https://doi.org/10.1002/cjs.70025","url":null,"abstract":"<p>This article considers the receiver operating characteristic (ROC) curve analysis for medical data with non-ignorable missingness in the disease status. In the framework of the logistic regression models for both the disease status and the verification status, we first establish the identifiability of model parameters, and then propose a likelihood method to estimate the model parameters, the ROC curve, and the area under the ROC curve for the biomarker. The asymptotic distributions of these estimators are established. Via extensive simulation studies, we compare our method with competing methods of point estimation and assess the accuracy of confidence interval estimation under various scenarios. To illustrate the use of our proposed approach in a practical setting, we apply our method to the Alzheimer's disease dataset from the National Alzheimer's Coordinating Center.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 4","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust multitask feature learning with adaptive Huber regressions 基于自适应Huber回归的鲁棒多任务特征学习
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-09-10 DOI: 10.1002/cjs.70022
Yuan Zhong, Xin Gao, Wei Xu

When data from multiple tasks have outlier contamination, existing multitask learning methods perform less efficiently. To address this issue, we propose a robust multitask feature learning method by combining the adaptive Huber regression tasks with mixed regularization. The robustification parameters can be chosen to adapt to the sample size, model dimension, and moments of the error distribution while striking a balance between unbiasedness and robustness. We consider heavy-tailed distributions for multiple datasets that have bounded (1+ω)$$ left(1+omega right) $$th moment for any ω>0$$ omega >0 $$. Our method can achieve estimation and sign recovery consistency. Additionally, we propose a robust information criterion to conduct joint inference on related tasks, which can be used for consistent model selection. Through different simulation studies and real data applications, we illustrate that the performance of the proposed model can provide smaller estimation errors and higher feature selection accuracy than the non robust multitask learning and robust single-task methods.

当来自多个任务的数据存在离群值污染时,现有的多任务学习方法的效率较低。为了解决这一问题,我们将自适应Huber回归任务与混合正则化相结合,提出了一种鲁棒的多任务特征学习方法。鲁棒化参数的选择可以适应样本大小、模型尺寸和误差分布的矩,同时在无偏性和鲁棒性之间取得平衡。我们考虑多个数据集的重尾分布,这些数据集对任何ω &gt; 0 $$ omega >0 $$的矩有界(1 + ω) $$ left(1+omega right) $$。该方法可以实现估计一致性和签名恢复一致性。此外,我们提出了一种鲁棒性信息准则来对相关任务进行联合推理,可用于一致性模型选择。通过不同的仿真研究和实际数据应用,我们证明了该模型的性能比非鲁棒多任务学习和鲁棒单任务学习方法提供更小的估计误差和更高的特征选择精度。
{"title":"Robust multitask feature learning with adaptive Huber regressions","authors":"Yuan Zhong,&nbsp;Xin Gao,&nbsp;Wei Xu","doi":"10.1002/cjs.70022","DOIUrl":"https://doi.org/10.1002/cjs.70022","url":null,"abstract":"<p>When data from multiple tasks have outlier contamination, existing multitask learning methods perform less efficiently. To address this issue, we propose a robust multitask feature learning method by combining the adaptive Huber regression tasks with mixed regularization. The robustification parameters can be chosen to adapt to the sample size, model dimension, and moments of the error distribution while striking a balance between unbiasedness and robustness. We consider heavy-tailed distributions for multiple datasets that have bounded <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mo>(</mo>\u0000 <mn>1</mn>\u0000 <mo>+</mo>\u0000 <mi>ω</mi>\u0000 <mo>)</mo>\u0000 </mrow>\u0000 <annotation>$$ left(1+omega right) $$</annotation>\u0000 </semantics></math>th moment for any <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>ω</mi>\u0000 <mo>&gt;</mo>\u0000 <mn>0</mn>\u0000 </mrow>\u0000 <annotation>$$ omega &gt;0 $$</annotation>\u0000 </semantics></math>. Our method can achieve estimation and sign recovery consistency. Additionally, we propose a robust information criterion to conduct joint inference on related tasks, which can be used for consistent model selection. Through different simulation studies and real data applications, we illustrate that the performance of the proposed model can provide smaller estimation errors and higher feature selection accuracy than the non robust multitask learning and robust single-task methods.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 4","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70022","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145625627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A probabilistic diagnostic for Laplace approximations: Introduction and experimentation 拉普拉斯近似的概率诊断:介绍与实验
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-08-26 DOI: 10.1002/cjs.70019
Shaun McDonald, Dave Campbell

Many models require integrals of high-dimensional functions: for instance, to obtain marginal likelihoods. Such integrals may be intractable, or too expensive to compute numerically. Instead, we can use the Laplace approximation (LA). The LA is exact if the function is proportional to a normal density; its effectiveness therefore depends on the function's true shape. Here, we propose the use of the probabilistic numerical framework to develop a diagnostic for the LA and its underlying shape assumptions, modelling the function and its integral as a Gaussian process and devising a “test” by conditioning on a finite number of function values. The test is decidedly non asymptotic and is not intended as a full substitute for numerical integration - rather, it is simply intended to test the feasibility of the assumptions underpinning the LA with minimal computation. We discuss approaches to optimize and design the test, apply it to known sample functions, and highlight the challenges of high dimensions.

许多模型需要高维函数的积分:例如,获得边际似然。这样的积分可能难以处理,或者过于昂贵而无法进行数值计算。相反,我们可以用拉普拉斯近似。如果函数与正态密度成正比,则LA是精确的;因此,它的有效性取决于函数的真实形状。在这里,我们建议使用概率数值框架来开发对LA及其潜在形状假设的诊断,将函数及其积分建模为高斯过程,并通过在有限数量的函数值上设置条件来设计“测试”。该测试绝对是非渐近的,并且不打算完全替代数值积分-相反,它只是打算用最少的计算来测试支持LA的假设的可行性。我们讨论了优化和设计测试的方法,将其应用于已知的样本函数,并强调了高维的挑战。
{"title":"A probabilistic diagnostic for Laplace approximations: Introduction and experimentation","authors":"Shaun McDonald,&nbsp;Dave Campbell","doi":"10.1002/cjs.70019","DOIUrl":"https://doi.org/10.1002/cjs.70019","url":null,"abstract":"<p>Many models require integrals of high-dimensional functions: for instance, to obtain marginal likelihoods. Such integrals may be intractable, or too expensive to compute numerically. Instead, we can use the <i>Laplace approximation</i> (LA). The LA is exact if the function is proportional to a normal density; its effectiveness therefore depends on the function's true shape. Here, we propose the use of the <i>probabilistic numerical</i> framework to develop a diagnostic for the LA and its underlying shape assumptions, modelling the function and its integral as a Gaussian process and devising a “test” by conditioning on a finite number of function values. The test is decidedly non asymptotic and is not intended as a full substitute for numerical integration - rather, it is simply intended to test the feasibility of the assumptions underpinning the LA with minimal computation. We discuss approaches to optimize and design the test, apply it to known sample functions, and highlight the challenges of high dimensions.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 4","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to measure statistical evidence and its strength: Bayes factors or relative belief ratios? 如何衡量统计证据及其强度:贝叶斯因子还是相对置信比?
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-08-19 DOI: 10.1002/cjs.70015
Luai Al-Labadi, Ayman Alzaatreh, Michael Evans

Both the Bayes factor and the relative belief ratio satisfy the principle of evidence and are therefore valid measures of statistical evidence. Which of these measures of evidence is more appropriate? We argue here that there are questions concerning the validity of a commonly used definition of the Bayes factor based on a mixture prior, and when all is considered, the relative belief ratio has better properties as a measure of evidence. We further show that, when a natural restriction on the mixture prior is imposed, the Bayes factor equals the relative belief ratio obtained without using the mixture prior. Even with this restriction, this still leaves open the question of how the strength of evidence is to be measured. We argue here that the current practice of using the size of the Bayes factor to measure strength is not correct and present a solution. We also discuss and address several general criticisms of these measures of evidence.

贝叶斯因子和相对置信比都满足证据原则,是统计证据的有效度量。这些证据的衡量标准中哪一个更合适?我们在此认为,基于混合先验的贝叶斯因子的常用定义的有效性存在问题,当考虑到所有因素时,相对置信比作为证据的度量具有更好的特性。我们进一步证明,当对混合先验施加自然限制时,贝叶斯因子等于不使用混合先验获得的相对置信比。即使有这样的限制,这仍然留下了如何衡量证据强度的问题。我们认为,目前使用贝叶斯因子的大小来衡量强度的做法是不正确的,并提出了一个解决方案。我们还讨论并解决了对这些证据度量的几个一般性批评。
{"title":"How to measure statistical evidence and its strength: Bayes factors or relative belief ratios?","authors":"Luai Al-Labadi,&nbsp;Ayman Alzaatreh,&nbsp;Michael Evans","doi":"10.1002/cjs.70015","DOIUrl":"https://doi.org/10.1002/cjs.70015","url":null,"abstract":"<p>Both the Bayes factor and the relative belief ratio satisfy the principle of evidence and are therefore valid measures of statistical evidence. Which of these measures of evidence is more appropriate? We argue here that there are questions concerning the validity of a commonly used definition of the Bayes factor based on a mixture prior, and when all is considered, the relative belief ratio has better properties as a measure of evidence. We further show that, when a natural restriction on the mixture prior is imposed, the Bayes factor equals the relative belief ratio obtained without using the mixture prior. Even with this restriction, this still leaves open the question of how the strength of evidence is to be measured. We argue here that the current practice of using the size of the Bayes factor to measure strength is not correct and present a solution. We also discuss and address several general criticisms of these measures of evidence.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 4","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145619101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A class of directed acyclic graphs with mixed data types in mediation analysis 中介分析中一类混合数据类型的有向无环图
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-07-14 DOI: 10.1002/cjs.70016
Wei Hao, Canyi Chen, Peter X.-K. Song

We propose a unified class of generalized structural equation models (GSEMs) with data of mixed types in mediation analysis, including continuous, categorical, and count variables. Such models extend substantially the classical linear structural equation model to accommodate many data types arising from the application of mediation analysis. Invoking the hierarchical modelling approach, we specify GSEMs by a copula joint distribution of outcome variable, mediator and exposure variable, in which marginal distributions are built upon generalized linear models (GLMs) with confounding factors. We discuss the identifiability conditions for the causal mediation effects in the counterfactual paradigm as well as the issue of mediation leakage, and develop an asymptotically efficient profile maximum likelihood estimation and inference for two key mediation estimands, namely natural direct effect and natural indirect effect, in different scenarios of mixed data types. The proposed new methodology is illustrated by a motivating epidemiological study that aims to investigate whether the tempo of reaching infancy BMI (body mass index) peak (delay or on time), an important early life growth milestone, mediates the association between prenatal exposure to phthalates and pubertal health outcomes.

我们提出了一类统一的广义结构方程模型(GSEMs)的数据混合类型的中介分析,包括连续,分类和计数变量。这些模型实质上扩展了经典的线性结构方程模型,以适应中介分析应用中产生的许多数据类型。采用分层建模方法,我们通过结果变量、中介变量和暴露变量的联结联合分布来指定GSEMs,其中边际分布建立在具有混杂因素的广义线性模型(GLMs)上。我们讨论了反事实范式中因果中介效应的可识别性条件以及中介泄漏问题,并针对不同混合数据类型情景下的自然直接效应和自然间接效应两种关键中介估计,建立了渐近有效的剖面最大似然估计和推理。提出的新方法是通过一项激励的流行病学研究来说明的,该研究旨在调查婴儿体重指数(BMI)达到峰值的速度(延迟或准时),这是生命早期重要的生长里程碑,是否在产前接触邻苯二甲酸酯和青春期健康结果之间起中介作用。
{"title":"A class of directed acyclic graphs with mixed data types in mediation analysis","authors":"Wei Hao,&nbsp;Canyi Chen,&nbsp;Peter X.-K. Song","doi":"10.1002/cjs.70016","DOIUrl":"https://doi.org/10.1002/cjs.70016","url":null,"abstract":"<p>We propose a unified class of generalized structural equation models (GSEMs) with data of mixed types in mediation analysis, including continuous, categorical, and count variables. Such models extend substantially the classical linear structural equation model to accommodate many data types arising from the application of mediation analysis. Invoking the hierarchical modelling approach, we specify GSEMs by a copula joint distribution of outcome variable, mediator and exposure variable, in which marginal distributions are built upon generalized linear models (GLMs) with confounding factors. We discuss the identifiability conditions for the causal mediation effects in the counterfactual paradigm as well as the issue of mediation leakage, and develop an asymptotically efficient profile maximum likelihood estimation and inference for two key mediation estimands, namely natural direct effect and natural indirect effect, in different scenarios of mixed data types. The proposed new methodology is illustrated by a motivating epidemiological study that aims to investigate whether the tempo of reaching infancy BMI (body mass index) peak (delay or on time), an important early life growth milestone, mediates the association between prenatal exposure to phthalates and pubertal health outcomes.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 4","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70016","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deep support vector clustering algorithm for unsupervised and semi-supervised learning 无监督和半监督学习的深度支持向量聚类算法
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-06-01 DOI: 10.1002/cjs.70013
Zhen Zhang, Shuyan Chen, Xin Liu

As a widely carried out task in data-driven applications, clustering relies on good data representation. Since deep neural networks are powerful tools for the analysis of clustering-friendly representations, certain combinations of clustering and deep models have been explored in the literature. Yet, only limited improvement has been achieved for real data with complex structures such as positive and unlabelled (PU) data. In this article we propose an unsupervised clustering model, called the deep support vector clustering (dSVC). The method combines a deep autoencoder neural network with hinge loss, and is further extended to binary semi-supervised PU data learning. Theoretical results are established for label recovery and novelty detection using a large-margin classifier. Intensive numerical experiments on multiple datasets of both high and low dimension validate the efficiency of the proposed approach. We found that the proposed approach constructs clusters in a manner opposite to the popular generative adversarial network (GAN) model.

作为数据驱动应用中广泛执行的任务,聚类依赖于良好的数据表示。由于深度神经网络是分析聚类友好表示的强大工具,因此文献中已经探索了聚类和深度模型的某些组合。然而,对于具有复杂结构的真实数据(如阳性和未标记(PU)数据),仅取得了有限的改进。在本文中,我们提出了一种无监督聚类模型,称为深度支持向量聚类(dSVC)。该方法结合了具有铰链损失的深度自编码器神经网络,并进一步扩展到二进制半监督PU数据学习中。建立了基于大边界分类器的标签恢复和新颖性检测的理论结果。在多个高维和低维数据集上进行的大量数值实验验证了该方法的有效性。我们发现所提出的方法以一种与流行的生成对抗网络(GAN)模型相反的方式构建聚类。
{"title":"A deep support vector clustering algorithm for unsupervised and semi-supervised learning","authors":"Zhen Zhang,&nbsp;Shuyan Chen,&nbsp;Xin Liu","doi":"10.1002/cjs.70013","DOIUrl":"https://doi.org/10.1002/cjs.70013","url":null,"abstract":"<p>As a widely carried out task in data-driven applications, clustering relies on good data representation. Since deep neural networks are powerful tools for the analysis of clustering-friendly representations, certain combinations of clustering and deep models have been explored in the literature. Yet, only limited improvement has been achieved for real data with complex structures such as positive and unlabelled (PU) data. In this article we propose an unsupervised clustering model, called the deep support vector clustering (dSVC). The method combines a deep autoencoder neural network with hinge loss, and is further extended to binary semi-supervised PU data learning. Theoretical results are established for label recovery and novelty detection using a large-margin classifier. Intensive numerical experiments on multiple datasets of both high and low dimension validate the efficiency of the proposed approach. We found that the proposed approach constructs clusters in a manner opposite to the popular generative adversarial network (GAN) model.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 4","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random discrete probability measures based on a negative binomial process 基于负二项过程的随机离散概率测度
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-05-26 DOI: 10.1002/cjs.70009
Sadegh Chegini, Mahmoud Zarepour

A distinctive functional of the Poisson point process is the negative binomial process for which the increments are not independent but are independent conditional on an underlying gamma variable. Using a new point process representation for the negative binomial process, we generalize the Poisson–Kingman distribution and its corresponding random discrete probability measure. This new proposed family of discrete random probability measures, which is defined by normalizing the points of the negative binomial process, provides a new set of useful priors for Bayesian nonparametric models with more flexibility than the random discrete probability measure which are obtained by normalizing the points of a Poisson point process. We illustrate how this family of random discrete probability measures contains the nonparametric Bayesian priors such as the Dirichlet process, the normalized positive α$$ alpha $$-stable process, the Poisson–Dirichlet process (PDP), and others. With the same gamma Lévy measure, we derive an extension of the Dirichlet process and its almost sure approximation. Using our representation for the negative binomial process, we develop a new series representation for the PDP. We demonstrate through simulations how using priors from this family can enhance the accuracy of Bayesian nonparametric hierarchical models.

泊松点过程的一个独特的泛函是负二项过程,其增量不是独立的,而是独立于一个潜在的伽马变量的条件。利用一种新的负二项过程的点过程表示,推广了泊松-金曼分布及其相应的随机离散概率测度。这种新的离散随机概率测度是通过对负二项过程的点进行归一化来定义的,它为贝叶斯非参数模型提供了一组新的有用的先验,比对泊松点过程的点进行归一化得到的随机离散概率测度具有更大的灵活性。我们说明了这类随机离散概率测度如何包含非参数贝叶斯先验,如狄利克雷过程、归一化正α $$ alpha $$ -稳定过程、泊松-狄利克雷过程(PDP)等。在相同的lsamuvy测度下,我们得到了狄利克雷过程的推广及其近似。利用我们对负二项过程的表示,我们开发了PDP的一个新的级数表示。我们通过模拟证明了如何使用该家族的先验可以提高贝叶斯非参数层次模型的准确性。
{"title":"Random discrete probability measures based on a negative binomial process","authors":"Sadegh Chegini,&nbsp;Mahmoud Zarepour","doi":"10.1002/cjs.70009","DOIUrl":"https://doi.org/10.1002/cjs.70009","url":null,"abstract":"<p>A distinctive functional of the Poisson point process is the negative binomial process for which the increments are not independent but are independent conditional on an underlying gamma variable. Using a new point process representation for the negative binomial process, we generalize the Poisson–Kingman distribution and its corresponding random discrete probability measure. This new proposed family of discrete random probability measures, which is defined by normalizing the points of the negative binomial process, provides a new set of useful priors for Bayesian nonparametric models with more flexibility than the random discrete probability measure which are obtained by normalizing the points of a Poisson point process. We illustrate how this family of random discrete probability measures contains the nonparametric Bayesian priors such as the Dirichlet process, the normalized positive <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>α</mi>\u0000 </mrow>\u0000 <annotation>$$ alpha $$</annotation>\u0000 </semantics></math>-stable process, the Poisson–Dirichlet process (PDP), and others. With the same gamma Lévy measure, we derive an extension of the Dirichlet process and its almost sure approximation. Using our representation for the negative binomial process, we develop a new series representation for the PDP. We demonstrate through simulations how using priors from this family can enhance the accuracy of Bayesian nonparametric hierarchical models.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 4","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145626266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functional regression with intensively measured longitudinal outcomes: a new lens through data partitioning 功能回归与密集测量纵向结果:一个新的镜头,通过数据分区
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-05-06 DOI: 10.1002/cjs.70011
Cole Manschot, Emily C. Hector

Modern longitudinal data from wearable devices consist of biological signals at high-frequency time points. Distributed statistical methods have emerged as a powerful tool to overcome the computational burden of estimation and inference with large data, but methodology for distributed functional regression remains limited. We propose a distributed estimation and inference procedure that efficiently estimates both functional and scalar parameters with intensively measured longitudinal outcomes. The procedure overcomes computational difficulties through a scalable divide-and-conquer algorithm that partitions the outcomes into smaller sets. We circumvent traditional basis selection problems by analyzing data using quadratic inference functions in smaller subsets such that the basis functions have a low dimension. To address the challenges of combining estimates from dependent subsets, we propose a statistically efficient one-step estimator derived from a constrained generalized method of moments objective function with a smoothing penalty. We show theoretically and numerically that the proposed estimator is as statistically efficient as non-distributed alternative approaches and more efficient computationally. We demonstrate the practicality of our approach with the analysis of accelerometer data from the National Health and Nutrition Examination Survey.

来自可穿戴设备的现代纵向数据由高频时间点的生物信号组成。分布式统计方法已经成为克服大数据估计和推断计算负担的强大工具,但分布式函数回归的方法仍然有限。我们提出了一个分布式估计和推理过程,有效地估计功能和标量参数与密集测量的纵向结果。该程序通过可扩展的分治算法克服了计算困难,该算法将结果划分为更小的集合。我们通过在较小的子集中使用二次推理函数来分析数据,从而使基函数具有低维数,从而避免了传统的基选择问题。为了解决从相关子集组合估计的挑战,我们提出了一种统计上有效的一步估计器,该估计器来自具有平滑惩罚的矩目标函数的约束广义方法。我们从理论上和数值上证明了所提出的估计器在统计上与非分布替代方法一样有效,并且在计算上更有效。我们通过分析国家健康和营养检查调查的加速度计数据来证明我们方法的实用性。
{"title":"Functional regression with intensively measured longitudinal outcomes: a new lens through data partitioning","authors":"Cole Manschot,&nbsp;Emily C. Hector","doi":"10.1002/cjs.70011","DOIUrl":"https://doi.org/10.1002/cjs.70011","url":null,"abstract":"<p>Modern longitudinal data from wearable devices consist of biological signals at high-frequency time points. Distributed statistical methods have emerged as a powerful tool to overcome the computational burden of estimation and inference with large data, but methodology for distributed functional regression remains limited. We propose a distributed estimation and inference procedure that efficiently estimates both functional and scalar parameters with intensively measured longitudinal outcomes. The procedure overcomes computational difficulties through a scalable divide-and-conquer algorithm that partitions the outcomes into smaller sets. We circumvent traditional basis selection problems by analyzing data using quadratic inference functions in smaller subsets such that the basis functions have a low dimension. To address the challenges of combining estimates from dependent subsets, we propose a statistically efficient one-step estimator derived from a constrained generalized method of moments objective function with a smoothing penalty. We show theoretically and numerically that the proposed estimator is as statistically efficient as non-distributed alternative approaches and more efficient computationally. We demonstrate the practicality of our approach with the analysis of accelerometer data from the National Health and Nutrition Examination Survey.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 4","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145619037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multivariate Poisson model based on a triangular comonotonic shock construction 基于三角共频激波构造的多元泊松模型
IF 1 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2025-04-22 DOI: 10.1002/cjs.70010
Orla A. Murphy, Juliana Schulz

Multi-dimensional data frequently occur in many different fields, including risk management, insurance, biology, environmental sciences, and many more. In analyzing multivariate data, it is imperative that the underlying modelling assumptions adequately reflect both the marginal behaviour and the associations between components. This article focuses specifically on developing a new multivariate Poisson model appropriate for multi-dimensional count data. The proposed formulation is based on convolutions of comonotonic shock vectors with Poisson-distributed components and allows for flexibility in capturing different degrees of positive dependence. In this article, we will present the general model framework along with various distributional properties. Several estimation techniques will be explored and assessed both through simulations and in a real data application involving extreme rainfall events.

多维数据经常出现在许多不同的领域,包括风险管理、保险、生物学、环境科学等等。在分析多变量数据时,基本的建模假设必须充分反映边际行为和组成部分之间的关联。本文着重于开发一种适用于多维计数数据的新的多元泊松模型。所提出的公式基于泊松分布分量的共频激波矢量的卷积,并允许在捕获不同程度的正依赖性方面具有灵活性。在本文中,我们将介绍通用模型框架以及各种分布属性。将通过模拟和在涉及极端降雨事件的实际数据应用中探索和评估几种估计技术。
{"title":"A multivariate Poisson model based on a triangular comonotonic shock construction","authors":"Orla A. Murphy,&nbsp;Juliana Schulz","doi":"10.1002/cjs.70010","DOIUrl":"https://doi.org/10.1002/cjs.70010","url":null,"abstract":"<p>Multi-dimensional data frequently occur in many different fields, including risk management, insurance, biology, environmental sciences, and many more. In analyzing multivariate data, it is imperative that the underlying modelling assumptions adequately reflect both the marginal behaviour and the associations between components. This article focuses specifically on developing a new multivariate Poisson model appropriate for multi-dimensional count data. The proposed formulation is based on convolutions of comonotonic shock vectors with Poisson-distributed components and allows for flexibility in capturing different degrees of positive dependence. In this article, we will present the general model framework along with various distributional properties. Several estimation techniques will be explored and assessed both through simulations and in a real data application involving extreme rainfall events.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"53 3","pages":""},"PeriodicalIF":1.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.70010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144918756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Canadian Journal of Statistics-Revue Canadienne De Statistique
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1