首页 > 最新文献

Biometrika最新文献

英文 中文
More Efficient Exact Group Invariance Testing: using a Representative Subgroup 更有效的精确群不变性测试:使用代表子群
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-09-01 DOI: 10.1093/biomet/asad050
N. W. Koning, J. Hemerik
We consider testing invariance of a distribution under an algebraic group of transformations, such as permutations or sign-flips. As such groups are typically huge, tests based on the full group are often computationally infeasible. Hence, it is standard practice to use a random subset of transformations. We improve upon this by replacing the random subset with a strategically chosen, fixed subgroup of transformations. In a generalized location model, we show that the resulting tests are often consistent for lower signal-to-noise ratios. Moreover, we establish an analogy between the power improvement and switching from a t-test to a Z-test under normality. Importantly, in permutation-based multiple testing, the efficiency gain with our approach can be huge, since we attain the same power with much fewer permutations.
我们考虑在置换或符号翻转等变换的代数群下检验分布的不变性。由于这样的组通常是巨大的,基于整个组的测试通常在计算上是不可行的。因此,使用转换的随机子集是标准实践。我们通过用策略选择的固定变换子群替换随机子集来改进这一点。在广义的位置模型中,我们表明,在较低的信噪比下,结果测试通常是一致的。此外,我们建立了功率改进与正态性下从t检验切换到z检验之间的类比。重要的是,在基于排列的多重测试中,使用我们的方法可以获得巨大的效率增益,因为我们可以用更少的排列获得相同的功率。
{"title":"More Efficient Exact Group Invariance Testing: using a Representative Subgroup","authors":"N. W. Koning, J. Hemerik","doi":"10.1093/biomet/asad050","DOIUrl":"https://doi.org/10.1093/biomet/asad050","url":null,"abstract":"\u0000 We consider testing invariance of a distribution under an algebraic group of transformations, such as permutations or sign-flips. As such groups are typically huge, tests based on the full group are often computationally infeasible. Hence, it is standard practice to use a random subset of transformations. We improve upon this by replacing the random subset with a strategically chosen, fixed subgroup of transformations. In a generalized location model, we show that the resulting tests are often consistent for lower signal-to-noise ratios. Moreover, we establish an analogy between the power improvement and switching from a t-test to a Z-test under normality. Importantly, in permutation-based multiple testing, the efficiency gain with our approach can be huge, since we attain the same power with much fewer permutations.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48678941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spectral adjustment for spatial confounding. 用于空间混淆的光谱调整
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-09-01 Epub Date: 2022-12-21 DOI: 10.1093/biomet/asac069
Yawen Guan, Garritt L Page, Brian J Reich, Massimo Ventrucci, Shu Yang

Adjusting for an unmeasured confounder is generally an intractable problem, but in the spatial setting it may be possible under certain conditions. We derive necessary conditions on the coherence between the exposure and the unmeasured confounder that ensure the effect of exposure is estimable. We specify our model and assumptions in the spectral domain to allow for different degrees of confounding at different spatial resolutions. One assumption that ensures identifiability is that confounding present at global scales dissipates at local scales. We show that this assumption in the spectral domain is equivalent to adjusting for global-scale confounding in the spatial domain by adding a spatially smoothed version of the exposure to the mean of the response variable. Within this general framework, we propose a sequence of confounder adjustment methods that range from parametric adjustments based on the Matérn coherence function to more robust semiparametric methods that use smoothing splines. These ideas are applied to areal and geostatistical data for both simulated and real datasets.

对未测量的混杂因素进行调整通常是一个棘手的问题,但在空间设置中,在某些条件下可能是可能的。我们推导了暴露和未测量混杂因素之间的一致性的必要条件,以确保暴露的影响是可估计的。我们在光谱域中指定我们的模型和假设,以便在不同的空间分辨率下允许不同程度的混淆。确保可识别性的一个假设是,在全球尺度上存在的混淆在局部尺度上消散。我们表明,谱域中的这个假设相当于通过将暴露的空间平滑版本添加到响应变量的平均值来调整空间域中的全局尺度混淆。在这个总体框架内,我们提出了一系列混杂调整方法,范围从基于mat相干函数的参数调整到使用平滑样条的更健壮的半参数方法。这些思想被应用于模拟和真实数据集的地面和地质统计数据。
{"title":"Spectral adjustment for spatial confounding.","authors":"Yawen Guan, Garritt L Page, Brian J Reich, Massimo Ventrucci, Shu Yang","doi":"10.1093/biomet/asac069","DOIUrl":"10.1093/biomet/asac069","url":null,"abstract":"<p><p>Adjusting for an unmeasured confounder is generally an intractable problem, but in the spatial setting it may be possible under certain conditions. We derive necessary conditions on the coherence between the exposure and the unmeasured confounder that ensure the effect of exposure is estimable. We specify our model and assumptions in the spectral domain to allow for different degrees of confounding at different spatial resolutions. One assumption that ensures identifiability is that confounding present at global scales dissipates at local scales. We show that this assumption in the spectral domain is equivalent to adjusting for global-scale confounding in the spatial domain by adding a spatially smoothed version of the exposure to the mean of the response variable. Within this general framework, we propose a sequence of confounder adjustment methods that range from parametric adjustments based on the Matérn coherence function to more robust semiparametric methods that use smoothing splines. These ideas are applied to areal and geostatistical data for both simulated and real datasets.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10947425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41901151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Marginal proportional hazards models for multivariate interval-censored data. 多变量区间删失数据的边际比例危害模型。
IF 2.4 2区 数学 Q2 BIOLOGY Pub Date : 2023-09-01 Epub Date: 2022-11-02 DOI: 10.1093/biomet/asac059
Yangjianchen Xu, Donglin Zeng, D Y Lin

Multivariate interval-censored data arise when there are multiple types of events or clusters of study subjects, such that the event times are potentially correlated and when each event is only known to occur over a particular time interval. We formulate the effects of potentially time-varying covariates on the multivariate event times through marginal proportional hazards models while leaving the dependence structures of the related event times unspecified. We construct the nonparametric pseudolikelihood under the working assumption that all event times are independent, and we provide a simple and stable EM-type algorithm. The resulting nonparametric maximum pseudolikelihood estimators for the regression parameters are shown to be consistent and asymptotically normal, with a limiting covariance matrix that can be consistently estimated by a sandwich estimator under arbitrary dependence structures for the related event times. We evaluate the performance of the proposed methods through extensive simulation studies and present an application to data from the Atherosclerosis Risk in Communities Study.

当存在多种类型的事件或研究对象集群时,就会产生多变量区间删失数据,从而使事件时间具有潜在的相关性,并且每个事件只在特定的时间区间内发生。我们通过边际比例危险模型来计算可能随时间变化的协变量对多元事件时间的影响,同时不指定相关事件时间的依赖结构。我们在所有事件时间都是独立的工作假设下构建了非参数伪概率,并提供了一种简单稳定的 EM 型算法。结果表明,回归参数的非参数最大伪似然估计值是一致的、渐近正态的,其极限协方差矩阵可以在相关事件时间的任意依赖结构下通过三明治估计值进行一致估计。我们通过大量的模拟研究评估了所提方法的性能,并介绍了对社区动脉粥样硬化风险研究数据的应用。
{"title":"Marginal proportional hazards models for multivariate interval-censored data.","authors":"Yangjianchen Xu, Donglin Zeng, D Y Lin","doi":"10.1093/biomet/asac059","DOIUrl":"10.1093/biomet/asac059","url":null,"abstract":"<p><p>Multivariate interval-censored data arise when there are multiple types of events or clusters of study subjects, such that the event times are potentially correlated and when each event is only known to occur over a particular time interval. We formulate the effects of potentially time-varying covariates on the multivariate event times through marginal proportional hazards models while leaving the dependence structures of the related event times unspecified. We construct the nonparametric pseudolikelihood under the working assumption that all event times are independent, and we provide a simple and stable EM-type algorithm. The resulting nonparametric maximum pseudolikelihood estimators for the regression parameters are shown to be consistent and asymptotically normal, with a limiting covariance matrix that can be consistently estimated by a sandwich estimator under arbitrary dependence structures for the related event times. We evaluate the performance of the proposed methods through extensive simulation studies and present an application to data from the Atherosclerosis Risk in Communities Study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10434824/pdf/nihms-1874830.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10490393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ASSESSING TIME-VARYING CAUSAL EFFECT MODERATION IN THE PRESENCE OF CLUSTER-LEVEL TREATMENT EFFECT HETEROGENEITY AND INTERFERENCE. 在集群水平治疗效果异质性和干扰存在的情况下评估时变因果效应调节。
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-09-01 DOI: 10.1093/biomet/asac065
Jieru Shi, Zhenke Wu, Walter Dempsey

The micro-randomized trial (MRT) is a sequential randomized experimental design to empirically evaluate the effectiveness of mobile health (mHealth) intervention components that may be delivered at hundreds or thousands of decision points. MRTs have motivated a new class of causal estimands, termed "causal excursion effects", for which semiparametric inference can be conducted via a weighted, centered least squares criterion (Boruvka et al., 2018). Existing methods assume between-subject independence and non-interference. Deviations from these assumptions often occur. In this paper, causal excursion effects are revisited under potential cluster-level treatment effect heterogeneity and interference, where the treatment effect of interest may depend on cluster-level moderators. Utility of the proposed methods is shown by analyzing data from a multi-institution cohort of first year medical residents in the United States.

微随机试验(MRT)是一种顺序随机实验设计,旨在经验性地评估可能在数百或数千个决策点提供的移动医疗(mHealth)干预组件的有效性。mrt激发了一类新的因果估计,称为“因果偏移效应”,其中半参数推理可以通过加权的中心最小二乘准则进行(Boruvka等人,2018)。现有的方法假设主体间独立性和不干涉性。偏离这些假设的情况经常发生。本文在潜在的集群水平治疗效应异质性和干扰下重新审视了因果偏移效应,其中感兴趣的治疗效应可能取决于集群水平调节因子。通过分析来自美国多机构的第一年住院医生队列的数据,显示了所提出方法的效用。
{"title":"ASSESSING TIME-VARYING CAUSAL EFFECT MODERATION IN THE PRESENCE OF CLUSTER-LEVEL TREATMENT EFFECT HETEROGENEITY AND INTERFERENCE.","authors":"Jieru Shi,&nbsp;Zhenke Wu,&nbsp;Walter Dempsey","doi":"10.1093/biomet/asac065","DOIUrl":"https://doi.org/10.1093/biomet/asac065","url":null,"abstract":"<p><p>The micro-randomized trial (MRT) is a sequential randomized experimental design to empirically evaluate the effectiveness of mobile health (mHealth) intervention components that may be delivered at hundreds or thousands of decision points. MRTs have motivated a new class of causal estimands, termed \"causal excursion effects\", for which semiparametric inference can be conducted via a weighted, centered least squares criterion (Boruvka et al., 2018). Existing methods assume between-subject independence and non-interference. Deviations from these assumptions often occur. In this paper, causal excursion effects are revisited under potential cluster-level treatment effect heterogeneity and interference, where the treatment effect of interest may depend on cluster-level moderators. Utility of the proposed methods is shown by analyzing data from a multi-institution cohort of first year medical residents in the United States.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10501736/pdf/nihms-1882489.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10653942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Kronecker Network 深度克罗内克网络
2区 数学 Q1 Mathematics Pub Date : 2023-08-31 DOI: 10.1093/biomet/asad049
Long Feng, Guang Yang
Summary We develop a novel framework named Deep Kronecker Network for the analysis of medical imaging data, including magnetic resonance imaging (MRI), functional MRI, computed tomography, and more. Medical imaging data differs from general images in two main aspects: i) the sample size is often considerably smaller, and ii) the interpretation of the model is usually more crucial than predicting the outcome. As a result, standard methods such as convolutional neural networks cannot be directly applied to medical imaging analysis. Therefore, we propose the Deep Kronecker Network, which can adapt to the low sample size constraint and offer the desired model interpretation. Our approach is versatile, as it works for both matrix and tensor represented image data and can be applied to discrete and continuous outcomes. The Deep Kronecker network is built upon a Kronecker product structure, which implicitly enforces a piecewise smooth property on coefficients. Moreover, our approach resembles a fully convolutional network as the Kronecker structure can be expressed in a convolutional form. Interestingly, our approach also has strong connections to the tensor regression framework proposed by Zhou et al. (2013), which imposes a canonical low-rank structure on tensor coefficients. We conduct both classification and regression analyses using real MRI data from the Alzheimer’s Disease Neuroimaging Initiative to demonstrate the effectiveness of our approach.
我们开发了一个名为Deep Kronecker Network的新框架,用于分析医学成像数据,包括磁共振成像(MRI)、功能性MRI、计算机断层扫描等。医学成像数据与一般图像在两个主要方面不同:i)样本量通常要小得多,ii)对模型的解释通常比预测结果更重要。因此,卷积神经网络等标准方法不能直接应用于医学成像分析。因此,我们提出了深度Kronecker网络,它可以适应低样本容量约束并提供所需的模型解释。我们的方法是通用的,因为它适用于矩阵和张量表示的图像数据,可以应用于离散和连续的结果。深度Kronecker网络建立在Kronecker积结构上,该结构隐式地在系数上强制执行分段平滑特性。此外,我们的方法类似于一个完全卷积网络,因为Kronecker结构可以用卷积形式表示。有趣的是,我们的方法也与Zhou等人(2013)提出的张量回归框架有很强的联系,该框架对张量系数施加了典型的低秩结构。我们使用来自阿尔茨海默病神经成像倡议的真实MRI数据进行分类和回归分析,以证明我们方法的有效性。
{"title":"Deep Kronecker Network","authors":"Long Feng, Guang Yang","doi":"10.1093/biomet/asad049","DOIUrl":"https://doi.org/10.1093/biomet/asad049","url":null,"abstract":"Summary We develop a novel framework named Deep Kronecker Network for the analysis of medical imaging data, including magnetic resonance imaging (MRI), functional MRI, computed tomography, and more. Medical imaging data differs from general images in two main aspects: i) the sample size is often considerably smaller, and ii) the interpretation of the model is usually more crucial than predicting the outcome. As a result, standard methods such as convolutional neural networks cannot be directly applied to medical imaging analysis. Therefore, we propose the Deep Kronecker Network, which can adapt to the low sample size constraint and offer the desired model interpretation. Our approach is versatile, as it works for both matrix and tensor represented image data and can be applied to discrete and continuous outcomes. The Deep Kronecker network is built upon a Kronecker product structure, which implicitly enforces a piecewise smooth property on coefficients. Moreover, our approach resembles a fully convolutional network as the Kronecker structure can be expressed in a convolutional form. Interestingly, our approach also has strong connections to the tensor regression framework proposed by Zhou et al. (2013), which imposes a canonical low-rank structure on tensor coefficients. We conduct both classification and regression analyses using real MRI data from the Alzheimer’s Disease Neuroimaging Initiative to demonstrate the effectiveness of our approach.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135830829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kernel interpolation generalizes poorly 核插值的泛化性很差
2区 数学 Q1 Mathematics Pub Date : 2023-08-07 DOI: 10.1093/biomet/asad048
Yicheng Li, Haobo Zhang, Qian Lin
Summary One of the most interesting problems in the recent renaissance of the studies in kernel regression might be whether kernel interpolation can generalize well, since it may help us understand the ‘benign overfitting phenomenon’ reported in the literature on deep networks. In this paper, under mild conditions, we show that, for any ε&gt;0, the generalization error of kernel interpolation is lower bounded by Ω(n−ε). In other words, the kernel interpolation generalizes poorly for a large class of kernels. As a direct corollary, we can show that overfitted wide neural networks defined on the sphere generalize poorly.
最近核回归研究复兴中最有趣的问题之一可能是核插值是否可以很好地泛化,因为它可以帮助我们理解深度网络文献中报道的“良性过拟合现象”。在温和条件下,我们证明了对于任意ε>0,核插值的泛化误差下界为Ω(n−ε)。换句话说,对于大量的核,核插值的泛化效果很差。作为一个直接推论,我们可以证明在球上定义的过拟合宽神经网络泛化效果很差。
{"title":"Kernel interpolation generalizes poorly","authors":"Yicheng Li, Haobo Zhang, Qian Lin","doi":"10.1093/biomet/asad048","DOIUrl":"https://doi.org/10.1093/biomet/asad048","url":null,"abstract":"Summary One of the most interesting problems in the recent renaissance of the studies in kernel regression might be whether kernel interpolation can generalize well, since it may help us understand the ‘benign overfitting phenomenon’ reported in the literature on deep networks. In this paper, under mild conditions, we show that, for any ε&amp;gt;0, the generalization error of kernel interpolation is lower bounded by Ω(n−ε). In other words, the kernel interpolation generalizes poorly for a large class of kernels. As a direct corollary, we can show that overfitted wide neural networks defined on the sphere generalize poorly.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135904639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
τ -censored weighted Benjamini-Hochberg procedures under independence 独立性下τ -审查加权benjami - hochberg程序
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-08-02 DOI: 10.1093/biomet/asad047
Haibing Zhao, Huijuan Zhou
In the field of multiple hypothesis testing, auxiliary information can be leveraged to enhance the efficiency of test procedures. A common way to make use of auxiliary information is by weighting p-values. However, when the weights are learned from data, controlling the finite-sample false discovery rate becomes challenging, and most existing weighted procedures only guarantee false discovery rate control in an asymptotic limit. In a recent study conducted by Ignatiadis & Huber (2021), a novel τ-censored weighted Benjamini-Hochberg procedure was proposed to control the finite-sample false discovery rate. The authors employed the cross-weighting approach to learn weights for the p-values. This approach randomly splits the data into several folds and constructs a weight for each p-value Pi using the p-values outside the fold containing Pi. Cross-weighting does not exploit the p-value information inside the fold and only balances the weights within each fold, which may result in a loss of power. In this article, we introduce two methods for constructing data-driven weights for τ-censored weighted Benjamini-Hochberg procedures under independence. They provide new insight into masking p-values to prevent overfitting in multiple testing. The first method utilizes a leave-one-out technique, where all but one of the p-values are used to learn a weight for each p-value. This technique masks the information of a p-value in its weight by calculating the infimum of the weight with respect to the p-value. The second method uses partial information from each p-value to construct weights and utilizes the conditional distributions of the null p-values to establish false discovery rate control. Additionally, we propose two methods for estimating the null proportion and demonstrate how to integrate null-proportion adaptivity into the proposed weights to improve power.
在多重假设检验领域,可以利用辅助信息来提高检验程序的效率。利用辅助信息的一种常见方式是对p值进行加权。然而,当从数据中学习权重时,控制有限样本的错误发现率变得具有挑战性,并且大多数现有的加权过程仅保证错误发现率控制在渐近极限中。在Ignatidis&Huber(2021)最近进行的一项研究中,提出了一种新的τ-截尾加权Benjamini Hochberg程序来控制有限样本的错误发现率。作者采用交叉加权方法来学习p值的权重。这种方法将数据随机划分为几个折叠,并使用包含Pi的折叠之外的p值为每个p值Pi构建权重。交叉加权不利用折叠内的p值信息,只平衡每个折叠内的权重,这可能导致功率损失。在本文中,我们介绍了两种在独立条件下构造τ-截尾加权Benjamini-Hochberg过程数据驱动权重的方法。它们为屏蔽p值提供了新的见解,以防止多重测试中的过拟合。第一种方法使用留一技术,其中除了一个p值之外的所有p值都用于学习每个p值的权重。该技术通过计算权重相对于p值的下确界来屏蔽其权重中的p值的信息。第二种方法使用来自每个p值的部分信息来构造权重,并利用空p值的条件分布来建立错误发现率控制。此外,我们提出了两种估计零比例的方法,并演示了如何将零比例自适应性集成到所提出的权重中以提高功率。
{"title":"τ -censored weighted Benjamini-Hochberg procedures under independence","authors":"Haibing Zhao, Huijuan Zhou","doi":"10.1093/biomet/asad047","DOIUrl":"https://doi.org/10.1093/biomet/asad047","url":null,"abstract":"\u0000 In the field of multiple hypothesis testing, auxiliary information can be leveraged to enhance the efficiency of test procedures. A common way to make use of auxiliary information is by weighting p-values. However, when the weights are learned from data, controlling the finite-sample false discovery rate becomes challenging, and most existing weighted procedures only guarantee false discovery rate control in an asymptotic limit. In a recent study conducted by Ignatiadis & Huber (2021), a novel τ-censored weighted Benjamini-Hochberg procedure was proposed to control the finite-sample false discovery rate. The authors employed the cross-weighting approach to learn weights for the p-values. This approach randomly splits the data into several folds and constructs a weight for each p-value Pi using the p-values outside the fold containing Pi. Cross-weighting does not exploit the p-value information inside the fold and only balances the weights within each fold, which may result in a loss of power. In this article, we introduce two methods for constructing data-driven weights for τ-censored weighted Benjamini-Hochberg procedures under independence. They provide new insight into masking p-values to prevent overfitting in multiple testing. The first method utilizes a leave-one-out technique, where all but one of the p-values are used to learn a weight for each p-value. This technique masks the information of a p-value in its weight by calculating the infimum of the weight with respect to the p-value. The second method uses partial information from each p-value to construct weights and utilizes the conditional distributions of the null p-values to establish false discovery rate control. Additionally, we propose two methods for estimating the null proportion and demonstrate how to integrate null-proportion adaptivity into the proposed weights to improve power.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49253424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A linear adjustment-based approach to posterior drift in transfer learning. 基于线性调整的迁移学习后验漂移研究
IF 2.4 2区 数学 Q2 BIOLOGY Pub Date : 2023-07-27 eCollection Date: 2024-03-01 DOI: 10.1093/biomet/asad029
Subha Maity, Diptavo Dutta, Jonathan Terhorst, Yuekai Sun, Moulinath Banerjee

We present new models and methods for the posterior drift problem where the regression function in the target domain is modelled as a linear adjustment, on an appropriate scale, of that in the source domain, and study the theoretical properties of our proposed estimators in the binary classification problem. The core idea of our model inherits the simplicity and the usefulness of generalized linear models and accelerated failure time models from the classical statistics literature. Our approach is shown to be flexible and applicable in a variety of statistical settings, and can be adopted for transfer learning problems in various domains including epidemiology, genetics and biomedicine. As concrete applications, we illustrate the power of our approach (i) through mortality prediction for British Asians by borrowing strength from similar data from the larger pool of British Caucasians, using the UK Biobank data, and (ii) in overcoming a spurious correlation present in the source domain of the Waterbirds dataset.

对于后置漂移问题,我们提出了新的模型和方法,其中目标域的回归函数被建模为源域回归函数在适当尺度上的线性调整,并研究了我们提出的估计器在二值分类问题中的理论性质。该模型的核心思想继承了经典统计文献中广义线性模型和加速失效时间模型的简单性和实用性。我们的方法被证明是灵活的,适用于各种统计设置,并可用于各种领域的迁移学习问题,包括流行病学,遗传学和生物医学。作为具体的应用,我们说明了我们的方法的力量(i)通过借用来自英国高加索人的更大的类似数据的力量来预测英国亚洲人的死亡率,使用英国生物银行的数据,以及(ii)克服水鸟数据集源域中存在的虚假相关性。
{"title":"A linear adjustment-based approach to posterior drift in transfer learning.","authors":"Subha Maity, Diptavo Dutta, Jonathan Terhorst, Yuekai Sun, Moulinath Banerjee","doi":"10.1093/biomet/asad029","DOIUrl":"10.1093/biomet/asad029","url":null,"abstract":"<p><p>We present new models and methods for the posterior drift problem where the regression function in the target domain is modelled as a linear adjustment, on an appropriate scale, of that in the source domain, and study the theoretical properties of our proposed estimators in the binary classification problem. The core idea of our model inherits the simplicity and the usefulness of generalized linear models and accelerated failure time models from the classical statistics literature. Our approach is shown to be flexible and applicable in a variety of statistical settings, and can be adopted for transfer learning problems in various domains including epidemiology, genetics and biomedicine. As concrete applications, we illustrate the power of our approach (i) through mortality prediction for British Asians by borrowing strength from similar data from the larger pool of British Caucasians, using the UK Biobank data, and (ii) in overcoming a spurious correlation present in the source domain of the Waterbirds dataset.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.4,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11212525/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44281470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Inference with Debiased Stochastic Gradient Descent 基于去偏随机梯度下降的在线推理
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-07-27 DOI: 10.1093/biomet/asad046
Ruijian Han, Lan Luo, Yuanyuan Lin, Jian Huang
We propose a debiased stochastic gradient descent algorithm for online statistical inference with high-dimensional data. Our approach combines the debiasing technique developed in high-dimensional statistics with the stochastic gradient descent algorithm. It can be used for efficiently constructing confidence intervals in an online fashion. Our proposed algorithm has several appealing aspects: first, as a one-pass algorithm, it reduces the time complexity; in addition, each update step requires only the current data together with the previous estimate, which reduces the space complexity. We establish the asymptotic normality of the proposed estimator under mild conditions on the sparsity level of the parameter and the data distribution. We conduct numerical experiments to demonstrate the proposed debiased stochastic gradient descent algorithm reaches nominal coverage probability. Furthermore, we illustrate our method with a high-dimensional text dataset.
提出了一种用于高维数据在线统计推断的去偏随机梯度下降算法。我们的方法结合了高维统计中的去偏技术和随机梯度下降算法。它可以用于以在线方式有效地构建置信区间。我们提出的算法有几个吸引人的方面:首先,作为一种单遍算法,它降低了时间复杂度;此外,每个更新步骤只需要当前数据和之前的估计数据,从而降低了空间复杂度。在参数稀疏性水平和数据分布的温和条件下,我们建立了所提估计量的渐近正态性。我们通过数值实验证明了所提出的去偏随机梯度下降算法达到了标称覆盖概率。此外,我们用一个高维文本数据集来说明我们的方法。
{"title":"Online Inference with Debiased Stochastic Gradient Descent","authors":"Ruijian Han, Lan Luo, Yuanyuan Lin, Jian Huang","doi":"10.1093/biomet/asad046","DOIUrl":"https://doi.org/10.1093/biomet/asad046","url":null,"abstract":"\u0000 We propose a debiased stochastic gradient descent algorithm for online statistical inference with high-dimensional data. Our approach combines the debiasing technique developed in high-dimensional statistics with the stochastic gradient descent algorithm. It can be used for efficiently constructing confidence intervals in an online fashion. Our proposed algorithm has several appealing aspects: first, as a one-pass algorithm, it reduces the time complexity; in addition, each update step requires only the current data together with the previous estimate, which reduces the space complexity. We establish the asymptotic normality of the proposed estimator under mild conditions on the sparsity level of the parameter and the data distribution. We conduct numerical experiments to demonstrate the proposed debiased stochastic gradient descent algorithm reaches nominal coverage probability. Furthermore, we illustrate our method with a high-dimensional text dataset.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44970146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An anomaly arising in the analysis of processes with more than one source of variability 在分析具有一个以上变率源的过程时出现的异常
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-07-18 DOI: 10.1093/biomet/asad044
H. Battey, P. McCullagh
It is frequently observed in practice that the Wald statistic gives a poor assessment of the statistical significance of a variance component. This paper provides detailed analytic insight into the phenomenon by way of two simple models, which point to an atypical geometry as the source of the aberration. The latter can in principle be checked numerically to cover situations of arbitrary complexity, such as those arising from elaborate forms of blocking in an experimental context, or models for longitudinal or clustered data. The salient point, echoing Dickey (2020), is that a suitable likelihood-ratio test should always be used for the assessment of variance components.
在实践中经常观察到Wald统计量对方差成分的统计显著性给出了很差的评估。本文通过两个简单的模型对这一现象提供了详细的分析见解,这两个模型指出非典型几何形状是像差的来源。后者原则上可以通过数值检查来涵盖任意复杂性的情况,例如在实验环境中由复杂形式的阻塞引起的情况,或者纵向或集群数据的模型。与Dickey(2020)相呼应的重点是,应该始终使用合适的似然比检验来评估方差成分。
{"title":"An anomaly arising in the analysis of processes with more than one source of variability","authors":"H. Battey, P. McCullagh","doi":"10.1093/biomet/asad044","DOIUrl":"https://doi.org/10.1093/biomet/asad044","url":null,"abstract":"\u0000 It is frequently observed in practice that the Wald statistic gives a poor assessment of the statistical significance of a variance component. This paper provides detailed analytic insight into the phenomenon by way of two simple models, which point to an atypical geometry as the source of the aberration. The latter can in principle be checked numerically to cover situations of arbitrary complexity, such as those arising from elaborate forms of blocking in an experimental context, or models for longitudinal or clustered data. The salient point, echoing Dickey (2020), is that a suitable likelihood-ratio test should always be used for the assessment of variance components.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42978709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1