首页 > 最新文献

Biometrika最新文献

英文 中文
With random regressors, least squares inference is robust to correlated errors with unknown correlation structure. 利用随机回归量,最小二乘推理对未知相关结构的相关误差具有鲁棒性。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2025-01-01 Epub Date: 2024-10-17 DOI: 10.1093/biomet/asae054
Zifeng Zhang, Peng Ding, Wen Zhou, Haonan Wang

Linear regression is arguably the most widely used statistical method. With fixed regressors and correlated errors, the conventional wisdom is to modify the variance-covariance estimator to accommodate the known correlation structure of the errors. We depart from existing literature by showing that with random regressors, linear regression inference is robust to correlated errors with unknown correlation structure. The existing theoretical analyses for linear regression are no longer valid because even the asymptotic normality of the least squares coefficients breaks down in this regime. We first prove the asymptotic normality of the t statistics by establishing their Berry-Esseen bounds based on a novel probabilistic analysis of self-normalized statistics. We then study the local power of the corresponding t tests and show that, perhaps surprisingly, error correlation can even enhance power in the regime of weak signals. Overall, our results show that linear regression is applicable more broadly than the conventional theory suggests, and they further demonstrate the value of randomization for ensuring robustness of inference.

线性回归可以说是应用最广泛的统计方法。对于固定的回归量和相关误差,传统的方法是修改方差-协方差估计量以适应已知的误差相关结构。与已有文献不同的是,在随机回归量的情况下,线性回归推理对未知相关结构的相关误差具有鲁棒性。现有的线性回归理论分析不再有效,因为最小二乘系数的渐近正态性在这种情况下也会失效。基于一种新的自归一化统计量的概率分析,我们首先通过建立t统计量的Berry-Esseen界证明了t统计量的渐近正态性。然后,我们研究了相应t检验的局部功率,并表明,也许令人惊讶的是,误差相关性甚至可以增强弱信号区域的功率。总体而言,我们的结果表明线性回归比传统理论所建议的更广泛地适用,并且它们进一步证明了随机化对于确保推理的鲁棒性的价值。
{"title":"With random regressors, least squares inference is robust to correlated errors with unknown correlation structure.","authors":"Zifeng Zhang, Peng Ding, Wen Zhou, Haonan Wang","doi":"10.1093/biomet/asae054","DOIUrl":"10.1093/biomet/asae054","url":null,"abstract":"<p><p>Linear regression is arguably the most widely used statistical method. With fixed regressors and correlated errors, the conventional wisdom is to modify the variance-covariance estimator to accommodate the known correlation structure of the errors. We depart from existing literature by showing that with random regressors, linear regression inference is robust to correlated errors with unknown correlation structure. The existing theoretical analyses for linear regression are no longer valid because even the asymptotic normality of the least squares coefficients breaks down in this regime. We first prove the asymptotic normality of the <math><mi>t</mi></math> statistics by establishing their Berry-Esseen bounds based on a novel probabilistic analysis of self-normalized statistics. We then study the local power of the corresponding <math><mi>t</mi></math> tests and show that, perhaps surprisingly, error correlation can even enhance power in the regime of weak signals. Overall, our results show that linear regression is applicable more broadly than the conventional theory suggests, and they further demonstrate the value of randomization for ensuring robustness of inference.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 1","pages":""},"PeriodicalIF":2.8,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12320931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144783434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving randomized controlled trial analysis via data-adaptive borrowing. 通过数据适应性借用改进随机对照试验分析。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2024-12-17 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asae069
Chenyin Gao, Shu Yang, Mingyang Shan, Wenyu Ye, Ilya Lipkovich, Douglas Faries

In recent years, real-world external controls have grown in popularity as a tool to empower randomized placebo-controlled trials, particularly in rare diseases or cases where balanced randomization is unethical or impractical. However, as external controls are not always comparable to the trials, direct borrowing without scrutiny may heavily bias the treatment effect estimator. Our paper proposes a data-adaptive integrative framework capable of preventing unknown biases of the external controls. The adaptive nature is achieved by dynamically sorting out a comparable subset of external controls via bias penalization. Our proposed method can simultaneously achieve (a) the semiparametric efficiency bound when the external controls are comparable and (b) selective borrowing that mitigates the impact of the existence of incomparable external controls. Furthermore, we establish statistical guarantees, including consistency, asymptotic distribution and inference, providing Type-I error control and good power. Extensive simulations and two real-data applications show that the proposed method leads to improved performance over the trial-only estimator across various bias-generating scenarios.

近年来,真实世界的外部对照作为一种增强随机安慰剂对照试验能力的工具越来越受欢迎,尤其是在罕见疾病或平衡随机化不道德或不切实际的情况下。然而,由于外部对照并不总是与试验具有可比性,不加审查地直接借用外部对照可能会使治疗效果估计值产生严重偏差。我们的论文提出了一种数据自适应整合框架,能够防止外部对照的未知偏差。这种适应性是通过偏差惩罚来动态筛选出可比的外部对照子集来实现的。我们提出的方法可以同时实现:(a) 当外部控制具有可比性时的半参数效率约束;(b) 选择性借用,以减轻存在不可比外部控制的影响。此外,我们还建立了统计保证,包括一致性、渐近分布和推理,提供了第一类误差控制和良好的功率。大量的模拟和两个实际数据应用表明,在各种偏差产生的情况下,所提出的方法比单纯试验估计法的性能更佳。
{"title":"Improving randomized controlled trial analysis via data-adaptive borrowing.","authors":"Chenyin Gao, Shu Yang, Mingyang Shan, Wenyu Ye, Ilya Lipkovich, Douglas Faries","doi":"10.1093/biomet/asae069","DOIUrl":"10.1093/biomet/asae069","url":null,"abstract":"<p><p>In recent years, real-world external controls have grown in popularity as a tool to empower randomized placebo-controlled trials, particularly in rare diseases or cases where balanced randomization is unethical or impractical. However, as external controls are not always comparable to the trials, direct borrowing without scrutiny may heavily bias the treatment effect estimator. Our paper proposes a data-adaptive integrative framework capable of preventing unknown biases of the external controls. The adaptive nature is achieved by dynamically sorting out a comparable subset of external controls via bias penalization. Our proposed method can simultaneously achieve (a) the semiparametric efficiency bound when the external controls are comparable and (b) selective borrowing that mitigates the impact of the existence of incomparable external controls. Furthermore, we establish statistical guarantees, including consistency, asymptotic distribution and inference, providing Type-I error control and good power. Extensive simulations and two real-data applications show that the proposed method leads to improved performance over the trial-only estimator across various bias-generating scenarios.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":"asae069"},"PeriodicalIF":2.8,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972012/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143794582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Radial Neighbors for Provably Accurate Scalable Approximations of Gaussian Processes. 高斯过程可证明精确可扩展逼近的径向邻域。
IF 2.4 2区 数学 Q2 BIOLOGY Pub Date : 2024-12-01 Epub Date: 2024-06-14 DOI: 10.1093/biomet/asae029
Yichen Zhu, Michele Peruzzi, Cheng Li, David B Dunson

In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable O ( n ) computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empirical performance, but the lack of theoretical validation leads to unclear guidance in specifying the underlying graphical model and sensitivity to graph choice. We address these issues by introducing radial neighbors Gaussian processes (RadGP), a class of Gaussian processes based on directed acyclic graphs in which directed edges connect every location to all of its neighbors within a predetermined radius. We prove that any radial neighbors Gaussian process can accurately approximate the corresponding unrestricted Gaussian process in Wasserstein-2 distance, with an error rate determined by the approximation radius, the spatial covariance function, and the spatial dispersion of samples. We offer further empirical validation of our approach via applications on simulated and real world data showing excellent performance in both prior and posterior approximations to the original Gaussian process.

在具有大量样本量的地统计问题中,高斯过程可以用稀疏有向无环图逼近,以达到可扩展的O (n)计算复杂度。在这些模型中,通常假设每个位置的数据有条件地依赖于一组父节点,这些父节点通常包括最近邻居的子集。这些方法通常表现出出色的经验表现,但缺乏理论验证导致在指定底层图形模型和对图形选择的敏感性方面指导不明确。我们通过引入径向邻居高斯过程(RadGP)来解决这些问题,RadGP是一类基于有向无环图的高斯过程,其中有向边将每个位置连接到预定半径内的所有邻居。我们证明了任意径向邻近高斯过程都能在Wasserstein-2距离上精确地逼近相应的不受限制高斯过程,其误差率由近似半径、空间协方差函数和样本的空间色散决定。我们通过模拟和真实世界数据的应用对我们的方法进行了进一步的经验验证,这些数据在原始高斯过程的先验和后验近似中都显示出优异的性能。
{"title":"Radial Neighbors for Provably Accurate Scalable Approximations of Gaussian Processes.","authors":"Yichen Zhu, Michele Peruzzi, Cheng Li, David B Dunson","doi":"10.1093/biomet/asae029","DOIUrl":"10.1093/biomet/asae029","url":null,"abstract":"<p><p>In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable <math><mi>O</mi> <mo>(</mo> <mi>n</mi> <mo>)</mo></math> computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empirical performance, but the lack of theoretical validation leads to unclear guidance in specifying the underlying graphical model and sensitivity to graph choice. We address these issues by introducing radial neighbors Gaussian processes (RadGP), a class of Gaussian processes based on directed acyclic graphs in which directed edges connect every location to all of its neighbors within a predetermined radius. We prove that any radial neighbors Gaussian process can accurately approximate the corresponding unrestricted Gaussian process in Wasserstein-2 distance, with an error rate determined by the approximation radius, the spatial covariance function, and the spatial dispersion of samples. We offer further empirical validation of our approach via applications on simulated and real world data showing excellent performance in both prior and posterior approximations to the original Gaussian process.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"111 4","pages":"1151-1167"},"PeriodicalIF":2.4,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11993192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143967374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using negative controls to identify causal effects with invalid instrumental variables. 利用无效工具变量的负控制来确定因果效应。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2024-11-22 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asae064
O Dukes, D B Richardson, Z Shahn, J M Robins, E J Tchetgen Tchetgen

Many proposals for the identification of causal effects require an instrumental variable that satisfies strong, untestable unconfoundedness and exclusion restriction assumptions. In this paper, we show how one can potentially identify causal effects under violations of these assumptions by harnessing a negative control population or outcome. This strategy allows one to leverage subpopulations for whom the exposure is degenerate, and requires that the instrument-outcome association satisfies a certain parallel trend condition. We develop semiparametric efficiency theory for a general instrumental variable model, and obtain a multiply robust, locally efficient estimator of the average treatment effect in the treated. The utility of the estimators is demonstrated in simulation studies and an analysis of the Life Span Study.

许多确定因果关系的建议需要一个工具变量来满足强的、不可检验的非混杂性和排除限制假设。在本文中,我们展示了如何通过利用负面控制人群或结果来潜在地识别违反这些假设的因果关系。该策略允许利用暴露退化的亚种群,并要求工具-结果关联满足一定的平行趋势条件。我们发展了一般工具变量模型的半参数效率理论,并获得了被处理对象平均处理效果的多重鲁棒局部有效估计。在模拟研究和寿命研究的分析中证明了估计器的效用。
{"title":"Using negative controls to identify causal effects with invalid instrumental variables.","authors":"O Dukes, D B Richardson, Z Shahn, J M Robins, E J Tchetgen Tchetgen","doi":"10.1093/biomet/asae064","DOIUrl":"10.1093/biomet/asae064","url":null,"abstract":"<p><p>Many proposals for the identification of causal effects require an instrumental variable that satisfies strong, untestable unconfoundedness and exclusion restriction assumptions. In this paper, we show how one can potentially identify causal effects under violations of these assumptions by harnessing a negative control population or outcome. This strategy allows one to leverage subpopulations for whom the exposure is degenerate, and requires that the instrument-outcome association satisfies a certain parallel trend condition. We develop semiparametric efficiency theory for a general instrumental variable model, and obtain a multiply robust, locally efficient estimator of the average treatment effect in the treated. The utility of the estimators is demonstrated in simulation studies and an analysis of the Life Span Study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 1","pages":"asae064"},"PeriodicalIF":2.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878522/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143566025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing variable importance in survival analysis using machine learning. 使用机器学习评估生存分析中的变量重要性。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2024-11-04 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asae061
C J Wolock, P B Gilbert, N Simon, M Carone

Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of HIV acquisition over the intended follow-up period, and investigators may wish to understand how much certain types of predictors, such as behavioural factors, contribute to overall predictiveness. Time-to-event outcomes such as time to HIV acquisition are often subject to right censoring, and existing methods for assessing variable importance are typically not intended to be used in this setting. We describe a broad class of algorithm-agnostic variable importance measures for prediction in the context of survival data. We propose a nonparametric efficient estimation procedure that incorporates flexible learning of nuisance parameters, yields asymptotically valid inference and enjoys double robustness. We assess the performance of our proposed procedure via numerical simulations and analyse data from the HVTN 702 vaccine trial to inform enrolment strategies for future HIV vaccine trials.

给定一组可用于包含在预测模型中的特征,量化特征子集对于当前预测任务的相对重要性可能是有意义的。例如,在艾滋病毒疫苗试验中,参与者的基线特征被用来预测在预期随访期间感染艾滋病毒的概率,调查人员可能希望了解某些类型的预测因素,如行为因素,对总体预测有多大贡献。时间到事件的结果,如感染艾滋病毒的时间,经常受到正确的审查,现有的评估变量重要性的方法通常不打算在这种情况下使用。我们在生存数据的背景下描述了一类广泛的算法不可知变量重要性的预测措施。我们提出了一种非参数有效估计方法,该方法结合了讨厌参数的灵活学习,产生渐近有效的推理,并具有双重鲁棒性。我们通过数值模拟和分析HVTN 702疫苗试验的数据来评估我们提出的程序的性能,为未来HIV疫苗试验的招募策略提供信息。
{"title":"Assessing variable importance in survival analysis using machine learning.","authors":"C J Wolock, P B Gilbert, N Simon, M Carone","doi":"10.1093/biomet/asae061","DOIUrl":"10.1093/biomet/asae061","url":null,"abstract":"<p><p>Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of HIV acquisition over the intended follow-up period, and investigators may wish to understand how much certain types of predictors, such as behavioural factors, contribute to overall predictiveness. Time-to-event outcomes such as time to HIV acquisition are often subject to right censoring, and existing methods for assessing variable importance are typically not intended to be used in this setting. We describe a broad class of algorithm-agnostic variable importance measures for prediction in the context of survival data. We propose a nonparametric efficient estimation procedure that incorporates flexible learning of nuisance parameters, yields asymptotically valid inference and enjoys double robustness. We assess the performance of our proposed procedure via numerical simulations and analyse data from the HVTN 702 vaccine trial to inform enrolment strategies for future HIV vaccine trials.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 2","pages":"asae061"},"PeriodicalIF":2.8,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11910984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143656050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Functional principal component analysis with informative observation times. 功能主成分分析与信息观察时间。
IF 2.8 2区 数学 Q2 BIOLOGY Pub Date : 2024-10-17 eCollection Date: 2025-01-01 DOI: 10.1093/biomet/asae055
Peijun Sang, Dehan Kong, Shu Yang

Functional principal component analysis has been shown to be invaluable for revealing variation modes of longitudinal outcomes, which serve as important building blocks for forecasting and model building. Decades of research have advanced methods for functional principal component analysis, often assuming independence between the observation times and longitudinal outcomes. Yet such assumptions are fragile in real-world settings where observation times may be driven by outcome-related processes. Rather than ignoring the informative observation time process, we explicitly model the observational times by a general counting process dependent on time-varying prognostic factors. Identification of the mean, covariance function and functional principal components ensues via inverse intensity weighting. We propose using weighted penalized splines for estimation and establish consistency and convergence rates for the weighted estimators. Simulation studies demonstrate that the proposed estimators are substantially more accurate than the existing ones in the presence of a correlation between the observation time process and the longitudinal outcome process. We further examine the finite-sample performance of the proposed method using the Acute Infection and Early Disease Research Program study.

功能主成分分析对于揭示纵向结果的变化模式具有不可估量的价值,这是预测和模型构建的重要组成部分。几十年的研究已经有了先进的功能主成分分析方法,通常假设观察时间和纵向结果之间是独立的。然而,在现实环境中,这种假设是脆弱的,因为观察时间可能受到与结果相关的过程的驱动。我们不是忽略观测时间过程的信息,而是通过依赖于时变预测因素的一般计数过程来明确地模拟观测时间。通过逆强度加权确定均值、协方差函数和功能主成分。我们提出使用加权惩罚样条进行估计,并建立了加权估计的一致性和收敛率。仿真研究表明,在观测时间过程和纵向结果过程之间存在相关性的情况下,所提出的估计器比现有的估计器要准确得多。我们使用急性感染和早期疾病研究项目研究进一步检验了所提出方法的有限样本性能。
{"title":"Functional principal component analysis with informative observation times.","authors":"Peijun Sang, Dehan Kong, Shu Yang","doi":"10.1093/biomet/asae055","DOIUrl":"10.1093/biomet/asae055","url":null,"abstract":"<p><p>Functional principal component analysis has been shown to be invaluable for revealing variation modes of longitudinal outcomes, which serve as important building blocks for forecasting and model building. Decades of research have advanced methods for functional principal component analysis, often assuming independence between the observation times and longitudinal outcomes. Yet such assumptions are fragile in real-world settings where observation times may be driven by outcome-related processes. Rather than ignoring the informative observation time process, we explicitly model the observational times by a general counting process dependent on time-varying prognostic factors. Identification of the mean, covariance function and functional principal components ensues via inverse intensity weighting. We propose using weighted penalized splines for estimation and establish consistency and convergence rates for the weighted estimators. Simulation studies demonstrate that the proposed estimators are substantially more accurate than the existing ones in the presence of a correlation between the observation time process and the longitudinal outcome process. We further examine the finite-sample performance of the proposed method using the Acute Infection and Early Disease Research Program study.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"112 1","pages":"asae055"},"PeriodicalIF":2.8,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11771518/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143057923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local Bootstrap for Network Data 网络数据的本地引导
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-09-09 DOI: 10.1093/biomet/asae046
Tianhai Zu, Yichen Qin
SUMMARY In network analysis, we frequently need to conduct inference for network parameters based on one observed network. Since the sampling distribution of the statistic is often unknown, we need to rely on the bootstrap. However, due to the complex dependence structure among vertices, existing bootstrap methods often yield unsatisfactory performance, especially under small or moderate sample sizes. To this end, we propose a new network bootstrap procedure, termed local bootstrap, to estimate the standard errors of network statistics. We propose to resample the observed vertices along with their neighbor sets, and reconstruct the edges between the resampled vertices by drawing from the set of edges connecting their neighbor sets. We justify the proposed method theoretically with desirable asymptotic properties for statistics such as motif density, and demonstrate its excellent numerical performance in small and moderate sample sizes. Our method includes several existing methods, such as the empirical graphon bootstrap, as special cases. We investigate the advantages of the proposed methods over the existing methods through the lens of edge randomness, vertex heterogeneity, neighbor set size, which shed some light on the complex issue of network bootstrapping.
摘要 在网络分析中,我们经常需要根据一个观测网络来推断网络参数。由于统计量的抽样分布往往是未知的,因此我们需要依靠自举法。然而,由于顶点之间存在复杂的依赖结构,现有的自举方法往往效果不佳,尤其是在样本量较小或中等的情况下。为此,我们提出了一种新的网络引导程序,称为局部引导,用于估计网络统计的标准误差。我们建议对观察到的顶点及其邻居集进行重新采样,并从连接其邻居集的边缘集中抽取,重建重新采样顶点之间的边缘。我们从理论上证明了所提议的方法对图案密度等统计数据具有理想的渐近特性,并证明了该方法在中小规模样本中的优异数值性能。我们的方法包括几种现有方法,如经验图引导法,作为特例。我们从边缘随机性、顶点异质性、邻居集大小等角度研究了所提方法相对于现有方法的优势,从而揭示了网络引导这一复杂问题。
{"title":"Local Bootstrap for Network Data","authors":"Tianhai Zu, Yichen Qin","doi":"10.1093/biomet/asae046","DOIUrl":"https://doi.org/10.1093/biomet/asae046","url":null,"abstract":"SUMMARY In network analysis, we frequently need to conduct inference for network parameters based on one observed network. Since the sampling distribution of the statistic is often unknown, we need to rely on the bootstrap. However, due to the complex dependence structure among vertices, existing bootstrap methods often yield unsatisfactory performance, especially under small or moderate sample sizes. To this end, we propose a new network bootstrap procedure, termed local bootstrap, to estimate the standard errors of network statistics. We propose to resample the observed vertices along with their neighbor sets, and reconstruct the edges between the resampled vertices by drawing from the set of edges connecting their neighbor sets. We justify the proposed method theoretically with desirable asymptotic properties for statistics such as motif density, and demonstrate its excellent numerical performance in small and moderate sample sizes. Our method includes several existing methods, such as the empirical graphon bootstrap, as special cases. We investigate the advantages of the proposed methods over the existing methods through the lens of edge randomness, vertex heterogeneity, neighbor set size, which shed some light on the complex issue of network bootstrapping.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"15 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Simple Bootstrap for Chatterjee's Rank Correlation 查特吉等级相关性的简单引导法
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-08-26 DOI: 10.1093/biomet/asae045
H Dette, M Kroll
SUMMARY We prove that an m out of n bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that m out of n bootstrap works for continuous as well as for discrete data with independent coordinates; furthermore, simulations indicate that it also performs well for discrete data with dependent coordinates, and that it outperforms alternative estimation methods. Consistency of the bootstrap is proved in the Kolmogorov as well as in the Wasserstein distance.
摘要 我们证明,只要能确定查特吉秩相关性的渐近正态性,则查特吉秩相关性的 n 分之 m 引导程序是一致的。特别是,我们证明了 n 分之 m 引导法既适用于连续数据,也适用于具有独立坐标的离散数据;此外,模拟结果表明,它对具有从属坐标的离散数据也有良好的表现,并且优于其他估计方法。在科尔莫哥洛夫距离和瓦瑟斯坦距离中都证明了引导法的一致性。
{"title":"A Simple Bootstrap for Chatterjee's Rank Correlation","authors":"H Dette, M Kroll","doi":"10.1093/biomet/asae045","DOIUrl":"https://doi.org/10.1093/biomet/asae045","url":null,"abstract":"SUMMARY We prove that an m out of n bootstrap procedure for Chatterjee's rank correlation is consistent whenever asymptotic normality of Chatterjee's rank correlation can be established. In particular, we prove that m out of n bootstrap works for continuous as well as for discrete data with independent coordinates; furthermore, simulations indicate that it also performs well for discrete data with dependent coordinates, and that it outperforms alternative estimation methods. Consistency of the bootstrap is proved in the Kolmogorov as well as in the Wasserstein distance.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity models and bounds under sequential unmeasured confounding in longitudinal studies 纵向研究中连续未测量混杂情况下的灵敏度模型和界限
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-08-20 DOI: 10.1093/biomet/asae044
Zhiqiang Tan
Consider sensitivity analysis for causal inference in a longitudinal study with time-varying treatments and covariates. It is of interest to assess the worst-case possible values of counterfactual-outcome means and average treatment effects under sequential unmeasured confounding. We formulate several multi-period sensitivity models to relax the corresponding versions of the assumption of sequential non-confounding. The primary sensitivity model involves only counterfactual outcomes, whereas the joint and product sensitivity models involve both counterfactual covariates and outcomes. We establish and compare explicit representations for the sharp and conservative bounds at the population level through convex optimization, depending only on the observed data. These results provide for the first time a satisfactory generalization from the marginal sensitivity model in the cross-sectional setting.
考虑在具有时变治疗和协变量的纵向研究中进行因果推断的敏感性分析。我们有兴趣评估在连续的未测量混杂情况下,反事实结果均值和平均治疗效果的最坏情况可能值。我们制定了几个多期敏感性模型,以放松相应版本的连续非混杂假设。主要灵敏度模型只涉及反事实结果,而联合灵敏度模型和乘积灵敏度模型则涉及反事实协变量和结果。我们仅根据观测数据,通过凸优化,在群体水平上建立并比较了锐界和保守界的明确表示。这些结果首次令人满意地概括了横截面环境下的边际敏感性模型。
{"title":"Sensitivity models and bounds under sequential unmeasured confounding in longitudinal studies","authors":"Zhiqiang Tan","doi":"10.1093/biomet/asae044","DOIUrl":"https://doi.org/10.1093/biomet/asae044","url":null,"abstract":"Consider sensitivity analysis for causal inference in a longitudinal study with time-varying treatments and covariates. It is of interest to assess the worst-case possible values of counterfactual-outcome means and average treatment effects under sequential unmeasured confounding. We formulate several multi-period sensitivity models to relax the corresponding versions of the assumption of sequential non-confounding. The primary sensitivity model involves only counterfactual outcomes, whereas the joint and product sensitivity models involve both counterfactual covariates and outcomes. We establish and compare explicit representations for the sharp and conservative bounds at the population level through convex optimization, depending only on the observed data. These results provide for the first time a satisfactory generalization from the marginal sensitivity model in the cross-sectional setting.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Studies in the history of probability and statistics, LI: the first conditional logistic regression 概率论与统计学史研究,LI:第一个条件逻辑回归
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-08-09 DOI: 10.1093/biomet/asae038
J A Hanley
Statisticians and epidemiologists generally cite the publications by Prentice & Breslow and by Breslow et al. in 1978 as the first description and use of conditional logistic regression, while economists cite the 1973 book chapter by Nobel laureate McFadden. We describe the until-now-unrecognized use of, and way of fitting, this model in 1934 by Lionel Penrose and Ronald Fisher.
统计学家和流行病学家一般将 Prentice & Breslow 和 Breslow 等人 1978 年发表的文章作为条件对数回归的首次描述和使用,而经济学家则引用诺贝尔奖得主麦克法登 1973 年在书中的章节。我们描述的是莱昂内尔-彭罗斯和罗纳德-费舍尔在 1934 年对这一模型的使用和拟合方法,直到现在还未得到认可。
{"title":"Studies in the history of probability and statistics, LI: the first conditional logistic regression","authors":"J A Hanley","doi":"10.1093/biomet/asae038","DOIUrl":"https://doi.org/10.1093/biomet/asae038","url":null,"abstract":"Statisticians and epidemiologists generally cite the publications by Prentice &amp; Breslow and by Breslow et al. in 1978 as the first description and use of conditional logistic regression, while economists cite the 1973 book chapter by Nobel laureate McFadden. We describe the until-now-unrecognized use of, and way of fitting, this model in 1934 by Lionel Penrose and Ronald Fisher.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"116 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1