首页 > 最新文献

Biometrika最新文献

英文 中文
A note on minimax robustness of designs against correlated or heteroscedastic responses 关于针对相关或异方差响应的最小稳健性设计的说明
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-01-20 DOI: 10.1093/biomet/asae001
D P Wiens
Summary We present a result according to which certain functions of covariance matrices are maximized at scalar multiples of the identity matrix. This is used to show that experimental designs that are optimal under an assumption of independent, homoscedastic responses can be minimax robust, in broad classes of alternate covariance structures. In particular it can justify the common practice of disregarding possible dependence, or heteroscedasticity, at the design stage of an experiment.
摘要 我们提出了一个结果,根据这个结果,协方差矩阵的某些函数在同矩阵的标量倍数上达到最大。这一结果表明,在独立、同方差反应假设下为最优的实验设计,在各种交替协方差结构中具有最小稳健性。特别是,它可以证明在实验设计阶段忽略可能的依赖性或异方差性的常见做法是正确的。
{"title":"A note on minimax robustness of designs against correlated or heteroscedastic responses","authors":"D P Wiens","doi":"10.1093/biomet/asae001","DOIUrl":"https://doi.org/10.1093/biomet/asae001","url":null,"abstract":"Summary We present a result according to which certain functions of covariance matrices are maximized at scalar multiples of the identity matrix. This is used to show that experimental designs that are optimal under an assumption of independent, homoscedastic responses can be minimax robust, in broad classes of alternate covariance structures. In particular it can justify the common practice of disregarding possible dependence, or heteroscedasticity, at the design stage of an experiment.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient nonparametric estimation of Toeplitz covariance matrices 托普利兹协方差矩阵的高效非参数估计
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-01-17 DOI: 10.1093/biomet/asae002
K Klockmann, T Krivobokova
A new efficient nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very fast. Moreover, this estimator is shown to be minimax optimal under the spectral norm for a large class of Toeplitz matrices. These results are readily extended to estimation of inverses of Toeplitz covariance matrices. Also, an alternative version of the Whittle likelihood for the spectral density based on the discrete cosine transform is proposed. The method is implemented in the R package vstdct that accompanies the paper.
本文提出了一种新的高效托普利兹协方差矩阵非参数估计器。该估计器基于数据转换,将托普利兹协方差矩阵估计问题转化为近似高斯回归中的均值估计问题。由此产生的托普利兹协方差矩阵估计器在构造上是正定的,完全由数据驱动,计算速度非常快。此外,对于一大类 Toeplitz 矩阵,该估计器在谱规范下是最小最优的。这些结果很容易扩展到对托普利兹协方差矩阵逆的估计。此外,还提出了基于离散余弦变换的谱密度惠特尔似然法的替代版本。本文附带的 R 软件包 vstdct 实现了该方法。
{"title":"Efficient nonparametric estimation of Toeplitz covariance matrices","authors":"K Klockmann, T Krivobokova","doi":"10.1093/biomet/asae002","DOIUrl":"https://doi.org/10.1093/biomet/asae002","url":null,"abstract":"A new efficient nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very fast. Moreover, this estimator is shown to be minimax optimal under the spectral norm for a large class of Toeplitz matrices. These results are readily extended to estimation of inverses of Toeplitz covariance matrices. Also, an alternative version of the Whittle likelihood for the spectral density based on the discrete cosine transform is proposed. The method is implemented in the R package vstdct that accompanies the paper.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Selecting and Conditioning in Multiple Testing and Selective Inference 论多重测试和选择性推理中的选择和条件限制
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-12-22 DOI: 10.1093/biomet/asad078
Jelle J Goeman, Aldo Solari
We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.
我们研究了一类以选择事件为条件的选择性推理方法。这类方法分为两个阶段。首先,从大量假设中选择一个数据驱动的假设集合。随后,在这个数据驱动的集合中,以用于选择的信息为条件进行推理。这类方法的例子包括基本的数据分割、现代的数据雕刻方法和基于多面体阶梯的套索系数选择后推理方法。在本文中,我们对此类方法采用了整体观点,将选择、调节和最终误差控制步骤视为一个方法。从这个角度出发,我们证明了直接定义于全部假设的多重检验方法总是至少与基于选择和条件的选择性推理方法一样强大。即使假设的范围可能是无限的,而且只是隐含定义的,例如在数据分割的情况下,这一结果也是成立的。我们先给出了一般理论和直觉,然后详细研究了几个案例,在这些案例中,转向非选择性或无条件视角可以获得更强的推理能力。
{"title":"On Selecting and Conditioning in Multiple Testing and Selective Inference","authors":"Jelle J Goeman, Aldo Solari","doi":"10.1093/biomet/asad078","DOIUrl":"https://doi.org/10.1093/biomet/asad078","url":null,"abstract":"We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Central limit theorems for local network statistics 本地网络统计的中心极限定理
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-12-22 DOI: 10.1093/biomet/asad080
P A Maugis
Summary Subgraph counts, in particular the number of occurrences of small shapes such as triangles, characterize properties of random networks. As a result, they have seen wide use as network summary statistics. Subgraphs are typically counted globally, making existing approaches unable to describe vertex-specific characteristics. In contrast, rooted subgraphs focus on vertex neighbourhoods, and are fundamental descriptors of local network properties. We derive the asymptotic joint distribution of rooted subgraph counts in inhomogeneous random graphs, a model which generalizes most statistical network models. This result enables a shift in the statistical analysis of graphs, from estimating network summaries, to estimating models linking local network structure and vertex-specific covariates. As an example, we consider a school friendship network and show that gender and race are significant predictors of local friendship patterns.
摘要 子图计数,尤其是三角形等小图形的出现次数,是随机网络属性的特征。因此,它们被广泛用作网络汇总统计。子图通常是全局统计的,因此现有方法无法描述特定顶点的特征。相比之下,有根子图侧重于顶点邻域,是局部网络特性的基本描述符。我们推导出了非均质随机图中有根子图计数的渐近联合分布,这一模型概括了大多数统计网络模型。这一结果使得图的统计分析从估算网络摘要转向估算连接局部网络结构和顶点特定协变量的模型。例如,我们考虑了一个学校友谊网络,结果表明性别和种族是本地友谊模式的重要预测因素。
{"title":"Central limit theorems for local network statistics","authors":"P A Maugis","doi":"10.1093/biomet/asad080","DOIUrl":"https://doi.org/10.1093/biomet/asad080","url":null,"abstract":"Summary Subgraph counts, in particular the number of occurrences of small shapes such as triangles, characterize properties of random networks. As a result, they have seen wide use as network summary statistics. Subgraphs are typically counted globally, making existing approaches unable to describe vertex-specific characteristics. In contrast, rooted subgraphs focus on vertex neighbourhoods, and are fundamental descriptors of local network properties. We derive the asymptotic joint distribution of rooted subgraph counts in inhomogeneous random graphs, a model which generalizes most statistical network models. This result enables a shift in the statistical analysis of graphs, from estimating network summaries, to estimating models linking local network structure and vertex-specific covariates. As an example, we consider a school friendship network and show that gender and race are significant predictors of local friendship patterns.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The state of cumulative sum sequential change point testing seventy years after Page 累积和顺序变化点测试七十年后的状况 Page
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-12-21 DOI: 10.1093/biomet/asad079
Alexander Aue, Claudia Kirch
Quality control charts aim at raising an alarm as soon as sequentially obtained observations of an underlying random process no longer seem to be within stochastic fluctuations prescribed by an ‘in-control’ scenario. Such random processes can often be modelled using the concept of stationarity, or even independence as in most classical works. An important out-of-control scenario is the changepoint alternative, for which the distribution of the process changes at an unknown point in time. In his seminal 1954 Biometrika paper, E. S. Page introduced the famous cumulative sum control charts for changepoint monitoring. Innovatively, decision rules based on cumulative sum procedures took the full history of the process into account, whereas previous procedures were based only on a fixed and typically small number of the most recent observations. The extreme case of using only the most recent observation, often referred to as the Shewhart chart, is more akin to serial outlier than changepoint detection. Page’s cumulative sum approach, introduced seven decades ago, is ubiquitous in modern changepoint analysis, and his original paper has led to a multitude of follow-up papers in different research communities. This review is focused on a particular subfield of this research, namely nonparametric sequential, or online, changepoint tests which are constructed to maintain a desired Type 1 error as opposed to the more traditional approach seeking to minimize the average run length of the procedures. Such tests have originated at the intersection of econometrics and statistics. We trace the development of these tests and highlight their properties, mostly using a simple location model for clarity of exposition, but also review more complex situations such as regression and time series models.
质量控制图的目的是,一旦连续获得的底层随机过程的观测结果似乎不再符合 "在控 "方案所规定的随机波动范围,就会发出警报。此类随机过程通常可以使用静止概念建模,甚至可以使用大多数经典著作中的独立概念建模。一个重要的失控情景是变化点替代方案,即过程的分布在一个未知的时间点发生变化。E. S. Page 在 1954 年发表的开创性论文《Biometrika》中,提出了著名的用于变化点监控的累积和控制图。创新性的是,基于累积总和程序的决策规则考虑到了整个过程的历史,而以前的程序仅基于固定的、通常为数不多的最新观测数据。仅使用最近观测值的极端情况通常被称为休哈特图表,它更类似于序列离群值,而非变化点检测。佩奇在七十年前提出的累积和方法在现代变化点分析中无处不在,他的原始论文在不同研究领域引发了大量后续论文。本综述的重点是这一研究的一个特殊子领域,即非参数序列或在线变化点检验,其构建目的是保持理想的 1 类误差,而不是寻求最小化程序平均运行长度的传统方法。这类检验起源于计量经济学和统计学的交叉学科。我们追溯了这些检验的发展历程,并强调了它们的特性,为了论述清晰,我们主要使用了简单的位置模型,但也回顾了回归和时间序列模型等更复杂的情况。
{"title":"The state of cumulative sum sequential change point testing seventy years after Page","authors":"Alexander Aue, Claudia Kirch","doi":"10.1093/biomet/asad079","DOIUrl":"https://doi.org/10.1093/biomet/asad079","url":null,"abstract":"\u0000 Quality control charts aim at raising an alarm as soon as sequentially obtained observations of an underlying random process no longer seem to be within stochastic fluctuations prescribed by an ‘in-control’ scenario. Such random processes can often be modelled using the concept of stationarity, or even independence as in most classical works. An important out-of-control scenario is the changepoint alternative, for which the distribution of the process changes at an unknown point in time. In his seminal 1954 Biometrika paper, E. S. Page introduced the famous cumulative sum control charts for changepoint monitoring. Innovatively, decision rules based on cumulative sum procedures took the full history of the process into account, whereas previous procedures were based only on a fixed and typically small number of the most recent observations. The extreme case of using only the most recent observation, often referred to as the Shewhart chart, is more akin to serial outlier than changepoint detection. Page’s cumulative sum approach, introduced seven decades ago, is ubiquitous in modern changepoint analysis, and his original paper has led to a multitude of follow-up papers in different research communities. This review is focused on a particular subfield of this research, namely nonparametric sequential, or online, changepoint tests which are constructed to maintain a desired Type 1 error as opposed to the more traditional approach seeking to minimize the average run length of the procedures. Such tests have originated at the intersection of econometrics and statistics. We trace the development of these tests and highlight their properties, mostly using a simple location model for clarity of exposition, but also review more complex situations such as regression and time series models.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138951837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: ‘A cross-validation-based statistical theory for point processes’ 更正:基于交叉验证的点过程统计理论
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-12-20 DOI: 10.1093/biomet/asad077
{"title":"Correction to: ‘A cross-validation-based statistical theory for point processes’","authors":"","doi":"10.1093/biomet/asad077","DOIUrl":"https://doi.org/10.1093/biomet/asad077","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139169267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Phylogenetic Association Analysis with Conditional Rank Correlation 基于条件秩相关的系统发育关联分析
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-12-01 DOI: 10.1093/biomet/asad075
Shulei Wang, Bo Yuan, T Tony Cai, Hongzhe Li
Summary Phylogenetic association analysis plays a crucial role in investigating the correlation between microbial compositions and specific outcomes of interest in microbiome studies. However, existing methods for testing such associations have limitations related to the assumption of a linear association in high-dimensional settings and the handling of confounding effects. Therefore, there is a need for methods capable of characterizing complex associations, including nonmonotonic relationships. This paper introduces a novel phylogenetic association analysis framework and associated tests to address these challenges by employing conditional rank correlation as a measure of association. These tests account for confounders in a fully nonparametric manner, ensuring robustness against outliers and the ability to detect diverse dependencies. The proposed framework aggregates conditional rank correlations for subtrees using a weighted sum and maximum approach to capture both dense and sparse signals. The significance level of the test statistics is determined by calibrating through a nearest neighbour bootstrapping method, which is straightforward to implement and can accommodate additional datasets when available. The practical advantages of the proposed framework are demonstrated through numerical experiments utilizing both simulated and real microbiome datasets.
在微生物组研究中,系统发育关联分析在研究微生物组成与特定结果之间的相关性方面起着至关重要的作用。然而,测试这种关联的现有方法存在与高维环境下线性关联假设和混淆效应处理相关的局限性。因此,需要能够表征复杂关联的方法,包括非单调关系。本文介绍了一种新的系统发育关联分析框架和相关测试,通过使用条件等级相关作为关联度量来解决这些挑战。这些测试以完全非参数的方式考虑混杂因素,确保对异常值的鲁棒性和检测不同依赖关系的能力。所提出的框架使用加权和和最大化方法聚合子树的条件秩相关性,以捕获密集和稀疏信号。测试统计数据的显著性水平是通过最近邻自举方法校准确定的,该方法易于实现,并且可以在可用时容纳额外的数据集。通过利用模拟和真实微生物组数据集的数值实验证明了所提出框架的实际优势。
{"title":"Phylogenetic Association Analysis with Conditional Rank Correlation","authors":"Shulei Wang, Bo Yuan, T Tony Cai, Hongzhe Li","doi":"10.1093/biomet/asad075","DOIUrl":"https://doi.org/10.1093/biomet/asad075","url":null,"abstract":"Summary Phylogenetic association analysis plays a crucial role in investigating the correlation between microbial compositions and specific outcomes of interest in microbiome studies. However, existing methods for testing such associations have limitations related to the assumption of a linear association in high-dimensional settings and the handling of confounding effects. Therefore, there is a need for methods capable of characterizing complex associations, including nonmonotonic relationships. This paper introduces a novel phylogenetic association analysis framework and associated tests to address these challenges by employing conditional rank correlation as a measure of association. These tests account for confounders in a fully nonparametric manner, ensuring robustness against outliers and the ability to detect diverse dependencies. The proposed framework aggregates conditional rank correlations for subtrees using a weighted sum and maximum approach to capture both dense and sparse signals. The significance level of the test statistics is determined by calibrating through a nearest neighbour bootstrapping method, which is straightforward to implement and can accommodate additional datasets when available. The practical advantages of the proposed framework are demonstrated through numerical experiments utilizing both simulated and real microbiome datasets.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Conformalized survival analysis with adaptive cutoffs 具有自适应截断的符合化生存分析
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-12-01 DOI: 10.1093/biomet/asad076
Yu Gui, Rohan Hore, Zhimei Ren, Rina Foygel Barber
Summary This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data.We build on recent work by Candès et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users’ active times on a mobile app.
摘要本文介绍了一种无假设的方法,该方法在剔除数据的情况下为生存时间构造有效和高效的下预测界。我们以cand等人(2021)的最新工作为基础,他们的方法首先对数据进行子集,以丢弃具有早期审查时间的任何数据点,然后使用重加权技术(即加权共形推理(Tibshirani等人,2019))来纠正该子集过程引入的分布偏移。对于我们的新方法,在对数据进行子集设置时,我们允许协变量相关和数据自适应的子集步骤,而不是约束于固定的审查时间阈值,这能够更好地捕获审查机制的异质性。因此,我们的方法可以产生更少保守的lpb,并提供更准确的信息。我们表明,在I型右审查设置中,如果审查机制或生存时间的条件分位数中的任何一个被很好地估计,我们提出的程序实现了几乎精确的边际覆盖,其中在后一种情况下,我们额外具有近似的条件覆盖。通过数值实验验证了该算法的有效性和效率,说明了与其他竞争方法相比,该算法具有优势。最后,将我们的方法应用于实际数据集,生成用户在移动应用程序上活动时间的lpb。
{"title":"Conformalized survival analysis with adaptive cutoffs","authors":"Yu Gui, Rohan Hore, Zhimei Ren, Rina Foygel Barber","doi":"10.1093/biomet/asad076","DOIUrl":"https://doi.org/10.1093/biomet/asad076","url":null,"abstract":"Summary This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data.We build on recent work by Candès et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users’ active times on a mobile app.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Estimation under Data Fusion. 数据融合下的有效估计
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-12-01 Epub Date: 2023-02-06 DOI: 10.1093/biomet/asad007
Sijia Li, Alex Luedtke

We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical simulations, we illustrate marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.

我们的目标是通过将多个来源的数据融合在一起,对光滑的有限维参数进行推断。先前的工作已经研究了在类似的数据融合环境中对各种参数的估计,包括对政策下的平均治疗效果和平均奖励的估计,其中大多数工作将一个具有协变量、行动和奖励的历史数据源与一个具有相同协变量的数据源合并。在这项工作中,我们考虑了一个或多个数据源与目标人群分布的每一部分一致的一般情况,例如,给定行动和协变量的奖励的条件分布。我们描述了在单个分析中将这些数据源融合在一起可能产生的潜在效率增益,我们通过半参数效率界的降低来表征。我们还提供了一种构造实现这些边界的估计量的通用方法。在数值模拟中,我们展示了使用我们提出的估计量而不是它们的自然替代方案在效率上的显著提高。最后,我们通过融合两项HIV疫苗试验的数据,说明了疫苗免疫原性研究中可以实现的效率提高的幅度。
{"title":"Efficient Estimation under Data Fusion.","authors":"Sijia Li, Alex Luedtke","doi":"10.1093/biomet/asad007","DOIUrl":"10.1093/biomet/asad007","url":null,"abstract":"<p><p>We aim to make inferences about a smooth, finite-dimensional parameter by fusing data from multiple sources together. Previous works have studied the estimation of a variety of parameters in similar data fusion settings, including in the estimation of the average treatment effect and average reward under a policy, with the majority of them merging one historical data source with covariates, actions, and rewards and one data source of the same covariates. In this work, we consider the general case where one or more data sources align with each part of the distribution of the target population, for example, the conditional distribution of the reward given actions and covariates. We describe potential gains in efficiency that can arise from fusing these data sources together in a single analysis, which we characterize by a reduction in the semiparametric efficiency bound. We also provide a general means to construct estimators that achieve these bounds. In numerical simulations, we illustrate marked improvements in efficiency from using our proposed estimators rather than their natural alternatives. Finally, we illustrate the magnitude of efficiency gains that can be realized in vaccine immunogenicity studies by fusing data from two HIV vaccine trials.</p>","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10653189/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44309457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Familial inference: Tests for hypotheses on a family of centres 家族推理:对一个中心家族的假设进行检验
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2023-11-28 DOI: 10.1093/biomet/asad074
Ryan Thompson, Catherine S Forbes, Steven N Maceachern, Mario Peruggia
Statistical hypotheses are translations of scientific hypotheses into statements about one or more distributions, often concerning their centre. Tests that assess statistical hypotheses of centre implicitly assume a specific centre, e.g., the mean or median. Yet, scientific hypotheses do not always specify a particular centre. This ambiguity leaves the possibility for a gap between scientific theory and statistical practice that can lead to rejection of a true null. In the face of replicability crises in many scientific disciplines, significant results of this kind are concerning. Rather than testing a single centre, this paper proposes testing a family of plausible centres, such as that induced by the Huber loss function. Each centre in the family generates a testing problem, and the resulting family of hypotheses constitutes a familial hypothesis. A Bayesian nonparametric procedure is devised to test familial hypotheses, enabled by a novel pathwise optimization routine to fit the Huber family. The favourable properties of the new test are demonstrated theoretically and experimentally. Two examples from psychology serve as real-world case studies.
统计假设是将科学假设转化为关于一个或多个分布的陈述,通常与它们的中心有关。评估中心的统计假设的检验隐含地假设一个特定的中心,例如,平均值或中位数。然而,科学假设并不总是指定一个特定的中心。这种模糊性使科学理论和统计实践之间存在差距的可能性,从而导致拒绝真正的零值。面对许多科学学科的可复制性危机,这类重大结果令人担忧。本文提出测试一系列似是而非的中心,例如由Huber损失函数引起的似是而非的中心。家族中的每个中心都会产生一个测试问题,由此产生的假设家族构成一个家族假设。设计了一个贝叶斯非参数过程来测试家族假设,通过一个新的路径优化程序来拟合Huber家族。理论和实验都证明了新方法的良好性能。心理学中的两个例子可以作为现实世界的案例研究。
{"title":"Familial inference: Tests for hypotheses on a family of centres","authors":"Ryan Thompson, Catherine S Forbes, Steven N Maceachern, Mario Peruggia","doi":"10.1093/biomet/asad074","DOIUrl":"https://doi.org/10.1093/biomet/asad074","url":null,"abstract":"Statistical hypotheses are translations of scientific hypotheses into statements about one or more distributions, often concerning their centre. Tests that assess statistical hypotheses of centre implicitly assume a specific centre, e.g., the mean or median. Yet, scientific hypotheses do not always specify a particular centre. This ambiguity leaves the possibility for a gap between scientific theory and statistical practice that can lead to rejection of a true null. In the face of replicability crises in many scientific disciplines, significant results of this kind are concerning. Rather than testing a single centre, this paper proposes testing a family of plausible centres, such as that induced by the Huber loss function. Each centre in the family generates a testing problem, and the resulting family of hypotheses constitutes a familial hypothesis. A Bayesian nonparametric procedure is devised to test familial hypotheses, enabled by a novel pathwise optimization routine to fit the Huber family. The favourable properties of the new test are demonstrated theoretically and experimentally. Two examples from psychology serve as real-world case studies.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1