首页 > 最新文献

Biometrika最新文献

英文 中文
Regression analysis of group-tested current status data 对分组测试的现状数据进行回归分析
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-02-08 DOI: 10.1093/biomet/asae006
Shuwei Li, Tao Hu, Lianming Wang, Christopher S McMahan, Joshua M Tebbs
Summary Group testing is an effective way to reduce the time and cost associated with conducting large-scale screening for infectious diseases. Benefits are realized through testing pools formed by combining specimens, such as blood or urine, from different individuals. In some studies, individuals are assessed only once and a time-to-event endpoint is recorded, for example, the time until infection. Combining group testing with this type of endpoint results in group-tested current status data (?). To analyse these complex data, we propose methods which estimate a proportional hazards regression model based on test outcomes from measuring the pools. A sieve maximum likelihood estimation approach is developed that approximates the cumulative baseline hazard function with a piecewise constant function. To identify the sieve estimator, a computationally efficient expectation-maximization algorithm is derived by using data augmentation. Asymptotic properties of both the parametric and nonparametric components of the sieve estimator are then established by applying modern empirical process theory. Numerical results from simulation studies show that our proposed method performs nominally and has advantages over the corresponding estimation method based on individual testing results. We illustrate our work by analysing a chlamydia dataset collected by the State Hygienic Laboratory at the University of Iowa.
摘要 群体检测是减少大规模传染病筛查所需的时间和成本的有效方法。通过将不同个体的血液或尿液等标本组合成检测池,可实现效益。在某些研究中,只对个体进行一次评估,并记录时间终点,如感染前的时间。将分组检测与这类终点相结合,就会得到分组检测的当前状态数据(?)为了分析这些复杂的数据,我们提出了一些方法,这些方法基于测量池的测试结果来估计比例危险回归模型。我们开发了一种筛网最大似然估计方法,用一个片断常数函数来近似累积基线危害函数。为了确定筛网估计器,利用数据扩增推导出了一种计算高效的期望最大化算法。然后,通过应用现代经验过程理论,建立了筛网估计器的参数和非参数部分的渐近特性。模拟研究的数值结果表明,与基于单个测试结果的相应估算方法相比,我们提出的方法具有显著的性能和优势。我们通过分析爱荷华大学国家卫生实验室收集的衣原体数据集来说明我们的工作。
{"title":"Regression analysis of group-tested current status data","authors":"Shuwei Li, Tao Hu, Lianming Wang, Christopher S McMahan, Joshua M Tebbs","doi":"10.1093/biomet/asae006","DOIUrl":"https://doi.org/10.1093/biomet/asae006","url":null,"abstract":"Summary Group testing is an effective way to reduce the time and cost associated with conducting large-scale screening for infectious diseases. Benefits are realized through testing pools formed by combining specimens, such as blood or urine, from different individuals. In some studies, individuals are assessed only once and a time-to-event endpoint is recorded, for example, the time until infection. Combining group testing with this type of endpoint results in group-tested current status data (?). To analyse these complex data, we propose methods which estimate a proportional hazards regression model based on test outcomes from measuring the pools. A sieve maximum likelihood estimation approach is developed that approximates the cumulative baseline hazard function with a piecewise constant function. To identify the sieve estimator, a computationally efficient expectation-maximization algorithm is derived by using data augmentation. Asymptotic properties of both the parametric and nonparametric components of the sieve estimator are then established by applying modern empirical process theory. Numerical results from simulation studies show that our proposed method performs nominally and has advantages over the corresponding estimation method based on individual testing results. We illustrate our work by analysing a chlamydia dataset collected by the State Hygienic Laboratory at the University of Iowa.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"25 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explicit solutions for the asymptotically-optimal bandwidth in cross-validation 交叉验证中渐近最优带宽的显式解法
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-02-08 DOI: 10.1093/biomet/asae007
Karim M Abadir, Michel Lubrano
Summary We show that least squares cross-validation methods share a common structure which has an explicit asymptotic solution, when the chosen kernel is asymptotically separable in bandwidth and data. For density estimation with a multivariate Student t(ν) kernel, the cross-validation criterion becomes asymptotically equivalent to a polynomial of only three terms. Our bandwidth formulae are simple and noniterative thus leading to very fast computations, their integrated squared-error dominates traditional cross-validation implementations, they alleviate the notorious sample variability of cross-validation, and overcome its breakdown in the case of repeated observations. We illustrate our method with univariate and bivariate applications, of density estimation and nonparametric regressions, to a large dataset of Michigan State University academic wages and experience.
摘要 我们证明了最小二乘交叉验证方法有一个共同的结构,当所选的核在带宽和数据上是渐进可分的时候,这个结构有一个明确的渐进解。对于使用多变量 Student t(ν) 核的密度估计,交叉验证准则在渐近上等价于一个只有三个项的多项式。我们的带宽计算公式简单且无需迭代,因此计算速度非常快,其综合平方误差在传统的交叉验证实现中占优势,缓解了交叉验证中众所周知的样本变异性,并克服了其在重复观测情况下的缺陷。我们在密歇根州立大学学术工资和经验的大型数据集上,用密度估计和非参数回归的单变量和双变量应用来说明我们的方法。
{"title":"Explicit solutions for the asymptotically-optimal bandwidth in cross-validation","authors":"Karim M Abadir, Michel Lubrano","doi":"10.1093/biomet/asae007","DOIUrl":"https://doi.org/10.1093/biomet/asae007","url":null,"abstract":"Summary We show that least squares cross-validation methods share a common structure which has an explicit asymptotic solution, when the chosen kernel is asymptotically separable in bandwidth and data. For density estimation with a multivariate Student t(ν) kernel, the cross-validation criterion becomes asymptotically equivalent to a polynomial of only three terms. Our bandwidth formulae are simple and noniterative thus leading to very fast computations, their integrated squared-error dominates traditional cross-validation implementations, they alleviate the notorious sample variability of cross-validation, and overcome its breakdown in the case of repeated observations. We illustrate our method with univariate and bivariate applications, of density estimation and nonparametric regressions, to a large dataset of Michigan State University academic wages and experience.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"3 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the failure of the bootstrap for Chatterjee's rank correlation 关于查特吉秩相关自举法的失败
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-02-04 DOI: 10.1093/biomet/asae004
Zhexiao Lin, Fang Han
Summary While researchers commonly use the bootstrap to quantify the uncertainty of an estimator, it has been noticed that the standard bootstrap, in general, does not work for Chatterjee's rank correlation. In this paper, we provide proof of this issue under an additional independence assumption, and complement our theory with simulation evidence for general settings. Chatterjee's rank correlation thus falls into a category of statistics that are asymptotically normal but bootstrap inconsistent. Valid inferential methods in this case are Chatterjee's original proposal for testing independence and Lin & Han (2022) 's analytic asymptotic variance estimator for more general purposes.
摘要 虽然研究人员通常使用引导法来量化估计器的不确定性,但人们注意到标准引导法一般不适用于查特吉秩相关。在本文中,我们在额外的独立性假设下证明了这一问题,并用一般情况下的模拟证据补充了我们的理论。因此,查特吉秩相关属于渐近正态但自举不一致的统计类别。在这种情况下,有效的推论方法是 Chatterjee 最初提出的用于检验独立性的方法,以及 Lin & Han (2022) 用于更一般目的的解析渐近方差估计器。
{"title":"On the failure of the bootstrap for Chatterjee's rank correlation","authors":"Zhexiao Lin, Fang Han","doi":"10.1093/biomet/asae004","DOIUrl":"https://doi.org/10.1093/biomet/asae004","url":null,"abstract":"Summary While researchers commonly use the bootstrap to quantify the uncertainty of an estimator, it has been noticed that the standard bootstrap, in general, does not work for Chatterjee's rank correlation. In this paper, we provide proof of this issue under an additional independence assumption, and complement our theory with simulation evidence for general settings. Chatterjee's rank correlation thus falls into a category of statistics that are asymptotically normal but bootstrap inconsistent. Valid inferential methods in this case are Chatterjee's original proposal for testing independence and Lin & Han (2022) 's analytic asymptotic variance estimator for more general purposes.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"125 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotically constant risk estimator of the time-average variance constant 时间平均方差常数的渐近恒定风险估计器
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-02-03 DOI: 10.1093/biomet/asae003
K W Chan, C Y Yau
Summary Estimation of the time-average variance constant is important for statistical analyses involving dependent data. This problem is difficult as it relies on a bandwidth parameter. Specifically, the optimal choices of the bandwidths of all existing estimators depend on the estimand itself and another unknown parameter which is very difficult to estimate. Thus, optimal variance estimation is unachievable. In this paper, we introduce a concept of converging flat-top kernels for constructing variance estimators whose optimal bandwidths are free of unknown parameters asymptotically and hence can be computed easily. We prove that the new estimator has an asymptotically constant risk and is locally asymptotically minimax.
摘要 估算时间平均方差常数对于涉及从属数据的统计分析非常重要。这个问题很难解决,因为它依赖于一个带宽参数。具体来说,所有现有估计器带宽的最优选择都取决于估计变量本身和另一个很难估计的未知参数。因此,最优方差估计是无法实现的。在本文中,我们引入了收敛平顶核的概念,用于构建方差估计器,其最优带宽在渐近上不受未知参数的影响,因此可以轻松计算。我们证明了新的估计器具有渐近恒定的风险,并且是局部渐近最小的。
{"title":"Asymptotically constant risk estimator of the time-average variance constant","authors":"K W Chan, C Y Yau","doi":"10.1093/biomet/asae003","DOIUrl":"https://doi.org/10.1093/biomet/asae003","url":null,"abstract":"Summary Estimation of the time-average variance constant is important for statistical analyses involving dependent data. This problem is difficult as it relies on a bandwidth parameter. Specifically, the optimal choices of the bandwidths of all existing estimators depend on the estimand itself and another unknown parameter which is very difficult to estimate. Thus, optimal variance estimation is unachievable. In this paper, we introduce a concept of converging flat-top kernels for constructing variance estimators whose optimal bandwidths are free of unknown parameters asymptotically and hence can be computed easily. We prove that the new estimator has an asymptotically constant risk and is locally asymptotically minimax.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"16 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139678925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A note on minimax robustness of designs against correlated or heteroscedastic responses 关于针对相关或异方差响应的最小稳健性设计的说明
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-01-20 DOI: 10.1093/biomet/asae001
D P Wiens
Summary We present a result according to which certain functions of covariance matrices are maximized at scalar multiples of the identity matrix. This is used to show that experimental designs that are optimal under an assumption of independent, homoscedastic responses can be minimax robust, in broad classes of alternate covariance structures. In particular it can justify the common practice of disregarding possible dependence, or heteroscedasticity, at the design stage of an experiment.
摘要 我们提出了一个结果,根据这个结果,协方差矩阵的某些函数在同矩阵的标量倍数上达到最大。这一结果表明,在独立、同方差反应假设下为最优的实验设计,在各种交替协方差结构中具有最小稳健性。特别是,它可以证明在实验设计阶段忽略可能的依赖性或异方差性的常见做法是正确的。
{"title":"A note on minimax robustness of designs against correlated or heteroscedastic responses","authors":"D P Wiens","doi":"10.1093/biomet/asae001","DOIUrl":"https://doi.org/10.1093/biomet/asae001","url":null,"abstract":"Summary We present a result according to which certain functions of covariance matrices are maximized at scalar multiples of the identity matrix. This is used to show that experimental designs that are optimal under an assumption of independent, homoscedastic responses can be minimax robust, in broad classes of alternate covariance structures. In particular it can justify the common practice of disregarding possible dependence, or heteroscedasticity, at the design stage of an experiment.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"47 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient nonparametric estimation of Toeplitz covariance matrices 托普利兹协方差矩阵的高效非参数估计
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-01-17 DOI: 10.1093/biomet/asae002
K Klockmann, T Krivobokova
A new efficient nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very fast. Moreover, this estimator is shown to be minimax optimal under the spectral norm for a large class of Toeplitz matrices. These results are readily extended to estimation of inverses of Toeplitz covariance matrices. Also, an alternative version of the Whittle likelihood for the spectral density based on the discrete cosine transform is proposed. The method is implemented in the R package vstdct that accompanies the paper.
本文提出了一种新的高效托普利兹协方差矩阵非参数估计器。该估计器基于数据转换,将托普利兹协方差矩阵估计问题转化为近似高斯回归中的均值估计问题。由此产生的托普利兹协方差矩阵估计器在构造上是正定的,完全由数据驱动,计算速度非常快。此外,对于一大类 Toeplitz 矩阵,该估计器在谱规范下是最小最优的。这些结果很容易扩展到对托普利兹协方差矩阵逆的估计。此外,还提出了基于离散余弦变换的谱密度惠特尔似然法的替代版本。本文附带的 R 软件包 vstdct 实现了该方法。
{"title":"Efficient nonparametric estimation of Toeplitz covariance matrices","authors":"K Klockmann, T Krivobokova","doi":"10.1093/biomet/asae002","DOIUrl":"https://doi.org/10.1093/biomet/asae002","url":null,"abstract":"A new efficient nonparametric estimator for Toeplitz covariance matrices is proposed. This estimator is based on a data transformation that translates the problem of Toeplitz covariance matrix estimation to the problem of mean estimation in an approximate Gaussian regression. The resulting Toeplitz covariance matrix estimator is positive definite by construction, fully data-driven and computationally very fast. Moreover, this estimator is shown to be minimax optimal under the spectral norm for a large class of Toeplitz matrices. These results are readily extended to estimation of inverses of Toeplitz covariance matrices. Also, an alternative version of the Whittle likelihood for the spectral density based on the discrete cosine transform is proposed. The method is implemented in the R package vstdct that accompanies the paper.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139506380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Selecting and Conditioning in Multiple Testing and Selective Inference 论多重测试和选择性推理中的选择和条件限制
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-12-22 DOI: 10.1093/biomet/asad078
Jelle J Goeman, Aldo Solari
We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.
我们研究了一类以选择事件为条件的选择性推理方法。这类方法分为两个阶段。首先,从大量假设中选择一个数据驱动的假设集合。随后,在这个数据驱动的集合中,以用于选择的信息为条件进行推理。这类方法的例子包括基本的数据分割、现代的数据雕刻方法和基于多面体阶梯的套索系数选择后推理方法。在本文中,我们对此类方法采用了整体观点,将选择、调节和最终误差控制步骤视为一个方法。从这个角度出发,我们证明了直接定义于全部假设的多重检验方法总是至少与基于选择和条件的选择性推理方法一样强大。即使假设的范围可能是无限的,而且只是隐含定义的,例如在数据分割的情况下,这一结果也是成立的。我们先给出了一般理论和直觉,然后详细研究了几个案例,在这些案例中,转向非选择性或无条件视角可以获得更强的推理能力。
{"title":"On Selecting and Conditioning in Multiple Testing and Selective Inference","authors":"Jelle J Goeman, Aldo Solari","doi":"10.1093/biomet/asad078","DOIUrl":"https://doi.org/10.1093/biomet/asad078","url":null,"abstract":"We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"94 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Central limit theorems for local network statistics 本地网络统计的中心极限定理
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-12-22 DOI: 10.1093/biomet/asad080
P A Maugis
Summary Subgraph counts, in particular the number of occurrences of small shapes such as triangles, characterize properties of random networks. As a result, they have seen wide use as network summary statistics. Subgraphs are typically counted globally, making existing approaches unable to describe vertex-specific characteristics. In contrast, rooted subgraphs focus on vertex neighbourhoods, and are fundamental descriptors of local network properties. We derive the asymptotic joint distribution of rooted subgraph counts in inhomogeneous random graphs, a model which generalizes most statistical network models. This result enables a shift in the statistical analysis of graphs, from estimating network summaries, to estimating models linking local network structure and vertex-specific covariates. As an example, we consider a school friendship network and show that gender and race are significant predictors of local friendship patterns.
摘要 子图计数,尤其是三角形等小图形的出现次数,是随机网络属性的特征。因此,它们被广泛用作网络汇总统计。子图通常是全局统计的,因此现有方法无法描述特定顶点的特征。相比之下,有根子图侧重于顶点邻域,是局部网络特性的基本描述符。我们推导出了非均质随机图中有根子图计数的渐近联合分布,这一模型概括了大多数统计网络模型。这一结果使得图的统计分析从估算网络摘要转向估算连接局部网络结构和顶点特定协变量的模型。例如,我们考虑了一个学校友谊网络,结果表明性别和种族是本地友谊模式的重要预测因素。
{"title":"Central limit theorems for local network statistics","authors":"P A Maugis","doi":"10.1093/biomet/asad080","DOIUrl":"https://doi.org/10.1093/biomet/asad080","url":null,"abstract":"Summary Subgraph counts, in particular the number of occurrences of small shapes such as triangles, characterize properties of random networks. As a result, they have seen wide use as network summary statistics. Subgraphs are typically counted globally, making existing approaches unable to describe vertex-specific characteristics. In contrast, rooted subgraphs focus on vertex neighbourhoods, and are fundamental descriptors of local network properties. We derive the asymptotic joint distribution of rooted subgraph counts in inhomogeneous random graphs, a model which generalizes most statistical network models. This result enables a shift in the statistical analysis of graphs, from estimating network summaries, to estimating models linking local network structure and vertex-specific covariates. As an example, we consider a school friendship network and show that gender and race are significant predictors of local friendship patterns.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"178 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The state of cumulative sum sequential change point testing seventy years after Page 累积和顺序变化点测试七十年后的状况 Page
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-12-21 DOI: 10.1093/biomet/asad079
Alexander Aue, Claudia Kirch
Quality control charts aim at raising an alarm as soon as sequentially obtained observations of an underlying random process no longer seem to be within stochastic fluctuations prescribed by an ‘in-control’ scenario. Such random processes can often be modelled using the concept of stationarity, or even independence as in most classical works. An important out-of-control scenario is the changepoint alternative, for which the distribution of the process changes at an unknown point in time. In his seminal 1954 Biometrika paper, E. S. Page introduced the famous cumulative sum control charts for changepoint monitoring. Innovatively, decision rules based on cumulative sum procedures took the full history of the process into account, whereas previous procedures were based only on a fixed and typically small number of the most recent observations. The extreme case of using only the most recent observation, often referred to as the Shewhart chart, is more akin to serial outlier than changepoint detection. Page’s cumulative sum approach, introduced seven decades ago, is ubiquitous in modern changepoint analysis, and his original paper has led to a multitude of follow-up papers in different research communities. This review is focused on a particular subfield of this research, namely nonparametric sequential, or online, changepoint tests which are constructed to maintain a desired Type 1 error as opposed to the more traditional approach seeking to minimize the average run length of the procedures. Such tests have originated at the intersection of econometrics and statistics. We trace the development of these tests and highlight their properties, mostly using a simple location model for clarity of exposition, but also review more complex situations such as regression and time series models.
质量控制图的目的是,一旦连续获得的底层随机过程的观测结果似乎不再符合 "在控 "方案所规定的随机波动范围,就会发出警报。此类随机过程通常可以使用静止概念建模,甚至可以使用大多数经典著作中的独立概念建模。一个重要的失控情景是变化点替代方案,即过程的分布在一个未知的时间点发生变化。E. S. Page 在 1954 年发表的开创性论文《Biometrika》中,提出了著名的用于变化点监控的累积和控制图。创新性的是,基于累积总和程序的决策规则考虑到了整个过程的历史,而以前的程序仅基于固定的、通常为数不多的最新观测数据。仅使用最近观测值的极端情况通常被称为休哈特图表,它更类似于序列离群值,而非变化点检测。佩奇在七十年前提出的累积和方法在现代变化点分析中无处不在,他的原始论文在不同研究领域引发了大量后续论文。本综述的重点是这一研究的一个特殊子领域,即非参数序列或在线变化点检验,其构建目的是保持理想的 1 类误差,而不是寻求最小化程序平均运行长度的传统方法。这类检验起源于计量经济学和统计学的交叉学科。我们追溯了这些检验的发展历程,并强调了它们的特性,为了论述清晰,我们主要使用了简单的位置模型,但也回顾了回归和时间序列模型等更复杂的情况。
{"title":"The state of cumulative sum sequential change point testing seventy years after Page","authors":"Alexander Aue, Claudia Kirch","doi":"10.1093/biomet/asad079","DOIUrl":"https://doi.org/10.1093/biomet/asad079","url":null,"abstract":"\u0000 Quality control charts aim at raising an alarm as soon as sequentially obtained observations of an underlying random process no longer seem to be within stochastic fluctuations prescribed by an ‘in-control’ scenario. Such random processes can often be modelled using the concept of stationarity, or even independence as in most classical works. An important out-of-control scenario is the changepoint alternative, for which the distribution of the process changes at an unknown point in time. In his seminal 1954 Biometrika paper, E. S. Page introduced the famous cumulative sum control charts for changepoint monitoring. Innovatively, decision rules based on cumulative sum procedures took the full history of the process into account, whereas previous procedures were based only on a fixed and typically small number of the most recent observations. The extreme case of using only the most recent observation, often referred to as the Shewhart chart, is more akin to serial outlier than changepoint detection. Page’s cumulative sum approach, introduced seven decades ago, is ubiquitous in modern changepoint analysis, and his original paper has led to a multitude of follow-up papers in different research communities. This review is focused on a particular subfield of this research, namely nonparametric sequential, or online, changepoint tests which are constructed to maintain a desired Type 1 error as opposed to the more traditional approach seeking to minimize the average run length of the procedures. Such tests have originated at the intersection of econometrics and statistics. We trace the development of these tests and highlight their properties, mostly using a simple location model for clarity of exposition, but also review more complex situations such as regression and time series models.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"12 3","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138951837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: ‘A cross-validation-based statistical theory for point processes’ 更正:基于交叉验证的点过程统计理论
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-12-20 DOI: 10.1093/biomet/asad077
{"title":"Correction to: ‘A cross-validation-based statistical theory for point processes’","authors":"","doi":"10.1093/biomet/asad077","DOIUrl":"https://doi.org/10.1093/biomet/asad077","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"45 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139169267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1