Biometrika最新文献

英文中文

On Selecting and Conditioning in Multiple Testing and Selective Inference 论多重测试和选择性推理中的选择和条件限制

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-12-22 DOI: 10.1093/biomet/asad078

Jelle J Goeman, Aldo Solari

We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.

我们研究了一类以选择事件为条件的选择性推理方法。这类方法分为两个阶段。首先，从大量假设中选择一个数据驱动的假设集合。随后，在这个数据驱动的集合中，以用于选择的信息为条件进行推理。这类方法的例子包括基本的数据分割、现代的数据雕刻方法和基于多面体阶梯的套索系数选择后推理方法。在本文中，我们对此类方法采用了整体观点，将选择、调节和最终误差控制步骤视为一个方法。从这个角度出发，我们证明了直接定义于全部假设的多重检验方法总是至少与基于选择和条件的选择性推理方法一样强大。即使假设的范围可能是无限的，而且只是隐含定义的，例如在数据分割的情况下，这一结果也是成立的。我们先给出了一般理论和直觉，然后详细研究了几个案例，在这些案例中，转向非选择性或无条件视角可以获得更强的推理能力。

{"title":"On Selecting and Conditioning in Multiple Testing and Selective Inference","authors":"Jelle J Goeman, Aldo Solari","doi":"10.1093/biomet/asad078","DOIUrl":"https://doi.org/10.1093/biomet/asad078","url":null,"abstract":"We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting, as well as modern data carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this paper, we adopt a holistic view on such methods, considering the selection, conditioning, and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We give general theory and intuitions before investigating in detail several case studies where a shift to a non-selective or unconditional perspective can yield a power gain.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"94 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139051164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Central limit theorems for local network statistics 本地网络统计的中心极限定理

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-12-22 DOI: 10.1093/biomet/asad080

P A Maugis

Summary Subgraph counts, in particular the number of occurrences of small shapes such as triangles, characterize properties of random networks. As a result, they have seen wide use as network summary statistics. Subgraphs are typically counted globally, making existing approaches unable to describe vertex-specific characteristics. In contrast, rooted subgraphs focus on vertex neighbourhoods, and are fundamental descriptors of local network properties. We derive the asymptotic joint distribution of rooted subgraph counts in inhomogeneous random graphs, a model which generalizes most statistical network models. This result enables a shift in the statistical analysis of graphs, from estimating network summaries, to estimating models linking local network structure and vertex-specific covariates. As an example, we consider a school friendship network and show that gender and race are significant predictors of local friendship patterns.

摘要子图计数，尤其是三角形等小图形的出现次数，是随机网络属性的特征。因此，它们被广泛用作网络汇总统计。子图通常是全局统计的，因此现有方法无法描述特定顶点的特征。相比之下，有根子图侧重于顶点邻域，是局部网络特性的基本描述符。我们推导出了非均质随机图中有根子图计数的渐近联合分布，这一模型概括了大多数统计网络模型。这一结果使得图的统计分析从估算网络摘要转向估算连接局部网络结构和顶点特定协变量的模型。例如，我们考虑了一个学校友谊网络，结果表明性别和种族是本地友谊模式的重要预测因素。

引用次数: 0

The state of cumulative sum sequential change point testing seventy years after Page 累积和顺序变化点测试七十年后的状况 Page

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-12-21 DOI: 10.1093/biomet/asad079

Alexander Aue, Claudia Kirch

Quality control charts aim at raising an alarm as soon as sequentially obtained observations of an underlying random process no longer seem to be within stochastic fluctuations prescribed by an ‘in-control’ scenario. Such random processes can often be modelled using the concept of stationarity, or even independence as in most classical works. An important out-of-control scenario is the changepoint alternative, for which the distribution of the process changes at an unknown point in time. In his seminal 1954 Biometrika paper, E. S. Page introduced the famous cumulative sum control charts for changepoint monitoring. Innovatively, decision rules based on cumulative sum procedures took the full history of the process into account, whereas previous procedures were based only on a fixed and typically small number of the most recent observations. The extreme case of using only the most recent observation, often referred to as the Shewhart chart, is more akin to serial outlier than changepoint detection. Page’s cumulative sum approach, introduced seven decades ago, is ubiquitous in modern changepoint analysis, and his original paper has led to a multitude of follow-up papers in different research communities. This review is focused on a particular subfield of this research, namely nonparametric sequential, or online, changepoint tests which are constructed to maintain a desired Type 1 error as opposed to the more traditional approach seeking to minimize the average run length of the procedures. Such tests have originated at the intersection of econometrics and statistics. We trace the development of these tests and highlight their properties, mostly using a simple location model for clarity of exposition, but also review more complex situations such as regression and time series models.

质量控制图的目的是，一旦连续获得的底层随机过程的观测结果似乎不再符合 "在控 "方案所规定的随机波动范围，就会发出警报。此类随机过程通常可以使用静止概念建模，甚至可以使用大多数经典著作中的独立概念建模。一个重要的失控情景是变化点替代方案，即过程的分布在一个未知的时间点发生变化。E. S. Page 在 1954 年发表的开创性论文《Biometrika》中，提出了著名的用于变化点监控的累积和控制图。创新性的是，基于累积总和程序的决策规则考虑到了整个过程的历史，而以前的程序仅基于固定的、通常为数不多的最新观测数据。仅使用最近观测值的极端情况通常被称为休哈特图表，它更类似于序列离群值，而非变化点检测。佩奇在七十年前提出的累积和方法在现代变化点分析中无处不在，他的原始论文在不同研究领域引发了大量后续论文。本综述的重点是这一研究的一个特殊子领域，即非参数序列或在线变化点检验，其构建目的是保持理想的 1 类误差，而不是寻求最小化程序平均运行长度的传统方法。这类检验起源于计量经济学和统计学的交叉学科。我们追溯了这些检验的发展历程，并强调了它们的特性，为了论述清晰，我们主要使用了简单的位置模型，但也回顾了回归和时间序列模型等更复杂的情况。

{"title":"The state of cumulative sum sequential change point testing seventy years after Page","authors":"Alexander Aue, Claudia Kirch","doi":"10.1093/biomet/asad079","DOIUrl":"https://doi.org/10.1093/biomet/asad079","url":null,"abstract":"\u0000 Quality control charts aim at raising an alarm as soon as sequentially obtained observations of an underlying random process no longer seem to be within stochastic fluctuations prescribed by an ‘in-control’ scenario. Such random processes can often be modelled using the concept of stationarity, or even independence as in most classical works. An important out-of-control scenario is the changepoint alternative, for which the distribution of the process changes at an unknown point in time. In his seminal 1954 Biometrika paper, E. S. Page introduced the famous cumulative sum control charts for changepoint monitoring. Innovatively, decision rules based on cumulative sum procedures took the full history of the process into account, whereas previous procedures were based only on a fixed and typically small number of the most recent observations. The extreme case of using only the most recent observation, often referred to as the Shewhart chart, is more akin to serial outlier than changepoint detection. Page’s cumulative sum approach, introduced seven decades ago, is ubiquitous in modern changepoint analysis, and his original paper has led to a multitude of follow-up papers in different research communities. This review is focused on a particular subfield of this research, namely nonparametric sequential, or online, changepoint tests which are constructed to maintain a desired Type 1 error as opposed to the more traditional approach seeking to minimize the average run length of the procedures. Such tests have originated at the intersection of econometrics and statistics. We trace the development of these tests and highlight their properties, mostly using a simple location model for clarity of exposition, but also review more complex situations such as regression and time series models.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"12 3","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138951837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction to: ‘A cross-validation-based statistical theory for point processes’ 更正：基于交叉验证的点过程统计理论

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-12-20 DOI: 10.1093/biomet/asad077

引用次数: 0

Phylogenetic Association Analysis with Conditional Rank Correlation 基于条件秩相关的系统发育关联分析

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-12-01 DOI: 10.1093/biomet/asad075

Shulei Wang, Bo Yuan, T Tony Cai, Hongzhe Li

Summary Phylogenetic association analysis plays a crucial role in investigating the correlation between microbial compositions and specific outcomes of interest in microbiome studies. However, existing methods for testing such associations have limitations related to the assumption of a linear association in high-dimensional settings and the handling of confounding effects. Therefore, there is a need for methods capable of characterizing complex associations, including nonmonotonic relationships. This paper introduces a novel phylogenetic association analysis framework and associated tests to address these challenges by employing conditional rank correlation as a measure of association. These tests account for confounders in a fully nonparametric manner, ensuring robustness against outliers and the ability to detect diverse dependencies. The proposed framework aggregates conditional rank correlations for subtrees using a weighted sum and maximum approach to capture both dense and sparse signals. The significance level of the test statistics is determined by calibrating through a nearest neighbour bootstrapping method, which is straightforward to implement and can accommodate additional datasets when available. The practical advantages of the proposed framework are demonstrated through numerical experiments utilizing both simulated and real microbiome datasets.

在微生物组研究中，系统发育关联分析在研究微生物组成与特定结果之间的相关性方面起着至关重要的作用。然而，测试这种关联的现有方法存在与高维环境下线性关联假设和混淆效应处理相关的局限性。因此，需要能够表征复杂关联的方法，包括非单调关系。本文介绍了一种新的系统发育关联分析框架和相关测试，通过使用条件等级相关作为关联度量来解决这些挑战。这些测试以完全非参数的方式考虑混杂因素，确保对异常值的鲁棒性和检测不同依赖关系的能力。所提出的框架使用加权和和最大化方法聚合子树的条件秩相关性，以捕获密集和稀疏信号。测试统计数据的显著性水平是通过最近邻自举方法校准确定的，该方法易于实现，并且可以在可用时容纳额外的数据集。通过利用模拟和真实微生物组数据集的数值实验证明了所提出框架的实际优势。

{"title":"Phylogenetic Association Analysis with Conditional Rank Correlation","authors":"Shulei Wang, Bo Yuan, T Tony Cai, Hongzhe Li","doi":"10.1093/biomet/asad075","DOIUrl":"https://doi.org/10.1093/biomet/asad075","url":null,"abstract":"Summary Phylogenetic association analysis plays a crucial role in investigating the correlation between microbial compositions and specific outcomes of interest in microbiome studies. However, existing methods for testing such associations have limitations related to the assumption of a linear association in high-dimensional settings and the handling of confounding effects. Therefore, there is a need for methods capable of characterizing complex associations, including nonmonotonic relationships. This paper introduces a novel phylogenetic association analysis framework and associated tests to address these challenges by employing conditional rank correlation as a measure of association. These tests account for confounders in a fully nonparametric manner, ensuring robustness against outliers and the ability to detect diverse dependencies. The proposed framework aggregates conditional rank correlations for subtrees using a weighted sum and maximum approach to capture both dense and sparse signals. The significance level of the test statistics is determined by calibrating through a nearest neighbour bootstrapping method, which is straightforward to implement and can accommodate additional datasets when available. The practical advantages of the proposed framework are demonstrated through numerical experiments utilizing both simulated and real microbiome datasets.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"15 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Conformalized survival analysis with adaptive cutoffs 具有自适应截断的符合化生存分析

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-12-01 DOI: 10.1093/biomet/asad076

Yu Gui, Rohan Hore, Zhimei Ren, Rina Foygel Barber

Summary This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data.We build on recent work by Candès et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users’ active times on a mobile app.

摘要本文介绍了一种无假设的方法，该方法在剔除数据的情况下为生存时间构造有效和高效的下预测界。我们以cand等人(2021)的最新工作为基础，他们的方法首先对数据进行子集，以丢弃具有早期审查时间的任何数据点，然后使用重加权技术(即加权共形推理(Tibshirani等人，2019))来纠正该子集过程引入的分布偏移。对于我们的新方法，在对数据进行子集设置时，我们允许协变量相关和数据自适应的子集步骤，而不是约束于固定的审查时间阈值，这能够更好地捕获审查机制的异质性。因此，我们的方法可以产生更少保守的lpb，并提供更准确的信息。我们表明，在I型右审查设置中，如果审查机制或生存时间的条件分位数中的任何一个被很好地估计，我们提出的程序实现了几乎精确的边际覆盖，其中在后一种情况下，我们额外具有近似的条件覆盖。通过数值实验验证了该算法的有效性和效率，说明了与其他竞争方法相比，该算法具有优势。最后，将我们的方法应用于实际数据集，生成用户在移动应用程序上活动时间的lpb。

{"title":"Conformalized survival analysis with adaptive cutoffs","authors":"Yu Gui, Rohan Hore, Zhimei Ren, Rina Foygel Barber","doi":"10.1093/biomet/asad076","DOIUrl":"https://doi.org/10.1093/biomet/asad076","url":null,"abstract":"Summary This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data.We build on recent work by Candès et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users’ active times on a mobile app.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"5 3","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Familial inference: Tests for hypotheses on a family of centres 家族推理:对一个中心家族的假设进行检验

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-11-28 DOI: 10.1093/biomet/asad074

Ryan Thompson, Catherine S Forbes, Steven N Maceachern, Mario Peruggia

Statistical hypotheses are translations of scientific hypotheses into statements about one or more distributions, often concerning their centre. Tests that assess statistical hypotheses of centre implicitly assume a specific centre, e.g., the mean or median. Yet, scientific hypotheses do not always specify a particular centre. This ambiguity leaves the possibility for a gap between scientific theory and statistical practice that can lead to rejection of a true null. In the face of replicability crises in many scientific disciplines, significant results of this kind are concerning. Rather than testing a single centre, this paper proposes testing a family of plausible centres, such as that induced by the Huber loss function. Each centre in the family generates a testing problem, and the resulting family of hypotheses constitutes a familial hypothesis. A Bayesian nonparametric procedure is devised to test familial hypotheses, enabled by a novel pathwise optimization routine to fit the Huber family. The favourable properties of the new test are demonstrated theoretically and experimentally. Two examples from psychology serve as real-world case studies.

统计假设是将科学假设转化为关于一个或多个分布的陈述，通常与它们的中心有关。评估中心的统计假设的检验隐含地假设一个特定的中心，例如，平均值或中位数。然而，科学假设并不总是指定一个特定的中心。这种模糊性使科学理论和统计实践之间存在差距的可能性，从而导致拒绝真正的零值。面对许多科学学科的可复制性危机，这类重大结果令人担忧。本文提出测试一系列似是而非的中心，例如由Huber损失函数引起的似是而非的中心。家族中的每个中心都会产生一个测试问题，由此产生的假设家族构成一个家族假设。设计了一个贝叶斯非参数过程来测试家族假设，通过一个新的路径优化程序来拟合Huber家族。理论和实验都证明了新方法的良好性能。心理学中的两个例子可以作为现实世界的案例研究。

引用次数: 0

Maximum Likelihood Estimation for Semiparametric Regression Models with Interval-Censored Multistate Data 区间截尾多态数据半参数回归模型的极大似然估计

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-11-24 DOI: 10.1093/biomet/asad073

Yu Gu, Donglin Zeng, Gerardo Heiss, D Y Lin

Summary Interval-censored multistate data arise in many studies of chronic diseases, where the health status of a subject can be characterized by a finite number of disease states and the transition between any two states is only known to occur over a broad time interval. We relate potentially time-dependent covariates to multistate processes through semiparametric proportional intensity models with random effects. We study nonparametric maximum likelihood estimation under general interval censoring and develop a stable expectation-maximization algorithm. We show that the resulting parameter estimators are consistent and that the finite-dimensional components are asymptotically normal with a covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we demonstrate through extensive simulation studies that the proposed numerical and inferential procedures perform well in realistic settings. Finally, we provide an application to a major epidemiologic cohort study.

区间删减的多状态数据出现在许多慢性病研究中，在这些研究中，受试者的健康状况可以用有限数量的疾病状态来表征，并且任何两种状态之间的转换只会在很宽的时间间隔内发生。我们通过具有随机效应的半参数比例强度模型将潜在的时变协变量与多状态过程联系起来。研究了一般区间滤波下的非参数极大似然估计，并给出了一种稳定的期望最大化算法。我们证明了所得到的参数估计量是一致的，有限维分量是渐近正态的，其协方差矩阵达到了半参数效率界，并且可以通过剖面似然一致地估计。此外，我们通过广泛的模拟研究证明，所提出的数值和推理程序在现实环境中表现良好。最后，我们提供了一个主要流行病学队列研究的应用程序。

引用次数: 0

On varimax asymptotics in network models and spectral methods for dimensionality reduction 网络模型的变极大渐近性及降维的谱方法

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-11-20 DOI: 10.1093/biomet/asad061

J Cape

Summary Varimax factor rotations, while popular among practitioners in psychology and statistics since being introduced by H.Kaiser, have historically been viewed with skepticism and suspicion by some theoreticians and mathematical statisticians. Now, work by K. Rohe and M. Zeng provides new, fundamental insight: varimax rotations provably perform statistical estimation in certain classes of latent variable models when paired with spectral-based matrix truncations for dimensionality reduction. We build on this new-found understanding of varimax rotations by developing further connections to network analysis and spectral methods rooted in entrywise matrix perturbation analysis. Concretely, this paper establishes the asymptotic multivariate normality of vectors in varimax-transformed Euclidean point clouds that represent low-dimensional node embeddings in certain latent space random graph models. We address related concepts including network sparsity, data denoising, and the role of matrix rank in latent variable parameterizations. Collectively, these findings, at the confluence of classical and contemporary multivariate analysis, reinforce methodology and inference procedures grounded in matrix factorization-based techniques. Numerical examples illustrate our findings and supplement our discussion.

自H.Kaiser引入变异因子旋转以来，虽然在心理学和统计学从业者中很受欢迎，但历史上一直受到一些理论家和数理统计学家的怀疑和怀疑。现在，K. Rohe和M. Zeng的工作提供了新的、基本的见解:当与基于谱的矩阵截断相结合以降低维数时，可变旋转可证明在某些类别的潜在变量模型中执行统计估计。我们通过进一步发展网络分析和基于入口矩阵摄动分析的光谱方法，建立对变差旋转的新理解。具体地说，本文建立了在某些潜在空间随机图模型中表示低维节点嵌入的变大变换欧几里得点云中向量的渐近多元正态性。我们讨论了相关的概念，包括网络稀疏性、数据去噪和矩阵秩在潜在变量参数化中的作用。总的来说，这些发现，在经典和当代多元分析的融合，加强了基于矩阵分解技术的方法和推理程序。数值例子说明了我们的发现并补充了我们的讨论。

{"title":"On varimax asymptotics in network models and spectral methods for dimensionality reduction","authors":"J Cape","doi":"10.1093/biomet/asad061","DOIUrl":"https://doi.org/10.1093/biomet/asad061","url":null,"abstract":"Summary Varimax factor rotations, while popular among practitioners in psychology and statistics since being introduced by H.Kaiser, have historically been viewed with skepticism and suspicion by some theoreticians and mathematical statisticians. Now, work by K. Rohe and M. Zeng provides new, fundamental insight: varimax rotations provably perform statistical estimation in certain classes of latent variable models when paired with spectral-based matrix truncations for dimensionality reduction. We build on this new-found understanding of varimax rotations by developing further connections to network analysis and spectral methods rooted in entrywise matrix perturbation analysis. Concretely, this paper establishes the asymptotic multivariate normality of vectors in varimax-transformed Euclidean point clouds that represent low-dimensional node embeddings in certain latent space random graph models. We address related concepts including network sparsity, data denoising, and the role of matrix rank in latent variable parameterizations. Collectively, these findings, at the confluence of classical and contemporary multivariate analysis, reinforce methodology and inference procedures grounded in matrix factorization-based techniques. Numerical examples illustrate our findings and supplement our discussion.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"19 3","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Second term improvement to generalised linear mixed model asymptotics 广义线性混合模型渐近性的二阶改进

IF 2.7 2区数学 Q2 BIOLOGY

Biometrika

Pub Date : 2023-11-16 DOI: 10.1093/biomet/asad072

Luca Maestrini, Aishwarya Bhaskaran, Matt P Wand

Summary A recent article on generalised linear mixed model asymptotics, Jiang et al. (2022), derived the rates of convergence for the asymptotic variances of maximum likelihood estimators. If m denotes the number of groups and n is the average within-group sample size then the asymptotic variances have orders m − 1 and (mn)−1, depending on the parameter. We extend this theory to provide explicit forms of the (mn)−1 second terms of the asymptotically harder-to-estimate parameters. Improved accuracy of statistical inference and planning are consequences of our theory.

Jiang等人(2022)最近发表了一篇关于广义线性混合模型渐近性的文章，推导了极大似然估计量渐近方差的收敛率。如果m表示组数，n是组内样本的平均值，则渐近方差的阶为m−1和(mn)−1，取决于参数。我们扩展了这一理论，给出了渐近难以估计参数的(mn)−1次项的显式形式。我们的理论提高了统计推断和规划的准确性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Biometrika

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀