首页 > 最新文献

Biometrika最新文献

英文 中文
Conformalized survival analysis with adaptive cutoffs 具有自适应截断的符合化生存分析
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-12-01 DOI: 10.1093/biomet/asad076
Yu Gui, Rohan Hore, Zhimei Ren, Rina Foygel Barber
Summary This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data.We build on recent work by Candès et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users’ active times on a mobile app.
摘要本文介绍了一种无假设的方法,该方法在剔除数据的情况下为生存时间构造有效和高效的下预测界。我们以cand等人(2021)的最新工作为基础,他们的方法首先对数据进行子集,以丢弃具有早期审查时间的任何数据点,然后使用重加权技术(即加权共形推理(Tibshirani等人,2019))来纠正该子集过程引入的分布偏移。对于我们的新方法,在对数据进行子集设置时,我们允许协变量相关和数据自适应的子集步骤,而不是约束于固定的审查时间阈值,这能够更好地捕获审查机制的异质性。因此,我们的方法可以产生更少保守的lpb,并提供更准确的信息。我们表明,在I型右审查设置中,如果审查机制或生存时间的条件分位数中的任何一个被很好地估计,我们提出的程序实现了几乎精确的边际覆盖,其中在后一种情况下,我们额外具有近似的条件覆盖。通过数值实验验证了该算法的有效性和效率,说明了与其他竞争方法相比,该算法具有优势。最后,将我们的方法应用于实际数据集,生成用户在移动应用程序上活动时间的lpb。
{"title":"Conformalized survival analysis with adaptive cutoffs","authors":"Yu Gui, Rohan Hore, Zhimei Ren, Rina Foygel Barber","doi":"10.1093/biomet/asad076","DOIUrl":"https://doi.org/10.1093/biomet/asad076","url":null,"abstract":"Summary This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds (LPBs) for survival times with censored data.We build on recent work by Candès et al. (2021), whose approach first subsets the data to discard any data points with early censoring times, and then uses a reweighting technique (namely, weighted conformal inference (Tibshirani et al., 2019)) to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to LPBs that are less conservative and give more accurate information. We show that in the Type I right-censoring setting, if either of the censoring mechanism or the conditional quantile of survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate LPBs for users’ active times on a mobile app.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"5 3","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Familial inference: Tests for hypotheses on a family of centres 家族推理:对一个中心家族的假设进行检验
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-11-28 DOI: 10.1093/biomet/asad074
Ryan Thompson, Catherine S Forbes, Steven N Maceachern, Mario Peruggia
Statistical hypotheses are translations of scientific hypotheses into statements about one or more distributions, often concerning their centre. Tests that assess statistical hypotheses of centre implicitly assume a specific centre, e.g., the mean or median. Yet, scientific hypotheses do not always specify a particular centre. This ambiguity leaves the possibility for a gap between scientific theory and statistical practice that can lead to rejection of a true null. In the face of replicability crises in many scientific disciplines, significant results of this kind are concerning. Rather than testing a single centre, this paper proposes testing a family of plausible centres, such as that induced by the Huber loss function. Each centre in the family generates a testing problem, and the resulting family of hypotheses constitutes a familial hypothesis. A Bayesian nonparametric procedure is devised to test familial hypotheses, enabled by a novel pathwise optimization routine to fit the Huber family. The favourable properties of the new test are demonstrated theoretically and experimentally. Two examples from psychology serve as real-world case studies.
统计假设是将科学假设转化为关于一个或多个分布的陈述,通常与它们的中心有关。评估中心的统计假设的检验隐含地假设一个特定的中心,例如,平均值或中位数。然而,科学假设并不总是指定一个特定的中心。这种模糊性使科学理论和统计实践之间存在差距的可能性,从而导致拒绝真正的零值。面对许多科学学科的可复制性危机,这类重大结果令人担忧。本文提出测试一系列似是而非的中心,例如由Huber损失函数引起的似是而非的中心。家族中的每个中心都会产生一个测试问题,由此产生的假设家族构成一个家族假设。设计了一个贝叶斯非参数过程来测试家族假设,通过一个新的路径优化程序来拟合Huber家族。理论和实验都证明了新方法的良好性能。心理学中的两个例子可以作为现实世界的案例研究。
{"title":"Familial inference: Tests for hypotheses on a family of centres","authors":"Ryan Thompson, Catherine S Forbes, Steven N Maceachern, Mario Peruggia","doi":"10.1093/biomet/asad074","DOIUrl":"https://doi.org/10.1093/biomet/asad074","url":null,"abstract":"Statistical hypotheses are translations of scientific hypotheses into statements about one or more distributions, often concerning their centre. Tests that assess statistical hypotheses of centre implicitly assume a specific centre, e.g., the mean or median. Yet, scientific hypotheses do not always specify a particular centre. This ambiguity leaves the possibility for a gap between scientific theory and statistical practice that can lead to rejection of a true null. In the face of replicability crises in many scientific disciplines, significant results of this kind are concerning. Rather than testing a single centre, this paper proposes testing a family of plausible centres, such as that induced by the Huber loss function. Each centre in the family generates a testing problem, and the resulting family of hypotheses constitutes a familial hypothesis. A Bayesian nonparametric procedure is devised to test familial hypotheses, enabled by a novel pathwise optimization routine to fit the Huber family. The favourable properties of the new test are demonstrated theoretically and experimentally. Two examples from psychology serve as real-world case studies.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"9 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Likelihood Estimation for Semiparametric Regression Models with Interval-Censored Multistate Data 区间截尾多态数据半参数回归模型的极大似然估计
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-11-24 DOI: 10.1093/biomet/asad073
Yu Gu, Donglin Zeng, Gerardo Heiss, D Y Lin
Summary Interval-censored multistate data arise in many studies of chronic diseases, where the health status of a subject can be characterized by a finite number of disease states and the transition between any two states is only known to occur over a broad time interval. We relate potentially time-dependent covariates to multistate processes through semiparametric proportional intensity models with random effects. We study nonparametric maximum likelihood estimation under general interval censoring and develop a stable expectation-maximization algorithm. We show that the resulting parameter estimators are consistent and that the finite-dimensional components are asymptotically normal with a covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we demonstrate through extensive simulation studies that the proposed numerical and inferential procedures perform well in realistic settings. Finally, we provide an application to a major epidemiologic cohort study.
区间删减的多状态数据出现在许多慢性病研究中,在这些研究中,受试者的健康状况可以用有限数量的疾病状态来表征,并且任何两种状态之间的转换只会在很宽的时间间隔内发生。我们通过具有随机效应的半参数比例强度模型将潜在的时变协变量与多状态过程联系起来。研究了一般区间滤波下的非参数极大似然估计,并给出了一种稳定的期望最大化算法。我们证明了所得到的参数估计量是一致的,有限维分量是渐近正态的,其协方差矩阵达到了半参数效率界,并且可以通过剖面似然一致地估计。此外,我们通过广泛的模拟研究证明,所提出的数值和推理程序在现实环境中表现良好。最后,我们提供了一个主要流行病学队列研究的应用程序。
{"title":"Maximum Likelihood Estimation for Semiparametric Regression Models with Interval-Censored Multistate Data","authors":"Yu Gu, Donglin Zeng, Gerardo Heiss, D Y Lin","doi":"10.1093/biomet/asad073","DOIUrl":"https://doi.org/10.1093/biomet/asad073","url":null,"abstract":"Summary Interval-censored multistate data arise in many studies of chronic diseases, where the health status of a subject can be characterized by a finite number of disease states and the transition between any two states is only known to occur over a broad time interval. We relate potentially time-dependent covariates to multistate processes through semiparametric proportional intensity models with random effects. We study nonparametric maximum likelihood estimation under general interval censoring and develop a stable expectation-maximization algorithm. We show that the resulting parameter estimators are consistent and that the finite-dimensional components are asymptotically normal with a covariance matrix that attains the semiparametric efficiency bound and can be consistently estimated through profile likelihood. In addition, we demonstrate through extensive simulation studies that the proposed numerical and inferential procedures perform well in realistic settings. Finally, we provide an application to a major epidemiologic cohort study.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"16 6","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On varimax asymptotics in network models and spectral methods for dimensionality reduction 网络模型的变极大渐近性及降维的谱方法
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-11-20 DOI: 10.1093/biomet/asad061
J Cape
Summary Varimax factor rotations, while popular among practitioners in psychology and statistics since being introduced by H.Kaiser, have historically been viewed with skepticism and suspicion by some theoreticians and mathematical statisticians. Now, work by K. Rohe and M. Zeng provides new, fundamental insight: varimax rotations provably perform statistical estimation in certain classes of latent variable models when paired with spectral-based matrix truncations for dimensionality reduction. We build on this new-found understanding of varimax rotations by developing further connections to network analysis and spectral methods rooted in entrywise matrix perturbation analysis. Concretely, this paper establishes the asymptotic multivariate normality of vectors in varimax-transformed Euclidean point clouds that represent low-dimensional node embeddings in certain latent space random graph models. We address related concepts including network sparsity, data denoising, and the role of matrix rank in latent variable parameterizations. Collectively, these findings, at the confluence of classical and contemporary multivariate analysis, reinforce methodology and inference procedures grounded in matrix factorization-based techniques. Numerical examples illustrate our findings and supplement our discussion.
自H.Kaiser引入变异因子旋转以来,虽然在心理学和统计学从业者中很受欢迎,但历史上一直受到一些理论家和数理统计学家的怀疑和怀疑。现在,K. Rohe和M. Zeng的工作提供了新的、基本的见解:当与基于谱的矩阵截断相结合以降低维数时,可变旋转可证明在某些类别的潜在变量模型中执行统计估计。我们通过进一步发展网络分析和基于入口矩阵摄动分析的光谱方法,建立对变差旋转的新理解。具体地说,本文建立了在某些潜在空间随机图模型中表示低维节点嵌入的变大变换欧几里得点云中向量的渐近多元正态性。我们讨论了相关的概念,包括网络稀疏性、数据去噪和矩阵秩在潜在变量参数化中的作用。总的来说,这些发现,在经典和当代多元分析的融合,加强了基于矩阵分解技术的方法和推理程序。数值例子说明了我们的发现并补充了我们的讨论。
{"title":"On varimax asymptotics in network models and spectral methods for dimensionality reduction","authors":"J Cape","doi":"10.1093/biomet/asad061","DOIUrl":"https://doi.org/10.1093/biomet/asad061","url":null,"abstract":"Summary Varimax factor rotations, while popular among practitioners in psychology and statistics since being introduced by H.Kaiser, have historically been viewed with skepticism and suspicion by some theoreticians and mathematical statisticians. Now, work by K. Rohe and M. Zeng provides new, fundamental insight: varimax rotations provably perform statistical estimation in certain classes of latent variable models when paired with spectral-based matrix truncations for dimensionality reduction. We build on this new-found understanding of varimax rotations by developing further connections to network analysis and spectral methods rooted in entrywise matrix perturbation analysis. Concretely, this paper establishes the asymptotic multivariate normality of vectors in varimax-transformed Euclidean point clouds that represent low-dimensional node embeddings in certain latent space random graph models. We address related concepts including network sparsity, data denoising, and the role of matrix rank in latent variable parameterizations. Collectively, these findings, at the confluence of classical and contemporary multivariate analysis, reinforce methodology and inference procedures grounded in matrix factorization-based techniques. Numerical examples illustrate our findings and supplement our discussion.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"19 3","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Second term improvement to generalised linear mixed model asymptotics 广义线性混合模型渐近性的二阶改进
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-11-16 DOI: 10.1093/biomet/asad072
Luca Maestrini, Aishwarya Bhaskaran, Matt P Wand
Summary A recent article on generalised linear mixed model asymptotics, Jiang et al. (2022), derived the rates of convergence for the asymptotic variances of maximum likelihood estimators. If m denotes the number of groups and n is the average within-group sample size then the asymptotic variances have orders m − 1 and (mn)−1, depending on the parameter. We extend this theory to provide explicit forms of the (mn)−1 second terms of the asymptotically harder-to-estimate parameters. Improved accuracy of statistical inference and planning are consequences of our theory.
Jiang等人(2022)最近发表了一篇关于广义线性混合模型渐近性的文章,推导了极大似然估计量渐近方差的收敛率。如果m表示组数,n是组内样本的平均值,则渐近方差的阶为m−1和(mn)−1,取决于参数。我们扩展了这一理论,给出了渐近难以估计参数的(mn)−1次项的显式形式。我们的理论提高了统计推断和规划的准确性。
{"title":"Second term improvement to generalised linear mixed model asymptotics","authors":"Luca Maestrini, Aishwarya Bhaskaran, Matt P Wand","doi":"10.1093/biomet/asad072","DOIUrl":"https://doi.org/10.1093/biomet/asad072","url":null,"abstract":"Summary A recent article on generalised linear mixed model asymptotics, Jiang et al. (2022), derived the rates of convergence for the asymptotic variances of maximum likelihood estimators. If m denotes the number of groups and n is the average within-group sample size then the asymptotic variances have orders m − 1 and (mn)−1, depending on the parameter. We extend this theory to provide explicit forms of the (mn)−1 second terms of the asymptotically harder-to-estimate parameters. Improved accuracy of statistical inference and planning are consequences of our theory.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"9 5","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discussion of 'Statistical inference for streamed longitudinal data'. 关于“纵向流数据的统计推断”的讨论。
IF 2.4 2区 数学 Q2 BIOLOGY Pub Date : 2023-11-15 eCollection Date: 2023-12-01 DOI: 10.1093/biomet/asad043
Yang Ning, Jingyi Duan
{"title":"Discussion of 'Statistical inference for streamed longitudinal data'.","authors":"Yang Ning, Jingyi Duan","doi":"10.1093/biomet/asad043","DOIUrl":"10.1093/biomet/asad043","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"110 4","pages":"867-869"},"PeriodicalIF":2.4,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10651177/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138046147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Projective Independence Tests in High Dimensions: the Curses and the Cures 高维投射独立性检验:弊与弊
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-11-15 DOI: 10.1093/biomet/asad070
Yaowu Zhang, Liping Zhu
Summary Testing independence between high dimensional random vectors is fundamentally different from testing independence between univariate random variables. Take the projection correlation as an example. It suffers from at least three issues. First, it has a high computational complexity of O{n3 (p + q)}, where n, p and q are the respective sample size and dimensions of the random vectors. This limits its usefulness substantially when n is extremely large. Second, the asymptotic null distribution of the projection correlation test is rarely tractable. Therefore, random permutations are often suggested to approximate the asymptotic null distribution. This further increases the complexity of implementing independence tests. Last, the power performance of the projection correlation test deteriorates in high dimensions. To address these issues, we improve the projection correlation through a modified weight function, which reduces the complexity to O{n2 (p + q)}. We estimate the improved projection correlation with U-statistic theory. More importantly, its asymptotic null distribution is standard normal, thanks to the high dimensions of random vectors. This expedites the implementation of independence tests substantially. To enhance power performance in high dimensions, we introduce a cross-validation procedure which incorporates feature screening with the projection correlation test. The implementation efficacy and power enhancement are confirmed through extensive numerical studies.
测试高维随机向量之间的独立性与测试单变量随机变量之间的独立性有本质的不同。以投影相关性为例。它至少有三个问题。首先,它的计算复杂度很高,为O{n3 (p + q)},其中n、p和q分别是随机向量的样本量和维数。当n非常大时,这极大地限制了它的有用性。其次,投影相关检验的渐近零分布很难处理。因此,随机排列常被用来逼近渐近零分布。这进一步增加了实现独立性测试的复杂性。最后,投影相关检验的功率性能在高维情况下会下降。为了解决这些问题,我们通过修改权函数来改进投影相关性,将复杂度降低到O{n2 (p + q)}。我们用u统计量理论估计改进后的投影相关性。更重要的是,由于随机向量的高维,它的渐近零分布是标准正态分布。这大大加快了独立性测试的实现。为了提高高维度下的功率性能,我们引入了一种交叉验证程序,该程序将特征筛选与投影相关测试相结合。通过大量的数值研究证实了该方法的实现效率和功率增强。
{"title":"Projective Independence Tests in High Dimensions: the Curses and the Cures","authors":"Yaowu Zhang, Liping Zhu","doi":"10.1093/biomet/asad070","DOIUrl":"https://doi.org/10.1093/biomet/asad070","url":null,"abstract":"Summary Testing independence between high dimensional random vectors is fundamentally different from testing independence between univariate random variables. Take the projection correlation as an example. It suffers from at least three issues. First, it has a high computational complexity of O{n3 (p + q)}, where n, p and q are the respective sample size and dimensions of the random vectors. This limits its usefulness substantially when n is extremely large. Second, the asymptotic null distribution of the projection correlation test is rarely tractable. Therefore, random permutations are often suggested to approximate the asymptotic null distribution. This further increases the complexity of implementing independence tests. Last, the power performance of the projection correlation test deteriorates in high dimensions. To address these issues, we improve the projection correlation through a modified weight function, which reduces the complexity to O{n2 (p + q)}. We estimate the improved projection correlation with U-statistic theory. More importantly, its asymptotic null distribution is standard normal, thanks to the high dimensions of random vectors. This expedites the implementation of independence tests substantially. To enhance power performance in high dimensions, we introduce a cross-validation procedure which incorporates feature screening with the projection correlation test. The implementation efficacy and power enhancement are confirmed through extensive numerical studies.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"11 6","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Discussion of ‘Statistical inference for streamed longitudinal data’ 关于 "流式纵向数据的统计推断 "的讨论
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-11-15 DOI: 10.1093/biomet/asad035
J. Wang, H. Wang, K. Chen
{"title":"Discussion of ‘Statistical inference for streamed longitudinal data’","authors":"J. Wang, H. Wang, K. Chen","doi":"10.1093/biomet/asad035","DOIUrl":"https://doi.org/10.1093/biomet/asad035","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"4 4","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139274543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discussion of ‘Statistical inference for streamed longitudinal data’ 关于 "流式纵向数据的统计推断 "的讨论
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2023-11-15 DOI: 10.1093/biomet/asad034
Peter X-K Song, Ling Zhou
{"title":"Discussion of ‘Statistical inference for streamed longitudinal data’","authors":"Peter X-K Song, Ling Zhou","doi":"10.1093/biomet/asad034","DOIUrl":"https://doi.org/10.1093/biomet/asad034","url":null,"abstract":"","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"32 9-10","pages":""},"PeriodicalIF":2.7,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139275224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized kernel two-sample tests 广义核双样本测试
2区 数学 Q2 BIOLOGY Pub Date : 2023-11-14 DOI: 10.1093/biomet/asad068
Hoseung Song, Hao Chen
Summary Kernel two-sample tests have been widely used for multivariate data to test equality of distributions. However, existing tests based on mapping distributions into a reproducing kernel Hilbert space mainly target specific alternatives and do not work well for some scenarios when the dimension of the data is moderate to high due to the curse of dimensionality. We propose a new test statistic that makes use of a common pattern under moderate and high dimensions and achieves substantial power improvements over existing kernel two-sample tests for a wide range of alternatives. We also propose alternative testing procedures that maintain high power with low computational cost, offering easy off-the-shelf tools for large datasets. The new approaches are compared to other state-of-the-art tests under various settings and show good performance. We showcase the new approaches through two applications: the comparison of musks and non-musks using the shape of molecules, and the comparison of taxi trips starting from John F. Kennedy airport in consecutive months. All proposed methods are implemented in an R package kerTests.
核二样本检验被广泛用于多变量数据的分布是否相等。然而,现有的基于将分布映射到再现内核希尔伯特空间的测试主要针对特定的替代方案,并且由于维数的诅咒,当数据的维数从中等到高时,它不能很好地工作。我们提出了一种新的测试统计量,它利用了中等和高维下的通用模式,并在广泛的替代方案中实现了对现有内核双样本测试的实质性改进。我们还提出了替代测试程序,以低计算成本保持高功率,为大型数据集提供简单的现成工具。将新方法与其他最先进的测试方法在各种设置下进行了比较,并显示出良好的性能。我们通过两个应用程序展示了新方法:使用分子形状比较麝香和非麝香,以及比较从约翰肯尼迪机场连续几个月的出租车行程。所有建议的方法都在一个R包kerTests中实现。
{"title":"Generalized kernel two-sample tests","authors":"Hoseung Song, Hao Chen","doi":"10.1093/biomet/asad068","DOIUrl":"https://doi.org/10.1093/biomet/asad068","url":null,"abstract":"Summary Kernel two-sample tests have been widely used for multivariate data to test equality of distributions. However, existing tests based on mapping distributions into a reproducing kernel Hilbert space mainly target specific alternatives and do not work well for some scenarios when the dimension of the data is moderate to high due to the curse of dimensionality. We propose a new test statistic that makes use of a common pattern under moderate and high dimensions and achieves substantial power improvements over existing kernel two-sample tests for a wide range of alternatives. We also propose alternative testing procedures that maintain high power with low computational cost, offering easy off-the-shelf tools for large datasets. The new approaches are compared to other state-of-the-art tests under various settings and show good performance. We showcase the new approaches through two applications: the comparison of musks and non-musks using the shape of molecules, and the comparison of taxi trips starting from John F. Kennedy airport in consecutive months. All proposed methods are implemented in an R package kerTests.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"114 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134957329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1