首页 > 最新文献

Biometrika最新文献

英文 中文
Studies in the history of probability and statistics, LI: the first conditional logistic regression 概率论与统计学史研究,LI:第一个条件逻辑回归
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-08-09 DOI: 10.1093/biomet/asae038
J A Hanley
Statisticians and epidemiologists generally cite the publications by Prentice & Breslow and by Breslow et al. in 1978 as the first description and use of conditional logistic regression, while economists cite the 1973 book chapter by Nobel laureate McFadden. We describe the until-now-unrecognized use of, and way of fitting, this model in 1934 by Lionel Penrose and Ronald Fisher.
统计学家和流行病学家一般将 Prentice & Breslow 和 Breslow 等人 1978 年发表的文章作为条件对数回归的首次描述和使用,而经济学家则引用诺贝尔奖得主麦克法登 1973 年在书中的章节。我们描述的是莱昂内尔-彭罗斯和罗纳德-费舍尔在 1934 年对这一模型的使用和拟合方法,直到现在还未得到认可。
{"title":"Studies in the history of probability and statistics, LI: the first conditional logistic regression","authors":"J A Hanley","doi":"10.1093/biomet/asae038","DOIUrl":"https://doi.org/10.1093/biomet/asae038","url":null,"abstract":"Statisticians and epidemiologists generally cite the publications by Prentice & Breslow and by Breslow et al. in 1978 as the first description and use of conditional logistic regression, while economists cite the 1973 book chapter by Nobel laureate McFadden. We describe the until-now-unrecognized use of, and way of fitting, this model in 1934 by Lionel Penrose and Ronald Fisher.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"116 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Covariate-Balancing Method in Learning Optimal Individualized Treatment Regimes 学习最佳个性化治疗方案的稳健协变量平衡法
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-07-17 DOI: 10.1093/biomet/asae036
Canhui Li, Donglin Zeng, Wensheng Zhu
Summary One of the most important problems in precision medicine is to find the optimal individualized treatment rule, which is designed to recommend treatment decisions and maximize overall clinical benefit to patients based on their individual characteristics. Typically, the expected clinical outcome is required to be estimated first, in which an outcome regression model or a propensity score model usually needs to be assumed for most of the existing statistical methods. However, if either model assumption is invalid, the estimated treatment regime is not reliable. In this article, we first define a contrast value function, which is the basis of the study for individualized treatment regimes. Then we construct a hybrid estimator of the contrast value function, by combining two types of estimation methods. We further propose a robust covariate-balancing estimator of the contrast value function by combining the inverse probability weighted method and matching method, which is based on the covariate balancing propensity score proposed by Imai and Ratkovic (2014). Theoretical results show that the proposed estimator is doubly robust, that is, it is consistent if either the propensity score model or the matching is correct. Based on a large number of simulation studies, we demonstrate that the proposed estimator outperforms existing methods. Lastly, the proposed method is illustrated through analysis of the SUPPORT study.
摘要 精准医疗中最重要的问题之一是找到最佳个体化治疗规则,该规则旨在根据患者的个体特征推荐治疗决策,并使患者的总体临床获益最大化。通常情况下,首先需要估计预期临床结果,在此过程中,大多数现有统计方法通常需要假设结果回归模型或倾向评分模型。然而,如果任一模型假设无效,估计出的治疗方案就不可靠。在本文中,我们首先定义了对比值函数,这是研究个体化治疗方案的基础。然后,我们结合两种估计方法,构建了对比值函数的混合估计器。我们进一步结合反概率加权法和匹配法,在 Imai 和 Ratkovic(2014 年)提出的共变平衡倾向得分的基础上,提出了一种稳健的共变平衡对比值函数估计器。理论结果表明,所提出的估计器具有双重稳健性,即如果倾向得分模型或匹配正确,则估计器是一致的。基于大量的模拟研究,我们证明了所提出的估计方法优于现有方法。最后,我们通过对 SUPPORT 研究的分析来说明所提出的方法。
{"title":"Robust Covariate-Balancing Method in Learning Optimal Individualized Treatment Regimes","authors":"Canhui Li, Donglin Zeng, Wensheng Zhu","doi":"10.1093/biomet/asae036","DOIUrl":"https://doi.org/10.1093/biomet/asae036","url":null,"abstract":"Summary One of the most important problems in precision medicine is to find the optimal individualized treatment rule, which is designed to recommend treatment decisions and maximize overall clinical benefit to patients based on their individual characteristics. Typically, the expected clinical outcome is required to be estimated first, in which an outcome regression model or a propensity score model usually needs to be assumed for most of the existing statistical methods. However, if either model assumption is invalid, the estimated treatment regime is not reliable. In this article, we first define a contrast value function, which is the basis of the study for individualized treatment regimes. Then we construct a hybrid estimator of the contrast value function, by combining two types of estimation methods. We further propose a robust covariate-balancing estimator of the contrast value function by combining the inverse probability weighted method and matching method, which is based on the covariate balancing propensity score proposed by Imai and Ratkovic (2014). Theoretical results show that the proposed estimator is doubly robust, that is, it is consistent if either the propensity score model or the matching is correct. Based on a large number of simulation studies, we demonstrate that the proposed estimator outperforms existing methods. Lastly, the proposed method is illustrated through analysis of the SUPPORT study.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"337 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141740777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal inference with hidden mediators 隐性中介的因果推断
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-07-13 DOI: 10.1093/biomet/asae037
AmirEmad Ghassami, Alan Yang, Ilya Shpitser, Eric Tchetgen Tchetgen
Summary Proximal causal inference was recently proposed as a framework to identify causal effects from observational data in the presence of hidden confounders for which proxies are available. In this paper, we extend the proximal causal inference approach to settings where identification of causal effects hinges upon a set of mediators which are not observed, yet error prone proxies of the hidden mediators are measured. Specifically, (i) we establish causal hidden mediation analysis, which extends classical causal mediation analysis methods for identifying natural direct and indirect effects under no unmeasured confounding to a setting where the mediator of interest is hidden, but proxies of it are available. (ii) We establish a hidden front-door criterion, which extends the classical front-door criterion to allow for hidden mediators for which proxies are available. (iii) We show that the identification of a certain causal effect called population intervention indirect effect remains possible with hidden mediators in settings where challenges in (i) and (ii) might co-exist. We view (i)-(iii) as important steps towards the practical application of front-door criteria and mediation analysis as mediators are almost always measured with error and thus, the most one can hope for in practice is that the measurements are at best proxies of mediating mechanisms. We propose identification approaches for the parameters of interest in our considered models. For the estimation aspect, we propose an influence function-based estimation method and provide an analysis for the robustness of the estimators.
摘要 近因推断是最近提出的一个框架,用于在存在可替代的隐藏混杂因素的情况下,从观测数据中识别因果效应。在本文中,我们将近端因果推理方法扩展到因果效应的识别取决于一组未被观测到的中介因子,但测量了隐藏中介因子的易错替代物的情况。具体来说,(i) 我们建立了因果隐性中介分析法,它将经典的因果中介分析法扩展到了在没有未测量混杂因素的情况下识别自然直接和间接效应的方法,在这种情况下,所关注的中介因素是隐性的,但可以得到其替代物。(ii) 我们建立了一个隐藏的前门标准,该标准扩展了经典的前门标准,允许存在替代物的隐藏中介。(iii) 我们证明,在(i)和(ii)中的挑战可能同时存在的情况下,利用隐藏的中介因素仍有可能识别出某种因果效应,即人口干预间接效应。我们认为(i)-(iii)是前门标准和中介分析实际应用的重要步骤,因为中介因子的测量几乎总是有误差的,因此,在实践中我们最多只能希望测量结果是中介机制的替代物。我们为所考虑模型中的相关参数提出了识别方法。在估计方面,我们提出了一种基于影响函数的估计方法,并对估计值的稳健性进行了分析。
{"title":"Causal inference with hidden mediators","authors":"AmirEmad Ghassami, Alan Yang, Ilya Shpitser, Eric Tchetgen Tchetgen","doi":"10.1093/biomet/asae037","DOIUrl":"https://doi.org/10.1093/biomet/asae037","url":null,"abstract":"Summary Proximal causal inference was recently proposed as a framework to identify causal effects from observational data in the presence of hidden confounders for which proxies are available. In this paper, we extend the proximal causal inference approach to settings where identification of causal effects hinges upon a set of mediators which are not observed, yet error prone proxies of the hidden mediators are measured. Specifically, (i) we establish causal hidden mediation analysis, which extends classical causal mediation analysis methods for identifying natural direct and indirect effects under no unmeasured confounding to a setting where the mediator of interest is hidden, but proxies of it are available. (ii) We establish a hidden front-door criterion, which extends the classical front-door criterion to allow for hidden mediators for which proxies are available. (iii) We show that the identification of a certain causal effect called population intervention indirect effect remains possible with hidden mediators in settings where challenges in (i) and (ii) might co-exist. We view (i)-(iii) as important steps towards the practical application of front-door criteria and mediation analysis as mediators are almost always measured with error and thus, the most one can hope for in practice is that the measurements are at best proxies of mediating mechanisms. We propose identification approaches for the parameters of interest in our considered models. For the estimation aspect, we propose an influence function-based estimation method and provide an analysis for the robustness of the estimators.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"249 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
More Power by Using Fewer Permutations 用更少的排列组合获得更大的能量
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-07-10 DOI: 10.1093/biomet/asae031
Nick W Koning
Summary It is conventionally believed that permutation-based testing methods should ideally use all permutations. We challenge this by showing we can sometimes obtain dramatically more power by using a tiny subgroup. As the subgroup is tiny, this also comes at a much lower computational cost. Moreover, the method remains valid for the same hypotheses. We exploit this to improve the popular permutation-based Westfall & Young MaxT multiple testing method. We analyze the relative efficiency in a Gaussian location model, and find the largest gain in high dimensions.
摘要 传统观点认为,基于排列的检验方法最好使用所有排列。我们对这一观点提出了质疑,因为我们发现有时使用一个很小的子群就能获得更强的能力。由于子群很小,因此计算成本也低得多。此外,这种方法对相同的假设依然有效。我们利用这一点改进了流行的基于置换的 Westfall & Young MaxT 多重检验方法。我们分析了高斯位置模型中的相对效率,发现在高维度中的收益最大。
{"title":"More Power by Using Fewer Permutations","authors":"Nick W Koning","doi":"10.1093/biomet/asae031","DOIUrl":"https://doi.org/10.1093/biomet/asae031","url":null,"abstract":"Summary It is conventionally believed that permutation-based testing methods should ideally use all permutations. We challenge this by showing we can sometimes obtain dramatically more power by using a tiny subgroup. As the subgroup is tiny, this also comes at a much lower computational cost. Moreover, the method remains valid for the same hypotheses. We exploit this to improve the popular permutation-based Westfall & Young MaxT multiple testing method. We analyze the relative efficiency in a Gaussian location model, and find the largest gain in high dimensions.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"377 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141585906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing Independence for Sparse Longitudinal Data 测试稀疏纵向数据的独立性
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-07-08 DOI: 10.1093/biomet/asae035
Changbo Zhu, Junwen Yao, Jane-Ling Wang
Summary With the advance of science and technology, more and more data are collected in the form of functions. A fundamental question for a pair of random functions is to test whether they are independent. This problem becomes quite challenging when the random trajectories are sampled irregularly and sparsely for each subject. In other words, each random function is only sampled at a few time-points, and these time-points vary with subjects. Furthermore, the observed data may contain noise. To the best of our knowledge, there exists no consistent test in the literature to test the independence of sparsely observed functional data. We show in this work that testing pointwise independence simultaneously is feasible. The test statistics are constructed by integrating pointwise distance covariances (Székely et al., 2007) and are shown to converge, at a certain rate, to their corresponding population counterparts, which characterize the simultaneous pointwise independence of two random functions. The performance of the proposed methods is further verified by Monte Carlo simulations and analysis of real data.
摘要 随着科学技术的发展,越来越多的数据以函数的形式被收集起来。一对随机函数的基本问题是测试它们是否独立。如果对每个受试者的随机轨迹进行不规则的稀疏采样,这个问题就变得相当具有挑战性。换句话说,每个随机函数只在几个时间点上采样,而这些时间点会随着受试者的不同而变化。此外,观察到的数据可能包含噪声。据我们所知,文献中没有一致的测试方法来测试稀疏观测功能数据的独立性。我们在这项工作中证明,同时测试点独立性是可行的。测试统计量是通过积分点距协方差(Székely et al.蒙特卡罗模拟和真实数据分析进一步验证了所提方法的性能。
{"title":"Testing Independence for Sparse Longitudinal Data","authors":"Changbo Zhu, Junwen Yao, Jane-Ling Wang","doi":"10.1093/biomet/asae035","DOIUrl":"https://doi.org/10.1093/biomet/asae035","url":null,"abstract":"Summary With the advance of science and technology, more and more data are collected in the form of functions. A fundamental question for a pair of random functions is to test whether they are independent. This problem becomes quite challenging when the random trajectories are sampled irregularly and sparsely for each subject. In other words, each random function is only sampled at a few time-points, and these time-points vary with subjects. Furthermore, the observed data may contain noise. To the best of our knowledge, there exists no consistent test in the literature to test the independence of sparsely observed functional data. We show in this work that testing pointwise independence simultaneously is feasible. The test statistics are constructed by integrating pointwise distance covariances (Székely et al., 2007) and are shown to converge, at a certain rate, to their corresponding population counterparts, which characterize the simultaneous pointwise independence of two random functions. The performance of the proposed methods is further verified by Monte Carlo simulations and analysis of real data.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"18 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semiparametric efficiency gains from parametric restrictions on propensity scores 倾向分数参数限制带来的半参数效率收益
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-07-06 DOI: 10.1093/biomet/asae034
Haruki Kono
Summary We explore how much knowing a parametric restriction on propensity scores improves semiparametric efficiency bounds in the potential outcome framework. For stratified propensity scores, considered as a parametric model, we derive explicit formulas for the efficiency gain from knowing how the covariate space is split. Based on these, we find that the efficiency gain decreases as the partition of the stratification becomes finer. For general parametric models, where it is hard to obtain explicit representations of efficiency bounds, we propose a novel framework that enables us to see whether knowing a parametric model is valuable in terms of efficiency even when it is high-dimensional. In addition to the intuitive fact that knowing the parametric model does not help much if it is sufficiently flexible, we discover that the efficiency gain can be nearly zero even though the parametric assumption significantly restricts the space of possible propensity scores.
摘要 我们探讨了在潜在结果框架下,了解倾向得分的参数限制对半参数效率约束的改善程度。对于被视为参数模型的分层倾向得分,我们推导出了明确的公式,说明了解协变量空间的分割方式对效率的提高有多大。在此基础上,我们发现效率增益会随着分层分割的细化而降低。对于一般的参数模型,很难获得效率边界的明确表示,我们提出了一个新颖的框架,使我们能够了解即使是高维的参数模型,知道它在效率方面是否有价值。如果参数模型足够灵活,那么了解参数模型并不会有太大帮助,除了这一直观事实外,我们还发现,即使参数假设极大地限制了可能的倾向得分空间,效率收益也可能几乎为零。
{"title":"Semiparametric efficiency gains from parametric restrictions on propensity scores","authors":"Haruki Kono","doi":"10.1093/biomet/asae034","DOIUrl":"https://doi.org/10.1093/biomet/asae034","url":null,"abstract":"Summary We explore how much knowing a parametric restriction on propensity scores improves semiparametric efficiency bounds in the potential outcome framework. For stratified propensity scores, considered as a parametric model, we derive explicit formulas for the efficiency gain from knowing how the covariate space is split. Based on these, we find that the efficiency gain decreases as the partition of the stratification becomes finer. For general parametric models, where it is hard to obtain explicit representations of efficiency bounds, we propose a novel framework that enables us to see whether knowing a parametric model is valuable in terms of efficiency even when it is high-dimensional. In addition to the intuitive fact that knowing the parametric model does not help much if it is sufficiently flexible, we discover that the efficiency gain can be nearly zero even though the parametric assumption significantly restricts the space of possible propensity scores.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"22 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Debiasing Welch’s Method for Spectral Density Estimation 用于频谱密度估计的去偏差韦尔奇方法
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-07-01 DOI: 10.1093/biomet/asae033
Lachlan C Astfalck, Adam M Sykulski, Edward J Cripps
Summary Welch’s method provides an estimator of the power spectral density that is statistically consistent. This is achieved by averaging over periodograms calculated from overlapping segments of a time series. For a finite length time series, while the variance of the estimator decreases as the number of segments increase, the magnitude of the estimator’s bias increases: a bias-variance trade-off ensues when setting the segment number. We address this issue by providing a novel method for debiasing Welch’s method which maintains the computational complexity and asymptotic consistency, and leads to improved finite-sample performance. Theoretical results are given for fourth-order stationary processes with finite fourth-order moments and absolutely convergent fourth-order cumulant function. The significant bias reduction is demonstrated with numerical simulation and an application to real-world data. Our estimator also permits irregular spacing over frequency and we demonstrate how this may be employed for signal compression and further variance reduction. Code accompanying this work is available in R and python.
摘要 韦尔奇方法提供了一种统计上一致的功率谱密度估算器。这是通过对时间序列的重叠片段计算出的周期图求取平均值来实现的。对于有限长度的时间序列,虽然估计器的方差会随着分段数的增加而减小,但估计器的偏差幅度却会增大:在设定分段数时,偏差与方差之间会产生权衡。为了解决这个问题,我们提供了一种新的韦尔奇去偏方法,这种方法既能保持计算复杂性和渐进一致性,又能改善有限样本性能。该方法给出了具有有限四阶矩和绝对收敛四阶累积函数的四阶静止过程的理论结果。通过数值模拟和实际数据的应用,证明了偏差的显著减少。我们的估计器还允许频率上的不规则间隔,并演示了如何将其用于信号压缩和进一步降低方差。本研究的相关代码采用 R 和 python 语言编写。
{"title":"Debiasing Welch’s Method for Spectral Density Estimation","authors":"Lachlan C Astfalck, Adam M Sykulski, Edward J Cripps","doi":"10.1093/biomet/asae033","DOIUrl":"https://doi.org/10.1093/biomet/asae033","url":null,"abstract":"Summary Welch’s method provides an estimator of the power spectral density that is statistically consistent. This is achieved by averaging over periodograms calculated from overlapping segments of a time series. For a finite length time series, while the variance of the estimator decreases as the number of segments increase, the magnitude of the estimator’s bias increases: a bias-variance trade-off ensues when setting the segment number. We address this issue by providing a novel method for debiasing Welch’s method which maintains the computational complexity and asymptotic consistency, and leads to improved finite-sample performance. Theoretical results are given for fourth-order stationary processes with finite fourth-order moments and absolutely convergent fourth-order cumulant function. The significant bias reduction is demonstrated with numerical simulation and an application to real-world data. Our estimator also permits irregular spacing over frequency and we demonstrate how this may be employed for signal compression and further variance reduction. Code accompanying this work is available in R and python.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"7 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141517612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing serial dependence or cross dependence for time series with underreporting 测试有漏报的时间序列的序列依赖性或交叉依赖性
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-06-22 DOI: 10.1093/biomet/asae027
Keyao Wei, Lengyang Wang, Yingcun Xia
In practice, it is common for collected data to be underreported, which is particularly prevalent in fields such as social sciences, ecology and epidemiology. Drawing inferences from such data using conventional statistical methods can lead to incorrect conclusions. In this paper, we study tests for serial or cross dependence in time series data that are subject to underreporting. We introduce new test statistics, develop corresponding group-of-blocks bootstrap techniques, and establish their consistency. The methods are shown to be efficient by simulation and are used to identify key factors responsible for the spread of dengue fever and the occurrence of cardiovascular disease.
在实践中,收集到的数据被漏报是很常见的现象,这在社会科学、生态学和流行病学等领域尤为普遍。使用传统统计方法对此类数据进行推断可能会得出错误的结论。在本文中,我们研究了受漏报影响的时间序列数据中的序列或交叉依赖性检验。我们引入了新的检验统计量,开发了相应的块组引导技术,并确定了它们的一致性。通过模拟证明了这些方法的有效性,并将其用于确定登革热传播和心血管疾病发生的关键因素。
{"title":"Testing serial dependence or cross dependence for time series with underreporting","authors":"Keyao Wei, Lengyang Wang, Yingcun Xia","doi":"10.1093/biomet/asae027","DOIUrl":"https://doi.org/10.1093/biomet/asae027","url":null,"abstract":"In practice, it is common for collected data to be underreported, which is particularly prevalent in fields such as social sciences, ecology and epidemiology. Drawing inferences from such data using conventional statistical methods can lead to incorrect conclusions. In this paper, we study tests for serial or cross dependence in time series data that are subject to underreporting. We introduce new test statistics, develop corresponding group-of-blocks bootstrap techniques, and establish their consistency. The methods are shown to be efficient by simulation and are used to identify key factors responsible for the spread of dengue fever and the occurrence of cardiovascular disease.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"197 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Rank-Based Sequential Test of Independence 基于等级的独立性序列检验
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-05-13 DOI: 10.1093/biomet/asae023
Alexander Henzi, Michael Law
Summary We consider the problem of independence testing for two univariate random variables in a sequential setting. By leveraging recent developments on safe, anytime-valid inference, we propose a test with time-uniform type I error control and derive explicit bounds on the finite sample performance of the test. We demonstrate the empirical performance of the procedure in comparison to existing sequential and non-sequential independence tests. Furthermore, since the proposed test is distribution free under the null hypothesis, we empirically simulate the gap due to Ville’s inequality–the supermartingale analogue of Markov’s inequality–that is commonly applied to control type I error in anytime-valid inference, and apply this to construct a truncated sequential test.
摘要 我们考虑的问题是在连续环境中对两个单变量随机变量进行独立性检验。通过利用最近在安全、随时有效推断方面的发展,我们提出了一种具有时间均匀 I 型误差控制的检验,并推导出了检验的有限样本性能的明确界限。与现有的顺序和非顺序独立性检验相比,我们证明了该程序的经验性能。此外,由于所提出的检验在零假设下是无分布的,因此我们根据经验模拟了 Ville 不等式--即马尔可夫不等式的超马尔可夫不等式--导致的差距,该不等式通常用于控制任意时间有效推断中的 I 型误差,并将其应用于构建截断序列检验。
{"title":"A Rank-Based Sequential Test of Independence","authors":"Alexander Henzi, Michael Law","doi":"10.1093/biomet/asae023","DOIUrl":"https://doi.org/10.1093/biomet/asae023","url":null,"abstract":"Summary We consider the problem of independence testing for two univariate random variables in a sequential setting. By leveraging recent developments on safe, anytime-valid inference, we propose a test with time-uniform type I error control and derive explicit bounds on the finite sample performance of the test. We demonstrate the empirical performance of the procedure in comparison to existing sequential and non-sequential independence tests. Furthermore, since the proposed test is distribution free under the null hypothesis, we empirically simulate the gap due to Ville’s inequality–the supermartingale analogue of Markov’s inequality–that is commonly applied to control type I error in anytime-valid inference, and apply this to construct a truncated sequential test.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"23 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A model-free variable screening method for optimal treatment regimes with high-dimensional survival data 利用高维生存数据优化治疗方案的无模型变量筛选法
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-05-05 DOI: 10.1093/biomet/asae022
Cheng-Han Yang, Yu-Jen Cheng
Summary We propose a model-free variable screening method for the optimal treatment regime with high-dimensional survival data. The proposed screening method provides a unified framework to select the active variables in a prespecified target population, including the treated group as a special case. Based on this framework, the optimal treatment regime is exactly the optimal classifier that minimizes a weighted misclassification error rate, with weights associated with survival outcome variables, the censoring distribution, and a prespecified target population. Our main contribution involves reformulating the weighted classification problem into a classification problem within a hypothetical population, where the observed data can be viewed as a sample obtained from outcome-dependent sampling, with the selection probability inversely proportional to the weights. Consequently, we introduce the weighted Kolmogorov–Smirnov approach for selecting active variables in the optimal treatment regime, extending the conventional Kolmogorov–Smirnov method for binary classification. Additionally, the proposed screening method exhibits two levels of robustness. The first level of robustness is achieved because the proposed method does not require any model assumptions for survival outcome on treatment and covariates, whereas the other is attained as the form of treatment regimes is allowed to be unspecified even without requiring convex surrogate loss, such as logit loss or hinge loss. As a result, the proposed screening method is robust to model misspecifications, and nonparametric learning methods such as random forests and boosting can be applied to those selected variables for further analysis. The theoretical properties of the proposed method are established. The performance of the proposed method is examined through simulation studies and illustrated by a real dataset.
摘要 我们提出了一种针对高维生存数据的最佳治疗机制的无模型变量筛选方法。所提出的筛选方法提供了一个统一的框架,用于在预先指定的目标人群(包括作为特例的治疗组)中选择活性变量。基于这一框架,最佳治疗机制正是能使加权误分类错误率最小化的最佳分类器,其权重与生存结果变量、删减分布和预先指定的目标人群相关。我们的主要贡献在于将加权分类问题重新表述为假设人群中的分类问题,其中观察到的数据可被视为从结果依赖抽样中获得的样本,选择概率与权重成反比。因此,我们引入了加权 Kolmogorov-Smirnov 方法,用于在最佳治疗机制中选择活跃变量,从而扩展了用于二元分类的传统 Kolmogorov-Smirnov 方法。此外,所提出的筛选方法具有两层稳健性。第一层稳健性是由于所提出的方法不需要对治疗和协变量的生存结果进行任何模型假设,而另一层稳健性则是由于允许不指定治疗制度的形式,甚至不需要凸代损失,如 logit 损失或铰链损失。因此,所提出的筛选方法对模型的错误指定具有鲁棒性,而且可以将随机森林和提升等非参数学习方法应用于所选变量的进一步分析。本文建立了拟议方法的理论属性。通过模拟研究检验了所提方法的性能,并通过真实数据集进行了说明。
{"title":"A model-free variable screening method for optimal treatment regimes with high-dimensional survival data","authors":"Cheng-Han Yang, Yu-Jen Cheng","doi":"10.1093/biomet/asae022","DOIUrl":"https://doi.org/10.1093/biomet/asae022","url":null,"abstract":"Summary We propose a model-free variable screening method for the optimal treatment regime with high-dimensional survival data. The proposed screening method provides a unified framework to select the active variables in a prespecified target population, including the treated group as a special case. Based on this framework, the optimal treatment regime is exactly the optimal classifier that minimizes a weighted misclassification error rate, with weights associated with survival outcome variables, the censoring distribution, and a prespecified target population. Our main contribution involves reformulating the weighted classification problem into a classification problem within a hypothetical population, where the observed data can be viewed as a sample obtained from outcome-dependent sampling, with the selection probability inversely proportional to the weights. Consequently, we introduce the weighted Kolmogorov–Smirnov approach for selecting active variables in the optimal treatment regime, extending the conventional Kolmogorov–Smirnov method for binary classification. Additionally, the proposed screening method exhibits two levels of robustness. The first level of robustness is achieved because the proposed method does not require any model assumptions for survival outcome on treatment and covariates, whereas the other is attained as the form of treatment regimes is allowed to be unspecified even without requiring convex surrogate loss, such as logit loss or hinge loss. As a result, the proposed screening method is robust to model misspecifications, and nonparametric learning methods such as random forests and boosting can be applied to those selected variables for further analysis. The theoretical properties of the proposed method are established. The performance of the proposed method is examined through simulation studies and illustrated by a real dataset.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"46 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1