首页 > 最新文献

Biometrika最新文献

英文 中文
Testing serial dependence or cross dependence for time series with underreporting 测试有漏报的时间序列的序列依赖性或交叉依赖性
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-06-22 DOI: 10.1093/biomet/asae027
Keyao Wei, Lengyang Wang, Yingcun Xia
In practice, it is common for collected data to be underreported, which is particularly prevalent in fields such as social sciences, ecology and epidemiology. Drawing inferences from such data using conventional statistical methods can lead to incorrect conclusions. In this paper, we study tests for serial or cross dependence in time series data that are subject to underreporting. We introduce new test statistics, develop corresponding group-of-blocks bootstrap techniques, and establish their consistency. The methods are shown to be efficient by simulation and are used to identify key factors responsible for the spread of dengue fever and the occurrence of cardiovascular disease.
在实践中,收集到的数据被漏报是很常见的现象,这在社会科学、生态学和流行病学等领域尤为普遍。使用传统统计方法对此类数据进行推断可能会得出错误的结论。在本文中,我们研究了受漏报影响的时间序列数据中的序列或交叉依赖性检验。我们引入了新的检验统计量,开发了相应的块组引导技术,并确定了它们的一致性。通过模拟证明了这些方法的有效性,并将其用于确定登革热传播和心血管疾病发生的关键因素。
{"title":"Testing serial dependence or cross dependence for time series with underreporting","authors":"Keyao Wei, Lengyang Wang, Yingcun Xia","doi":"10.1093/biomet/asae027","DOIUrl":"https://doi.org/10.1093/biomet/asae027","url":null,"abstract":"In practice, it is common for collected data to be underreported, which is particularly prevalent in fields such as social sciences, ecology and epidemiology. Drawing inferences from such data using conventional statistical methods can lead to incorrect conclusions. In this paper, we study tests for serial or cross dependence in time series data that are subject to underreporting. We introduce new test statistics, develop corresponding group-of-blocks bootstrap techniques, and establish their consistency. The methods are shown to be efficient by simulation and are used to identify key factors responsible for the spread of dengue fever and the occurrence of cardiovascular disease.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"197 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Rank-Based Sequential Test of Independence 基于等级的独立性序列检验
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-05-13 DOI: 10.1093/biomet/asae023
Alexander Henzi, Michael Law
Summary We consider the problem of independence testing for two univariate random variables in a sequential setting. By leveraging recent developments on safe, anytime-valid inference, we propose a test with time-uniform type I error control and derive explicit bounds on the finite sample performance of the test. We demonstrate the empirical performance of the procedure in comparison to existing sequential and non-sequential independence tests. Furthermore, since the proposed test is distribution free under the null hypothesis, we empirically simulate the gap due to Ville’s inequality–the supermartingale analogue of Markov’s inequality–that is commonly applied to control type I error in anytime-valid inference, and apply this to construct a truncated sequential test.
摘要 我们考虑的问题是在连续环境中对两个单变量随机变量进行独立性检验。通过利用最近在安全、随时有效推断方面的发展,我们提出了一种具有时间均匀 I 型误差控制的检验,并推导出了检验的有限样本性能的明确界限。与现有的顺序和非顺序独立性检验相比,我们证明了该程序的经验性能。此外,由于所提出的检验在零假设下是无分布的,因此我们根据经验模拟了 Ville 不等式--即马尔可夫不等式的超马尔可夫不等式--导致的差距,该不等式通常用于控制任意时间有效推断中的 I 型误差,并将其应用于构建截断序列检验。
{"title":"A Rank-Based Sequential Test of Independence","authors":"Alexander Henzi, Michael Law","doi":"10.1093/biomet/asae023","DOIUrl":"https://doi.org/10.1093/biomet/asae023","url":null,"abstract":"Summary We consider the problem of independence testing for two univariate random variables in a sequential setting. By leveraging recent developments on safe, anytime-valid inference, we propose a test with time-uniform type I error control and derive explicit bounds on the finite sample performance of the test. We demonstrate the empirical performance of the procedure in comparison to existing sequential and non-sequential independence tests. Furthermore, since the proposed test is distribution free under the null hypothesis, we empirically simulate the gap due to Ville’s inequality–the supermartingale analogue of Markov’s inequality–that is commonly applied to control type I error in anytime-valid inference, and apply this to construct a truncated sequential test.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"23 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A model-free variable screening method for optimal treatment regimes with high-dimensional survival data 利用高维生存数据优化治疗方案的无模型变量筛选法
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-05-05 DOI: 10.1093/biomet/asae022
Cheng-Han Yang, Yu-Jen Cheng
Summary We propose a model-free variable screening method for the optimal treatment regime with high-dimensional survival data. The proposed screening method provides a unified framework to select the active variables in a prespecified target population, including the treated group as a special case. Based on this framework, the optimal treatment regime is exactly the optimal classifier that minimizes a weighted misclassification error rate, with weights associated with survival outcome variables, the censoring distribution, and a prespecified target population. Our main contribution involves reformulating the weighted classification problem into a classification problem within a hypothetical population, where the observed data can be viewed as a sample obtained from outcome-dependent sampling, with the selection probability inversely proportional to the weights. Consequently, we introduce the weighted Kolmogorov–Smirnov approach for selecting active variables in the optimal treatment regime, extending the conventional Kolmogorov–Smirnov method for binary classification. Additionally, the proposed screening method exhibits two levels of robustness. The first level of robustness is achieved because the proposed method does not require any model assumptions for survival outcome on treatment and covariates, whereas the other is attained as the form of treatment regimes is allowed to be unspecified even without requiring convex surrogate loss, such as logit loss or hinge loss. As a result, the proposed screening method is robust to model misspecifications, and nonparametric learning methods such as random forests and boosting can be applied to those selected variables for further analysis. The theoretical properties of the proposed method are established. The performance of the proposed method is examined through simulation studies and illustrated by a real dataset.
摘要 我们提出了一种针对高维生存数据的最佳治疗机制的无模型变量筛选方法。所提出的筛选方法提供了一个统一的框架,用于在预先指定的目标人群(包括作为特例的治疗组)中选择活性变量。基于这一框架,最佳治疗机制正是能使加权误分类错误率最小化的最佳分类器,其权重与生存结果变量、删减分布和预先指定的目标人群相关。我们的主要贡献在于将加权分类问题重新表述为假设人群中的分类问题,其中观察到的数据可被视为从结果依赖抽样中获得的样本,选择概率与权重成反比。因此,我们引入了加权 Kolmogorov-Smirnov 方法,用于在最佳治疗机制中选择活跃变量,从而扩展了用于二元分类的传统 Kolmogorov-Smirnov 方法。此外,所提出的筛选方法具有两层稳健性。第一层稳健性是由于所提出的方法不需要对治疗和协变量的生存结果进行任何模型假设,而另一层稳健性则是由于允许不指定治疗制度的形式,甚至不需要凸代损失,如 logit 损失或铰链损失。因此,所提出的筛选方法对模型的错误指定具有鲁棒性,而且可以将随机森林和提升等非参数学习方法应用于所选变量的进一步分析。本文建立了拟议方法的理论属性。通过模拟研究检验了所提方法的性能,并通过真实数据集进行了说明。
{"title":"A model-free variable screening method for optimal treatment regimes with high-dimensional survival data","authors":"Cheng-Han Yang, Yu-Jen Cheng","doi":"10.1093/biomet/asae022","DOIUrl":"https://doi.org/10.1093/biomet/asae022","url":null,"abstract":"Summary We propose a model-free variable screening method for the optimal treatment regime with high-dimensional survival data. The proposed screening method provides a unified framework to select the active variables in a prespecified target population, including the treated group as a special case. Based on this framework, the optimal treatment regime is exactly the optimal classifier that minimizes a weighted misclassification error rate, with weights associated with survival outcome variables, the censoring distribution, and a prespecified target population. Our main contribution involves reformulating the weighted classification problem into a classification problem within a hypothetical population, where the observed data can be viewed as a sample obtained from outcome-dependent sampling, with the selection probability inversely proportional to the weights. Consequently, we introduce the weighted Kolmogorov–Smirnov approach for selecting active variables in the optimal treatment regime, extending the conventional Kolmogorov–Smirnov method for binary classification. Additionally, the proposed screening method exhibits two levels of robustness. The first level of robustness is achieved because the proposed method does not require any model assumptions for survival outcome on treatment and covariates, whereas the other is attained as the form of treatment regimes is allowed to be unspecified even without requiring convex surrogate loss, such as logit loss or hinge loss. As a result, the proposed screening method is robust to model misspecifications, and nonparametric learning methods such as random forests and boosting can be applied to those selected variables for further analysis. The theoretical properties of the proposed method are established. The performance of the proposed method is examined through simulation studies and illustrated by a real dataset.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"46 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity analysis for matched observational studies with continuous exposures and binary outcomes 对连续暴露和二元结果的匹配观察研究进行敏感性分析
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-04-13 DOI: 10.1093/biomet/asae021
Jeffrey Zhang, Dylan S Small, Siyu Heng
Summary Matching is one of the most widely used study designs for adjusting for measured confounders in observational studies. However, unmeasured confounding may exist and cannot be removed by matching. Therefore, a sensitivity analysis is typically needed to assess a causal conclusion’s sensitivity to unmeasured confounding. Sensitivity analysis frameworks for binary exposures have been well-established for various matching designs and are commonly used in various studies. However, unlike the binary exposure case, there still lacks valid and general sensitivity analysis methods for continuous exposures, except in some special cases such as pair matching. To fill this gap in the binary outcome case, we develop a sensitivity analysis framework for general matching designs with continuous exposures and binary outcomes. First, we use probabilistic lattice theory to show our sensitivity analysis approach is finite-population- exact under Fisher’s sharp null. Second, we prove a novel design sensitivity formula as a powerful tool for asymptotically evaluating the performance of our sensitivity analysis approach. Third, to allow effect heterogeneity with binary outcomes, we introduce a framework for conducting asymptotically exact inference and sensitivity analysis on generalized attributable effects with binary outcomes via mixed- integer programming. Fourth, for the continuous outcomes case, we show that conducting an asymptotically exact sensitivity analysis in matched observational studies when both the exposures and outcomes are continuous is generally NP-hard, except in some special cases such as pair matching. As a real data application, we apply our new methods to study the effect of early-life lead exposure on juvenile delinquency. An implementation of the methods in this work is available in the R package doseSens.
摘要 在观察性研究中,匹配是调整测量混杂因素最广泛使用的研究设计之一。然而,未测量的混杂因素可能存在,且无法通过匹配去除。因此,通常需要进行敏感性分析,以评估因果结论对未测量混杂因素的敏感性。针对二元暴露的敏感性分析框架已经在各种匹配设计中得到了很好的应用,并在各种研究中得到了普遍使用。然而,与二元暴露情况不同的是,除了配对匹配等一些特殊情况外,对于连续暴露仍然缺乏有效和通用的敏感性分析方法。为了填补二元结果情况下的这一空白,我们为连续暴露和二元结果的一般匹配设计开发了一个敏感性分析框架。首先,我们使用概率晶格理论来证明我们的灵敏度分析方法在费雪尖锐无效条件下是有限人群精确的。其次,我们证明了一个新颖的设计敏感性公式,它是渐近评估我们的敏感性分析方法性能的有力工具。第三,为了允许二元结果的效应异质性,我们引入了一个框架,通过混合整数编程对二元结果的广义归因效应进行渐进精确推断和敏感性分析。第四,对于连续结果的情况,我们表明,在暴露和结果都是连续的情况下,在配对观察研究中进行渐进精确的敏感性分析一般是 NP-困难的,除了一些特殊情况,如配对匹配。在实际数据应用中,我们将新方法应用于研究早期铅暴露对青少年犯罪的影响。本研究方法的实现可在 R 软件包 doseSens 中找到。
{"title":"Sensitivity analysis for matched observational studies with continuous exposures and binary outcomes","authors":"Jeffrey Zhang, Dylan S Small, Siyu Heng","doi":"10.1093/biomet/asae021","DOIUrl":"https://doi.org/10.1093/biomet/asae021","url":null,"abstract":"Summary Matching is one of the most widely used study designs for adjusting for measured confounders in observational studies. However, unmeasured confounding may exist and cannot be removed by matching. Therefore, a sensitivity analysis is typically needed to assess a causal conclusion’s sensitivity to unmeasured confounding. Sensitivity analysis frameworks for binary exposures have been well-established for various matching designs and are commonly used in various studies. However, unlike the binary exposure case, there still lacks valid and general sensitivity analysis methods for continuous exposures, except in some special cases such as pair matching. To fill this gap in the binary outcome case, we develop a sensitivity analysis framework for general matching designs with continuous exposures and binary outcomes. First, we use probabilistic lattice theory to show our sensitivity analysis approach is finite-population- exact under Fisher’s sharp null. Second, we prove a novel design sensitivity formula as a powerful tool for asymptotically evaluating the performance of our sensitivity analysis approach. Third, to allow effect heterogeneity with binary outcomes, we introduce a framework for conducting asymptotically exact inference and sensitivity analysis on generalized attributable effects with binary outcomes via mixed- integer programming. Fourth, for the continuous outcomes case, we show that conducting an asymptotically exact sensitivity analysis in matched observational studies when both the exposures and outcomes are continuous is generally NP-hard, except in some special cases such as pair matching. As a real data application, we apply our new methods to study the effect of early-life lead exposure on juvenile delinquency. An implementation of the methods in this work is available in the R package doseSens.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"1 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140568514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sharp symbolic nonparametric bounds for measures of benefit in observational and imperfect randomized studies with ordinal outcomes 具有序数结果的观察性研究和不完全随机研究中收益测量的锐利符号非参数界限
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-04-11 DOI: 10.1093/biomet/asae020
Erin E Gabriel, Michael C Sachs, Andreas Kryger Jensen
Summary The probability of benefit is a valuable and meaningful measure of treatment effect, which has advantages over the average treatment effect. Particularly for an ordinal outcome, it has a better interpretation and can make apparent different aspects of the treatment impact. Unfortunately, this measure, and variations of it, are not identifiable even in randomized trials with perfect compliance. There is, for this reason, a long literature on nonparametric bounds for unidentifiable measures of benefit. These have primarily focused on perfect randomized trial settings and one or two specific estimands. We expand these bounds to observational settings with unmeasured confounders and imperfect randomized trials for all three estimands considered in the literature: the probability of benefit, the probability of no harm, and the relative treatment effect.
摘要 受益概率是衡量治疗效果的一个有价值、有意义的指标,它比平均治疗效果更有优势。特别是对于序数结果,它有更好的解释,并能使治疗效果的不同方面显而易见。遗憾的是,即使是在完全符合要求的随机试验中,也无法识别这种测量方法及其变体。因此,关于无法识别的收益测量的非参数界限的文献有很长的篇幅。这些文献主要集中于完美随机试验环境和一两个特定的估计值。我们将这些界限扩展到具有未测量混杂因素和不完全随机试验的观察环境中,适用于文献中考虑的所有三种估计值:获益概率、无害概率和相对治疗效果。
{"title":"Sharp symbolic nonparametric bounds for measures of benefit in observational and imperfect randomized studies with ordinal outcomes","authors":"Erin E Gabriel, Michael C Sachs, Andreas Kryger Jensen","doi":"10.1093/biomet/asae020","DOIUrl":"https://doi.org/10.1093/biomet/asae020","url":null,"abstract":"Summary The probability of benefit is a valuable and meaningful measure of treatment effect, which has advantages over the average treatment effect. Particularly for an ordinal outcome, it has a better interpretation and can make apparent different aspects of the treatment impact. Unfortunately, this measure, and variations of it, are not identifiable even in randomized trials with perfect compliance. There is, for this reason, a long literature on nonparametric bounds for unidentifiable measures of benefit. These have primarily focused on perfect randomized trial settings and one or two specific estimands. We expand these bounds to observational settings with unmeasured confounders and imperfect randomized trials for all three estimands considered in the literature: the probability of benefit, the probability of no harm, and the relative treatment effect.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"49 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140568397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Individualized dynamic model for multi-resolutional data 多分辨率数据的个性化动态模型
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-04-08 DOI: 10.1093/biomet/asae015
J Zhang, F Xue, Q Xu, J Lee, A Qu
SUMMARY Mobile health has emerged as a major success for tracking individual health status, due to the popularity and power of smartphones and wearable devices. This has also brought great challenges in handling heterogeneous, multi-resolution data which arise ubiquitously in mobile health due to irregular multivariate measurements collected from individuals. In this paper, we propose an individualized dynamic latent factor model for irregular multi-resolution time series data to interpolate unsampled measurements of time series with low resolution. One major advantage of the proposed method is the capability to integrate multiple irregular time series and multiple subjects by mapping the multi-resolution data to the latent space. In addition, the proposed individualized dynamic latent factor model is applicable to capturing heterogeneous longitudinal information through individualized dynamic latent factors. Our theory provides a bound on the integrated interpolation error and the convergence rate for B-spline approximation methods. Both the simulation studies and the application to smartwatch data demonstrate the superior performance of the proposed method compared to existing methods.
摘要 由于智能手机和可穿戴设备的普及和功能强大,移动医疗已成为跟踪个人健康状况的主要成功手段。这也给处理异构多分辨率数据带来了巨大挑战,由于从个人收集到的不规则多变量测量数据,这些数据在移动健康领域无处不在。在本文中,我们提出了一种针对不规则多分辨率时间序列数据的个性化动态潜因模型,用于插值低分辨率时间序列的未采样测量值。所提方法的一大优势是,通过将多分辨率数据映射到潜空间,能够整合多个不规则时间序列和多个研究对象。此外,所提出的个体化动态潜因子模型适用于通过个体化动态潜因子捕捉异质纵向信息。我们的理论提供了 B-样条近似方法的综合插值误差和收敛速率的约束。模拟研究和对智能手表数据的应用都表明,与现有方法相比,所提出的方法具有更优越的性能。
{"title":"Individualized dynamic model for multi-resolutional data","authors":"J Zhang, F Xue, Q Xu, J Lee, A Qu","doi":"10.1093/biomet/asae015","DOIUrl":"https://doi.org/10.1093/biomet/asae015","url":null,"abstract":"SUMMARY Mobile health has emerged as a major success for tracking individual health status, due to the popularity and power of smartphones and wearable devices. This has also brought great challenges in handling heterogeneous, multi-resolution data which arise ubiquitously in mobile health due to irregular multivariate measurements collected from individuals. In this paper, we propose an individualized dynamic latent factor model for irregular multi-resolution time series data to interpolate unsampled measurements of time series with low resolution. One major advantage of the proposed method is the capability to integrate multiple irregular time series and multiple subjects by mapping the multi-resolution data to the latent space. In addition, the proposed individualized dynamic latent factor model is applicable to capturing heterogeneous longitudinal information through individualized dynamic latent factors. Our theory provides a bound on the integrated interpolation error and the convergence rate for B-spline approximation methods. Both the simulation studies and the application to smartwatch data demonstrate the superior performance of the proposed method compared to existing methods.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"3 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140568582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Flexible control of the median of the false discovery proportion 灵活控制错误发现比例的中位数
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-03-23 DOI: 10.1093/biomet/asae018
Jesse Hemerik, Aldo Solari, Jelle J Goeman
We introduce a multiple testing procedure that controls the median of the proportion of false discoveries in a flexible way. The procedure only requires a vector of p-values as input and is comparable to the Benjamini–Hochberg method, which controls the mean of the proportion of false discoveries. Our method allows free choice of one or several values of alpha after seeing the data, unlike the Benjamini–Hochberg procedure, which can be very anti-conservative when alpha is chosen post hoc. We prove these claims and illustrate them with simulations. Our procedure is inspired by a popular estimator of the total number of true hypotheses. We adapt this estimator to provide simultaneously median unbiased estimators of the proportion of false discoveries, valid for finite samples. This simultaneity allows for the claimed flexibility. Our approach does not assume independence. The time complexity of our method is linear in the number of hypotheses, after sorting the p-values.
我们介绍了一种多重检验程序,它能以灵活的方式控制错误发现比例的中位数。该程序只需要一个 p 值向量作为输入,与控制错误发现比例均值的本杰明-霍奇伯格方法具有可比性。我们的方法允许在看到数据后自由选择一个或多个 alpha 值,这与 Benjamini-Hochberg 程序不同,后者在事后选择 alpha 值时可能非常不保守。我们证明了这些说法,并通过模拟进行了说明。我们的程序受到一个流行的真实假设总数估计器的启发。我们对这个估计器进行了调整,以同时提供对有限样本有效的错误发现比例的中值无偏估计器。这种同时性使我们获得了所宣称的灵活性。我们的方法不假定独立性。在对 p 值进行排序后,我们方法的时间复杂度与假设数量成线性关系。
{"title":"Flexible control of the median of the false discovery proportion","authors":"Jesse Hemerik, Aldo Solari, Jelle J Goeman","doi":"10.1093/biomet/asae018","DOIUrl":"https://doi.org/10.1093/biomet/asae018","url":null,"abstract":"We introduce a multiple testing procedure that controls the median of the proportion of false discoveries in a flexible way. The procedure only requires a vector of p-values as input and is comparable to the Benjamini–Hochberg method, which controls the mean of the proportion of false discoveries. Our method allows free choice of one or several values of alpha after seeing the data, unlike the Benjamini–Hochberg procedure, which can be very anti-conservative when alpha is chosen post hoc. We prove these claims and illustrate them with simulations. Our procedure is inspired by a popular estimator of the total number of true hypotheses. We adapt this estimator to provide simultaneously median unbiased estimators of the proportion of false discoveries, valid for finite samples. This simultaneity allows for the claimed flexibility. Our approach does not assume independence. The time complexity of our method is linear in the number of hypotheses, after sorting the p-values.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"309 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal regimes for algorithm-assisted human decision-making 算法辅助人类决策的最佳机制
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-03-19 DOI: 10.1093/biomet/asae016
M J Stensrud, J D Laurendeau, A L Sarvet
Summary We consider optimal regimes for algorithm-assisted human decision-making. Such regimes are decision functions of measured pre-treatment variables and, by leveraging natural treatment values, enjoy a superoptimality property whereby they are guaranteed to outperform conventional optimal regimes. When there is unmeasured confounding, the benefit of using superoptimal regimes can be considerable. When there is no unmeasured confounding, superoptimal regimes are identical to conventional optimal regimes. Furthermore, identification of the expected outcome under superoptimal regimes in non-experimental studies requires the same assumptions as identification of value functions under conventional optimal regimes when the treatment is binary. To illustrate the utility of superoptimal regimes, we derive identification and estimation results in a common instrumental variable setting. We use these derivations to analyse examples from the optimal regimes literature, including a case study of the effect of prompt intensive care treatment on survival.
摘要 我们考虑了算法辅助人类决策的最优机制。这种机制是测量的前处理变量的决策函数,通过利用自然处理值,它们具有超优特性,从而保证优于传统的最优机制。当存在无法测量的混杂因素时,使用超优化方案的好处可能相当可观。当不存在无法测量的混杂因素时,超最优制度与传统最优制度完全相同。此外,在非实验研究中,确定超最优制度下的预期结果所需的假设条件,与治疗为二元时确定传统最优制度下的价值函数所需的假设条件相同。为了说明超最优制度的实用性,我们推导了普通工具变量设置下的识别和估计结果。我们利用这些推导来分析最优制度文献中的例子,包括一个关于及时重症监护治疗对存活率影响的案例研究。
{"title":"Optimal regimes for algorithm-assisted human decision-making","authors":"M J Stensrud, J D Laurendeau, A L Sarvet","doi":"10.1093/biomet/asae016","DOIUrl":"https://doi.org/10.1093/biomet/asae016","url":null,"abstract":"Summary We consider optimal regimes for algorithm-assisted human decision-making. Such regimes are decision functions of measured pre-treatment variables and, by leveraging natural treatment values, enjoy a superoptimality property whereby they are guaranteed to outperform conventional optimal regimes. When there is unmeasured confounding, the benefit of using superoptimal regimes can be considerable. When there is no unmeasured confounding, superoptimal regimes are identical to conventional optimal regimes. Furthermore, identification of the expected outcome under superoptimal regimes in non-experimental studies requires the same assumptions as identification of value functions under conventional optimal regimes when the treatment is binary. To illustrate the utility of superoptimal regimes, we derive identification and estimation results in a common instrumental variable setting. We use these derivations to analyse examples from the optimal regimes literature, including a case study of the effect of prompt intensive care treatment on survival.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"309 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference of partial correlations of a multivariate Gaussian time series 多变量高斯时间序列的局部相关性推理
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-02-26 DOI: 10.1093/biomet/asae012
A S DiLernia, M Fiecas, L Zhang
We derive an asymptotic joint distribution and novel covariance estimator for the partial correlations of a multivariate Gaussian time series given mild regularity conditions. Using our derived asymptotic distribution, we develop a Wald confidence interval and testing procedure for inference of individual partial correlations for time series data. Through simulation we demonstrate that our proposed confidence interval attains higher coverage rates, and our testing procedure attains false positive rates closer to the nominal levels than approaches that assume independent observations when autocorrelation is present.
在轻度正则性条件下,我们推导出了多元高斯时间序列偏相关性的渐近联合分布和新型协方差估计器。利用我们推导出的渐近分布,我们开发了一种 Wald 置信区间和测试程序,用于推断时间序列数据的单个偏相关性。通过仿真,我们证明了我们提出的置信区间能获得更高的覆盖率,而我们的测试程序能获得更接近名义水平的假阳性率。
{"title":"Inference of partial correlations of a multivariate Gaussian time series","authors":"A S DiLernia, M Fiecas, L Zhang","doi":"10.1093/biomet/asae012","DOIUrl":"https://doi.org/10.1093/biomet/asae012","url":null,"abstract":"We derive an asymptotic joint distribution and novel covariance estimator for the partial correlations of a multivariate Gaussian time series given mild regularity conditions. Using our derived asymptotic distribution, we develop a Wald confidence interval and testing procedure for inference of individual partial correlations for time series data. Through simulation we demonstrate that our proposed confidence interval attains higher coverage rates, and our testing procedure attains false positive rates closer to the nominal levels than approaches that assume independent observations when autocorrelation is present.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"101 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139977684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network-adjusted covariates for community detection 用于群落探测的网络调整协变量
IF 2.7 2区 数学 Q2 BIOLOGY Pub Date : 2024-02-24 DOI: 10.1093/biomet/asae011
Y Hu, W Wang
Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e., covariates. Existing methods have shown the effectiveness of using covariates on the low-degree nodes, but rarely discuss the case where communities have significantly different density levels, i.e. multiscale networks. In this paper, we introduce a novel method that addresses this challenge by constructing network-adjusted covariates, which leverage the network connections and covariates with a node-specific weight for each node. This weight can be calculated without tuning parameters. We present novel theoretical results on the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of misspecification and multiple sparse communities. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal for connection intensity up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network. We then compare our method with others on a statistics publication citation network where 30% of nodes are isolated, and our method produces reasonable and balanced results.
社群检测是网络分析中的一项重要任务,通过纳入主体层面的信息(即协变量),可以大大提高检测效率。现有的方法已经证明了在低度节点上使用协变量的有效性,但很少讨论社群密度水平存在显著差异的情况,即多尺度网络。在本文中,我们引入了一种新方法,通过构建网络调整协变量来应对这一挑战,该方法利用了网络连接和协变量,并为每个节点设定了特定于节点的权重。该权重无需调整参数即可计算。我们提出了新的理论结果,说明在带有协变量的度校正随机块模型下,我们的方法具有很强的一致性,即使在存在规范错误和多个稀疏群落的情况下也是如此。此外,我们还为同时存在网络和协变量时的群落检测问题建立了一般下界,并表明我们的方法在连接强度达到一个常数因子时是最优的。在模拟和 LastFM 应用程序用户网络中,我们的方法优于现有方法。然后,我们在统计出版物引用网络中将我们的方法与其他方法进行了比较,在该网络中,有 30% 的节点是孤立的,我们的方法产生了合理而均衡的结果。
{"title":"Network-adjusted covariates for community detection","authors":"Y Hu, W Wang","doi":"10.1093/biomet/asae011","DOIUrl":"https://doi.org/10.1093/biomet/asae011","url":null,"abstract":"Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e., covariates. Existing methods have shown the effectiveness of using covariates on the low-degree nodes, but rarely discuss the case where communities have significantly different density levels, i.e. multiscale networks. In this paper, we introduce a novel method that addresses this challenge by constructing network-adjusted covariates, which leverage the network connections and covariates with a node-specific weight for each node. This weight can be calculated without tuning parameters. We present novel theoretical results on the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of misspecification and multiple sparse communities. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal for connection intensity up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network. We then compare our method with others on a statistics publication citation network where 30% of nodes are isolated, and our method produces reasonable and balanced results.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"35 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139950512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1