首页 > 最新文献

Biometrika最新文献

英文 中文
Optimal regimes for algorithm-assisted human decision-making 算法辅助人类决策的最佳机制
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-03-19 DOI: 10.1093/biomet/asae016
M J Stensrud, J D Laurendeau, A L Sarvet
Summary We consider optimal regimes for algorithm-assisted human decision-making. Such regimes are decision functions of measured pre-treatment variables and, by leveraging natural treatment values, enjoy a superoptimality property whereby they are guaranteed to outperform conventional optimal regimes. When there is unmeasured confounding, the benefit of using superoptimal regimes can be considerable. When there is no unmeasured confounding, superoptimal regimes are identical to conventional optimal regimes. Furthermore, identification of the expected outcome under superoptimal regimes in non-experimental studies requires the same assumptions as identification of value functions under conventional optimal regimes when the treatment is binary. To illustrate the utility of superoptimal regimes, we derive identification and estimation results in a common instrumental variable setting. We use these derivations to analyse examples from the optimal regimes literature, including a case study of the effect of prompt intensive care treatment on survival.
摘要 我们考虑了算法辅助人类决策的最优机制。这种机制是测量的前处理变量的决策函数,通过利用自然处理值,它们具有超优特性,从而保证优于传统的最优机制。当存在无法测量的混杂因素时,使用超优化方案的好处可能相当可观。当不存在无法测量的混杂因素时,超最优制度与传统最优制度完全相同。此外,在非实验研究中,确定超最优制度下的预期结果所需的假设条件,与治疗为二元时确定传统最优制度下的价值函数所需的假设条件相同。为了说明超最优制度的实用性,我们推导了普通工具变量设置下的识别和估计结果。我们利用这些推导来分析最优制度文献中的例子,包括一个关于及时重症监护治疗对存活率影响的案例研究。
{"title":"Optimal regimes for algorithm-assisted human decision-making","authors":"M J Stensrud, J D Laurendeau, A L Sarvet","doi":"10.1093/biomet/asae016","DOIUrl":"https://doi.org/10.1093/biomet/asae016","url":null,"abstract":"Summary We consider optimal regimes for algorithm-assisted human decision-making. Such regimes are decision functions of measured pre-treatment variables and, by leveraging natural treatment values, enjoy a superoptimality property whereby they are guaranteed to outperform conventional optimal regimes. When there is unmeasured confounding, the benefit of using superoptimal regimes can be considerable. When there is no unmeasured confounding, superoptimal regimes are identical to conventional optimal regimes. Furthermore, identification of the expected outcome under superoptimal regimes in non-experimental studies requires the same assumptions as identification of value functions under conventional optimal regimes when the treatment is binary. To illustrate the utility of superoptimal regimes, we derive identification and estimation results in a common instrumental variable setting. We use these derivations to analyse examples from the optimal regimes literature, including a case study of the effect of prompt intensive care treatment on survival.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140199831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference of partial correlations of a multivariate Gaussian time series 多变量高斯时间序列的局部相关性推理
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-02-26 DOI: 10.1093/biomet/asae012
A S DiLernia, M Fiecas, L Zhang
We derive an asymptotic joint distribution and novel covariance estimator for the partial correlations of a multivariate Gaussian time series given mild regularity conditions. Using our derived asymptotic distribution, we develop a Wald confidence interval and testing procedure for inference of individual partial correlations for time series data. Through simulation we demonstrate that our proposed confidence interval attains higher coverage rates, and our testing procedure attains false positive rates closer to the nominal levels than approaches that assume independent observations when autocorrelation is present.
在轻度正则性条件下,我们推导出了多元高斯时间序列偏相关性的渐近联合分布和新型协方差估计器。利用我们推导出的渐近分布,我们开发了一种 Wald 置信区间和测试程序,用于推断时间序列数据的单个偏相关性。通过仿真,我们证明了我们提出的置信区间能获得更高的覆盖率,而我们的测试程序能获得更接近名义水平的假阳性率。
{"title":"Inference of partial correlations of a multivariate Gaussian time series","authors":"A S DiLernia, M Fiecas, L Zhang","doi":"10.1093/biomet/asae012","DOIUrl":"https://doi.org/10.1093/biomet/asae012","url":null,"abstract":"We derive an asymptotic joint distribution and novel covariance estimator for the partial correlations of a multivariate Gaussian time series given mild regularity conditions. Using our derived asymptotic distribution, we develop a Wald confidence interval and testing procedure for inference of individual partial correlations for time series data. Through simulation we demonstrate that our proposed confidence interval attains higher coverage rates, and our testing procedure attains false positive rates closer to the nominal levels than approaches that assume independent observations when autocorrelation is present.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139977684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Network-adjusted covariates for community detection 用于群落探测的网络调整协变量
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-02-24 DOI: 10.1093/biomet/asae011
Y Hu, W Wang
Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e., covariates. Existing methods have shown the effectiveness of using covariates on the low-degree nodes, but rarely discuss the case where communities have significantly different density levels, i.e. multiscale networks. In this paper, we introduce a novel method that addresses this challenge by constructing network-adjusted covariates, which leverage the network connections and covariates with a node-specific weight for each node. This weight can be calculated without tuning parameters. We present novel theoretical results on the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of misspecification and multiple sparse communities. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal for connection intensity up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network. We then compare our method with others on a statistics publication citation network where 30% of nodes are isolated, and our method produces reasonable and balanced results.
社群检测是网络分析中的一项重要任务,通过纳入主体层面的信息(即协变量),可以大大提高检测效率。现有的方法已经证明了在低度节点上使用协变量的有效性,但很少讨论社群密度水平存在显著差异的情况,即多尺度网络。在本文中,我们引入了一种新方法,通过构建网络调整协变量来应对这一挑战,该方法利用了网络连接和协变量,并为每个节点设定了特定于节点的权重。该权重无需调整参数即可计算。我们提出了新的理论结果,说明在带有协变量的度校正随机块模型下,我们的方法具有很强的一致性,即使在存在规范错误和多个稀疏群落的情况下也是如此。此外,我们还为同时存在网络和协变量时的群落检测问题建立了一般下界,并表明我们的方法在连接强度达到一个常数因子时是最优的。在模拟和 LastFM 应用程序用户网络中,我们的方法优于现有方法。然后,我们在统计出版物引用网络中将我们的方法与其他方法进行了比较,在该网络中,有 30% 的节点是孤立的,我们的方法产生了合理而均衡的结果。
{"title":"Network-adjusted covariates for community detection","authors":"Y Hu, W Wang","doi":"10.1093/biomet/asae011","DOIUrl":"https://doi.org/10.1093/biomet/asae011","url":null,"abstract":"Community detection is a crucial task in network analysis that can be significantly improved by incorporating subject-level information, i.e., covariates. Existing methods have shown the effectiveness of using covariates on the low-degree nodes, but rarely discuss the case where communities have significantly different density levels, i.e. multiscale networks. In this paper, we introduce a novel method that addresses this challenge by constructing network-adjusted covariates, which leverage the network connections and covariates with a node-specific weight for each node. This weight can be calculated without tuning parameters. We present novel theoretical results on the strong consistency of our method under degree-corrected stochastic blockmodels with covariates, even in the presence of misspecification and multiple sparse communities. Additionally, we establish a general lower bound for the community detection problem when both network and covariates are present, and it shows our method is optimal for connection intensity up to a constant factor. Our method outperforms existing approaches in simulations and a LastFM app user network. We then compare our method with others on a statistics publication citation network where 30% of nodes are isolated, and our method produces reasonable and balanced results.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139950512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Selective conformal inference with false coverage-statement rate control 带错误覆盖率控制的选择性保形推理
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-02-19 DOI: 10.1093/biomet/asae010
Yajie Bao, Yuyang Huo, Haojie Ren, Changliang Zou
Conformal inference is a popular tool for constructing prediction intervals. We consider here the scenario of post-selection/selective conformal inference, that is prediction intervals are reported only for individuals selected from unlabelled test data. To account for multiplicity, we develop a general split conformal framework to construct selective prediction intervals with the false coverage-statement rate control. We first investigate benjamini2005false's false coverage rate-adjusted method in the present setting, and show that it is able to achieve false coverage-statement rate control but yields uniformly inflated prediction intervals. We then propose a novel solution to the problem called selective conditional conformal prediction. Our method performs selection procedures on both the calibration set and test set, and then constructs conformal prediction intervals for the selected test candidates with the aid of conditional empirical distribution obtained by the post-selection calibration set. When the selection rule is exchangeable, we show that our proposed method can exactly control the false coverage-statement rate in a model-free and distribution-free guarantee. For non-exchangeable selection procedures involving the calibration set, we provide non-asymptotic bounds for the false coverage-statement rate under mild distributional assumptions. Numerical results confirm the effectiveness and robustness of our method in false coverage-statement rate control and show that it achieves more narrowed prediction intervals over existing methods across various settings.
共形推理是构建预测区间的常用工具。我们在此考虑的是后选择/选择性共形推理的情况,即预测区间仅针对从无标签测试数据中选出的个体进行报告。为了考虑多重性,我们开发了一个通用的分裂保形框架,以构建具有错误覆盖率控制的选择性预测区间。我们首先研究了 benjamini2005false 在当前环境下的虚假覆盖率调整方法,结果表明该方法能够实现虚假覆盖率控制,但会产生均匀膨胀的预测区间。然后,我们提出了一种新的解决方案,称为选择性条件保形预测。我们的方法同时对校准集和测试集执行选择程序,然后借助选择后校准集获得的条件经验分布为选定的候选测试构建保形预测区间。当选择规则是可交换的时,我们证明我们提出的方法可以在无模型和无分布的保证下精确控制错误覆盖率。对于涉及校准集的不可交换选择程序,我们在温和的分布假设条件下提供了虚假覆盖率的非渐近界限。数值结果证实了我们的方法在控制虚假覆盖率方面的有效性和稳健性,并表明在各种情况下,我们的方法比现有方法实现了更窄的预测区间。
{"title":"Selective conformal inference with false coverage-statement rate control","authors":"Yajie Bao, Yuyang Huo, Haojie Ren, Changliang Zou","doi":"10.1093/biomet/asae010","DOIUrl":"https://doi.org/10.1093/biomet/asae010","url":null,"abstract":"Conformal inference is a popular tool for constructing prediction intervals. We consider here the scenario of post-selection/selective conformal inference, that is prediction intervals are reported only for individuals selected from unlabelled test data. To account for multiplicity, we develop a general split conformal framework to construct selective prediction intervals with the false coverage-statement rate control. We first investigate benjamini2005false's false coverage rate-adjusted method in the present setting, and show that it is able to achieve false coverage-statement rate control but yields uniformly inflated prediction intervals. We then propose a novel solution to the problem called selective conditional conformal prediction. Our method performs selection procedures on both the calibration set and test set, and then constructs conformal prediction intervals for the selected test candidates with the aid of conditional empirical distribution obtained by the post-selection calibration set. When the selection rule is exchangeable, we show that our proposed method can exactly control the false coverage-statement rate in a model-free and distribution-free guarantee. For non-exchangeable selection procedures involving the calibration set, we provide non-asymptotic bounds for the false coverage-statement rate under mild distributional assumptions. Numerical results confirm the effectiveness and robustness of our method in false coverage-statement rate control and show that it achieves more narrowed prediction intervals over existing methods across various settings.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139950365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Promises of Parallel Outcomes 并行成果的承诺
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-02-17 DOI: 10.1093/biomet/asae008
Ying Zhou, Dingke Tang, Dehan Kong, Linbo Wang
A key challenge in causal inference from observational studies is the identification and estimation of causal effects in the presence of unmeasured confounding. In this paper, we introduce a novel approach for causal inference that leverages information in multiple outcomes to deal with unmeasured confounding. An important assumption in our approach is conditional independence among multiple outcomes. In contrast to existing proposals in the literature, the roles of multiple outcomes in the conditional independence assumption are symmetric, hence the name parallel outcomes. We show nonparametric identifiability with at least three parallel outcomes and provide parametric estimation tools under a set of linear structural equation models. Our proposal is evaluated through a set of synthetic and real data analyses.
从观察性研究中进行因果推断的一个主要挑战是在存在未测量混杂因素的情况下识别和估计因果效应。在本文中,我们介绍了一种新的因果推断方法,该方法利用多个结果的信息来处理无法测量的混杂因素。我们方法的一个重要假设是多个结果之间的条件独立性。与现有文献中的建议不同,条件独立性假设中多个结果的角色是对称的,因此被称为平行结果。我们展示了至少三个平行结果的非参数可识别性,并提供了一组线性结构方程模型下的参数估计工具。我们通过一组合成数据和真实数据分析对我们的建议进行了评估。
{"title":"The Promises of Parallel Outcomes","authors":"Ying Zhou, Dingke Tang, Dehan Kong, Linbo Wang","doi":"10.1093/biomet/asae008","DOIUrl":"https://doi.org/10.1093/biomet/asae008","url":null,"abstract":"A key challenge in causal inference from observational studies is the identification and estimation of causal effects in the presence of unmeasured confounding. In this paper, we introduce a novel approach for causal inference that leverages information in multiple outcomes to deal with unmeasured confounding. An important assumption in our approach is conditional independence among multiple outcomes. In contrast to existing proposals in the literature, the roles of multiple outcomes in the conditional independence assumption are symmetric, hence the name parallel outcomes. We show nonparametric identifiability with at least three parallel outcomes and provide parametric estimation tools under a set of linear structural equation models. Our proposal is evaluated through a set of synthetic and real data analyses.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139924069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Doubly robust estimation under covariate-induced dependent left truncation 协变量诱发的依存左截断条件下的双稳健估计
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-02-11 DOI: 10.1093/biomet/asae005
Yuyao Wang, Andrew Ying, Ronghui Xu
Summary In prevalent cohort studies with follow-up, the time-to-event outcome is subject to left truncation leading to selection bias. For estimation of the distribution of time-to-event, conventional methods adjusting for left truncation tend to rely on the quasi-independence assumption that the truncation time and the event time are independent on the observed region. This assumption is violated when there is dependence between the truncation time and the event time possibly induced by measured covariates. Inverse probability of truncation weighting can be used in this case, but it is sensitive to misspecification of the truncation model. In this work, we apply semiparametric theory to find the efficient influence curve of the expectation of an arbitrarily transformed survival time in the presence of covariate-induced dependent left truncation. We then use it to construct estimators that are shown to enjoy double-robustness properties. Our work represents the first attempt to construct doubly robust estimators in the presence of left truncation, which does not fall under the established framework of coarsened data where doubly robust approaches were developed. We provide technical conditions for the asymptotic properties that appear to not have been carefully examined in the literature for time-to-event data, and study the estimators via extensive simulation. We apply the estimators to two datasets from practice, with different right-censoring patterns.
摘要 在有随访的流行队列研究中,从时间到事件的结果会出现左截断,从而导致选择偏差。对于事件发生时间分布的估计,调整左截断的传统方法往往依赖于准独立性假设,即截断时间和事件发生时间在观察区域内是独立的。当截断时间和事件时间之间存在可能由测量协变量引起的依赖关系时,这一假设就被打破了。在这种情况下,可以使用截断加权的反概率,但它对截断模型的错误规范很敏感。在这项研究中,我们运用半参数理论,找到了在存在协变量诱导的左截断情况下,任意转换的生存时间期望的有效影响曲线。然后,我们用它来构建估计器,并证明这些估计器具有双重稳健性。我们的工作代表了在左截断情况下构建双重稳健估计器的首次尝试,而左截断并不属于已开发出双重稳健方法的粗化数据既定框架。我们提供了渐近特性的技术条件,这些条件在时间到事件数据的文献中似乎还没有仔细研究过,我们通过大量的模拟来研究这些估计器。我们将估计器应用于两个具有不同右删减模式的实践数据集。
{"title":"Doubly robust estimation under covariate-induced dependent left truncation","authors":"Yuyao Wang, Andrew Ying, Ronghui Xu","doi":"10.1093/biomet/asae005","DOIUrl":"https://doi.org/10.1093/biomet/asae005","url":null,"abstract":"Summary In prevalent cohort studies with follow-up, the time-to-event outcome is subject to left truncation leading to selection bias. For estimation of the distribution of time-to-event, conventional methods adjusting for left truncation tend to rely on the quasi-independence assumption that the truncation time and the event time are independent on the observed region. This assumption is violated when there is dependence between the truncation time and the event time possibly induced by measured covariates. Inverse probability of truncation weighting can be used in this case, but it is sensitive to misspecification of the truncation model. In this work, we apply semiparametric theory to find the efficient influence curve of the expectation of an arbitrarily transformed survival time in the presence of covariate-induced dependent left truncation. We then use it to construct estimators that are shown to enjoy double-robustness properties. Our work represents the first attempt to construct doubly robust estimators in the presence of left truncation, which does not fall under the established framework of coarsened data where doubly robust approaches were developed. We provide technical conditions for the asymptotic properties that appear to not have been carefully examined in the literature for time-to-event data, and study the estimators via extensive simulation. We apply the estimators to two datasets from practice, with different right-censoring patterns.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139769979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regression analysis of group-tested current status data 对分组测试的现状数据进行回归分析
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-02-08 DOI: 10.1093/biomet/asae006
Shuwei Li, Tao Hu, Lianming Wang, Christopher S McMahan, Joshua M Tebbs
Summary Group testing is an effective way to reduce the time and cost associated with conducting large-scale screening for infectious diseases. Benefits are realized through testing pools formed by combining specimens, such as blood or urine, from different individuals. In some studies, individuals are assessed only once and a time-to-event endpoint is recorded, for example, the time until infection. Combining group testing with this type of endpoint results in group-tested current status data (?). To analyse these complex data, we propose methods which estimate a proportional hazards regression model based on test outcomes from measuring the pools. A sieve maximum likelihood estimation approach is developed that approximates the cumulative baseline hazard function with a piecewise constant function. To identify the sieve estimator, a computationally efficient expectation-maximization algorithm is derived by using data augmentation. Asymptotic properties of both the parametric and nonparametric components of the sieve estimator are then established by applying modern empirical process theory. Numerical results from simulation studies show that our proposed method performs nominally and has advantages over the corresponding estimation method based on individual testing results. We illustrate our work by analysing a chlamydia dataset collected by the State Hygienic Laboratory at the University of Iowa.
摘要 群体检测是减少大规模传染病筛查所需的时间和成本的有效方法。通过将不同个体的血液或尿液等标本组合成检测池,可实现效益。在某些研究中,只对个体进行一次评估,并记录时间终点,如感染前的时间。将分组检测与这类终点相结合,就会得到分组检测的当前状态数据(?)为了分析这些复杂的数据,我们提出了一些方法,这些方法基于测量池的测试结果来估计比例危险回归模型。我们开发了一种筛网最大似然估计方法,用一个片断常数函数来近似累积基线危害函数。为了确定筛网估计器,利用数据扩增推导出了一种计算高效的期望最大化算法。然后,通过应用现代经验过程理论,建立了筛网估计器的参数和非参数部分的渐近特性。模拟研究的数值结果表明,与基于单个测试结果的相应估算方法相比,我们提出的方法具有显著的性能和优势。我们通过分析爱荷华大学国家卫生实验室收集的衣原体数据集来说明我们的工作。
{"title":"Regression analysis of group-tested current status data","authors":"Shuwei Li, Tao Hu, Lianming Wang, Christopher S McMahan, Joshua M Tebbs","doi":"10.1093/biomet/asae006","DOIUrl":"https://doi.org/10.1093/biomet/asae006","url":null,"abstract":"Summary Group testing is an effective way to reduce the time and cost associated with conducting large-scale screening for infectious diseases. Benefits are realized through testing pools formed by combining specimens, such as blood or urine, from different individuals. In some studies, individuals are assessed only once and a time-to-event endpoint is recorded, for example, the time until infection. Combining group testing with this type of endpoint results in group-tested current status data (?). To analyse these complex data, we propose methods which estimate a proportional hazards regression model based on test outcomes from measuring the pools. A sieve maximum likelihood estimation approach is developed that approximates the cumulative baseline hazard function with a piecewise constant function. To identify the sieve estimator, a computationally efficient expectation-maximization algorithm is derived by using data augmentation. Asymptotic properties of both the parametric and nonparametric components of the sieve estimator are then established by applying modern empirical process theory. Numerical results from simulation studies show that our proposed method performs nominally and has advantages over the corresponding estimation method based on individual testing results. We illustrate our work by analysing a chlamydia dataset collected by the State Hygienic Laboratory at the University of Iowa.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explicit solutions for the asymptotically-optimal bandwidth in cross-validation 交叉验证中渐近最优带宽的显式解法
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-02-08 DOI: 10.1093/biomet/asae007
Karim M Abadir, Michel Lubrano
Summary We show that least squares cross-validation methods share a common structure which has an explicit asymptotic solution, when the chosen kernel is asymptotically separable in bandwidth and data. For density estimation with a multivariate Student t(ν) kernel, the cross-validation criterion becomes asymptotically equivalent to a polynomial of only three terms. Our bandwidth formulae are simple and noniterative thus leading to very fast computations, their integrated squared-error dominates traditional cross-validation implementations, they alleviate the notorious sample variability of cross-validation, and overcome its breakdown in the case of repeated observations. We illustrate our method with univariate and bivariate applications, of density estimation and nonparametric regressions, to a large dataset of Michigan State University academic wages and experience.
摘要 我们证明了最小二乘交叉验证方法有一个共同的结构,当所选的核在带宽和数据上是渐进可分的时候,这个结构有一个明确的渐进解。对于使用多变量 Student t(ν) 核的密度估计,交叉验证准则在渐近上等价于一个只有三个项的多项式。我们的带宽计算公式简单且无需迭代,因此计算速度非常快,其综合平方误差在传统的交叉验证实现中占优势,缓解了交叉验证中众所周知的样本变异性,并克服了其在重复观测情况下的缺陷。我们在密歇根州立大学学术工资和经验的大型数据集上,用密度估计和非参数回归的单变量和双变量应用来说明我们的方法。
{"title":"Explicit solutions for the asymptotically-optimal bandwidth in cross-validation","authors":"Karim M Abadir, Michel Lubrano","doi":"10.1093/biomet/asae007","DOIUrl":"https://doi.org/10.1093/biomet/asae007","url":null,"abstract":"Summary We show that least squares cross-validation methods share a common structure which has an explicit asymptotic solution, when the chosen kernel is asymptotically separable in bandwidth and data. For density estimation with a multivariate Student t(ν) kernel, the cross-validation criterion becomes asymptotically equivalent to a polynomial of only three terms. Our bandwidth formulae are simple and noniterative thus leading to very fast computations, their integrated squared-error dominates traditional cross-validation implementations, they alleviate the notorious sample variability of cross-validation, and overcome its breakdown in the case of repeated observations. We illustrate our method with univariate and bivariate applications, of density estimation and nonparametric regressions, to a large dataset of Michigan State University academic wages and experience.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the failure of the bootstrap for Chatterjee's rank correlation 关于查特吉秩相关自举法的失败
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-02-04 DOI: 10.1093/biomet/asae004
Zhexiao Lin, Fang Han
Summary While researchers commonly use the bootstrap to quantify the uncertainty of an estimator, it has been noticed that the standard bootstrap, in general, does not work for Chatterjee's rank correlation. In this paper, we provide proof of this issue under an additional independence assumption, and complement our theory with simulation evidence for general settings. Chatterjee's rank correlation thus falls into a category of statistics that are asymptotically normal but bootstrap inconsistent. Valid inferential methods in this case are Chatterjee's original proposal for testing independence and Lin & Han (2022) 's analytic asymptotic variance estimator for more general purposes.
摘要 虽然研究人员通常使用引导法来量化估计器的不确定性,但人们注意到标准引导法一般不适用于查特吉秩相关。在本文中,我们在额外的独立性假设下证明了这一问题,并用一般情况下的模拟证据补充了我们的理论。因此,查特吉秩相关属于渐近正态但自举不一致的统计类别。在这种情况下,有效的推论方法是 Chatterjee 最初提出的用于检验独立性的方法,以及 Lin & Han (2022) 用于更一般目的的解析渐近方差估计器。
{"title":"On the failure of the bootstrap for Chatterjee's rank correlation","authors":"Zhexiao Lin, Fang Han","doi":"10.1093/biomet/asae004","DOIUrl":"https://doi.org/10.1093/biomet/asae004","url":null,"abstract":"Summary While researchers commonly use the bootstrap to quantify the uncertainty of an estimator, it has been noticed that the standard bootstrap, in general, does not work for Chatterjee's rank correlation. In this paper, we provide proof of this issue under an additional independence assumption, and complement our theory with simulation evidence for general settings. Chatterjee's rank correlation thus falls into a category of statistics that are asymptotically normal but bootstrap inconsistent. Valid inferential methods in this case are Chatterjee's original proposal for testing independence and Lin & Han (2022) 's analytic asymptotic variance estimator for more general purposes.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139770362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymptotically constant risk estimator of the time-average variance constant 时间平均方差常数的渐近恒定风险估计器
IF 2.7 2区 数学 Q1 Mathematics Pub Date : 2024-02-03 DOI: 10.1093/biomet/asae003
K W Chan, C Y Yau
Summary Estimation of the time-average variance constant is important for statistical analyses involving dependent data. This problem is difficult as it relies on a bandwidth parameter. Specifically, the optimal choices of the bandwidths of all existing estimators depend on the estimand itself and another unknown parameter which is very difficult to estimate. Thus, optimal variance estimation is unachievable. In this paper, we introduce a concept of converging flat-top kernels for constructing variance estimators whose optimal bandwidths are free of unknown parameters asymptotically and hence can be computed easily. We prove that the new estimator has an asymptotically constant risk and is locally asymptotically minimax.
摘要 估算时间平均方差常数对于涉及从属数据的统计分析非常重要。这个问题很难解决,因为它依赖于一个带宽参数。具体来说,所有现有估计器带宽的最优选择都取决于估计变量本身和另一个很难估计的未知参数。因此,最优方差估计是无法实现的。在本文中,我们引入了收敛平顶核的概念,用于构建方差估计器,其最优带宽在渐近上不受未知参数的影响,因此可以轻松计算。我们证明了新的估计器具有渐近恒定的风险,并且是局部渐近最小的。
{"title":"Asymptotically constant risk estimator of the time-average variance constant","authors":"K W Chan, C Y Yau","doi":"10.1093/biomet/asae003","DOIUrl":"https://doi.org/10.1093/biomet/asae003","url":null,"abstract":"Summary Estimation of the time-average variance constant is important for statistical analyses involving dependent data. This problem is difficult as it relies on a bandwidth parameter. Specifically, the optimal choices of the bandwidths of all existing estimators depend on the estimand itself and another unknown parameter which is very difficult to estimate. Thus, optimal variance estimation is unachievable. In this paper, we introduce a concept of converging flat-top kernels for constructing variance estimators whose optimal bandwidths are free of unknown parameters asymptotically and hence can be computed easily. We prove that the new estimator has an asymptotically constant risk and is locally asymptotically minimax.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139678925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrika
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1