首页 > 最新文献

Journal of the Royal Statistical Society Series A-Statistics in Society最新文献

英文 中文
Spatio-temporal quasi-experimental methods for rare disease outcomes: the impact of reformulated gasoline on childhood haematologic cancer. 罕见疾病结果的时空准实验方法:重新配方汽油对儿童血液病癌症的影响。
IF 1.6 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-11-18 eCollection Date: 2025-10-01 DOI: 10.1093/jrsssa/qnae109
Sofia L Vega, Rachel C Nethery

Although some pollutants emitted in vehicle exhaust, such as benzene, are known to cause leukaemia in adults with high exposure levels, less is known about the relationship between traffic-related air pollution (TRAP) and childhood haematologic cancer. In the 1990s, the US EPA enacted the reformulated gasoline program in select areas of the U.S., which drastically reduced ambient TRAP in affected areas. This created an ideal quasi-experiment to study the effects of TRAP on childhood haematologic cancers. However, existing methods for quasi-experimental analyses can perform poorly when outcomes are rare and unstable, as with childhood cancer incidence. We develop Bayesian spatio-temporal matrix completion methods to conduct causal inference in quasi-experimental settings with rare outcomes. Selective information sharing across space and time enables stable estimation, and the Bayesian approach facilitates uncertainty quantification. We evaluate the methods through simulations and apply them to estimate the causal effects of TRAP on childhood leukaemia and lymphoma.

虽然人们知道汽车尾气中排放的一些污染物,如苯,会导致高暴露水平的成年人患上白血病,但人们对交通相关空气污染(TRAP)与儿童血液病癌症之间的关系知之甚少。在20世纪90年代,美国环保署在美国的一些地区颁布了重新配方的汽油计划,这大大减少了受影响地区的环境陷阱。这创造了一个理想的准实验来研究TRAP对儿童血液病癌症的影响。然而,现有的准实验分析方法在结果罕见且不稳定的情况下表现不佳,例如儿童癌症发病率。我们开发贝叶斯时空矩阵补全方法,在具有罕见结果的准实验设置中进行因果推理。跨空间和时间的选择性信息共享实现了稳定估计,贝叶斯方法促进了不确定性量化。我们通过模拟来评估这些方法,并应用它们来估计TRAP对儿童白血病和淋巴瘤的因果效应。
{"title":"Spatio-temporal quasi-experimental methods for rare disease outcomes: the impact of reformulated gasoline on childhood haematologic cancer.","authors":"Sofia L Vega, Rachel C Nethery","doi":"10.1093/jrsssa/qnae109","DOIUrl":"10.1093/jrsssa/qnae109","url":null,"abstract":"<p><p>Although some pollutants emitted in vehicle exhaust, such as benzene, are known to cause leukaemia in adults with high exposure levels, less is known about the relationship between traffic-related air pollution (TRAP) and childhood haematologic cancer. In the 1990s, the US EPA enacted the reformulated gasoline program in select areas of the U.S., which drastically reduced ambient TRAP in affected areas. This created an ideal quasi-experiment to study the effects of TRAP on childhood haematologic cancers. However, existing methods for quasi-experimental analyses can perform poorly when outcomes are rare and unstable, as with childhood cancer incidence. We develop Bayesian spatio-temporal matrix completion methods to conduct causal inference in quasi-experimental settings with rare outcomes. Selective information sharing across space and time enables stable estimation, and the Bayesian approach facilitates uncertainty quantification. We evaluate the methods through simulations and apply them to estimate the causal effects of TRAP on childhood leukaemia and lymphoma.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 4","pages":"1184-1202"},"PeriodicalIF":1.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503115/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Studying Chinese immigrants' spatial distribution in the Raleigh-Durham area by linking survey and commercial data using romanized names. 结合调查数据和商业数据,研究罗利-达勒姆地区华人移民的空间分布。
IF 1.6 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-10-23 eCollection Date: 2025-01-01 DOI: 10.1093/jrsssa/qnae107
Eric A Bai, Botao Ju, Madeleine Beckner, Jerome P Reiter, M Giovanna Merli, Ted Mouw

Many population surveys do not provide information on respondents' residential addresses, instead offering coarse geographies like zip code or higher aggregations. However, fine resolution geography can be beneficial for characterizing neighbourhoods, especially for relatively rare populations such as immigrants. One way to obtain such information is to link survey records to records in auxiliary databases that include residential addresses by matching on variables common to both files. We present an approach based on probabilistic record linkage that enables matching survey participants in the Chinese Immigrants in Raleigh-Durham Study to records from InfoUSA, an information provider of residential records. The two files use different Chinese name romanization practices, which we address through a novel and generalizable strategy for constructing records' pairwise comparison vectors for romanized names. Using a fully Bayesian record linkage model, we characterize the geospatial distribution of Chinese immigrants in the Raleigh-Durham area of North Carolina.

许多人口调查不提供受访者的居住地址信息,而是提供诸如邮政编码或更高的集合等粗略的地理位置信息。然而,精细分辨率的地理可以有利于表征社区,特别是相对罕见的人口,如移民。获得此类信息的一种方法是通过匹配两个文件的共同变量,将调查记录与包括居住地址在内的辅助数据库中的记录联系起来。我们提出了一种基于概率记录链接的方法,使罗利-达勒姆中国移民研究中的调查参与者能够与居住记录信息提供商InfoUSA的记录相匹配。这两个文件使用了不同的中文姓名罗马化做法,我们通过一种新颖的、可推广的策略来构建记录的罗马化姓名的两两比较向量来解决这个问题。利用全贝叶斯记录联系模型,研究了北卡罗莱纳州罗利-达勒姆地区中国移民的地理空间分布特征。
{"title":"Studying Chinese immigrants' spatial distribution in the Raleigh-Durham area by linking survey and commercial data using romanized names.","authors":"Eric A Bai, Botao Ju, Madeleine Beckner, Jerome P Reiter, M Giovanna Merli, Ted Mouw","doi":"10.1093/jrsssa/qnae107","DOIUrl":"10.1093/jrsssa/qnae107","url":null,"abstract":"<p><p>Many population surveys do not provide information on respondents' residential addresses, instead offering coarse geographies like zip code or higher aggregations. However, fine resolution geography can be beneficial for characterizing neighbourhoods, especially for relatively rare populations such as immigrants. One way to obtain such information is to link survey records to records in auxiliary databases that include residential addresses by matching on variables common to both files. We present an approach based on probabilistic record linkage that enables matching survey participants in the Chinese Immigrants in Raleigh-Durham Study to records from InfoUSA, an information provider of residential records. The two files use different Chinese name romanization practices, which we address through a novel and generalizable strategy for constructing records' pairwise comparison vectors for romanized names. Using a fully Bayesian record linkage model, we characterize the geospatial distribution of Chinese immigrants in the Raleigh-Durham area of North Carolina.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"84-97"},"PeriodicalIF":1.6,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728054/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparison of some existing and novel methods for integrating historical models to improve estimation of coefficients in logistic regression. 结合历史模型改进逻辑回归中系数估计的一些现有方法和新方法的比较。
IF 1.6 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-09-24 eCollection Date: 2025-01-01 DOI: 10.1093/jrsssa/qnae093
Philip S Boonstra, Pedro Orozco Del Pino

Model integration refers to the process of incorporating a fitted historical model into the estimation of a current study to increase statistical efficiency. Integration can be challenging when the current model includes new covariates, leading to potential model misspecification. We present and evaluate seven existing and novel model integration techniques, which employ both likelihood constraints and Bayesian informative priors. Using a simulation study of logistic regression, we quantify how efficiency-assessed by bias and variance-changes with the sample sizes of both historical and current studies and in response to violations to transportability assumptions. We also apply these methods to a case study in which the goal is to use novel predictors to update a risk prediction model for in-hospital mortality among pediatric extracorporeal membrane oxygenation patients. Our simulation study and case study suggest that (i) when historical sample size is small, accounting for this statistical uncertainty is more efficient; (ii) all methods lose efficiency when there exist differences between the historical and current data-generating mechanisms; (iii) additional shrinkage to zero can improve efficiency in higher-dimensional settings but at the cost of bias in estimation.

模型整合是指将拟合的历史模型纳入当前研究的估计中以提高统计效率的过程。当当前模型包含新的协变量时,集成可能具有挑战性,从而导致潜在的模型规格错误。我们提出并评估了七种现有的和新颖的模型集成技术,它们同时采用了似然约束和贝叶斯信息先验。通过对逻辑回归的模拟研究,我们量化了效率(通过偏差和方差评估)如何随着历史和当前研究的样本量以及对可运输性假设的违反而变化。我们还将这些方法应用于一个案例研究,目的是使用新的预测因子来更新儿科体外膜氧合患者住院死亡率的风险预测模型。我们的模拟研究和案例研究表明:(i)当历史样本量较小时,对这种统计不确定性的考虑更有效;当历史数据生成机制和当前数据生成机制存在差异时,所有方法都失去效率;(iii)额外的零收缩可以提高高维环境下的效率,但代价是估计偏差。
{"title":"A comparison of some existing and novel methods for integrating historical models to improve estimation of coefficients in logistic regression.","authors":"Philip S Boonstra, Pedro Orozco Del Pino","doi":"10.1093/jrsssa/qnae093","DOIUrl":"10.1093/jrsssa/qnae093","url":null,"abstract":"<p><p>Model integration refers to the process of incorporating a fitted historical model into the estimation of a current study to increase statistical efficiency. Integration can be challenging when the current model includes new covariates, leading to potential model misspecification. We present and evaluate seven existing and novel model integration techniques, which employ both likelihood constraints and Bayesian informative priors. Using a simulation study of logistic regression, we quantify how efficiency-assessed by bias and variance-changes with the sample sizes of both historical and current studies and in response to violations to transportability assumptions. We also apply these methods to a case study in which the goal is to use novel predictors to update a risk prediction model for in-hospital mortality among pediatric extracorporeal membrane oxygenation patients. Our simulation study and case study suggest that (i) when historical sample size is small, accounting for this statistical uncertainty is more efficient; (ii) all methods lose efficiency when there exist differences between the historical and current data-generating mechanisms; (iii) additional shrinkage to zero can improve efficiency in higher-dimensional settings but at the cost of bias in estimation.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"46-67"},"PeriodicalIF":1.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728056/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Doubly robust machine learning-based estimation methods for instrumental variables with an application to surgical care for cholecystitis. 基于双鲁棒机器学习的工具变量估计方法及其在胆囊炎手术护理中的应用。
IF 1.6 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-09-24 DOI: 10.1093/jrsssa/qnae089
Kenta Takatsu, Alexander W Levis, Edward Kennedy, Rachel Kelz, Luke Keele

Comparative effectiveness research frequently employs the instrumental variable design since randomized trials can be infeasible for many reasons. In this study, we investigate treatments for emergency cholecystitis-inflammation of the gallbladder. A standard treatment for cholecystitis is surgical removal of the gallbladder, while alternative non-surgical treatments include managed care and pharmaceutical options. As randomized trials are judged to violate the principle of equipoise, we consider an instrument for operative care: the surgeon's tendency to operate. Standard instrumental variable estimation methods, however, often rely on parametric models that are prone to bias from model misspecification. Thus, we outline instrumental variable methods based on the doubly robust machine learning framework. These methods enable us to employ various machine learning techniques, delivering consistent estimates, and permitting valid inference on various estimands. We use these methods to estimate the primary target estimand in an instrumental variable design. Additionally, we expand these methods to develop new estimators for heterogeneous causal effects, profiling principal strata, and sensitivity analyses for a key instrumental variable assumption. We conduct a simulation study to demonstrate scenarios where more flexible estimation methods outperform standard methods. Our findings indicate that operative care is generally more effective for cholecystitis patients, although the benefits of surgery can be less pronounced for key patient subgroups.

比较有效性研究经常采用工具变量设计,因为随机试验可能由于许多原因而不可行的。在本研究中,我们探讨急诊胆囊炎-胆囊炎症的治疗方法。胆囊炎的标准治疗是手术切除胆囊,而其他非手术治疗包括管理护理和药物治疗。当随机试验被判定为违反平衡原则时,我们考虑一个手术护理的工具:外科医生的手术倾向。然而,标准的工具变量估计方法往往依赖于参数模型,容易因模型规格错误而产生偏差。因此,我们概述了基于双鲁棒机器学习框架的工具变量方法。这些方法使我们能够使用各种机器学习技术,提供一致的估计,并允许对各种估计进行有效的推断。我们使用这些方法来估计工具变量设计中的主要目标估计。此外,我们扩展了这些方法,以开发新的非均匀因果效应估计器,剖面主地层,以及关键工具变量假设的敏感性分析。我们进行了模拟研究,以演示更灵活的估计方法优于标准方法的场景。我们的研究结果表明,手术治疗通常对胆囊炎患者更有效,尽管手术对关键患者亚组的益处可能不太明显。
{"title":"Doubly robust machine learning-based estimation methods for instrumental variables with an application to surgical care for cholecystitis.","authors":"Kenta Takatsu, Alexander W Levis, Edward Kennedy, Rachel Kelz, Luke Keele","doi":"10.1093/jrsssa/qnae089","DOIUrl":"10.1093/jrsssa/qnae089","url":null,"abstract":"<p><p>Comparative effectiveness research frequently employs the instrumental variable design since randomized trials can be infeasible for many reasons. In this study, we investigate treatments for emergency <i>cholecystitis</i>-inflammation of the gallbladder. A standard treatment for cholecystitis is surgical removal of the gallbladder, while alternative non-surgical treatments include managed care and pharmaceutical options. As randomized trials are judged to violate the principle of equipoise, we consider an instrument for operative care: the surgeon's tendency to operate. Standard instrumental variable estimation methods, however, often rely on parametric models that are prone to bias from model misspecification. Thus, we outline instrumental variable methods based on the doubly robust machine learning framework. These methods enable us to employ various machine learning techniques, delivering consistent estimates, and permitting valid inference on various estimands. We use these methods to estimate the primary target estimand in an instrumental variable design. Additionally, we expand these methods to develop new estimators for heterogeneous causal effects, profiling principal strata, and sensitivity analyses for a key instrumental variable assumption. We conduct a simulation study to demonstrate scenarios where more flexible estimation methods outperform standard methods. Our findings indicate that operative care is generally more effective for cholecystitis patients, although the benefits of surgery can be less pronounced for key patient subgroups.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144692227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal risk-assessment scheduling for primary prevention of cardiovascular disease. 心血管疾病一级预防的最佳风险评估调度。
IF 1.5 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-09-17 eCollection Date: 2025-07-01 DOI: 10.1093/jrsssa/qnae086
Francesca Gasperoni, Christopher H Jackson, Angela M Wood, Michael J Sweeting, Paul J Newcombe, David Stevens, Jessica K Barrett

In this work, we introduce a personalized and age-specific net benefit function, composed of benefits and costs, to recommend optimal timing of risk assessments for cardiovascular disease (CVD) prevention. We extend the 2-stage landmarking model to estimate patient-specific CVD risk profiles, adjusting for time-varying covariates. We apply our model to data from the Clinical Practice Research Datalink, comprising primary care electronic health records from the UK. We find that people at lower risk could be recommended an optimal risk-assessment interval of 5 years or more. Time-varying risk factors are required to discriminate between more frequent schedules for high-risk people.

在这项工作中,我们引入了一个个性化和特定年龄的净收益函数,由收益和成本组成,以推荐心血管疾病(CVD)预防风险评估的最佳时机。我们扩展了两阶段里程碑模型,以估计患者特定的心血管疾病风险概况,调整时变协变量。我们将我们的模型应用于临床实践研究数据链的数据,包括来自英国的初级保健电子健康记录。我们发现风险较低的人可以推荐5年或更长时间的最佳风险评估间隔。需要有时变的风险因素来区分高风险人群更频繁的日程安排。
{"title":"Optimal risk-assessment scheduling for primary prevention of cardiovascular disease.","authors":"Francesca Gasperoni, Christopher H Jackson, Angela M Wood, Michael J Sweeting, Paul J Newcombe, David Stevens, Jessica K Barrett","doi":"10.1093/jrsssa/qnae086","DOIUrl":"10.1093/jrsssa/qnae086","url":null,"abstract":"<p><p>In this work, we introduce a personalized and age-specific net benefit function, composed of benefits and costs, to recommend optimal timing of risk assessments for cardiovascular disease (CVD) prevention. We extend the 2-stage landmarking model to estimate patient-specific CVD risk profiles, adjusting for time-varying covariates. We apply our model to data from the Clinical Practice Research Datalink, comprising primary care electronic health records from the UK. We find that people at lower risk could be recommended an optimal risk-assessment interval of 5 years or more. Time-varying risk factors are required to discriminate between more frequent schedules for high-risk people.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 3","pages":"920-934"},"PeriodicalIF":1.5,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12256122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating testing volume into bandit algorithms for infectious disease surveillance. 将测试量整合到传染病监测的强盗算法中。
IF 1.6 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-09-16 eCollection Date: 2025-10-01 DOI: 10.1093/jrsssa/qnae090
Joshua L Warren, Ottavia Prunas, A David Paltiel, Thomas Thornhill, Gregg S Gonsalves

Mobile testing services provide opportunities for active surveillance of infectious diseases for hard-to-reach and/or high-risk individuals who do not know their disease status. Identifying as many infected individuals as possible is important for mitigating disease transmission. Recently, multi-armed bandit sampling approaches have been adapted and applied in this setting to maximize the cumulative number of positive tests collected over time. However, these algorithms have not considered the possibility of variability in the number of tests administered across testing sites. What impact this variability has on the ability of these approaches to maximize yield is currently unknown. Therefore, we investigate this question by extending existing sampling frameworks to directly account for variability in testing volume while also maintaining the computational tractability of the previous methods. Through a simulation study based on human immunodeficiency virus infection characteristics in the Republic of the Congo (Congo-Brazzaville) as well as an application to COVID-19 testing data in Connecticut, we find improved long- and short-term performances of the new methods compared to several existing approaches. Based on these findings and the ease of computation, we recommend use of the newly developed methods for active surveillance of infectious diseases when variability in testing volume may be present.

移动检测服务为交通不便和/或不知道自己疾病状况的高危人群提供了主动监测传染病的机会。确定尽可能多的受感染个体对于减轻疾病传播非常重要。最近,在这种情况下采用了多臂土匪取样方法,以最大限度地增加长期收集的阳性检测的累积数量。然而,这些算法没有考虑到在测试地点进行的测试数量的可变性的可能性。这种可变性对这些方法实现产量最大化的能力有什么影响目前尚不清楚。因此,我们通过扩展现有的采样框架来研究这个问题,以直接考虑测试量的可变性,同时保持以前方法的计算可追溯性。通过一项基于刚果共和国(刚果-布拉柴维尔)人类免疫缺陷病毒感染特征的模拟研究以及对康涅狄格州COVID-19检测数据的应用,我们发现与几种现有方法相比,新方法的长期和短期性能都有所提高。基于这些发现和计算的便利性,我们建议在检测量可能存在变异性的情况下,使用新开发的方法对传染病进行主动监测。
{"title":"Integrating testing volume into bandit algorithms for infectious disease surveillance.","authors":"Joshua L Warren, Ottavia Prunas, A David Paltiel, Thomas Thornhill, Gregg S Gonsalves","doi":"10.1093/jrsssa/qnae090","DOIUrl":"10.1093/jrsssa/qnae090","url":null,"abstract":"<p><p>Mobile testing services provide opportunities for active surveillance of infectious diseases for hard-to-reach and/or high-risk individuals who do not know their disease status. Identifying as many infected individuals as possible is important for mitigating disease transmission. Recently, multi-armed bandit sampling approaches have been adapted and applied in this setting to maximize the cumulative number of positive tests collected over time. However, these algorithms have not considered the possibility of variability in the number of tests administered across testing sites. What impact this variability has on the ability of these approaches to maximize yield is currently unknown. Therefore, we investigate this question by extending existing sampling frameworks to directly account for variability in testing volume while also maintaining the computational tractability of the previous methods. Through a simulation study based on human immunodeficiency virus infection characteristics in the Republic of the Congo (Congo-Brazzaville) as well as an application to COVID-19 testing data in Connecticut, we find improved long- and short-term performances of the new methods compared to several existing approaches. Based on these findings and the ease of computation, we recommend use of the newly developed methods for active surveillance of infectious diseases when variability in testing volume may be present.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 4","pages":"1029-1043"},"PeriodicalIF":1.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synthesis estimators for transportability with positivity violations by a continuous covariate. 连续协变量具有正违反的可运性的综合估计。
IF 1.6 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-09-02 eCollection Date: 2025-01-01 DOI: 10.1093/jrsssa/qnae084
Paul N Zivich, Jessie K Edwards, Bonnie E Shook-Sa, Eric T Lofgren, Justin Lessler, Stephen R Cole

Studies intended to estimate the effect of a treatment, like randomized trials, may not be sampled from the desired target population. To correct for this discrepancy, estimates can be transported to the target population. Methods for transporting between populations are often premised on a positivity assumption, such that all relevant covariate patterns in one population are also present in the other. However, eligibility criteria, particularly in the case of trials, can result in violations of positivity when transporting to external populations. To address nonpositivity, a synthesis of statistical and mathematical models can be considered. This approach integrates multiple data sources (e.g. trials, observational, pharmacokinetic studies) to estimate treatment effects, leveraging mathematical models to handle positivity violations. This approach was previously demonstrated for positivity violations by a single binary covariate. Here, we extend the synthesis approach for positivity violations with a continuous covariate. For estimation, two novel augmented inverse probability weighting estimators are proposed. Both estimators are contrasted with other common approaches for addressing nonpositivity. Empirical performance is compared via Monte Carlo simulation. Finally, the competing approaches are illustrated with an example in the context of two-drug vs. one-drug antiretroviral therapy on CD4 T cell counts among women with HIV.

旨在评估治疗效果的研究,如随机试验,可能不会从预期的目标人群中取样。为了纠正这种差异,可以将估计值传递给目标人群。种群间迁移的方法通常以正假设为前提,即一个种群中的所有相关协变量模式也存在于另一个种群中。但是,资格标准,特别是在试验的情况下,可能导致在向外部人口运送时违反阳性。为了解决非正性,可以考虑综合统计和数学模型。这种方法整合了多种数据来源(如试验、观察、药代动力学研究)来估计治疗效果,利用数学模型来处理阳性违规行为。这种方法以前被证明是由一个单一的二进制协变量的正违规。在这里,我们扩展了具有连续协变量的正违反的综合方法。在估计方面,提出了两种新的增广逆概率加权估计。这两种估计都与其他处理非正性的常用方法进行了对比。通过蒙特卡罗仿真比较了经验性能。最后,以双药与单药抗逆转录病毒治疗对感染艾滋病毒的妇女CD4 T细胞计数的影响为例,说明了相互竞争的方法。
{"title":"Synthesis estimators for transportability with positivity violations by a continuous covariate.","authors":"Paul N Zivich, Jessie K Edwards, Bonnie E Shook-Sa, Eric T Lofgren, Justin Lessler, Stephen R Cole","doi":"10.1093/jrsssa/qnae084","DOIUrl":"10.1093/jrsssa/qnae084","url":null,"abstract":"<p><p>Studies intended to estimate the effect of a treatment, like randomized trials, may not be sampled from the desired target population. To correct for this discrepancy, estimates can be transported to the target population. Methods for transporting between populations are often premised on a positivity assumption, such that all relevant covariate patterns in one population are also present in the other. However, eligibility criteria, particularly in the case of trials, can result in violations of positivity when transporting to external populations. To address nonpositivity, a synthesis of statistical and mathematical models can be considered. This approach integrates multiple data sources (e.g. trials, observational, pharmacokinetic studies) to estimate treatment effects, leveraging mathematical models to handle positivity violations. This approach was previously demonstrated for positivity violations by a single binary covariate. Here, we extend the synthesis approach for positivity violations with a continuous covariate. For estimation, two novel augmented inverse probability weighting estimators are proposed. Both estimators are contrasted with other common approaches for addressing nonpositivity. Empirical performance is compared via Monte Carlo simulation. Finally, the competing approaches are illustrated with an example in the context of two-drug vs. one-drug antiretroviral therapy on CD4 T cell counts among women with HIV.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"158-180"},"PeriodicalIF":1.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian spatial-temporal varying coefficients model for estimating excess deaths associated with respiratory infections. 用于估计与呼吸道感染相关的超额死亡的贝叶斯时空变系数模型。
IF 1.5 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-08-19 eCollection Date: 2025-07-01 DOI: 10.1093/jrsssa/qnae079
Yuzi Zhang, Howard H Chang, Angela D Iuliano, Carrie Reed

Disease surveillance data are used for monitoring and understanding disease burden, which provides valuable information in allocating health programme resources. Statistical methods play an important role in estimating disease burden since disease surveillance systems are prone to undercounting. This paper is motivated by the challenge of estimating mortality associated with respiratory infections (e.g. influenza and COVID-19) that are not ascertained from death certificates. We propose a Bayesian spatial-temporal model incorporating measures of infection activity to estimate excess deaths. Particularly, the inclusion of time-varying coefficients allows us to better characterize associations between infection activity and mortality counts time series. Software to implement this method is available in the R package NBRegAD. Applying our modelling framework to weekly state-wide COVID-19 data in the US from 8 March 2020 to 3 July 2022, we identified temporal and spatial differences in excess deaths between different age groups. We estimated the total number of COVID-19 deaths in the US to be 1,168,481 (95% CI: 1,148,953 1,187,187) compared to the 1,022,147 from using only death certificate information. The analysis also suggests that the most severe undercounting was in the 18-49 years age group with an estimated underascertainment rate of 0.21 (95% CI: 0.16, 0.25).

疾病监测数据用于监测和了解疾病负担,这为分配卫生规划资源提供了宝贵信息。统计方法在估计疾病负担方面发挥重要作用,因为疾病监测系统容易漏报。这篇论文的动机是由于估计与呼吸道感染(如流感和COVID-19)相关的死亡率的挑战,而这些感染无法从死亡证明中确定。我们提出了一个贝叶斯时空模型,结合感染活动的措施来估计超额死亡。特别是,纳入时变系数使我们能够更好地表征感染活动和死亡率计数时间序列之间的关联。实现这种方法的软件可以在R包NBRegAD中找到。将我们的建模框架应用于2020年3月8日至2022年7月3日期间美国每周的COVID-19全州数据,我们确定了不同年龄组之间超额死亡的时空差异。我们估计美国COVID-19死亡总人数为1,168,481人(95% CI: 1,148,953 1,187,187),而仅使用死亡证明信息的死亡人数为1,022147人。分析还表明,最严重的漏报发生在18-49岁年龄组,估计漏报率为0.21 (95% CI: 0.16, 0.25)。
{"title":"A Bayesian spatial-temporal varying coefficients model for estimating excess deaths associated with respiratory infections.","authors":"Yuzi Zhang, Howard H Chang, Angela D Iuliano, Carrie Reed","doi":"10.1093/jrsssa/qnae079","DOIUrl":"10.1093/jrsssa/qnae079","url":null,"abstract":"<p><p>Disease surveillance data are used for monitoring and understanding disease burden, which provides valuable information in allocating health programme resources. Statistical methods play an important role in estimating disease burden since disease surveillance systems are prone to undercounting. This paper is motivated by the challenge of estimating mortality associated with respiratory infections (e.g. influenza and COVID-19) that are not ascertained from death certificates. We propose a Bayesian spatial-temporal model incorporating measures of infection activity to estimate excess deaths. Particularly, the inclusion of time-varying coefficients allows us to better characterize associations between infection activity and mortality counts time series. Software to implement this method is available in the R package NBRegAD. Applying our modelling framework to weekly state-wide COVID-19 data in the US from 8 March 2020 to 3 July 2022, we identified temporal and spatial differences in excess deaths between different age groups. We estimated the total number of COVID-19 deaths in the US to be 1,168,481 (95% CI: 1,148,953 1,187,187) compared to the 1,022,147 from using only death certificate information. The analysis also suggests that the most severe undercounting was in the 18-49 years age group with an estimated underascertainment rate of 0.21 (95% CI: 0.16, 0.25).</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 3","pages":"843-858"},"PeriodicalIF":1.5,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12256124/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-integration with pseudoweights and survey-calibration: application to developing US-representative lung cancer risk models for use in screening. 数据整合与伪权和调查校准:应用于开发具有美国代表性的肺癌风险模型用于筛查。
IF 1.5 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-07-12 eCollection Date: 2025-01-01 DOI: 10.1093/jrsssa/qnae059
Lingxiao Wang, Yan Li, Barry I Graubard, Hormuzd A Katki

Accurate cancer risk estimation is crucial to clinical decision-making, such as identifying high-risk people for screening. However, most existing cancer risk models incorporate data from epidemiologic studies, which usually cannot represent the target population. While population-based health surveys are ideal for making inference to the target population, they typically do not collect time-to-cancer incidence data. Instead, time-to-cancer specific mortality is often readily available on surveys via linkage to vital statistics. We develop calibrated pseudoweighting methods that integrate individual-level data from a cohort and a survey, and summary statistics of cancer incidence from national cancer registries. By leveraging individual-level cancer mortality data in the survey, the proposed methods impute time-to-cancer incidence for survey sample individuals and use survey calibration with auxiliary variables of influence functions generated from Cox regression to improve robustness and efficiency of the inverse-propensity pseudoweighting method in estimating pure risks. We develop a lung cancer incidence pure risk model from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial using our proposed methods by integrating data from the National Health Interview Survey and cancer registries.

准确的癌症风险评估对临床决策至关重要,例如确定需要筛查的高危人群。然而,大多数现有的癌症风险模型纳入了来自流行病学研究的数据,这些数据通常不能代表目标人群。虽然以人口为基础的健康调查对于推断目标人群是理想的,但它们通常不收集癌症发病时间的数据。相反,通过与生命统计数据的联系,通常可以很容易地从调查中获得癌症特定时间的死亡率。我们开发了校准的伪加权方法,整合了来自队列和调查的个人水平数据,以及来自国家癌症登记处的癌症发病率汇总统计数据。通过利用调查中个体水平的癌症死亡率数据,提出的方法为调查样本个体估算癌症发病时间,并使用Cox回归产生的影响函数辅助变量对调查进行校准,以提高反倾向伪加权法在估计纯风险方面的稳健性和效率。我们通过整合来自全国健康访谈调查和癌症登记处的数据,采用我们提出的方法,从前列腺癌、肺癌、结直肠癌和卵巢癌筛查试验中建立了肺癌发病率纯风险模型。
{"title":"Data-integration with pseudoweights and survey-calibration: application to developing US-representative lung cancer risk models for use in screening.","authors":"Lingxiao Wang, Yan Li, Barry I Graubard, Hormuzd A Katki","doi":"10.1093/jrsssa/qnae059","DOIUrl":"10.1093/jrsssa/qnae059","url":null,"abstract":"<p><p>Accurate cancer risk estimation is crucial to clinical decision-making, such as identifying high-risk people for screening. However, most existing cancer risk models incorporate data from epidemiologic studies, which usually cannot represent the target population. While population-based health surveys are ideal for making inference to the target population, they typically do not collect time-to-cancer incidence data. Instead, time-to-cancer specific mortality is often readily available on surveys via linkage to vital statistics. We develop calibrated pseudoweighting methods that integrate individual-level data from a cohort and a survey, and summary statistics of cancer incidence from national cancer registries. By leveraging individual-level cancer mortality data in the survey, the proposed methods impute time-to-cancer incidence for survey sample individuals and use survey calibration with auxiliary variables of influence functions generated from Cox regression to improve robustness and efficiency of the inverse-propensity pseudoweighting method in estimating pure risks. We develop a lung cancer incidence pure risk model from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial using our proposed methods by integrating data from the National Health Interview Survey and cancer registries.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"119-139"},"PeriodicalIF":1.5,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728053/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A framework for understanding selection bias in real-world healthcare data. 了解真实世界医疗数据中选择偏差的框架。
IF 1.5 3区 数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS Pub Date : 2024-05-02 eCollection Date: 2024-08-01 DOI: 10.1093/jrsssa/qnae039
Ritoban Kundu, Xu Shi, Jean Morrison, Jessica Barrett, Bhramar Mukherjee

Using administrative patient-care data such as Electronic Health Records (EHR) and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect estimates of the association between a binary outcome and an exposure (continuous or categorical) of interest. We consider four easy-to-implement weighting approaches to reduce selection bias with accompanying variance formulae. We demonstrate through a simulation study when they can rescue us in practice with analysis of real-world data. We compare these methods using a data example where our goal is to estimate the well-known association of cancer and biological sex, using EHR from a longitudinal biorepository at the University of Michigan Healthcare system. We provide annotated R codes to implement these weighted methods with associated inference.

利用电子健康记录(EHR)和医疗/药品报销单等患者护理管理数据进行基于人群的科学研究已变得越来越普遍。庞大的样本量会导致极小的标准误差,因此研究人员需要更多地关注相关联参数估计中的潜在偏差,特别是那些不会随着样本量的增加而减少的偏差。在这些多种偏差来源中,我们在本文中将重点了解选择偏差。我们提出了一个使用有向无环图的分析框架,用于指导应用研究人员剖析不同来源的选择偏倚如何影响二元结果与相关暴露(连续或分类)之间关联的估计值。我们考虑了四种易于实施的加权方法来减少选择偏差,并给出了相应的方差公式。我们通过一项模拟研究来证明,在实际分析真实世界数据时,这些方法何时能拯救我们。我们使用一个数据示例来比较这些方法,我们的目标是利用密歇根大学医疗保健系统纵向生物库中的电子病历来估计众所周知的癌症与生理性别的关联。我们提供了附有注释的 R 代码,以实现这些加权方法和相关推断。
{"title":"A framework for understanding selection bias in real-world healthcare data.","authors":"Ritoban Kundu, Xu Shi, Jean Morrison, Jessica Barrett, Bhramar Mukherjee","doi":"10.1093/jrsssa/qnae039","DOIUrl":"10.1093/jrsssa/qnae039","url":null,"abstract":"<p><p>Using administrative patient-care data such as Electronic Health Records (EHR) and medical/pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect estimates of the association between a binary outcome and an exposure (continuous or categorical) of interest. We consider four easy-to-implement weighting approaches to reduce selection bias with accompanying variance formulae. We demonstrate through a simulation study when they can rescue us in practice with analysis of real-world data. We compare these methods using a data example where our goal is to estimate the well-known association of cancer and biological sex, using EHR from a longitudinal biorepository at the University of Michigan Healthcare system. We provide annotated R codes to implement these weighted methods with associated inference.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"187 3","pages":"606-635"},"PeriodicalIF":1.5,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11393555/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142299713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the Royal Statistical Society Series A-Statistics in Society
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1