Pub Date : 2025-07-14Epub Date: 2024-08-20DOI: 10.1093/jrsssa/qnae080
Gabrielle Virgili-Gervais, Alexandra M Schmidt, Honor Bixby, Alicia Cavanaugh, George Owusu, Samuel Agyei-Mensah, Brian Robinson, Jill Baumgartner
We propose a Bayesian hierarchical model to estimate a socio-economic status (SES) index based on mixed dichotomous and continuous variables. In particular, we extend Quinn's ([2004]. Bayesian factor analysis for mixed ordinal and continuous responses. Political Analysis, 12(4), 338-353. https://doi.org/10.1093/pan/mph022) and Schliep and Hoeting's ([2013]. Multilevel latent Gaussian process model for mixed discrete and continuous multivariate response data. Journal of Agricultural, Biological, and Environmental Statistics, 18(4), 492-513. https://doi.org/10.1007/s13253-013-0136-z) factor analysis models for mixed dichotomous and continuous variables by allowing a spatial hierarchical structure of key parameters of the model. Unlike most SES assessment models proposed in the literature, the hierarchical nature of this model enables the use of census observations at the household level without needing to aggregate any information a priori. Therefore, it better accommodates the variability of the SES between census tracts and the number of households per area. The proposed model is used in the estimation of a socio-economic index using 10% of the 2010 Ghana census in the Greater Accra Metropolitan area. Out of the 20 observed variables, the number of people per room, access to water piping and flushable toilets differentiated high and low SES areas the best.
我们提出了一个基于混合二分类变量和连续变量的贝叶斯层次模型来估计社会经济地位(SES)指数。特别地,我们扩展了Quinn的([2004])。混合有序和连续响应的贝叶斯因子分析。政治分析,12(4),338-353。https://doi.org/10.1093/pan/mph022)和Schliep and Hoeting的[2013]。混合离散和连续多元响应数据的多水平隐高斯过程模型。农业生物与环境统计,18(4),492-513。https://doi.org/10.1007/s13253-013-0136-z)因子分析模型的混合二分类和连续变量,允许一个空间层次结构的关键参数的模型。与文献中提出的大多数社会经济地位评估模型不同,该模型的分层性质使其能够在家庭层面上使用人口普查观察结果,而无需先验地汇总任何信息。因此,它更好地适应了人口普查区之间的社会经济地位和每个地区的家庭数量的变化。该模型使用2010年加纳大阿克拉大都会地区人口普查数据的10%来估计社会经济指数。在观察到的20个变量中,每个房间的人数、是否有水管和可冲水厕所是区分高SES和低SES区域的最好方法。
{"title":"Mapping socio-economic status using mixed data: a hierarchical Bayesian approach.","authors":"Gabrielle Virgili-Gervais, Alexandra M Schmidt, Honor Bixby, Alicia Cavanaugh, George Owusu, Samuel Agyei-Mensah, Brian Robinson, Jill Baumgartner","doi":"10.1093/jrsssa/qnae080","DOIUrl":"10.1093/jrsssa/qnae080","url":null,"abstract":"<p><p>We propose a Bayesian hierarchical model to estimate a socio-economic status (SES) index based on mixed dichotomous and continuous variables. In particular, we extend Quinn's ([2004]. Bayesian factor analysis for mixed ordinal and continuous responses. <i>Political Analysis, 12</i>(4), 338-353. https://doi.org/10.1093/pan/mph022) and Schliep and Hoeting's ([2013]. Multilevel latent Gaussian process model for mixed discrete and continuous multivariate response data. <i>Journal of Agricultural, Biological, and Environmental Statistics, 18</i>(4), 492-513. https://doi.org/10.1007/s13253-013-0136-z) factor analysis models for mixed dichotomous and continuous variables by allowing a spatial hierarchical structure of key parameters of the model. Unlike most SES assessment models proposed in the literature, the hierarchical nature of this model enables the use of census observations at the household level without needing to aggregate any information <i>a priori</i>. Therefore, it better accommodates the variability of the SES between census tracts and the number of households per area. The proposed model is used in the estimation of a socio-economic index using 10% of the 2010 Ghana census in the Greater Accra Metropolitan area. Out of the 20 observed variables, the number of people per room, access to water piping and flushable toilets differentiated high and low SES areas the best.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":"859-874"},"PeriodicalIF":1.5,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7617442/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143544329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-01Epub Date: 2024-09-02DOI: 10.1093/jrsssa/qnae082
Erin Hartman, Chad Hazlett, Ciara Sterbenz
With the precipitous decline in response rates, researchers and pollsters have been left with highly nonrepresentative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables must be adjusted for, they rarely defend particular functional forms relating these variables to the response process or the outcome. Unfortunately, commonly used calibration weights-which make the weighted mean of in the sample equal that of the population-only ensure correct adjustment when the portion of the outcome and the response process left unexplained by linear functions of are independent. To alleviate this functional form dependency, we describe kernel balancing for population weighting (kpop). This approach replaces the design matrix with a kernel matrix, encoding high-order information about . Weights are then found to make the weighted average row of among sampled units approximately equal to that of the target population. This produces good calibration on a wide range of smooth functions of , without relying on the user to decide which or what functions of them to include. We describe the method and illustrate it by application to polling data from the 2016 US presidential election.
{"title":"<i>kpop</i>: a kernel balancing approach for reducing specification assumptions in survey weighting.","authors":"Erin Hartman, Chad Hazlett, Ciara Sterbenz","doi":"10.1093/jrsssa/qnae082","DOIUrl":"10.1093/jrsssa/qnae082","url":null,"abstract":"<p><p>With the precipitous decline in response rates, researchers and pollsters have been left with highly nonrepresentative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables <math><mrow><mi>X</mi></mrow> </math> must be adjusted for, they rarely defend particular functional forms relating these variables to the response process or the outcome. Unfortunately, commonly used calibration weights-which make the weighted mean of <math><mrow><mi>X</mi></mrow> </math> in the sample equal that of the population-only ensure correct adjustment when the portion of the outcome and the response process left unexplained by linear functions of <math><mrow><mi>X</mi></mrow> </math> are independent. To alleviate this functional form dependency, we describe kernel balancing for population weighting (<i>kpop</i>). This approach replaces the design matrix <math><mrow><mtext>X</mtext></mrow> </math> with a kernel matrix, <math><mrow><mtext>K</mtext></mrow> </math> encoding high-order information about <math><mrow><mtext>X</mtext></mrow> </math> . Weights are then found to make the weighted average row of <math><mrow><mtext>K</mtext></mrow> </math> among sampled units approximately equal to that of the target population. This produces good calibration on a wide range of smooth functions of <math><mrow><mi>X</mi></mrow> </math> , without relying on the user to decide which <math><mrow><mi>X</mi></mrow> </math> or what functions of them to include. We describe the method and illustrate it by application to polling data from the 2016 US presidential election.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 3","pages":"875-895"},"PeriodicalIF":1.6,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12352454/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark Louie Ramos, Barry Graubard, Joseph Gastwirth
Different methods for describing health disparities in the distributions of continuous measured health-related variables among groups provide more insight into the nature and impact of the disparities than comparing measures of central tendency. Transformations of the Lorenz curve and analogues of the Gini index used in the analysis of income inequality are adapted to provide graphical and analytical measures of health disparities. Akin to the classical Peters-Belson regression method for partitioning a disparity into a component explained by group differences in a set of covariates and an unexplained component, a new modified Lorenz curve is proposed. The estimation of these curves/measures is adapted for data obtained from surveys with complex sample weighted designs. The statistical properties of sample weighted estimators of the proposed measures and their bootstrap variances are explored through simulation studies. Applications are demonstrated using BMI and blood lead levels among race/ethnic groups of adult females and children, respectively, from the 2013-2018 and 1988-1994 US National Health and Nutrition Examination Surveys. Another application examines disparities in distance to nearest acute care hospital among census blocks in the US state of New York grouped by their level of urbanicity using US census data and the American Hospital Association survey.
{"title":"Graphical displays and related statistical measures of health disparities between groups in complex sample surveys.","authors":"Mark Louie Ramos, Barry Graubard, Joseph Gastwirth","doi":"10.1093/jrsssa/qnaf044","DOIUrl":"10.1093/jrsssa/qnaf044","url":null,"abstract":"<p><p>Different methods for describing health disparities in the distributions of continuous measured health-related variables among groups provide more insight into the nature and impact of the disparities than comparing measures of central tendency. Transformations of the Lorenz curve and analogues of the Gini index used in the analysis of income inequality are adapted to provide graphical and analytical measures of health disparities. Akin to the classical Peters-Belson regression method for partitioning a disparity into a component explained by group differences in a set of covariates and an unexplained component, a new modified Lorenz curve is proposed. The estimation of these curves/measures is adapted for data obtained from surveys with complex sample weighted designs. The statistical properties of sample weighted estimators of the proposed measures and their bootstrap variances are explored through simulation studies. Applications are demonstrated using BMI and blood lead levels among race/ethnic groups of adult females and children, respectively, from the 2013-2018 and 1988-1994 US National Health and Nutrition Examination Surveys. Another application examines disparities in distance to nearest acute care hospital among census blocks in the US state of New York grouped by their level of urbanicity using US census data and the American Hospital Association survey.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12341090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chun-Che Wen, Rajib Paul, Kelly J Hunt, A James O'Malley, Hong Li, Elizabeth Hill, Angela M Malek, Brian Neelon
Cardiometabolic risk factors (CRFs) during pregnancy are early indicators of maternal diseases, such as stroke and type 2 diabetes. The total number of CRFs typically takes the form of binomial counts that exhibit overdispersion and zero inflation due to correlations among the underlying CRFs. Motivated by an examination of spatiotemporal trends in five CRFs among pregnant women in the U.S. state of South Carolina during the COVID-19 pandemic, we developed a zero-inflated beta-binomial model within a spatiotemporal framework. This model combines a point mass at zero to account for zero inflation and a beta-binomial distribution to model the remaining CRF counts. Given the notable racial disparities in CRFs during pregnancy that vary across the state over time, we incorporate a spatially varying coefficient model to explore the complex relationships between CRFs and geographic and temporal disparities among non-Hispanic White and non-Hispanic Black women. For posterior inference, we developed an efficient hybrid Markov Chain Monte Carlo algorithm that relies on easily sampled Gibbs and Metropolis-Hastings steps. Our analysis of CRFs in South Carolina reveals that certain counties, such as Chesterfield and Clarendon, exhibit gaps in racial health disparities, making them prime candidates for community-level interventions aimed at reducing these disparities.
{"title":"A Bayesian zero-inflated spatially varying coefficients model for overdispersed binomial data.","authors":"Chun-Che Wen, Rajib Paul, Kelly J Hunt, A James O'Malley, Hong Li, Elizabeth Hill, Angela M Malek, Brian Neelon","doi":"10.1093/jrsssa/qnaf056","DOIUrl":"10.1093/jrsssa/qnaf056","url":null,"abstract":"<p><p>Cardiometabolic risk factors (CRFs) during pregnancy are early indicators of maternal diseases, such as stroke and type 2 diabetes. The total number of CRFs typically takes the form of binomial counts that exhibit overdispersion and zero inflation due to correlations among the underlying CRFs. Motivated by an examination of spatiotemporal trends in five CRFs among pregnant women in the U.S. state of South Carolina during the COVID-19 pandemic, we developed a zero-inflated beta-binomial model within a spatiotemporal framework. This model combines a point mass at zero to account for zero inflation and a beta-binomial distribution to model the remaining CRF counts. Given the notable racial disparities in CRFs during pregnancy that vary across the state over time, we incorporate a spatially varying coefficient model to explore the complex relationships between CRFs and geographic and temporal disparities among non-Hispanic White and non-Hispanic Black women. For posterior inference, we developed an efficient hybrid Markov Chain Monte Carlo algorithm that relies on easily sampled Gibbs and Metropolis-Hastings steps. Our analysis of CRFs in South Carolina reveals that certain counties, such as Chesterfield and Clarendon, exhibit gaps in racial health disparities, making them prime candidates for community-level interventions aimed at reducing these disparities.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625296/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-21eCollection Date: 2025-10-01DOI: 10.1093/jrsssa/qnaf057
Michael Cork, Daniel Mork, Francesca Dominici
{"title":"Authors' reply to the Discussion of 'Methods for estimating the exposure-response curve to inform the new safety standards for fine particulate matter'.","authors":"Michael Cork, Daniel Mork, Francesca Dominici","doi":"10.1093/jrsssa/qnaf057","DOIUrl":"10.1093/jrsssa/qnaf057","url":null,"abstract":"","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 4","pages":"995-1002"},"PeriodicalIF":1.6,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Healthcare quality metrics refer to a variety of measures used to characterize what should have been done or not done for a patient or the health consequences of what was or was not done. When estimating healthcare quality, many metrics are measured and combined to provide an overall estimate either at the patient level or at higher levels, such as the provider organization or insurer. Racial and ethnic disparities are defined as the mean difference in quality between minorities and Whites not justified by underlying health conditions or patient preferences. Several statistical features of healthcare quality data have been ignored: quality is a theoretical construct not directly observable; quality metrics are measured on different scales or, if measured on the same scale, have different baseline rates; the construct may be multidimensional; and metrics are correlated within-individuals. Balancing health differences across race and ethnicity groups is challenging due to confounding. We provide an approach addressing these features, utilizing exploratory multidimensional item response theory (IRT) models and latent class IRT models to estimate quality, and optimization-based matching to adjust for confounding among the race and ethnicity groups. Quality metrics measured on 93,000 adults with schizophrenia residing in five US states illustrate approaches.
{"title":"Estimating racial and ethnic healthcare quality disparities using exploratory item response theory and latent class item response theory models.","authors":"Sharon-Lise Normand, Katya Zelevinsky, Marcela Horvitz-Lennon","doi":"10.1093/jrsssa/qnaf033","DOIUrl":"https://doi.org/10.1093/jrsssa/qnaf033","url":null,"abstract":"<p><p>Healthcare quality metrics refer to a variety of measures used to characterize what should have been done or not done for a patient or the health consequences of what was or was not done. When estimating healthcare quality, many metrics are measured and combined to provide an overall estimate either at the patient level or at higher levels, such as the provider organization or insurer. Racial and ethnic disparities are defined as the mean difference in quality between minorities and Whites not justified by underlying health conditions or patient preferences. Several statistical features of healthcare quality data have been ignored: quality is a theoretical construct not directly observable; quality metrics are measured on different scales or, if measured on the same scale, have different baseline rates; the construct may be multidimensional; and metrics are correlated within-individuals. Balancing health differences across race and ethnicity groups is challenging due to confounding. We provide an approach addressing these features, utilizing exploratory multidimensional item response theory (IRT) models and latent class IRT models to estimate quality, and optimization-based matching to adjust for confounding among the race and ethnicity groups. Quality metrics measured on 93,000 adults with schizophrenia residing in five US states illustrate approaches.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377680/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Exposure to fine particulate matter (PM2.5) poses significant health risks and accurately determining the shape of the relationship between PM2.5 and health outcomes has crucial policy implications. Although various statistical methods exist to estimate this exposure-response curve (ERC), few studies have compared their performance under plausible data-generating scenarios. This study compares seven commonly used ERC estimators across 72 exposure-response and confounding scenarios via simulation. Additionally, we apply these methods to estimate the ERC between long-term PM2.5 exposure and all-cause mortality using data from over 68 million Medicare beneficiaries in the United States. Our simulation indicates that regression methods not placed within a causal inference framework are unsuitable when anticipating heterogeneous exposure effects. Under the setting of a large sample size and unknown ERC functional form, we recommend utilizing causal inference methods that allow for nonlinear ERCs. In our data application, we observe a nonlinear relationship between annual average PM2.5 and all-cause mortality in the Medicare population, with a sharp increase in relative mortality at low PM2.5 concentrations. Our findings suggest that stricter limits on PM2.5 could avert numerous premature deaths. To facilitate the utilization of our results, we provide publicly available, reproducible code on Github for every step of the analysis.
{"title":"Methods for Estimating the Exposure-Response Curve to Inform the New Safety Standards for Fine Particulate Matter.","authors":"Michael Cork, Daniel Mork, Francesca Dominici","doi":"10.1093/jrsssa/qnaf004","DOIUrl":"10.1093/jrsssa/qnaf004","url":null,"abstract":"<p><p>Exposure to fine particulate matter (PM<sub>2.5</sub>) poses significant health risks and accurately determining the shape of the relationship between PM<sub>2.5</sub> and health outcomes has crucial policy implications. Although various statistical methods exist to estimate this exposure-response curve (ERC), few studies have compared their performance under plausible data-generating scenarios. This study compares seven commonly used ERC estimators across 72 exposure-response and confounding scenarios via simulation. Additionally, we apply these methods to estimate the ERC between long-term PM<sub>2.5</sub> exposure and all-cause mortality using data from over 68 million Medicare beneficiaries in the United States. Our simulation indicates that regression methods not placed within a causal inference framework are unsuitable when anticipating heterogeneous exposure effects. Under the setting of a large sample size and unknown ERC functional form, we recommend utilizing causal inference methods that allow for nonlinear ERCs. In our data application, we observe a nonlinear relationship between annual average PM<sub>2.5</sub> and all-cause mortality in the Medicare population, with a sharp increase in relative mortality at low PM<sub>2.5</sub> concentrations. Our findings suggest that stricter limits on PM<sub>2.5</sub> could avert numerous premature deaths. To facilitate the utilization of our results, we provide publicly available, reproducible code on Github for every step of the analysis.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433667/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aaron Gerding, Nicholas G Reich, Benjamin Rogers, Evan L Ray
Recent years have seen increasing efforts to forecast infectious disease burdens, with a primary goal being to help public health workers make informed policy decisions. However, there has been only limited discussion of how predominant forecast evaluation metrics might indicate the success of policies based in part on those forecasts. We explore one possible tether between forecasts and policy: the allocation of limited medical resources so as to minimize unmet need. We use probabilistic forecasts of disease burden in each of several regions to determine optimal resource allocations, and then we score forecasts according to how much unmet need their associated allocations would have allowed. We illustrate with forecasts of COVID-19 hospitalizations in the U.S., and we find that the forecast skill ranking given by this allocation scoring rule can vary substantially from the ranking given by the weighted interval score. We see this as evidence that the allocation scoring rule detects forecast value that is missed by traditional accuracy measures and that the general strategy of designing scoring rules that are directly linked to policy performance is a promising direction for epidemic forecast evaluation.
{"title":"Evaluating infectious disease forecasts with allocation scoring rules.","authors":"Aaron Gerding, Nicholas G Reich, Benjamin Rogers, Evan L Ray","doi":"10.1093/jrsssa/qnae136","DOIUrl":"10.1093/jrsssa/qnae136","url":null,"abstract":"<p><p>Recent years have seen increasing efforts to forecast infectious disease burdens, with a primary goal being to help public health workers make informed policy decisions. However, there has been only limited discussion of how predominant forecast evaluation metrics might indicate the success of policies based in part on those forecasts. We explore one possible tether between forecasts and policy: the allocation of limited medical resources so as to minimize unmet need. We use probabilistic forecasts of disease burden in each of several regions to determine optimal resource allocations, and then we score forecasts according to how much unmet need their associated allocations would have allowed. We illustrate with forecasts of COVID-19 hospitalizations in the U.S., and we find that the forecast skill ranking given by this allocation scoring rule can vary substantially from the ranking given by the weighted interval score. We see this as evidence that the allocation scoring rule detects forecast value that is missed by traditional accuracy measures and that the general strategy of designing scoring rules that are directly linked to policy performance is a promising direction for epidemic forecast evaluation.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12371526/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-18eCollection Date: 2025-10-01DOI: 10.1093/jrsssa/qnae109
Sofia L Vega, Rachel C Nethery
Although some pollutants emitted in vehicle exhaust, such as benzene, are known to cause leukaemia in adults with high exposure levels, less is known about the relationship between traffic-related air pollution (TRAP) and childhood haematologic cancer. In the 1990s, the US EPA enacted the reformulated gasoline program in select areas of the U.S., which drastically reduced ambient TRAP in affected areas. This created an ideal quasi-experiment to study the effects of TRAP on childhood haematologic cancers. However, existing methods for quasi-experimental analyses can perform poorly when outcomes are rare and unstable, as with childhood cancer incidence. We develop Bayesian spatio-temporal matrix completion methods to conduct causal inference in quasi-experimental settings with rare outcomes. Selective information sharing across space and time enables stable estimation, and the Bayesian approach facilitates uncertainty quantification. We evaluate the methods through simulations and apply them to estimate the causal effects of TRAP on childhood leukaemia and lymphoma.
{"title":"Spatio-temporal quasi-experimental methods for rare disease outcomes: the impact of reformulated gasoline on childhood haematologic cancer.","authors":"Sofia L Vega, Rachel C Nethery","doi":"10.1093/jrsssa/qnae109","DOIUrl":"10.1093/jrsssa/qnae109","url":null,"abstract":"<p><p>Although some pollutants emitted in vehicle exhaust, such as benzene, are known to cause leukaemia in adults with high exposure levels, less is known about the relationship between traffic-related air pollution (TRAP) and childhood haematologic cancer. In the 1990s, the US EPA enacted the reformulated gasoline program in select areas of the U.S., which drastically reduced ambient TRAP in affected areas. This created an ideal quasi-experiment to study the effects of TRAP on childhood haematologic cancers. However, existing methods for quasi-experimental analyses can perform poorly when outcomes are rare and unstable, as with childhood cancer incidence. We develop Bayesian spatio-temporal matrix completion methods to conduct causal inference in quasi-experimental settings with rare outcomes. Selective information sharing across space and time enables stable estimation, and the Bayesian approach facilitates uncertainty quantification. We evaluate the methods through simulations and apply them to estimate the causal effects of TRAP on childhood leukaemia and lymphoma.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 4","pages":"1184-1202"},"PeriodicalIF":1.6,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503115/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145253449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23eCollection Date: 2025-01-01DOI: 10.1093/jrsssa/qnae107
Eric A Bai, Botao Ju, Madeleine Beckner, Jerome P Reiter, M Giovanna Merli, Ted Mouw
Many population surveys do not provide information on respondents' residential addresses, instead offering coarse geographies like zip code or higher aggregations. However, fine resolution geography can be beneficial for characterizing neighbourhoods, especially for relatively rare populations such as immigrants. One way to obtain such information is to link survey records to records in auxiliary databases that include residential addresses by matching on variables common to both files. We present an approach based on probabilistic record linkage that enables matching survey participants in the Chinese Immigrants in Raleigh-Durham Study to records from InfoUSA, an information provider of residential records. The two files use different Chinese name romanization practices, which we address through a novel and generalizable strategy for constructing records' pairwise comparison vectors for romanized names. Using a fully Bayesian record linkage model, we characterize the geospatial distribution of Chinese immigrants in the Raleigh-Durham area of North Carolina.
{"title":"Studying Chinese immigrants' spatial distribution in the Raleigh-Durham area by linking survey and commercial data using romanized names.","authors":"Eric A Bai, Botao Ju, Madeleine Beckner, Jerome P Reiter, M Giovanna Merli, Ted Mouw","doi":"10.1093/jrsssa/qnae107","DOIUrl":"10.1093/jrsssa/qnae107","url":null,"abstract":"<p><p>Many population surveys do not provide information on respondents' residential addresses, instead offering coarse geographies like zip code or higher aggregations. However, fine resolution geography can be beneficial for characterizing neighbourhoods, especially for relatively rare populations such as immigrants. One way to obtain such information is to link survey records to records in auxiliary databases that include residential addresses by matching on variables common to both files. We present an approach based on probabilistic record linkage that enables matching survey participants in the Chinese Immigrants in Raleigh-Durham Study to records from InfoUSA, an information provider of residential records. The two files use different Chinese name romanization practices, which we address through a novel and generalizable strategy for constructing records' pairwise comparison vectors for romanized names. Using a fully Bayesian record linkage model, we characterize the geospatial distribution of Chinese immigrants in the Raleigh-Durham area of North Carolina.</p>","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 1","pages":"84-97"},"PeriodicalIF":1.6,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11728054/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142985303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}