Journal of the Royal Statistical Society Series A-Statistics in Society最新文献

Multivariate mixed models accounting for don't know options in ordinal data. 多元混合模型考虑了有序数据中的未知选项。

IF 1.6 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-12-26 DOI: 10.1093/jrsssa/qnaf197

Ralitza Gueorguieva, Maria Iannario

Multivariate ordinal data characterised by between-subject heterogeneity or different response styles are prevalent in surveys and other observational studies. It is especially common in surveys designed to assess individual perceptions or knowledge to include a 'don't know' option on some or all survey questions. The latter makes the scales partially ordinal, which precludes the use of well-established models for ordinal data, whereas models for nominal data are inefficient and difficult to interpret. Ignoring the 'don't know' options may introduce bias as the subset of individuals who choose to provide ratings may not be representative of the population of interest. The suggested solution in this manuscript involves jointly modeling the selection of 'don't know' options and the ordinal ratings on multiple variables. The proposed multivariate mixed models are flexible, allow for heterogeneity in responses, response styles, and assessment of the effects of covariates on the ordinal ratings and on choosing the 'don't know' options. Likelihood-based inference and model comparisons are performed. Two case studies: one on financial risk perceptions and one on knowledge about the addictiveness of tobacco products are used for motivation and illustration. A simulation demonstrates that the proposed approach yields unbiased and efficient estimates. The results are straightforward to interpret, effectively capturing the complexity inherent in the data. This makes the proposed models particularly well-suited for analyzing partially ordinal ratings in social and behavioral surveys, providing a robust and reliable framework for such contexts.

以受试者间异质性或不同反应风格为特征的多变量有序数据在调查和其他观察性研究中普遍存在。在旨在评估个人看法或知识的调查中，在某些或所有调查问题中包括“不知道”选项尤其常见。后者使尺度部分有序，这就排除了对有序数据使用成熟的模型，而名义数据的模型效率低下，难以解释。忽略“不知道”选项可能会引入偏见，因为选择提供评级的个人子集可能无法代表感兴趣的总体。本文建议的解决方案包括对“不知道”选项的选择和对多个变量的顺序评级进行联合建模。所提出的多变量混合模型是灵活的，允许在响应、响应风格和评估协变量对顺序评级和选择“不知道”选项的影响方面的异质性。执行基于似然的推理和模型比较。两个案例研究：一个关于财务风险认知，一个关于烟草制品成瘾性的知识，用于激励和说明。仿真结果表明，该方法可以产生无偏和有效的估计。结果很容易解释，有效地捕获了数据中固有的复杂性。这使得所提出的模型特别适合于分析社会和行为调查中的部分有序评级，为这种情况提供了一个健壮和可靠的框架。

{"title":"Multivariate mixed models accounting for don't know options in ordinal data.","authors":"Ralitza Gueorguieva, Maria Iannario","doi":"10.1093/jrsssa/qnaf197","DOIUrl":"10.1093/jrsssa/qnaf197","url":null,"abstract":"Multivariate ordinal data characterised by between-subject heterogeneity or different response styles are prevalent in surveys and other observational studies. It is especially common in surveys designed to assess individual perceptions or knowledge to include a 'don't know' option on some or all survey questions. The latter makes the scales partially ordinal, which precludes the use of well-established models for ordinal data, whereas models for nominal data are inefficient and difficult to interpret. Ignoring the 'don't know' options may introduce bias as the subset of individuals who choose to provide ratings may not be representative of the population of interest. The suggested solution in this manuscript involves jointly modeling the selection of 'don't know' options and the ordinal ratings on multiple variables. The proposed multivariate mixed models are flexible, allow for heterogeneity in responses, response styles, and assessment of the effects of covariates on the ordinal ratings and on choosing the 'don't know' options. Likelihood-based inference and model comparisons are performed. Two case studies: one on financial risk perceptions and one on knowledge about the addictiveness of tobacco products are used for motivation and illustration. A simulation demonstrates that the proposed approach yields unbiased and efficient estimates. The results are straightforward to interpret, effectively capturing the complexity inherent in the data. This makes the proposed models particularly well-suited for analyzing partially ordinal ratings in social and behavioral surveys, providing a robust and reliable framework for such contexts.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12778354/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145936004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Survey Inference in Two-phase Designs Using Bayesian Machine Learning. 利用贝叶斯机器学习改进两阶段设计中的调查推断。

IF 1.6 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-09-08 DOI: 10.1093/jrsssa/qnaf136

Xinru Wang, Anyu Zhu, Lauren Kennedy, Abigail Greenleaf, Qixuan Chen

The two-phase sampling design is a cost-effective strategy widely used in public health research. Analyzing the Phase II sample often involves creating subsample-specific weights. However, these weights can be highly variable, leading to unstable weighted analyses. Alternatively, the rich data collected during the first phase can be leveraged to improve survey inference for the Phase II sample. In this paper, we propose a Bayesian tree-based multiple imputation (MI) approach for estimating population means using the Phase II sample, where the parent survey was conducted using a complex survey design. The design features of the parent survey, such as strata and clusters, are incorporated into the tree-based imputation models. Through simulations, we demonstrate that the tree-based MI method outperforms traditional weighted estimators, yielding smaller bias, lower root mean squared error, and narrower 95% confidence intervals, with coverage rates closer to the nominal level. Furthermore, we show that Rubin's variance estimation method provides valid statistical inference for population mean estimation in our setting. We illustrate the application of the proposed tree-based MI method using data from a cellphone survey on COVID-19 vaccination in Uganda, which represents a subcohort sample drawn from the 2020 Uganda Population-based HIV Impact Assessment Survey.

两阶段抽样设计是一种广泛应用于公共卫生研究的具有成本效益的策略。分析第二阶段的样本通常涉及创建子样本特定权重。然而，这些权重可能是高度可变的，导致不稳定的加权分析。另外，可以利用第一阶段收集的丰富数据来改进第二阶段样本的调查推断。在本文中，我们提出了一种基于贝叶斯树的多重插值（MI）方法，用于使用第二阶段样本估计总体均值，其中父母调查使用复杂的调查设计进行。母测量的设计特征，如地层和集群，被纳入基于树的估算模型中。通过模拟，我们证明了基于树的MI方法优于传统的加权估计方法，产生更小的偏差，更低的均方根误差，更窄的95%置信区间，覆盖率更接近名义水平。此外，我们表明Rubin的方差估计方法在我们的设置中为总体均值估计提供了有效的统计推断。我们使用来自乌干达COVID-19疫苗接种手机调查的数据来说明所提出的基于树的MI方法的应用，该数据代表了来自2020年乌干达基于人口的艾滋病毒影响评估调查的亚队列样本。

{"title":"Improving Survey Inference in Two-phase Designs Using Bayesian Machine Learning.","authors":"Xinru Wang, Anyu Zhu, Lauren Kennedy, Abigail Greenleaf, Qixuan Chen","doi":"10.1093/jrsssa/qnaf136","DOIUrl":"10.1093/jrsssa/qnaf136","url":null,"abstract":"The two-phase sampling design is a cost-effective strategy widely used in public health research. Analyzing the Phase II sample often involves creating subsample-specific weights. However, these weights can be highly variable, leading to unstable weighted analyses. Alternatively, the rich data collected during the first phase can be leveraged to improve survey inference for the Phase II sample. In this paper, we propose a Bayesian tree-based multiple imputation (MI) approach for estimating population means using the Phase II sample, where the parent survey was conducted using a complex survey design. The design features of the parent survey, such as strata and clusters, are incorporated into the tree-based imputation models. Through simulations, we demonstrate that the tree-based MI method outperforms traditional weighted estimators, yielding smaller bias, lower root mean squared error, and narrower 95% confidence intervals, with coverage rates closer to the nominal level. Furthermore, we show that Rubin's variance estimation method provides valid statistical inference for population mean estimation in our setting. We illustrate the application of the proposed tree-based MI method using data from a cellphone survey on COVID-19 vaccination in Uganda, which represents a subcohort sample drawn from the 2020 Uganda Population-based HIV Impact Assessment Survey.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12905726/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146203615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping socio-economic status using mixed data: a hierarchical Bayesian approach. 利用混合数据绘制社会经济地位图：分层贝叶斯方法。

IF 1.5 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-07-14 Epub Date: 2024-08-20 DOI: 10.1093/jrsssa/qnae080

Gabrielle Virgili-Gervais, Alexandra M Schmidt, Honor Bixby, Alicia Cavanaugh, George Owusu, Samuel Agyei-Mensah, Brian Robinson, Jill Baumgartner

We propose a Bayesian hierarchical model to estimate a socio-economic status (SES) index based on mixed dichotomous and continuous variables. In particular, we extend Quinn's ([2004]. Bayesian factor analysis for mixed ordinal and continuous responses. Political Analysis, 12(4), 338-353. https://doi.org/10.1093/pan/mph022) and Schliep and Hoeting's ([2013]. Multilevel latent Gaussian process model for mixed discrete and continuous multivariate response data. Journal of Agricultural, Biological, and Environmental Statistics, 18(4), 492-513. https://doi.org/10.1007/s13253-013-0136-z) factor analysis models for mixed dichotomous and continuous variables by allowing a spatial hierarchical structure of key parameters of the model. Unlike most SES assessment models proposed in the literature, the hierarchical nature of this model enables the use of census observations at the household level without needing to aggregate any information a priori. Therefore, it better accommodates the variability of the SES between census tracts and the number of households per area. The proposed model is used in the estimation of a socio-economic index using 10% of the 2010 Ghana census in the Greater Accra Metropolitan area. Out of the 20 observed variables, the number of people per room, access to water piping and flushable toilets differentiated high and low SES areas the best.

我们提出了一个基于混合二分类变量和连续变量的贝叶斯层次模型来估计社会经济地位（SES）指数。特别地，我们扩展了Quinn的（[2004]）。混合有序和连续响应的贝叶斯因子分析。政治分析，12(4),338-353。https://doi.org/10.1093/pan/mph022)和Schliep and Hoeting的[2013]。混合离散和连续多元响应数据的多水平隐高斯过程模型。农业生物与环境统计，18(4),492-513。https://doi.org/10.1007/s13253-013-0136-z)因子分析模型的混合二分类和连续变量，允许一个空间层次结构的关键参数的模型。与文献中提出的大多数社会经济地位评估模型不同，该模型的分层性质使其能够在家庭层面上使用人口普查观察结果，而无需先验地汇总任何信息。因此，它更好地适应了人口普查区之间的社会经济地位和每个地区的家庭数量的变化。该模型使用2010年加纳大阿克拉大都会地区人口普查数据的10%来估计社会经济指数。在观察到的20个变量中，每个房间的人数、是否有水管和可冲水厕所是区分高SES和低SES区域的最好方法。

{"title":"Mapping socio-economic status using mixed data: a hierarchical Bayesian approach.","authors":"Gabrielle Virgili-Gervais, Alexandra M Schmidt, Honor Bixby, Alicia Cavanaugh, George Owusu, Samuel Agyei-Mensah, Brian Robinson, Jill Baumgartner","doi":"10.1093/jrsssa/qnae080","DOIUrl":"10.1093/jrsssa/qnae080","url":null,"abstract":"We propose a Bayesian hierarchical model to estimate a socio-economic status (SES) index based on mixed dichotomous and continuous variables. In particular, we extend Quinn's ([2004]. Bayesian factor analysis for mixed ordinal and continuous responses. Political Analysis, 12(4), 338-353. https://doi.org/10.1093/pan/mph022) and Schliep and Hoeting's ([2013]. Multilevel latent Gaussian process model for mixed discrete and continuous multivariate response data. Journal of Agricultural, Biological, and Environmental Statistics, 18(4), 492-513. https://doi.org/10.1007/s13253-013-0136-z) factor analysis models for mixed dichotomous and continuous variables by allowing a spatial hierarchical structure of key parameters of the model. Unlike most SES assessment models proposed in the literature, the hierarchical nature of this model enables the use of census observations at the household level without needing to aggregate any information a priori. Therefore, it better accommodates the variability of the SES between census tracts and the number of households per area. The proposed model is used in the estimation of a socio-economic index using 10% of the 2010 Ghana census in the Greater Accra Metropolitan area. Out of the 20 observed variables, the number of people per room, access to water piping and flushable toilets differentiated high and low SES areas the best.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":"859-874"},"PeriodicalIF":1.5,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7617442/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143544329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

kpop: a kernel balancing approach for reducing specification assumptions in survey weighting. Kpop：在调查加权中减少规范假设的核平衡方法。

IF 1.6 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-07-01 Epub Date: 2024-09-02 DOI: 10.1093/jrsssa/qnae082

Erin Hartman, Chad Hazlett, Ciara Sterbenz

With the precipitous decline in response rates, researchers and pollsters have been left with highly nonrepresentative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables $X$ must be adjusted for, they rarely defend particular functional forms relating these variables to the response process or the outcome. Unfortunately, commonly used calibration weights-which make the weighted mean of $X$ in the sample equal that of the population-only ensure correct adjustment when the portion of the outcome and the response process left unexplained by linear functions of $X$ are independent. To alleviate this functional form dependency, we describe kernel balancing for population weighting (kpop). This approach replaces the design matrix $X$ with a kernel matrix, $K$ encoding high-order information about $X$ . Weights are then found to make the weighted average row of $K$ among sampled units approximately equal to that of the target population. This produces good calibration on a wide range of smooth functions of $X$ , without relying on the user to decide which $X$ or what functions of them to include. We describe the method and illustrate it by application to polling data from the 2016 US presidential election.

随着回复率的急剧下降，研究人员和民意测验专家留下了高度不具代表性的样本，依靠构建的权重来使这些样本代表期望的目标人群。尽管从业者使用有价值的专家知识来选择X必须调整的变量，但他们很少为这些变量与响应过程或结果相关的特定功能形式辩护。不幸的是，通常使用的校准权重——使样本中X的加权平均值等于总体的加权平均值——只有在X的线性函数无法解释的部分结果和响应过程是独立的情况下才能确保正确的调整。为了减轻这种功能形式依赖，我们描述了人口加权（kpop）的内核平衡。这种方法将设计矩阵X替换为核矩阵，K编码关于X的高阶信息。然后找到权重，使抽样单位中K的加权平均行近似等于目标总体的加权平均行。这对X的各种平滑函数产生了良好的校准，而不依赖于用户决定包含哪个X或其中的哪些函数。我们描述了该方法，并通过应用于2016年美国总统大选的民意调查数据来说明它。

{"title":"kpop: a kernel balancing approach for reducing specification assumptions in survey weighting.","authors":"Erin Hartman, Chad Hazlett, Ciara Sterbenz","doi":"10.1093/jrsssa/qnae082","DOIUrl":"10.1093/jrsssa/qnae082","url":null,"abstract":"With the precipitous decline in response rates, researchers and pollsters have been left with highly nonrepresentative samples, relying on constructed weights to make these samples representative of the desired target population. Though practitioners employ valuable expert knowledge to choose what variables <math><mrow><mi>X</mi></mrow> </math> must be adjusted for, they rarely defend particular functional forms relating these variables to the response process or the outcome. Unfortunately, commonly used calibration weights-which make the weighted mean of <math><mrow><mi>X</mi></mrow> </math> in the sample equal that of the population-only ensure correct adjustment when the portion of the outcome and the response process left unexplained by linear functions of <math><mrow><mi>X</mi></mrow> </math> are independent. To alleviate this functional form dependency, we describe kernel balancing for population weighting (kpop). This approach replaces the design matrix <math><mrow><mtext>X</mtext></mrow> </math> with a kernel matrix, <math><mrow><mtext>K</mtext></mrow> </math> encoding high-order information about <math><mrow><mtext>X</mtext></mrow> </math> . Weights are then found to make the weighted average row of <math><mrow><mtext>K</mtext></mrow> </math> among sampled units approximately equal to that of the target population. This produces good calibration on a wide range of smooth functions of <math><mrow><mi>X</mi></mrow> </math> , without relying on the user to decide which <math><mrow><mi>X</mi></mrow> </math> or what functions of them to include. We describe the method and illustrate it by application to polling data from the 2016 US presidential election.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":"188 3","pages":"875-895"},"PeriodicalIF":1.6,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12352454/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144876439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Professor Ian Hall's contribution to the Discussion of 'Some statistical aspects of the COVID-19 response' by Wood et al. 伊恩·霍尔教授对伍德等人讨论“COVID-19应对的一些统计方面”的贡献。

IF 1.6 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-06-27 eCollection Date: 2026-01-01 DOI: 10.1093/jrsssa/qnaf076

Ian Hall

引用次数: 0

Graphical displays and related statistical measures of health disparities between groups in complex sample surveys. 复杂抽样调查中群体间健康差异的图形显示和相关统计措施。

IF 1.6 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-05-22 DOI: 10.1093/jrsssa/qnaf044

Mark Louie Ramos, Barry Graubard, Joseph Gastwirth

Different methods for describing health disparities in the distributions of continuous measured health-related variables among groups provide more insight into the nature and impact of the disparities than comparing measures of central tendency. Transformations of the Lorenz curve and analogues of the Gini index used in the analysis of income inequality are adapted to provide graphical and analytical measures of health disparities. Akin to the classical Peters-Belson regression method for partitioning a disparity into a component explained by group differences in a set of covariates and an unexplained component, a new modified Lorenz curve is proposed. The estimation of these curves/measures is adapted for data obtained from surveys with complex sample weighted designs. The statistical properties of sample weighted estimators of the proposed measures and their bootstrap variances are explored through simulation studies. Applications are demonstrated using BMI and blood lead levels among race/ethnic groups of adult females and children, respectively, from the 2013-2018 and 1988-1994 US National Health and Nutrition Examination Surveys. Another application examines disparities in distance to nearest acute care hospital among census blocks in the US state of New York grouped by their level of urbanicity using US census data and the American Hospital Association survey.

不同的方法描述连续测量的健康相关变量在组间分布中的健康差异，比比较集中趋势的测量方法更能深入了解差异的性质和影响。对洛伦兹曲线的变换和用于分析收入不平等的基尼指数的类似物进行了调整，以提供健康差距的图形和分析措施。类似于经典的彼得斯-贝尔森回归方法，将差异划分为由一组协变量中的群体差异解释的分量和未解释的分量，提出了一种新的修正洛伦兹曲线。这些曲线/测量的估计适用于从具有复杂样本加权设计的调查中获得的数据。通过仿真研究，探讨了所提测度的样本加权估计量及其自举方差的统计性质。分别使用2013-2018年和1988-1994年美国国家健康与营养检查调查中成年女性和儿童种族/族裔群体的BMI和血铅水平来证明应用。另一个应用程序使用美国人口普查数据和美国医院协会的调查，根据城市化水平，检查美国纽约州人口普查街区到最近的急性护理医院的距离差异。

{"title":"Graphical displays and related statistical measures of health disparities between groups in complex sample surveys.","authors":"Mark Louie Ramos, Barry Graubard, Joseph Gastwirth","doi":"10.1093/jrsssa/qnaf044","DOIUrl":"10.1093/jrsssa/qnaf044","url":null,"abstract":"Different methods for describing health disparities in the distributions of continuous measured health-related variables among groups provide more insight into the nature and impact of the disparities than comparing measures of central tendency. Transformations of the Lorenz curve and analogues of the Gini index used in the analysis of income inequality are adapted to provide graphical and analytical measures of health disparities. Akin to the classical Peters-Belson regression method for partitioning a disparity into a component explained by group differences in a set of covariates and an unexplained component, a new modified Lorenz curve is proposed. The estimation of these curves/measures is adapted for data obtained from surveys with complex sample weighted designs. The statistical properties of sample weighted estimators of the proposed measures and their bootstrap variances are explored through simulation studies. Applications are demonstrated using BMI and blood lead levels among race/ethnic groups of adult females and children, respectively, from the 2013-2018 and 1988-1994 US National Health and Nutrition Examination Surveys. Another application examines disparities in distance to nearest acute care hospital among census blocks in the US state of New York grouped by their level of urbanicity using US census data and the American Hospital Association survey.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12341090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Bayesian zero-inflated spatially varying coefficients model for overdispersed binomial data. 过分散二项数据的贝叶斯零膨胀空间变系数模型。

IF 1.6 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-05-21 DOI: 10.1093/jrsssa/qnaf056

Chun-Che Wen, Rajib Paul, Kelly J Hunt, A James O'Malley, Hong Li, Elizabeth Hill, Angela M Malek, Brian Neelon

Cardiometabolic risk factors (CRFs) during pregnancy are early indicators of maternal diseases, such as stroke and type 2 diabetes. The total number of CRFs typically takes the form of binomial counts that exhibit overdispersion and zero inflation due to correlations among the underlying CRFs. Motivated by an examination of spatiotemporal trends in five CRFs among pregnant women in the U.S. state of South Carolina during the COVID-19 pandemic, we developed a zero-inflated beta-binomial model within a spatiotemporal framework. This model combines a point mass at zero to account for zero inflation and a beta-binomial distribution to model the remaining CRF counts. Given the notable racial disparities in CRFs during pregnancy that vary across the state over time, we incorporate a spatially varying coefficient model to explore the complex relationships between CRFs and geographic and temporal disparities among non-Hispanic White and non-Hispanic Black women. For posterior inference, we developed an efficient hybrid Markov Chain Monte Carlo algorithm that relies on easily sampled Gibbs and Metropolis-Hastings steps. Our analysis of CRFs in South Carolina reveals that certain counties, such as Chesterfield and Clarendon, exhibit gaps in racial health disparities, making them prime candidates for community-level interventions aimed at reducing these disparities.

怀孕期间的心脏代谢危险因素（crf）是产妇疾病的早期指标，如中风和2型糖尿病。crf的总数通常采用二项计数的形式，由于基础crf之间的相关性，表现出过度分散和零通货膨胀。在对COVID-19大流行期间美国南卡罗来纳州孕妇的五个crf的时空趋势进行研究的启发下，我们在时空框架内开发了一个零膨胀的β -二项模型。该模型结合了零点点质量来解释零膨胀和β二项分布来模拟剩余的CRF计数。考虑到怀孕期间crf的显著种族差异随时间而变化，我们结合了一个空间变化系数模型来探索非西班牙裔白人和非西班牙裔黑人妇女的crf与地理和时间差异之间的复杂关系。对于后验推理，我们开发了一种高效的混合马尔可夫链蒙特卡罗算法，该算法依赖于易于采样的Gibbs和Metropolis-Hastings步骤。我们对南卡罗来纳州crf的分析表明，某些县，如切斯特菲尔德和克拉伦登，在种族健康差异方面表现出差距，这使它们成为旨在减少这些差异的社区一级干预措施的主要候选者。

{"title":"A Bayesian zero-inflated spatially varying coefficients model for overdispersed binomial data.","authors":"Chun-Che Wen, Rajib Paul, Kelly J Hunt, A James O'Malley, Hong Li, Elizabeth Hill, Angela M Malek, Brian Neelon","doi":"10.1093/jrsssa/qnaf056","DOIUrl":"10.1093/jrsssa/qnaf056","url":null,"abstract":"Cardiometabolic risk factors (CRFs) during pregnancy are early indicators of maternal diseases, such as stroke and type 2 diabetes. The total number of CRFs typically takes the form of binomial counts that exhibit overdispersion and zero inflation due to correlations among the underlying CRFs. Motivated by an examination of spatiotemporal trends in five CRFs among pregnant women in the U.S. state of South Carolina during the COVID-19 pandemic, we developed a zero-inflated beta-binomial model within a spatiotemporal framework. This model combines a point mass at zero to account for zero inflation and a beta-binomial distribution to model the remaining CRF counts. Given the notable racial disparities in CRFs during pregnancy that vary across the state over time, we incorporate a spatially varying coefficient model to explore the complex relationships between CRFs and geographic and temporal disparities among non-Hispanic White and non-Hispanic Black women. For posterior inference, we developed an efficient hybrid Markov Chain Monte Carlo algorithm that relies on easily sampled Gibbs and Metropolis-Hastings steps. Our analysis of CRFs in South Carolina reveals that certain counties, such as Chesterfield and Clarendon, exhibit gaps in racial health disparities, making them prime candidates for community-level interventions aimed at reducing these disparities.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12625296/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Authors' reply to the Discussion of 'Methods for estimating the exposure-response curve to inform the new safety standards for fine particulate matter'. 关于“为新细颗粒物安全标准提供信息的暴露-反应曲线估算方法”的讨论的答复。

IF 1.6 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-05-21 eCollection Date: 2025-10-01 DOI: 10.1093/jrsssa/qnaf057

Michael Cork, Daniel Mork, Francesca Dominici

引用次数: 0

Estimating racial and ethnic healthcare quality disparities using exploratory item response theory and latent class item response theory models. 使用探索性项目反应理论和潜在类别项目反应理论模型估计种族和民族医疗保健质量差异。

IF 1.6 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-04-01 DOI: 10.1093/jrsssa/qnaf033

Sharon-Lise Normand, Katya Zelevinsky, Marcela Horvitz-Lennon

Healthcare quality metrics refer to a variety of measures used to characterize what should have been done or not done for a patient or the health consequences of what was or was not done. When estimating healthcare quality, many metrics are measured and combined to provide an overall estimate either at the patient level or at higher levels, such as the provider organization or insurer. Racial and ethnic disparities are defined as the mean difference in quality between minorities and Whites not justified by underlying health conditions or patient preferences. Several statistical features of healthcare quality data have been ignored: quality is a theoretical construct not directly observable; quality metrics are measured on different scales or, if measured on the same scale, have different baseline rates; the construct may be multidimensional; and metrics are correlated within-individuals. Balancing health differences across race and ethnicity groups is challenging due to confounding. We provide an approach addressing these features, utilizing exploratory multidimensional item response theory (IRT) models and latent class IRT models to estimate quality, and optimization-based matching to adjust for confounding among the race and ethnicity groups. Quality metrics measured on 93,000 adults with schizophrenia residing in five US states illustrate approaches.

医疗保健质量指标是指用于描述应该为患者做什么或不做什么，或做什么或不做什么对健康造成的后果的各种度量。在评估医疗保健质量时，要测量并组合许多指标，以提供患者级别或更高级别（如提供商组织或保险公司）的总体评估。种族和民族差异被定义为少数民族和白人之间的平均质量差异，而不是由潜在的健康状况或患者偏好来证明。医疗质量数据的几个统计特征被忽视了：质量是一个不能直接观察到的理论结构；在不同的尺度上测量质量度量，或者，如果在相同的尺度上测量，有不同的基线率；这个结构可能是多维的；指标在个体内部是相关的。由于混淆，平衡种族和族裔群体之间的健康差异具有挑战性。我们提供了一种解决这些特征的方法，利用探索性多维项目反应理论（IRT）模型和潜在类别IRT模型来估计质量，并基于优化的匹配来调整种族和民族群体之间的混淆。对居住在美国五个州的93,000名精神分裂症患者进行的质量指标测量说明了方法。

{"title":"Estimating racial and ethnic healthcare quality disparities using exploratory item response theory and latent class item response theory models.","authors":"Sharon-Lise Normand, Katya Zelevinsky, Marcela Horvitz-Lennon","doi":"10.1093/jrsssa/qnaf033","DOIUrl":"https://doi.org/10.1093/jrsssa/qnaf033","url":null,"abstract":"Healthcare quality metrics refer to a variety of measures used to characterize what should have been done or not done for a patient or the health consequences of what was or was not done. When estimating healthcare quality, many metrics are measured and combined to provide an overall estimate either at the patient level or at higher levels, such as the provider organization or insurer. Racial and ethnic disparities are defined as the mean difference in quality between minorities and Whites not justified by underlying health conditions or patient preferences. Several statistical features of healthcare quality data have been ignored: quality is a theoretical construct not directly observable; quality metrics are measured on different scales or, if measured on the same scale, have different baseline rates; the construct may be multidimensional; and metrics are correlated within-individuals. Balancing health differences across race and ethnicity groups is challenging due to confounding. We provide an approach addressing these features, utilizing exploratory multidimensional item response theory (IRT) models and latent class IRT models to estimate quality, and optimization-based matching to adjust for confounding among the race and ethnicity groups. Quality metrics measured on 93,000 adults with schizophrenia residing in five US states illustrate approaches.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377680/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144976850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Methods for Estimating the Exposure-Response Curve to Inform the New Safety Standards for Fine Particulate Matter. 暴露-反应曲线估算方法为新细颗粒物安全标准提供依据。

IF 1.6 3区数学 Q2 SOCIAL SCIENCES, MATHEMATICAL METHODS

Journal of the Royal Statistical Society Series A-Statistics in Society

Pub Date : 2025-01-16 DOI: 10.1093/jrsssa/qnaf004

Michael Cork, Daniel Mork, Francesca Dominici

Exposure to fine particulate matter (PM_2.5) poses significant health risks and accurately determining the shape of the relationship between PM_2.5 and health outcomes has crucial policy implications. Although various statistical methods exist to estimate this exposure-response curve (ERC), few studies have compared their performance under plausible data-generating scenarios. This study compares seven commonly used ERC estimators across 72 exposure-response and confounding scenarios via simulation. Additionally, we apply these methods to estimate the ERC between long-term PM_2.5 exposure and all-cause mortality using data from over 68 million Medicare beneficiaries in the United States. Our simulation indicates that regression methods not placed within a causal inference framework are unsuitable when anticipating heterogeneous exposure effects. Under the setting of a large sample size and unknown ERC functional form, we recommend utilizing causal inference methods that allow for nonlinear ERCs. In our data application, we observe a nonlinear relationship between annual average PM_2.5 and all-cause mortality in the Medicare population, with a sharp increase in relative mortality at low PM_2.5 concentrations. Our findings suggest that stricter limits on PM_2.5 could avert numerous premature deaths. To facilitate the utilization of our results, we provide publicly available, reproducible code on Github for every step of the analysis.

暴露于细颗粒物（PM2.5）会带来重大的健康风险，准确确定PM2.5与健康结果之间的关系具有至关重要的政策意义。尽管存在各种统计方法来估计这种暴露-反应曲线（ERC），但很少有研究比较它们在合理的数据生成场景下的性能。本研究通过模拟比较了72种暴露-反应和混杂情景中7种常用的ERC估计器。此外，我们利用来自美国6800多万医疗保险受益人的数据，应用这些方法来估计长期PM2.5暴露与全因死亡率之间的ERC。我们的模拟表明，在预测异质暴露效应时，未置于因果推理框架内的回归方法是不合适的。在大样本量和未知ERC函数形式的情况下，我们建议使用允许非线性ERC的因果推理方法。在我们的数据应用中，我们观察到医疗保险人群的年平均PM2.5与全因死亡率之间存在非线性关系，低PM2.5浓度下的相对死亡率急剧上升。我们的研究结果表明，更严格的PM2.5限制可以避免许多过早死亡。为了方便使用我们的结果，我们在Github上为分析的每个步骤提供了公开可用的、可复制的代码。

{"title":"Methods for Estimating the Exposure-Response Curve to Inform the New Safety Standards for Fine Particulate Matter.","authors":"Michael Cork, Daniel Mork, Francesca Dominici","doi":"10.1093/jrsssa/qnaf004","DOIUrl":"10.1093/jrsssa/qnaf004","url":null,"abstract":"Exposure to fine particulate matter (PM2.5) poses significant health risks and accurately determining the shape of the relationship between PM2.5 and health outcomes has crucial policy implications. Although various statistical methods exist to estimate this exposure-response curve (ERC), few studies have compared their performance under plausible data-generating scenarios. This study compares seven commonly used ERC estimators across 72 exposure-response and confounding scenarios via simulation. Additionally, we apply these methods to estimate the ERC between long-term PM2.5 exposure and all-cause mortality using data from over 68 million Medicare beneficiaries in the United States. Our simulation indicates that regression methods not placed within a causal inference framework are unsuitable when anticipating heterogeneous exposure effects. Under the setting of a large sample size and unknown ERC functional form, we recommend utilizing causal inference methods that allow for nonlinear ERCs. In our data application, we observe a nonlinear relationship between annual average PM2.5 and all-cause mortality in the Medicare population, with a sharp increase in relative mortality at low PM2.5 concentrations. Our findings suggest that stricter limits on PM2.5 could avert numerous premature deaths. To facilitate the utilization of our results, we provide publicly available, reproducible code on Github for every step of the analysis.","PeriodicalId":49983,"journal":{"name":"Journal of the Royal Statistical Society Series A-Statistics in Society","volume":" ","pages":""},"PeriodicalIF":1.6,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433667/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0