Pub Date : 2024-05-31eCollection Date: 2024-08-01DOI: 10.1093/jrsssc/qlae027
Junting Ren, Fabian J E Telschow, Armin Schwartzman
Motivated by the questions of risk assessment in climatology (temperature change in North America) and medicine (impact of statin usage and coronavirus disease 2019 on hospitalized patients), we address the problem of estimating the set in the domain of a function whose image equals a predefined subset of the real line. Existing methods require strict assumptions. We generalize the estimation of such sets to dense and nondense domains with protection against inflated Type I error in exploratory data analysis. This is achieved by proving that confidence sets of multiple upper, lower, or interval sets can be simultaneously constructed with the desired confidence nonasymptotically through inverting simultaneous confidence intervals. Nonparametric bootstrap algorithm and code are provided.
受气候学(北美气温变化)和医学(他汀类药物的使用和 2019 年冠状病毒疾病对住院病人的影响)中风险评估问题的启发,我们解决了估计函数域中的集合的问题,该函数的图像等于实线的预定义子集。现有方法需要严格的假设条件。我们将此类集合的估计方法推广到稠密域和非稠密域,并在探索性数据分析中防止 I 类错误的扩大。为此,我们证明了多个上集、下集或区间集的置信度集可以通过同时倒置置信区间,以非渐近的方式同时构建出所需的置信度。提供了非参数引导算法和代码。
{"title":"Inverse set estimation and inversion of simultaneous confidence intervals.","authors":"Junting Ren, Fabian J E Telschow, Armin Schwartzman","doi":"10.1093/jrsssc/qlae027","DOIUrl":"10.1093/jrsssc/qlae027","url":null,"abstract":"<p><p>Motivated by the questions of risk assessment in climatology (temperature change in North America) and medicine (impact of statin usage and coronavirus disease 2019 on hospitalized patients), we address the problem of estimating the set in the domain of a function whose image equals a predefined subset of the real line. Existing methods require strict assumptions. We generalize the estimation of such sets to dense and nondense domains with protection against inflated Type I error in exploratory data analysis. This is achieved by proving that confidence sets of multiple upper, lower, or interval sets can be simultaneously constructed with the desired confidence nonasymptotically through inverting simultaneous confidence intervals. Nonparametric bootstrap algorithm and code are provided.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11321826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141983698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-14eCollection Date: 2024-08-01DOI: 10.1093/jrsssc/qlae015
Kun Meng, Ani Eloyan
Functional magnetic resonance imaging (fMRI) is a noninvasive and in-vivo imaging technique essential for measuring brain activity. Functional connectivity is used to study associations between brain regions, either while study subjects perform tasks or during periods of rest. In this paper, we propose a rigorous definition of task-evoked functional connectivity at the population level (ptFC). Importantly, our proposed ptFC is interpretable in the context of task-fMRI studies. An algorithm for estimating the ptFC is provided. We present the performance of the proposed algorithm compared to existing functional connectivity frameworks using simulations. Lastly, we apply the proposed algorithm to estimate the ptFC in a motor-task study from the Human Connectome Project.
{"title":"Population-level task-evoked functional connectivity via Fourier analysis.","authors":"Kun Meng, Ani Eloyan","doi":"10.1093/jrsssc/qlae015","DOIUrl":"10.1093/jrsssc/qlae015","url":null,"abstract":"<p><p>Functional magnetic resonance imaging (fMRI) is a noninvasive and in-vivo imaging technique essential for measuring brain activity. Functional connectivity is used to study associations between brain regions, either while study subjects perform tasks or during periods of rest. In this paper, we propose a rigorous definition of task-evoked functional connectivity at the population level (ptFC). Importantly, our proposed ptFC is interpretable in the context of task-fMRI studies. An algorithm for estimating the ptFC is provided. We present the performance of the proposed algorithm compared to existing functional connectivity frameworks using simulations. Lastly, we apply the proposed algorithm to estimate the ptFC in a motor-task study from the Human Connectome Project.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11321825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141983699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-29eCollection Date: 2024-06-01DOI: 10.1093/jrsssc/qlae010
Charlotte Fowler, Xiaoxuan Cai, Justin T Baker, Jukka-Pekka Onnela, Linda Valeri
The use of digital devices to collect data in mobile health studies introduces a novel application of time series methods, with the constraint of potential data missing at random or missing not at random (MNAR). In time-series analysis, testing for stationarity is an important preliminary step to inform appropriate subsequent analyses. The Dickey-Fuller test evaluates the null hypothesis of unit root non-stationarity, under no missing data. Beyond recommendations under data missing completely at random for complete case analysis or last observation carry forward imputation, researchers have not extended unit root non-stationarity testing to more complex missing data mechanisms. Multiple imputation with chained equations, Kalman smoothing imputation, and linear interpolation have also been used for time-series data, however such methods impose constraints on the autocorrelation structure and impact unit root testing. We propose maximum likelihood estimation and multiple imputation using state space model approaches to adapt the augmented Dickey-Fuller test to a context with missing data. We further develop sensitivity analyses to examine the impact of MNAR data. We evaluate the performance of existing and proposed methods across missing mechanisms in extensive simulations and in their application to a multi-year smartphone study of bipolar patients.
{"title":"Testing unit root non-stationarity in the presence of missing data in univariate time series of mobile health studies.","authors":"Charlotte Fowler, Xiaoxuan Cai, Justin T Baker, Jukka-Pekka Onnela, Linda Valeri","doi":"10.1093/jrsssc/qlae010","DOIUrl":"10.1093/jrsssc/qlae010","url":null,"abstract":"<p><p>The use of digital devices to collect data in mobile health studies introduces a novel application of time series methods, with the constraint of potential data missing at random or missing not at random (MNAR). In time-series analysis, testing for stationarity is an important preliminary step to inform appropriate subsequent analyses. The Dickey-Fuller test evaluates the null hypothesis of unit root non-stationarity, under no missing data. Beyond recommendations under data missing completely at random for complete case analysis or last observation carry forward imputation, researchers have not extended unit root non-stationarity testing to more complex missing data mechanisms. Multiple imputation with chained equations, Kalman smoothing imputation, and linear interpolation have also been used for time-series data, however such methods impose constraints on the autocorrelation structure and impact unit root testing. We propose maximum likelihood estimation and multiple imputation using state space model approaches to adapt the augmented Dickey-Fuller test to a context with missing data. We further develop sensitivity analyses to examine the impact of MNAR data. We evaluate the performance of existing and proposed methods across missing mechanisms in extensive simulations and in their application to a multi-year smartphone study of bipolar patients.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11175825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141332377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-13eCollection Date: 2024-06-01DOI: 10.1093/jrsssc/qlae008
Vanessa McNealis, Erica E M Moodie, Nema Dean
In many contexts, particularly when study subjects are adolescents, peer effects can invalidate typical statistical requirements in the data. For instance, it is plausible that a student's academic performance is influenced both by their own mother's educational level as well as that of their peers. Since the underlying social network is measured, the Add Health study provides a unique opportunity to examine the impact of maternal college education on adolescent school performance, both direct and indirect. However, causal inference on populations embedded in social networks poses technical challenges, since the typical no interference assumption no longer holds. While inverse probability-of-treatment weighted (IPW) estimators have been developed for this setting, they are often highly unstable. Motivated by the question of maternal education, we propose doubly robust (DR) estimators combining models for treatment and outcome that are consistent and asymptotically normal if either model is correctly specified. We present empirical results that illustrate the DR property and the efficiency gain of DR over IPW estimators even when the treatment model is misspecified. Contrary to previous studies, our robust analysis does not provide evidence of an indirect effect of maternal education on academic performance within adolescents' social circles in Add Health.
在很多情况下,特别是当研究对象是青少年时,同伴效应会使数据中典型的统计要求失效。例如,学生的学业成绩可能既受其母亲教育水平的影响,也受其同伴教育水平的影响。由于对基本社会网络进行了测量,"Add Health "研究提供了一个独特的机会来研究母亲的大学教育对青少年学习成绩的直接和间接影响。然而,由于典型的无干扰假设不再成立,因此对嵌入社会网络的人群进行因果推断面临技术挑战。虽然针对这种情况已经开发出了治疗概率反向加权(IPW)估算器,但这些估算器往往非常不稳定。受孕产妇教育问题的启发,我们提出了结合治疗模型和结果模型的双重稳健(DR)估计器,如果其中任何一个模型指定正确,这些估计器都是一致和渐近正常的。我们提出的实证结果表明了 DR 特性以及 DR 相对于 IPW 估计器的效率增益,即使在处理模型被错误指定的情况下也是如此。与以往的研究相反,我们的稳健分析没有提供证据表明,在 Add Health 的青少年社交圈中,母亲教育对学习成绩有间接影响。
{"title":"Revisiting the effects of maternal education on adolescents' academic performance: Doubly robust estimation in a network-based observational study.","authors":"Vanessa McNealis, Erica E M Moodie, Nema Dean","doi":"10.1093/jrsssc/qlae008","DOIUrl":"10.1093/jrsssc/qlae008","url":null,"abstract":"<p><p>In many contexts, particularly when study subjects are adolescents, peer effects can invalidate typical statistical requirements in the data. For instance, it is plausible that a student's academic performance is influenced both by their own mother's educational level as well as that of their peers. Since the underlying social network is measured, the Add Health study provides a unique opportunity to examine the impact of maternal college education on adolescent school performance, both direct and indirect. However, causal inference on populations embedded in social networks poses technical challenges, since the typical no interference assumption no longer holds. While inverse probability-of-treatment weighted (IPW) estimators have been developed for this setting, they are often highly unstable. Motivated by the question of maternal education, we propose doubly robust (DR) estimators combining models for treatment and outcome that are consistent and asymptotically normal if either model is correctly specified. We present empirical results that illustrate the DR property and the efficiency gain of DR over IPW estimators even when the treatment model is misspecified. Contrary to previous studies, our robust analysis does not provide evidence of an indirect effect of maternal education on academic performance within adolescents' social circles in Add Health.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11175826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141332376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-07eCollection Date: 2024-06-01DOI: 10.1093/jrsssc/qlae006
Nancy L Garcia, Mariana Rodrigues-Motta, Helio S Migon, Eva Petkova, Thaddeus Tarpey, R Todd Ogden, Julio O Giordano, Martin M Perez
We consider unsupervised classification by means of a latent multinomial variable which categorizes a scalar response into one of the L components of a mixture model which incorporates scalar and functional covariates. This process can be thought as a hierarchical model with the first level modelling a scalar response according to a mixture of parametric distributions and the second level modelling the mixture probabilities by means of a generalized linear model with functional and scalar covariates. The traditional approach of treating functional covariates as vectors not only suffers from the curse of dimensionality, since functional covariates can be measured at very small intervals leading to a highly parametrized model, but also does not take into account the nature of the data. We use basis expansions to reduce the dimensionality and a Bayesian approach for estimating the parameters while providing predictions of the latent classification vector. The method is motivated by two data examples that are not easily handled by existing methods. The first example concerns identifying placebo responders on a clinical trial (normal mixture model) and the other predicting illness for milking cows (zero-inflated mixture of the Poisson model).
我们考虑通过一个潜在的多项式变量进行无监督分类,该变量将标量响应归类到包含标量和函数协变量的混合物模型的 L 个分量之一。这一过程可视为一个分层模型,第一层根据参数分布的混合物对标量响应进行建模,第二层通过包含功能和标量协变量的广义线性模型对混合物概率进行建模。将函数协变量视为向量的传统方法不仅存在维度诅咒,因为函数协变量的测量间隔可能非常小,导致模型高度参数化,而且没有考虑到数据的性质。我们使用基扩展来降低维度,并使用贝叶斯方法来估计参数,同时提供潜在分类向量的预测。该方法由两个现有方法不易处理的数据实例激发。第一个例子涉及识别临床试验中的安慰剂应答者(正态混合模型),另一个例子涉及预测挤奶奶牛的疾病(泊松模型的零膨胀混合)。
{"title":"Unsupervised Bayesian classification for models with scalar and functional covariates.","authors":"Nancy L Garcia, Mariana Rodrigues-Motta, Helio S Migon, Eva Petkova, Thaddeus Tarpey, R Todd Ogden, Julio O Giordano, Martin M Perez","doi":"10.1093/jrsssc/qlae006","DOIUrl":"10.1093/jrsssc/qlae006","url":null,"abstract":"<p><p>We consider unsupervised classification by means of a latent multinomial variable which categorizes a scalar response into one of the L components of a mixture model which incorporates scalar and functional covariates. This process can be thought as a hierarchical model with the first level modelling a scalar response according to a mixture of parametric distributions and the second level modelling the mixture probabilities by means of a generalized linear model with functional and scalar covariates. The traditional approach of treating functional covariates as vectors not only suffers from the curse of dimensionality, since functional covariates can be measured at very small intervals leading to a highly parametrized model, but also does not take into account the nature of the data. We use basis expansions to reduce the dimensionality and a Bayesian approach for estimating the parameters while providing predictions of the latent classification vector. The method is motivated by two data examples that are not easily handled by existing methods. The first example concerns identifying placebo responders on a clinical trial (normal mixture model) and the other predicting illness for milking cows (zero-inflated mixture of the Poisson model).</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11271982/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141789691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-01eCollection Date: 2024-06-01DOI: 10.1093/jrsssc/qlae003
Xinyuan Tian, Maria Ciarleglio, Jiachen Cai, Erich J Greene, Denise Esserman, Fan Li, Yize Zhao
Recurrent events are common in clinical studies and are often subject to terminal events. In pragmatic trials, participants are often nested in clinics and can be susceptible or structurally unsusceptible to the recurrent events. We develop a Bayesian shared random effects model to accommodate this complex data structure. To achieve robustness, we consider the Dirichlet processes to model the residual of the accelerated failure time model for the survival process as well as the cluster-specific shared frailty distribution, along with an efficient sampling algorithm for posterior inference. Our method is applied to a recent cluster randomized trial on fall injury prevention.
{"title":"Bayesian semi-parametric inference for clustered recurrent events with zero inflation and a terminal event.","authors":"Xinyuan Tian, Maria Ciarleglio, Jiachen Cai, Erich J Greene, Denise Esserman, Fan Li, Yize Zhao","doi":"10.1093/jrsssc/qlae003","DOIUrl":"10.1093/jrsssc/qlae003","url":null,"abstract":"<p><p>Recurrent events are common in clinical studies and are often subject to terminal events. In pragmatic trials, participants are often nested in clinics and can be susceptible or structurally unsusceptible to the recurrent events. We develop a Bayesian shared random effects model to accommodate this complex data structure. To achieve robustness, we consider the Dirichlet processes to model the residual of the accelerated failure time model for the survival process as well as the cluster-specific shared frailty distribution, along with an efficient sampling algorithm for posterior inference. Our method is applied to a recent cluster randomized trial on fall injury prevention.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11271983/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141789690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Multi-source longitudinal data have become increasingly common. This type of data refers to longitudinal datasets collected from multiple sources describing the same set of individuals. Representing distinct features of the individuals, each data source may consist of multiple longitudinal markers of distinct types and measurement frequencies. Motivated by the CHILD cohort study, we develop a model for joint clustering multi-source longitudinal data. The proposed model allows each data source to follow source-specific clustering, and they are aggregated to yield a global clustering. The proposed model is demonstrated through real-data analysis and simulation study.
{"title":"A Bayesian latent class model for integrating multi-source longitudinal data: application to the CHILD cohort study","authors":"Zihang Lu, Padmaja Subbarao, Wendy Lou","doi":"10.1093/jrsssc/qlad100","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad100","url":null,"abstract":"Abstract Multi-source longitudinal data have become increasingly common. This type of data refers to longitudinal datasets collected from multiple sources describing the same set of individuals. Representing distinct features of the individuals, each data source may consist of multiple longitudinal markers of distinct types and measurement frequencies. Motivated by the CHILD cohort study, we develop a model for joint clustering multi-source longitudinal data. The proposed model allows each data source to follow source-specific clustering, and they are aggregated to yield a global clustering. The proposed model is demonstrated through real-data analysis and simulation study.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julie Zhang, Gabriel A Preising, Molly Schumer, Julia A Palacios
Abstract An important problem in evolutionary genomics is to investigate whether a certain trait measured on each sample is associated with the sample phylogenetic tree. The phylogenetic tree represents the shared evolutionary history of the samples and it is usually estimated from molecular sequence data at a locus or from other type of genetic data. We propose a model for trait evolution inspired by the Chinese Restaurant Process that includes a parameter that controls the degree of preferential attachment, that is, the tendency of nodes in the tree to subtend from nodes of the same type. This model with no preferential attachment is equivalent to a structured coalescent model with simultaneous migration and coalescence events and serves as a null model. We derive a test for phylogenetic binary trait association with linear computational complexity and empirically demonstrate that it is more powerful than some other methods. We apply our test to study the phylogenetic association of some traits in swordtail fish, breast cancer, yellow fever virus, and influenza A H1N1 virus. R-package implementation of our methods is available at https://github.com/jyzhang27/CRPTree.
{"title":"CRP-Tree: a phylogenetic association test for binary traits","authors":"Julie Zhang, Gabriel A Preising, Molly Schumer, Julia A Palacios","doi":"10.1093/jrsssc/qlad098","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad098","url":null,"abstract":"Abstract An important problem in evolutionary genomics is to investigate whether a certain trait measured on each sample is associated with the sample phylogenetic tree. The phylogenetic tree represents the shared evolutionary history of the samples and it is usually estimated from molecular sequence data at a locus or from other type of genetic data. We propose a model for trait evolution inspired by the Chinese Restaurant Process that includes a parameter that controls the degree of preferential attachment, that is, the tendency of nodes in the tree to subtend from nodes of the same type. This model with no preferential attachment is equivalent to a structured coalescent model with simultaneous migration and coalescence events and serves as a null model. We derive a test for phylogenetic binary trait association with linear computational complexity and empirically demonstrate that it is more powerful than some other methods. We apply our test to study the phylogenetic association of some traits in swordtail fish, breast cancer, yellow fever virus, and influenza A H1N1 virus. R-package implementation of our methods is available at https://github.com/jyzhang27/CRPTree.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136281917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fedelis Mutiso, Hong Li, John L Pearce, Sara E Benjamin-Neelon, Noel T Mueller, Brian Neelon
Abstract The COVID-19 pandemic created an unprecedented global health crisis. Recent studies suggest that socially vulnerable communities were disproportionately impacted, although findings are mixed. To quantify social vulnerability in the US, many studies rely on the Social Vulnerability Index (SVI), a county-level measure comprising 15 census variables. Typically, the SVI is modelled in an additive manner, which may obscure non-linear or interactive associations, further contributing to inconsistent findings. As a more robust alternative, we propose a negative binomial Bayesian kernel machine regression (BKMR) model to investigate dynamic associations between social vulnerability and COVID-19 death rates, thus extending BKMR to the count data setting. The model produces a ‘vulnerability effect’ that quantifies the impact of vulnerability on COVID-19 death rates in each county. The method can also identify the relative importance of various SVI variables and make future predictions as county vulnerability profiles evolve. To capture spatio-temporal heterogeneity, the model incorporates spatial effects, county-level covariates, and smooth temporal functions. For Bayesian computation, we propose a tractable data-augmented Gibbs sampler. We conduct a simulation study to highlight the approach and apply the method to a study of COVID-19 deaths in the US state of South Carolina during the 2021 calendar year.
{"title":"Bayesian kernel machine regression for count data: modelling the association between social vulnerability and COVID-19 deaths in South Carolina","authors":"Fedelis Mutiso, Hong Li, John L Pearce, Sara E Benjamin-Neelon, Noel T Mueller, Brian Neelon","doi":"10.1093/jrsssc/qlad094","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad094","url":null,"abstract":"Abstract The COVID-19 pandemic created an unprecedented global health crisis. Recent studies suggest that socially vulnerable communities were disproportionately impacted, although findings are mixed. To quantify social vulnerability in the US, many studies rely on the Social Vulnerability Index (SVI), a county-level measure comprising 15 census variables. Typically, the SVI is modelled in an additive manner, which may obscure non-linear or interactive associations, further contributing to inconsistent findings. As a more robust alternative, we propose a negative binomial Bayesian kernel machine regression (BKMR) model to investigate dynamic associations between social vulnerability and COVID-19 death rates, thus extending BKMR to the count data setting. The model produces a ‘vulnerability effect’ that quantifies the impact of vulnerability on COVID-19 death rates in each county. The method can also identify the relative importance of various SVI variables and make future predictions as county vulnerability profiles evolve. To capture spatio-temporal heterogeneity, the model incorporates spatial effects, county-level covariates, and smooth temporal functions. For Bayesian computation, we propose a tractable data-augmented Gibbs sampler. We conduct a simulation study to highlight the approach and apply the method to a study of COVID-19 deaths in the US state of South Carolina during the 2021 calendar year.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135874680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01Epub Date: 2023-07-15DOI: 10.1093/jrsssc/qlad063
Jarcy Zee, Laura Mariani, Laura Barisoni, Parag Mahajan, Brenda Gillespie
Many existing methods for estimating agreement correct for chance agreement by adjusting the observed proportion agreement by the probability of chance agreement based on different assumptions. These assumptions may not always be appropriate, as demonstrated by pathologists' ratings of kidney biopsy descriptors. We propose a novel agreement statistic that accounts for the empirical probability of chance agreement, estimated by collecting additional data on rater uncertainty for each rating. A standard error estimator for the proposed statistic is derived. Simulation studies show that in most cases, our proposed statistic is unbiased in estimating the probability of agreement after removing chance agreement.
{"title":"A novel agreement statistic using data on uncertainty in ratings.","authors":"Jarcy Zee, Laura Mariani, Laura Barisoni, Parag Mahajan, Brenda Gillespie","doi":"10.1093/jrsssc/qlad063","DOIUrl":"10.1093/jrsssc/qlad063","url":null,"abstract":"<p><p>Many existing methods for estimating agreement correct for chance agreement by adjusting the observed proportion agreement by the probability of chance agreement based on different assumptions. These assumptions may not always be appropriate, as demonstrated by pathologists' ratings of kidney biopsy descriptors. We propose a novel agreement statistic that accounts for the empirical probability of chance agreement, estimated by collecting additional data on rater uncertainty for each rating. A standard error estimator for the proposed statistic is derived. Simulation studies show that in most cases, our proposed statistic is unbiased in estimating the probability of agreement after removing chance agreement.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10881211/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72618304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}