首页 > 最新文献

Statistics in Medicine最新文献

英文 中文
A Bayesian Approach to Modeling Variance of Intensive Longitudinal Biomarker Data as a Predictor of Health Outcomes. 用贝叶斯方法对作为健康结果预测因子的密集纵向生物标志物数据的方差进行建模。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-14 DOI: 10.1002/sim.10281
Mingyan Yu, Zhenke Wu, Margaret Hicken, Michael R Elliott

Intensive longitudinal biomarker data are increasingly common in scientific studies that seek temporally granular understanding of the role of behavioral and physiological factors in relation to outcomes of interest. Intensive longitudinal biomarker data, such as those obtained from wearable devices, are often obtained at a high frequency typically resulting in several hundred to thousand observations per individual measured over minutes, hours, or days. Often in longitudinal studies, the primary focus is on relating the means of biomarker trajectories to an outcome, and the variances are treated as nuisance parameters, although they may also be informative for the outcomes. In this paper, we propose a Bayesian hierarchical model to jointly model a cross-sectional outcome and the intensive longitudinal biomarkers. To model the variability of biomarkers and deal with the high intensity of data, we develop subject-level cubic B-splines and allow the sharing of information across individuals for both the residual variability and the random effects variability. Then different levels of variability are extracted and incorporated into an outcome submodel for inferential and predictive purposes. We demonstrate the utility of the proposed model via an application involving bio-monitoring of hertz-level heart rate information from a study on social stress.

密集型纵向生物标志物数据在科学研究中越来越常见,这些研究寻求从时间粒度上了解行为和生理因素在相关结果中的作用。密集型纵向生物标志物数据(如从可穿戴设备中获取的数据)通常以较高的频率获取,每个人在几分钟、几小时或几天内可获得几百到几千个观测值。纵向研究的主要重点往往是将生物标记物轨迹的均值与结果联系起来,而方差则被视为干扰参数,尽管它们也可能对结果具有参考价值。在本文中,我们提出了一种贝叶斯分层模型,用于对横截面结果和密集的纵向生物标记物进行联合建模。为了对生物标志物的变异性进行建模并处理高强度数据,我们开发了主体级立方 B 样条,并允许跨个体共享残差变异性和随机效应变异性的信息。然后提取不同水平的变异性,并将其纳入结果子模型,用于推断和预测目的。我们通过一项关于社会压力研究中赫兹级心率信息的生物监测应用,展示了所提模型的实用性。
{"title":"A Bayesian Approach to Modeling Variance of Intensive Longitudinal Biomarker Data as a Predictor of Health Outcomes.","authors":"Mingyan Yu, Zhenke Wu, Margaret Hicken, Michael R Elliott","doi":"10.1002/sim.10281","DOIUrl":"10.1002/sim.10281","url":null,"abstract":"<p><p>Intensive longitudinal biomarker data are increasingly common in scientific studies that seek temporally granular understanding of the role of behavioral and physiological factors in relation to outcomes of interest. Intensive longitudinal biomarker data, such as those obtained from wearable devices, are often obtained at a high frequency typically resulting in several hundred to thousand observations per individual measured over minutes, hours, or days. Often in longitudinal studies, the primary focus is on relating the means of biomarker trajectories to an outcome, and the variances are treated as nuisance parameters, although they may also be informative for the outcomes. In this paper, we propose a Bayesian hierarchical model to jointly model a cross-sectional outcome and the intensive longitudinal biomarkers. To model the variability of biomarkers and deal with the high intensity of data, we develop subject-level cubic B-splines and allow the sharing of information across individuals for both the residual variability and the random effects variability. Then different levels of variability are extracted and incorporated into an outcome submodel for inferential and predictive purposes. We demonstrate the utility of the proposed model via an application involving bio-monitoring of hertz-level heart rate information from a study on social stress.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformed ROC Curve for Biomarker Evaluation. 生物标记物评估的转换 ROC 曲线
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-12 DOI: 10.1002/sim.10268
Jianping Yang, Pei-Fen Kuan, Xiangyu Li, Jialiang Li, Xiao-Hua Zhou

To complement the conventional area under the ROC curve (AUC) which cannot fully describe the diagnostic accuracy of some non-standard biomarkers, we introduce a transformed ROC curve and its associated transformed AUC (TAUC) in this article, and show that TAUC can relate the original improper biomarker to a proper biomarker after a non-monotone transformation. We then provide nonparametric estimation of the non-monotone transformation and TAUC, and establish their consistency and asymptotic normality. We conduct extensive simulation studies to assess the performance of the proposed TAUC method and compare with the traditional methods. Case studies on real biomedical data are provided to illustrate the proposed TAUC method. We are able to identify more important biomarkers that tend to escape the traditional screening method.

传统的 ROC 曲线下面积(AUC)不能完全描述某些非标准生物标记物的诊断准确性,为了对其进行补充,我们在本文中引入了转化 ROC 曲线及其相关的转化 AUC(TAUC),并证明 TAUC 可以将原始的不恰当生物标记物与经过非单调转化后的恰当生物标记物联系起来。然后,我们对非单调变换和 TAUC 进行了非参数估计,并确定了它们的一致性和渐近正态性。我们进行了广泛的模拟研究,以评估所提出的 TAUC 方法的性能,并与传统方法进行比较。我们还提供了真实生物医学数据的案例研究,以说明所提出的 TAUC 方法。我们能够识别出传统筛选方法往往无法识别的更重要的生物标志物。
{"title":"Transformed ROC Curve for Biomarker Evaluation.","authors":"Jianping Yang, Pei-Fen Kuan, Xiangyu Li, Jialiang Li, Xiao-Hua Zhou","doi":"10.1002/sim.10268","DOIUrl":"https://doi.org/10.1002/sim.10268","url":null,"abstract":"<p><p>To complement the conventional area under the ROC curve (AUC) which cannot fully describe the diagnostic accuracy of some non-standard biomarkers, we introduce a transformed ROC curve and its associated transformed AUC (TAUC) in this article, and show that TAUC can relate the original improper biomarker to a proper biomarker after a non-monotone transformation. We then provide nonparametric estimation of the non-monotone transformation and TAUC, and establish their consistency and asymptotic normality. We conduct extensive simulation studies to assess the performance of the proposed TAUC method and compare with the traditional methods. Case studies on real biomedical data are provided to illustrate the proposed TAUC method. We are able to identify more important biomarkers that tend to escape the traditional screening method.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142628004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shape Mediation Analysis in Alzheimer's Disease Studies. 阿尔茨海默病研究中的形状中介分析。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-12 DOI: 10.1002/sim.10265
Xingcai Zhou, Miyeon Yeon, Jiangyan Wang, Shengxian Ding, Kaizhou Lei, Yanyong Zhao, Rongjie Liu, Chao Huang

As a crucial tool in neuroscience, mediation analysis has been developed and widely adopted to elucidate the role of intermediary variables derived from neuroimaging data. Typically, structural equation models (SEMs) are employed to investigate the influences of exposures on outcomes, with model coefficients being interpreted as causal effects. While existing SEMs have proven to be effective tools for mediation analysis involving various neuroimaging-related mediators, limited research has explored scenarios where these mediators are derived from the shape space. In addition, the linear relationship assumption adopted in existing SEMs may lead to substantial efficiency losses and decreased predictive accuracy in real-world applications. To address these challenges, we introduce a novel framework for shape mediation analysis, designed to explore the causal relationships between genetic exposures and clinical outcomes, whether mediated or unmediated by shape-related factors while accounting for potential confounding variables. Within our framework, we apply the square-root velocity function to extract elastic shape representations, which reside within the linear Hilbert space of square-integrable functions. Subsequently, we introduce a two-layer shape regression model to characterize the relationships among neurocognitive outcomes, elastic shape mediators, genetic exposures, and clinical confounders. Both estimation and inference procedures are established for unknown parameters along with the corresponding causal estimands. The asymptotic properties of estimated quantities are investigated as well. Both simulated studies and real-data analyses demonstrate the superior performance of our proposed method in terms of estimation accuracy and robustness when compared to existing approaches for estimating causal estimands.

作为神经科学的重要工具,中介分析已被开发并广泛采用,以阐明从神经影像数据中得出的中间变量的作用。通常情况下,采用结构方程模型(SEM)来研究暴露因素对结果的影响,并将模型系数解释为因果效应。虽然现有的 SEM 已被证明是涉及各种神经影像相关中介因子的中介分析的有效工具,但对这些中介因子来自形状空间的情景的探索却很有限。此外,现有 SEM 采用的线性关系假设可能会导致实际应用中的效率损失和预测准确性降低。为了应对这些挑战,我们引入了一种新的形状中介分析框架,旨在探索遗传暴露与临床结果之间的因果关系,无论是否由形状相关因素中介,同时考虑潜在的混杂变量。在我们的框架中,我们应用平方根速度函数来提取弹性形状表征,这些表征位于平方可积分函数的线性希尔伯特空间中。随后,我们引入了一个双层形状回归模型来描述神经认知结果、弹性形状介导因素、遗传暴露和临床混杂因素之间的关系。我们为未知参数和相应的因果估计值建立了估计和推理程序。此外,还研究了估计量的渐近特性。模拟研究和真实数据分析都表明,与现有的因果估计方法相比,我们提出的方法在估计准确性和稳健性方面都有卓越表现。
{"title":"Shape Mediation Analysis in Alzheimer's Disease Studies.","authors":"Xingcai Zhou, Miyeon Yeon, Jiangyan Wang, Shengxian Ding, Kaizhou Lei, Yanyong Zhao, Rongjie Liu, Chao Huang","doi":"10.1002/sim.10265","DOIUrl":"https://doi.org/10.1002/sim.10265","url":null,"abstract":"<p><p>As a crucial tool in neuroscience, mediation analysis has been developed and widely adopted to elucidate the role of intermediary variables derived from neuroimaging data. Typically, structural equation models (SEMs) are employed to investigate the influences of exposures on outcomes, with model coefficients being interpreted as causal effects. While existing SEMs have proven to be effective tools for mediation analysis involving various neuroimaging-related mediators, limited research has explored scenarios where these mediators are derived from the shape space. In addition, the linear relationship assumption adopted in existing SEMs may lead to substantial efficiency losses and decreased predictive accuracy in real-world applications. To address these challenges, we introduce a novel framework for shape mediation analysis, designed to explore the causal relationships between genetic exposures and clinical outcomes, whether mediated or unmediated by shape-related factors while accounting for potential confounding variables. Within our framework, we apply the square-root velocity function to extract elastic shape representations, which reside within the linear Hilbert space of square-integrable functions. Subsequently, we introduce a two-layer shape regression model to characterize the relationships among neurocognitive outcomes, elastic shape mediators, genetic exposures, and clinical confounders. Both estimation and inference procedures are established for unknown parameters along with the corresponding causal estimands. The asymptotic properties of estimated quantities are investigated as well. Both simulated studies and real-data analyses demonstrate the superior performance of our proposed method in terms of estimation accuracy and robustness when compared to existing approaches for estimating causal estimands.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142628013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
1 $$ {ell}_1 $$ -Penalized Multinomial Regression: Estimation, Inference, and Prediction, With an Application to Risk Factor Identification for Different Dementia Subtypes. ℓ 1 $$ {ell}_1 $$ -Penalized Multinomial Regression:估计、推理和预测,应用于不同痴呆亚型的风险因素识别。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-12 DOI: 10.1002/sim.10263
Ye Tian, Henry Rusinek, Arjun V Masurkar, Yang Feng

High-dimensional multinomial regression models are very useful in practice but have received less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast-based 1 $$ {ell}_1 $$ -penalized multinomial regression model and extend the debiasing method to the multinomial case, providing a valid confidence interval for each coefficient and p $$ p $$ value of the individual hypothesis test. We also examine cases of model misspecification and non-identically distributed data to demonstrate the robustness of our method when some assumptions are violated. We apply the debiasing method to identify important predictors in the progression into dementia of different subtypes. Results from extensive simulations show the superiority of the debiasing method compared to other inference methods.

高维多叉回归模型在实践中非常有用,但与逻辑回归模型相比,它受到的研究关注较少,尤其是从统计推断的角度来看。在这项工作中,我们分析了基于对比度的 ℓ 1 $$ {ell}_1 $$ -penalized 多叉回归模型的估计和预测误差,并将去尾法扩展到多叉情况,为每个系数和单个假设检验的 p $$ p $$ 值提供了有效的置信区间。我们还研究了模型规范错误和非同分布数据的情况,以证明我们的方法在违反某些假设时的稳健性。我们应用去杂方法来识别不同亚型痴呆症进展过程中的重要预测因素。大量模拟结果表明,与其他推理方法相比,去杂方法更具优势。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\"><ns0:math> <ns0:semantics> <ns0:mrow> <ns0:msub><ns0:mrow><ns0:mi>ℓ</ns0:mi></ns0:mrow> <ns0:mrow><ns0:mn>1</ns0:mn></ns0:mrow> </ns0:msub> </ns0:mrow> <ns0:annotation>$$ {ell}_1 $$</ns0:annotation></ns0:semantics> </ns0:math> -Penalized Multinomial Regression: Estimation, Inference, and Prediction, With an Application to Risk Factor Identification for Different Dementia Subtypes.","authors":"Ye Tian, Henry Rusinek, Arjun V Masurkar, Yang Feng","doi":"10.1002/sim.10263","DOIUrl":"https://doi.org/10.1002/sim.10263","url":null,"abstract":"<p><p>High-dimensional multinomial regression models are very useful in practice but have received less research attention than logistic regression models, especially from the perspective of statistical inference. In this work, we analyze the estimation and prediction error of the contrast-based <math> <semantics> <mrow> <msub><mrow><mi>ℓ</mi></mrow> <mrow><mn>1</mn></mrow> </msub> </mrow> <annotation>$$ {ell}_1 $$</annotation></semantics> </math> -penalized multinomial regression model and extend the debiasing method to the multinomial case, providing a valid confidence interval for each coefficient and <math> <semantics><mrow><mi>p</mi></mrow> <annotation>$$ p $$</annotation></semantics> </math> value of the individual hypothesis test. We also examine cases of model misspecification and non-identically distributed data to demonstrate the robustness of our method when some assumptions are violated. We apply the debiasing method to identify important predictors in the progression into dementia of different subtypes. Results from extensive simulations show the superiority of the debiasing method compared to other inference methods.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A General Framework for the Multiple Nonparametric Behrens-Fisher Problem With Dependent Replicates. 依赖复制的多重非参数 Behrens-Fisher 问题的一般框架》(A General Framework for the Multiple Nonparametric Behrens-Fisher Problem With Dependent Replicates.
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-11 DOI: 10.1002/sim.10262
Erin Sprünken, Robert Mertens, Frank Konietschke

In many trials and experiments, subjects are not only observed once but multiple times, resulting in a cluster of possibly correlated observations (e.g., brain regions per patient). Observations often do not fulfill model assumptions of mixed models and require the use of nonparametric methods. In this article, we develop and present a purely nonparametric rank-based procedure that flexibly allows the unbiased and consistent estimation of the Wilcoxon-Mann-Whitney effect P ( X < Y ) + 1 2 P ( X = Y ) $$ Pleft(X in clustered data designs. Compared with existing methods, we allow flexible weights to be used in effect estimation. Additionally, we develop global and multiple contrast test procedures to test null hypotheses formulated regarding the generalized Mann-Whitney effects and for the computation of range-preserving simultaneous confidence intervals in a unified way. Extensive simulation studies show that these methods control the type-I error rate well and have reasonable power to detect alternatives in various situations.

在许多试验和实验中,受试者不仅会被观察一次,还会被观察多次,从而产生一组可能相关的观察结果(例如,每个患者的脑区)。观察结果往往不符合混合模型的模型假设,因此需要使用非参数方法。在本文中,我们开发并提出了一种纯粹的基于秩的非参数程序,可以在聚类数据设计中灵活地对 Wilcoxon-Mann-Whitney 效应 P ( X Y ) + 1 2 P ( X = Y ) $$ Pleft(X 进行无偏且一致的估计。与现有方法相比,我们允许在效应估计中使用灵活的权重。此外,我们还开发了全局和多重对比检验程序,用于检验广义曼-惠特尼效应的零假设,并以统一的方式计算保留范围的同步置信区间。大量的模拟研究表明,这些方法能很好地控制 I 类错误率,并在各种情况下具有合理的检测能力。
{"title":"A General Framework for the Multiple Nonparametric Behrens-Fisher Problem With Dependent Replicates.","authors":"Erin Sprünken, Robert Mertens, Frank Konietschke","doi":"10.1002/sim.10262","DOIUrl":"https://doi.org/10.1002/sim.10262","url":null,"abstract":"<p><p>In many trials and experiments, subjects are not only observed once but multiple times, resulting in a cluster of possibly correlated observations (e.g., brain regions per patient). Observations often do not fulfill model assumptions of mixed models and require the use of nonparametric methods. In this article, we develop and present a purely nonparametric rank-based procedure that flexibly allows the unbiased and consistent estimation of the Wilcoxon-Mann-Whitney effect <math> <semantics><mrow><mi>P</mi> <mo>(</mo> <mi>X</mi> <mo><</mo> <mi>Y</mi> <mo>)</mo> <mo>+</mo> <mfrac><mrow><mn>1</mn></mrow> <mrow><mn>2</mn></mrow> </mfrac> <mi>P</mi> <mo>(</mo> <mi>X</mi> <mo>=</mo> <mi>Y</mi> <mo>)</mo></mrow> <annotation>$$ Pleft(X<Yright)+frac{1}{2}Pleft(X=Yright) $$</annotation></semantics> </math> in clustered data designs. Compared with existing methods, we allow flexible weights to be used in effect estimation. Additionally, we develop global and multiple contrast test procedures to test null hypotheses formulated regarding the generalized Mann-Whitney effects and for the computation of range-preserving simultaneous confidence intervals in a unified way. Extensive simulation studies show that these methods control the type-I error rate well and have reasonable power to detect alternatives in various situations.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hierarchical Bayesian Model for Estimating Age-Specific COVID-19 Infection Fatality Rates in Developing Countries. 估算发展中国家特定年龄 COVID-19 感染致死率的层次贝叶斯模型。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-11 DOI: 10.1002/sim.10259
Sierra Pugh, Andrew T Levin, Gideon Meyerowitz-Katz, Satej Soman, Nana Owusu-Boaitey, Anthony B Zwi, Anup Malani, Ander Wilson, Bailey K Fosdick

The COVID-19 infection fatality rate (IFR) is the proportion of individuals infected with SARS-CoV-2 who subsequently die. As COVID-19 disproportionately affects older individuals, age-specific IFR estimates are imperative to facilitate comparisons of the impact of COVID-19 between locations and prioritize distribution of scarce resources. However, there lacks a coherent method to synthesize available data to create estimates of IFR and seroprevalence that vary continuously with age and adequately reflect uncertainties inherent in the underlying data. In this article, we introduce a novel Bayesian hierarchical model to estimate IFR as a continuous function of age that acknowledges heterogeneity in population age structure across locations and accounts for uncertainty in the estimates due to seroprevalence sampling variability and the imperfect serology test assays. Our approach simultaneously models test assay characteristics, serology, and death data, where the serology and death data are often available only for binned age groups. Information is shared across locations through hierarchical modeling to improve estimation of the parameters with limited data. Modeling data from 26 developing country locations during the first year of the COVID-19 pandemic, we found seroprevalence did not change dramatically with age, and the IFR at age 60 was above the high-income country estimate for most locations.

COVID-19 感染致死率 (IFR) 是指感染 SARS-CoV-2 后死亡的人数比例。由于 COVID-19 对老年人的影响尤为严重,因此必须估算出特定年龄段的 IFR,以便于比较不同地区 COVID-19 的影响,并优先分配稀缺资源。然而,目前缺乏一种连贯的方法来综合现有数据,以得出随年龄不断变化并能充分反映基础数据固有不确定性的 IFR 和血清流行率估计值。在这篇文章中,我们引入了一种新的贝叶斯分层模型,以年龄的连续函数来估算IFR,该模型承认不同地点人群年龄结构的异质性,并考虑了由于血清流行率采样变异和血清学检测方法不完善而导致的估算值的不确定性。我们的方法同时对检测化验特性、血清学和死亡数据进行建模,而血清学和死亡数据通常只能提供二进制年龄组的数据。通过分层建模,各地共享信息,从而利用有限的数据改进参数估计。通过对 COVID-19 大流行第一年期间 26 个发展中国家的数据进行建模,我们发现血清流行率并没有随着年龄的增长而发生显著变化,而且大多数地区 60 岁时的 IFR 都高于高收入国家的估计值。
{"title":"A Hierarchical Bayesian Model for Estimating Age-Specific COVID-19 Infection Fatality Rates in Developing Countries.","authors":"Sierra Pugh, Andrew T Levin, Gideon Meyerowitz-Katz, Satej Soman, Nana Owusu-Boaitey, Anthony B Zwi, Anup Malani, Ander Wilson, Bailey K Fosdick","doi":"10.1002/sim.10259","DOIUrl":"https://doi.org/10.1002/sim.10259","url":null,"abstract":"<p><p>The COVID-19 infection fatality rate (IFR) is the proportion of individuals infected with SARS-CoV-2 who subsequently die. As COVID-19 disproportionately affects older individuals, age-specific IFR estimates are imperative to facilitate comparisons of the impact of COVID-19 between locations and prioritize distribution of scarce resources. However, there lacks a coherent method to synthesize available data to create estimates of IFR and seroprevalence that vary continuously with age and adequately reflect uncertainties inherent in the underlying data. In this article, we introduce a novel Bayesian hierarchical model to estimate IFR as a continuous function of age that acknowledges heterogeneity in population age structure across locations and accounts for uncertainty in the estimates due to seroprevalence sampling variability and the imperfect serology test assays. Our approach simultaneously models test assay characteristics, serology, and death data, where the serology and death data are often available only for binned age groups. Information is shared across locations through hierarchical modeling to improve estimation of the parameters with limited data. Modeling data from 26 developing country locations during the first year of the COVID-19 pandemic, we found seroprevalence did not change dramatically with age, and the IFR at age 60 was above the high-income country estimate for most locations.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142627999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating analytic models for individually randomized group treatment trials with complex clustering in nested and crossed designs. 评估嵌套和交叉设计中具有复杂聚类的单独随机分组治疗试验的分析模型。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-10 Epub Date: 2024-09-03 DOI: 10.1002/sim.10206
Jonathan C Moyer, Fan Li, Andrea J Cook, Patrick J Heagerty, Sherri L Pals, Elizabeth L Turner, Rui Wang, Yunji Zhou, Qilu Yu, Xueqi Wang, David M Murray

Many individually randomized group treatment (IRGT) trials randomly assign individuals to study arms but deliver treatments via shared agents, such as therapists, surgeons, or trainers. Post-randomization interactions induce correlations in outcome measures between participants sharing the same agent. Agents can be nested in or crossed with trial arm, and participants may interact with a single agent or with multiple agents. These complications have led to ambiguity in choice of models but there have been no systematic efforts to identify appropriate analytic models for these study designs. To address this gap, we undertook a simulation study to examine the performance of candidate analytic models in the presence of complex clustering arising from multiple membership, single membership, and single agent settings, in both nested and crossed designs and for a continuous outcome. With nested designs, substantial type I error rate inflation was observed when analytic models did not account for multiple membership and when analytic model weights characterizing the association with multiple agents did not match the data generating mechanism. Conversely, analytic models for crossed designs generally maintained nominal type I error rates unless there was notable imbalance in the number of participants that interact with each agent.

许多个体随机分组治疗(IRGT)试验将个体随机分配到研究臂,但通过共享代理(如治疗师、外科医生或培训师)提供治疗。随机化后的交互作用会诱发共享相同代理的参与者之间的结果测量相关性。代理人可以嵌套在试验臂中,也可以与试验臂交叉,参与者可以与单个代理人或多个代理人互动。这些复杂因素导致了模型选择的模糊性,但目前还没有系统性的工作来为这些研究设计确定合适的分析模型。为了填补这一空白,我们开展了一项模拟研究,以考察候选分析模型在嵌套设计和交叉设计中,在连续结果下,在多成员、单成员和单代理等复杂聚类情况下的表现。在嵌套设计中,当分析模型没有考虑多重成员时,以及当分析模型权重表征与多个代理的关联与数据生成机制不匹配时,观察到 I 类错误率大幅上升。相反,交叉设计的分析模型通常保持名义 I 型误差率,除非与每个代理互动的参与者人数明显失衡。
{"title":"Evaluating analytic models for individually randomized group treatment trials with complex clustering in nested and crossed designs.","authors":"Jonathan C Moyer, Fan Li, Andrea J Cook, Patrick J Heagerty, Sherri L Pals, Elizabeth L Turner, Rui Wang, Yunji Zhou, Qilu Yu, Xueqi Wang, David M Murray","doi":"10.1002/sim.10206","DOIUrl":"10.1002/sim.10206","url":null,"abstract":"<p><p>Many individually randomized group treatment (IRGT) trials randomly assign individuals to study arms but deliver treatments via shared agents, such as therapists, surgeons, or trainers. Post-randomization interactions induce correlations in outcome measures between participants sharing the same agent. Agents can be nested in or crossed with trial arm, and participants may interact with a single agent or with multiple agents. These complications have led to ambiguity in choice of models but there have been no systematic efforts to identify appropriate analytic models for these study designs. To address this gap, we undertook a simulation study to examine the performance of candidate analytic models in the presence of complex clustering arising from multiple membership, single membership, and single agent settings, in both nested and crossed designs and for a continuous outcome. With nested designs, substantial type I error rate inflation was observed when analytic models did not account for multiple membership and when analytic model weights characterizing the association with multiple agents did not match the data generating mechanism. Conversely, analytic models for crossed designs generally maintained nominal type I error rates unless there was notable imbalance in the number of participants that interact with each agent.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4796-4818"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142120561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis. 应用于单细胞测序数据分析的高维过度分散广义因子模型
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-10 Epub Date: 2024-09-05 DOI: 10.1002/sim.10213
Jinyu Nie, Zhilong Qin, Wei Liu

The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.

目前的高维线性因子模型无法解释不同类型的变量,而高维非线性因子模型往往忽略了混合型数据中存在的过度分散性。然而,在实际应用中,尤其是在生物医学和基因组学研究等领域,超分散现象十分普遍。针对这一实际需求,我们提出了一种超分散广义因子模型(OverGFM),用于对超分散混合型数据进行高维非线性因子分析。我们的方法包含一个额外的误差项,以捕捉仅靠因子无法解释的超分散性。然而,由于非线性模型中涉及两个高维潜在随机矩阵,这给计算带来了巨大挑战。为了克服这些挑战,我们提出了一种整合拉普拉斯和泰勒近似的新型变分电磁算法。该算法为复杂的变分参数提供了迭代显式解,并被证明具有出色的收敛特性。我们还开发了一种基于奇异值比率的标准,以确定最佳因子数。数值结果证明了这一标准的有效性。通过全面的模拟研究,我们表明 OverGFM 在估计精度和计算效率方面都优于最先进的方法。此外,我们还通过将该方法应用于两个基因组学数据集,证明了它的实用性。为了方便使用,我们将 OverGFM 的实现集成到了 R 软件包 GFM 中。
{"title":"High-Dimensional Overdispersed Generalized Factor Model With Application to Single-Cell Sequencing Data Analysis.","authors":"Jinyu Nie, Zhilong Qin, Wei Liu","doi":"10.1002/sim.10213","DOIUrl":"10.1002/sim.10213","url":null,"abstract":"<p><p>The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that integrates Laplace and Taylor approximations. This algorithm provides iterative explicit solutions for the complex variational parameters and is proven to possess excellent convergence properties. We also develop a criterion based on the singular value ratio to determine the optimal number of factors. Numerical results demonstrate the effectiveness of this criterion. Through comprehensive simulation studies, we show that OverGFM outperforms state-of-the-art methods in terms of estimation accuracy and computational efficiency. Furthermore, we demonstrate the practical merit of our method through its application to two datasets from genomics. To facilitate its usage, we have integrated the implementation of OverGFM into the R package GFM.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4836-4849"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multilevel Longitudinal Functional Principal Component Model. 多层次纵向功能主成分模型。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-10 Epub Date: 2024-09-03 DOI: 10.1002/sim.10207
Wenyi Lin, Jingjing Zou, Chongzhi Di, Cheryl L Rock, Loki Natarajan

Sensor devices, such as accelerometers, are widely used for measuring physical activity (PA). These devices provide outputs at fine granularity (e.g., 10-100 Hz or minute-level), which while providing rich data on activity patterns, also pose computational challenges with multilevel densely sampled data, resulting in PA records that are measured continuously across multiple days and visits. On the other hand, a scalar health outcome (e.g., BMI) is usually observed only at the individual or visit level. This leads to a discrepancy in numbers of nested levels between the predictors (PA) and outcomes, raising analytic challenges. To address this issue, we proposed a multilevel longitudinal functional principal component analysis (mLFPCA) model to directly model multilevel functional PA inputs in a longitudinal study, and then implemented a longitudinal functional principal component regression (FPCR) to explore the association between PA and obesity-related health outcomes. Additionally, we conducted a comprehensive simulation study to examine the impact of imbalanced multilevel data on both mLFPCA and FPCR performance and offer guidelines for selecting optimal methods.

加速度计等传感设备被广泛用于测量体力活动(PA)。这些设备提供细粒度(如 10-100 Hz 或分钟级)的输出,在提供丰富的活动模式数据的同时,也给多层次密集采样数据的计算带来了挑战,导致 PA 记录在多天和多次访问中被连续测量。另一方面,标量健康结果(如体重指数)通常只能在个人或访问水平上观测到。这就导致了预测因子(PA)和结果之间嵌套层级数量的差异,给分析带来了挑战。为解决这一问题,我们提出了多层次纵向功能主成分分析(mLFPCA)模型,以直接模拟纵向研究中的多层次功能性 PA 输入,然后实施纵向功能主成分回归(FPCR)来探讨 PA 与肥胖相关健康结果之间的关联。此外,我们还进行了一项综合模拟研究,以检验不平衡多层次数据对 mLFPCA 和 FPCR 性能的影响,并为选择最佳方法提供指导。
{"title":"Multilevel Longitudinal Functional Principal Component Model.","authors":"Wenyi Lin, Jingjing Zou, Chongzhi Di, Cheryl L Rock, Loki Natarajan","doi":"10.1002/sim.10207","DOIUrl":"10.1002/sim.10207","url":null,"abstract":"<p><p>Sensor devices, such as accelerometers, are widely used for measuring physical activity (PA). These devices provide outputs at fine granularity (e.g., 10-100 Hz or minute-level), which while providing rich data on activity patterns, also pose computational challenges with multilevel densely sampled data, resulting in PA records that are measured continuously across multiple days and visits. On the other hand, a scalar health outcome (e.g., BMI) is usually observed only at the individual or visit level. This leads to a discrepancy in numbers of nested levels between the predictors (PA) and outcomes, raising analytic challenges. To address this issue, we proposed a multilevel longitudinal functional principal component analysis (mLFPCA) model to directly model multilevel functional PA inputs in a longitudinal study, and then implemented a longitudinal functional principal component regression (FPCR) to explore the association between PA and obesity-related health outcomes. Additionally, we conducted a comprehensive simulation study to examine the impact of imbalanced multilevel data on both mLFPCA and FPCR performance and offer guidelines for selecting optimal methods.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4781-4795"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142126751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Performance of Machine Learning Methods Trained on Public Health Observational Data: A Case Study From COVID-19. 评估在公共卫生观察数据上训练的机器学习方法的性能:来自 COVID-19 的案例研究。
IF 1.8 4区 医学 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-10 Epub Date: 2024-09-05 DOI: 10.1002/sim.10211
Davide Pigoli, Kieran Baker, Jobie Budd, Lorraine Butler, Harry Coppock, Sabrina Egglestone, Steven G Gilmour, Chris Holmes, David Hurley, Radka Jersakova, Ivan Kiskin, Vasiliki Koutra, Jonathon Mellor, George Nicholson, Joe Packham, Selina Patel, Richard Payne, Stephen J Roberts, Björn W Schuller, Ana Tendero-Cañadas, Tracey Thornley, Alexander Titcomb

From early in the coronavirus disease 2019 (COVID-19) pandemic, there was interest in using machine learning methods to predict COVID-19 infection status based on vocal audio signals, for example, cough recordings. However, early studies had limitations in terms of data collection and of how the performances of the proposed predictive models were assessed. This article describes how these limitations have been overcome in a study carried out by the Turing-RSS Health Data Laboratory and the UK Health Security Agency. As part of the study, the UK Health Security Agency collected a dataset of acoustic recordings, SARS-CoV-2 infection status and extensive study participant meta-data. This allowed us to rigorously assess state-of-the-art machine learning techniques to predict SARS-CoV-2 infection status based on vocal audio signals. The lessons learned from this project should inform future studies on statistical evaluation methods to assess the performance of machine learning techniques for public health tasks.

从冠状病毒病 2019(COVID-19)大流行的早期开始,人们就对使用机器学习方法来预测基于人声音频信号(如咳嗽录音)的 COVID-19 感染状况产生了兴趣。然而,早期的研究在数据收集和如何评估拟议预测模型的性能方面存在局限性。本文介绍了图灵-RSS 健康数据实验室和英国健康安全局在一项研究中如何克服这些局限性。作为研究的一部分,英国卫生安全局收集了声音记录数据集、SARS-CoV-2 感染状况和大量研究参与者元数据。这使我们能够严格评估最先进的机器学习技术,以便根据人声音频信号预测 SARS-CoV-2 感染状况。从该项目中吸取的经验教训应为今后的统计评估方法研究提供参考,以评估机器学习技术在公共卫生任务中的表现。
{"title":"Assessing the Performance of Machine Learning Methods Trained on Public Health Observational Data: A Case Study From COVID-19.","authors":"Davide Pigoli, Kieran Baker, Jobie Budd, Lorraine Butler, Harry Coppock, Sabrina Egglestone, Steven G Gilmour, Chris Holmes, David Hurley, Radka Jersakova, Ivan Kiskin, Vasiliki Koutra, Jonathon Mellor, George Nicholson, Joe Packham, Selina Patel, Richard Payne, Stephen J Roberts, Björn W Schuller, Ana Tendero-Cañadas, Tracey Thornley, Alexander Titcomb","doi":"10.1002/sim.10211","DOIUrl":"10.1002/sim.10211","url":null,"abstract":"<p><p>From early in the coronavirus disease 2019 (COVID-19) pandemic, there was interest in using machine learning methods to predict COVID-19 infection status based on vocal audio signals, for example, cough recordings. However, early studies had limitations in terms of data collection and of how the performances of the proposed predictive models were assessed. This article describes how these limitations have been overcome in a study carried out by the Turing-RSS Health Data Laboratory and the UK Health Security Agency. As part of the study, the UK Health Security Agency collected a dataset of acoustic recordings, SARS-CoV-2 infection status and extensive study participant meta-data. This allowed us to rigorously assess state-of-the-art machine learning techniques to predict SARS-CoV-2 infection status based on vocal audio signals. The lessons learned from this project should inform future studies on statistical evaluation methods to assess the performance of machine learning techniques for public health tasks.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4861-4871"},"PeriodicalIF":1.8,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142141160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Statistics in Medicine
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1