首页 > 最新文献

Journal of the Royal Statistical Society Series C-Applied Statistics最新文献

英文 中文
Association Plots: visualizing cluster-specific associations in high-dimensional correspondence analysis biplots 关联图:在高维对应分析双图中可视化特定集群的关联
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-06-08 DOI: 10.1093/jrsssc/qlad039
E. Gralinska, Martin Vingron
In molecular biology, just as in many other fields of science, data often come in the form of matrices or contingency tables with many observations (rows) for a set of variables (columns). While projection methods like principal component analysis or correspondence analysis (CA) can be applied for obtaining an overview of such data, in cases where the matrix is very large the associated loss of information upon projection into two or three dimensions may be dramatic. However, when the set of variables can be grouped into clusters, this opens up a new angle on the data. We focus on the question of which observations are associated to a cluster and distinguish it from other clusters. CA employs a geometry geared towards answering this question. We exploit this feature in order to introduce Association Plots for visualizing cluster-specific observations in complex data. Regardless of the data matrix dimensionality Association Plots are two-dimensional and depict the observations associated to a cluster of variables. We demonstrate our method on two small data sets and then use it to study a challenging genomic data set comprising >10,000 samples. We show that Association Plots can clearly highlight those observations which characterise a cluster of variables.
在分子生物学中,就像在许多其他科学领域一样,数据通常以矩阵或列联表的形式出现,其中包含一组变量(列)的许多观察结果(行)。虽然主成分分析或对应分析(CA)等投影方法可以用于获得此类数据的概览,但在矩阵非常大的情况下,将相关信息投影到二维或三维时可能会造成巨大的损失。然而,当这组变量可以分组到集群中时,这就为数据打开了一个新的角度。我们关注的问题是哪些观测值与一个集群相关联,并将其与其他集群区分开来。CA采用了一种几何学来回答这个问题。我们利用这一特征来引入关联图,以在复杂数据中可视化特定于集群的观察结果。无论数据矩阵维度如何,关联图都是二维的,描述了与一组变量相关的观测结果。我们在两个小数据集上演示了我们的方法,然后使用它来研究包含>10,000个样本的具有挑战性的基因组数据集。我们表明,关联图可以清楚地突出那些表征一组变量的观察结果。
{"title":"Association Plots: visualizing cluster-specific associations in high-dimensional correspondence analysis biplots","authors":"E. Gralinska, Martin Vingron","doi":"10.1093/jrsssc/qlad039","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad039","url":null,"abstract":"\u0000 In molecular biology, just as in many other fields of science, data often come in the form of matrices or contingency tables with many observations (rows) for a set of variables (columns). While projection methods like principal component analysis or correspondence analysis (CA) can be applied for obtaining an overview of such data, in cases where the matrix is very large the associated loss of information upon projection into two or three dimensions may be dramatic. However, when the set of variables can be grouped into clusters, this opens up a new angle on the data. We focus on the question of which observations are associated to a cluster and distinguish it from other clusters. CA employs a geometry geared towards answering this question. We exploit this feature in order to introduce Association Plots for visualizing cluster-specific observations in complex data. Regardless of the data matrix dimensionality Association Plots are two-dimensional and depict the observations associated to a cluster of variables. We demonstrate our method on two small data sets and then use it to study a challenging genomic data set comprising >10,000 samples. We show that Association Plots can clearly highlight those observations which characterise a cluster of variables.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"71 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91226610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Longitudinal Canonical Correlation Analysis. 纵向典型相关分析。
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-06-01 DOI: 10.1093/jrsssc/qlad022
Seonjoo Lee, Jongwoo Choi, Zhiqian Fang, F DuBois Bowman

This paper considers canonical correlation analysis for two longitudinal variables that are possibly sampled at different time resolutions with irregular grids. We modeled trajectories of the multivariate variables using random effects and found the most correlated sets of linear combinations in the latent space. Our numerical simulations showed that the longitudinal canonical correlation analysis (LCCA) effectively recovers underlying correlation patterns between two high-dimensional longitudinal data sets. We applied the proposed LCCA to data from the Alzheimer's Disease Neuroimaging Initiative and identified the longitudinal profiles of morphological brain changes and amyloid cumulation.

本文研究了可能在不同时间分辨率下以不规则网格采样的两个纵向变量的典型相关分析。我们使用随机效应对多元变量的轨迹进行建模,并在潜在空间中找到最相关的线性组合集。数值模拟结果表明,纵向典型相关分析(LCCA)可以有效地恢复两个高维纵向数据集之间的潜在关联模式。我们将提出的LCCA应用于阿尔茨海默病神经影像学倡议的数据,并确定了脑形态变化和淀粉样蛋白积累的纵向分布。
{"title":"Longitudinal Canonical Correlation Analysis.","authors":"Seonjoo Lee,&nbsp;Jongwoo Choi,&nbsp;Zhiqian Fang,&nbsp;F DuBois Bowman","doi":"10.1093/jrsssc/qlad022","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad022","url":null,"abstract":"<p><p>This paper considers canonical correlation analysis for two longitudinal variables that are possibly sampled at different time resolutions with irregular grids. We modeled trajectories of the multivariate variables using random effects and found the most correlated sets of linear combinations in the latent space. Our numerical simulations showed that the longitudinal canonical correlation analysis (LCCA) effectively recovers underlying correlation patterns between two high-dimensional longitudinal data sets. We applied the proposed LCCA to data from the Alzheimer's Disease Neuroimaging Initiative and identified the longitudinal profiles of morphological brain changes and amyloid cumulation.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"72 3","pages":"587-607"},"PeriodicalIF":1.6,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10332816/pdf/nihms-1893974.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9813380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outcome trajectory estimation for optimal dynamic treatment regimes with repeated measures. 重复测量最佳动态治疗方案的结果轨迹估计。
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-05-22 eCollection Date: 2023-08-01 DOI: 10.1093/jrsssc/qlad037
Yuan Zhang, David M Vock, Megan E Patrick, Lizbeth H Finestack, Thomas A Murray

In recent sequential multiple assignment randomized trials, outcomes were assessed multiple times to evaluate longer-term impacts of the dynamic treatment regimes (DTRs). Q-learning requires a scalar response to identify the optimal DTR. Inverse probability weighting may be used to estimate the optimal outcome trajectory, but it is inefficient, susceptible to model mis-specification, and unable to characterize how treatment effects manifest over time. We propose modified Q-learning with generalized estimating equations to address these limitations and apply it to the M-bridge trial, which evaluates adaptive interventions to prevent problematic drinking among college freshmen. Simulation studies demonstrate our proposed method improves efficiency and robustness.

在最近的连续多次分配随机试验中,对结果进行了多次评估,以评价动态治疗方案(DTR)的长期影响。Q-learning 需要一个标量响应来确定最佳 DTR。反概率加权法可用来估计最佳结果轨迹,但效率低,易受模型错误规范的影响,且无法描述治疗效果如何随时间推移而显现。针对这些局限性,我们提出了使用广义估计方程的修正 Q-learning 方法,并将其应用于 M 桥试验,该试验评估了预防大学新生问题性饮酒的适应性干预措施。模拟研究表明,我们提出的方法提高了效率和稳健性。
{"title":"Outcome trajectory estimation for optimal dynamic treatment regimes with repeated measures.","authors":"Yuan Zhang, David M Vock, Megan E Patrick, Lizbeth H Finestack, Thomas A Murray","doi":"10.1093/jrsssc/qlad037","DOIUrl":"10.1093/jrsssc/qlad037","url":null,"abstract":"<p><p>In recent sequential multiple assignment randomized trials, outcomes were assessed multiple times to evaluate longer-term impacts of the dynamic treatment regimes (DTRs). Q-learning requires a scalar response to identify the optimal DTR. Inverse probability weighting may be used to estimate the optimal outcome trajectory, but it is inefficient, susceptible to model mis-specification, and unable to characterize how treatment effects manifest over time. We propose modified Q-learning with generalized estimating equations to address these limitations and apply it to the M-bridge trial, which evaluates adaptive interventions to prevent problematic drinking among college freshmen. Simulation studies demonstrate our proposed method improves efficiency and robustness.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"72 4","pages":"976-991"},"PeriodicalIF":1.6,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10474873/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10163294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying irregular activity sequences: an application to passive household monitoring 识别不规则活动序列:在被动家庭监测中的应用
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-05-17 DOI: 10.1093/jrsssc/qlad005
Jess Gillam, R. Killick, Simon Taylor, Jack Heal, Ben Norwood
Approximately one in five people will live to see their 100th birthday due to advancements in modern medicine and other factors. Over 65’s constitute 42% of elective admissions and 43% of emergency admissions to hospitals. Increasingly, people are turning to technology to help improve health and care of the elderly. There is mixed evidence of the success of wearables in older populations with a key barrier being adoption. In contrast, passive sensors such as infra-red motion and plug sensors have had more success. These passive sensors give us a sequence of categorical “trigger” events throughout the day. This paper proposes a method for detecting subtle changes in sequences while taking account of the natural day-to-day variability and differing numbers of “trigger” events per day.
由于现代医学的进步和其他因素,大约五分之一的人将活到100岁。65岁以上的人占医院选择性入院人数的42%,占急诊入院人数的43%。人们越来越多地转向技术来帮助改善老年人的健康和护理。可穿戴设备在老年人群中取得成功的证据好坏参半,其中一个关键障碍是采用。相比之下,红外运动传感器和插头传感器等被动传感器取得了更大的成功。这些被动传感器给我们一天中一系列明确的“触发”事件。本文提出了一种检测序列中细微变化的方法,同时考虑到自然的日常变化和每天不同数量的“触发”事件。
{"title":"Identifying irregular activity sequences: an application to passive household monitoring","authors":"Jess Gillam, R. Killick, Simon Taylor, Jack Heal, Ben Norwood","doi":"10.1093/jrsssc/qlad005","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad005","url":null,"abstract":"\u0000 Approximately one in five people will live to see their 100th birthday due to advancements in modern medicine and other factors. Over 65’s constitute 42% of elective admissions and 43% of emergency admissions to hospitals. Increasingly, people are turning to technology to help improve health and care of the elderly. There is mixed evidence of the success of wearables in older populations with a key barrier being adoption. In contrast, passive sensors such as infra-red motion and plug sensors have had more success. These passive sensors give us a sequence of categorical “trigger” events throughout the day. This paper proposes a method for detecting subtle changes in sequences while taking account of the natural day-to-day variability and differing numbers of “trigger” events per day.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"7 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84226559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayesian feature allocation model for identifying cell subpopulations using CyTOF data. 使用CyTOF数据识别细胞亚群的贝叶斯特征分配模型。
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-04-25 eCollection Date: 2023-06-01 DOI: 10.1093/jrsssc/qlad029
Arthur Lui, Juhee Lee, Peter F Thall, May Daher, Katy Rezvani, Rafet Basar

A Bayesian feature allocation model (FAM) is presented for identifying cell subpopulations based on multiple samples of cell surface or intracellular marker expression level data obtained by cytometry by time of flight (CyTOF). Cell subpopulations are characterized by differences in marker expression patterns, and cells are clustered into subpopulations based on their observed expression levels. A model-based method is used to construct cell clusters within each sample by modeling subpopulations as latent features, using a finite Indian buffet process. Non-ignorable missing data due to technical artifacts in mass cytometry instruments are accounted for by defining a static missingship mechanism. In contrast with conventional cell clustering methods, which cluster observed marker expression levels separately for each sample, the FAM-based method can be applied simultaneously to multiple samples, and also identify important cell subpopulations likely to be otherwise missed. The proposed FAM-based method is applied to jointly analyse three CyTOF datasets to study natural killer (NK) cells. Because the subpopulations identified by the FAM may define novel NK cell subsets, this statistical analysis may provide useful information about the biology of NK cells and their potential role in cancer immunotherapy which may lead, in turn, to development of improved NK cell therapies.

提出了一种贝叶斯特征分配模型(FAM),该模型基于细胞表面或细胞内标记物表达水平数据的多个样本,通过飞行时间(CyTOF)获得细胞亚群。细胞亚群以不同的标记物表达模式为特征,并根据观察到的表达水平将细胞聚集到亚群中。使用基于模型的方法,通过将亚种群建模为潜在特征,使用有限印度自助餐过程,在每个样本中构建细胞簇。通过定义静态丢失机制来解释由于质量细胞仪中的技术工件而导致的不可忽略的丢失数据。与传统的细胞聚类方法不同,传统的细胞聚类方法分别观察每个样本的标记物表达水平,基于fam的方法可以同时应用于多个样本,并且还可以识别可能被遗漏的重要细胞亚群。将该方法应用于三个CyTOF数据集的联合分析,以研究自然杀伤细胞(NK)。由于FAM鉴定的亚群可以定义新的NK细胞亚群,因此该统计分析可以提供有关NK细胞生物学及其在癌症免疫治疗中的潜在作用的有用信息,从而可能导致改进NK细胞治疗的发展。
{"title":"A Bayesian feature allocation model for identifying cell subpopulations using CyTOF data.","authors":"Arthur Lui, Juhee Lee, Peter F Thall, May Daher, Katy Rezvani, Rafet Basar","doi":"10.1093/jrsssc/qlad029","DOIUrl":"10.1093/jrsssc/qlad029","url":null,"abstract":"<p><p>A Bayesian feature allocation model (FAM) is presented for identifying cell subpopulations based on multiple samples of cell surface or intracellular marker expression level data obtained by cytometry by time of flight (CyTOF). Cell subpopulations are characterized by differences in marker expression patterns, and cells are clustered into subpopulations based on their observed expression levels. A model-based method is used to construct cell clusters within each sample by modeling subpopulations as latent features, using a finite Indian buffet process. Non-ignorable missing data due to technical artifacts in mass cytometry instruments are accounted for by defining a static missingship mechanism. In contrast with conventional cell clustering methods, which cluster observed marker expression levels separately for each sample, the FAM-based method can be applied simultaneously to multiple samples, and also identify important cell subpopulations likely to be otherwise missed. The proposed FAM-based method is applied to jointly analyse three CyTOF datasets to study natural killer (NK) cells. Because the subpopulations identified by the FAM may define novel NK cell subsets, this statistical analysis may provide useful information about the biology of NK cells and their potential role in cancer immunotherapy which may lead, in turn, to development of improved NK cell therapies.</p>","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"72 3","pages":"718-738"},"PeriodicalIF":1.6,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10264057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9752620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The impact of directly observed therapy on the efficacy of Tuberculosis treatment: a Bayesian multilevel approach 直接观察治疗对结核病治疗效果的影响:贝叶斯多水平方法
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-04-25 DOI: 10.1093/jrsssc/qlad034
Widemberg S. Nobre, A. M. Schmidt, E. Moodie, D. Stephens
We propose and discuss a Bayesian procedure to estimate causal effects for multilevel observations in the presence of confounding. This work is motivated by an interest in determining the causal impact of directly observed therapy on the successful treatment of Tuberculosis. We focus on propensity score regression and covariate adjustment to balance the treatment allocation. We discuss the need to include latent local-level random effects in the propensity score model to reduce bias in the estimation of causal effects. A simulation study suggests that accounting for the multilevel nature of the data with latent structures in both the outcome and propensity score models has the potential to reduce bias in the estimation of causal effects.
我们提出并讨论了一种贝叶斯方法来估计存在混淆的多水平观测的因果效应。这项工作的动机是确定直接观察治疗对结核病成功治疗的因果影响的兴趣。我们着重于倾向得分回归和协变量调整来平衡治疗分配。我们讨论了在倾向评分模型中包含潜在的局部水平随机效应以减少因果效应估计中的偏差的必要性。一项模拟研究表明,在结果和倾向评分模型中,考虑到具有潜在结构的数据的多层次性质,有可能减少因果效应估计中的偏差。
{"title":"The impact of directly observed therapy on the efficacy of Tuberculosis treatment: a Bayesian multilevel approach","authors":"Widemberg S. Nobre, A. M. Schmidt, E. Moodie, D. Stephens","doi":"10.1093/jrsssc/qlad034","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad034","url":null,"abstract":"\u0000 We propose and discuss a Bayesian procedure to estimate causal effects for multilevel observations in the presence of confounding. This work is motivated by an interest in determining the causal impact of directly observed therapy on the successful treatment of Tuberculosis. We focus on propensity score regression and covariate adjustment to balance the treatment allocation. We discuss the need to include latent local-level random effects in the propensity score model to reduce bias in the estimation of causal effects. A simulation study suggests that accounting for the multilevel nature of the data with latent structures in both the outcome and propensity score models has the potential to reduce bias in the estimation of causal effects.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"40 6 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88071403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating subject-specific hazard functions 估计特定主题的危害函数
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-04-24 DOI: 10.1093/jrsssc/qlad030
Moumita Chatterjee, B. Ganguli, Sugata Sen Roy
The central idea of this paper is to compare mean responses of several subjects in the presence of censoring and subject-specific variation. We develop a semiparametric mixed model for fitting subject-specific hazard curves to a set of censored failure times. A spline-based model and a mixed effects framework for smoothing are used. Efficient estimators of fixed parameters and predictors of the random components are derived and their asymptotic properties studied. This is a generalization of the method proposed by [Cai, T., Hyndman, R. J., & Wand, M. P. (2002). Mixed model-based hazard estimation. Journal of Computational and Graphical Statistics, 11(4), 784–798. https://doi.org/10.1198/106186002862] to incorporate additional subject-specific variation of the hazard function. The results are illustrated using two motivating examples.
本文的中心思想是比较几个受试者在审查和受试者特定变化的情况下的平均反应。我们开发了一个半参数混合模型,用于拟合主题特定的危险曲线到一组截尾失效时间。使用基于样条的模型和混合效果框架进行平滑。导出了固定参数的有效估计量和随机分量的有效预测量,并研究了它们的渐近性质。这是对Cai, T., Hyndman, R. J, and Wand, M. P.(2002)提出的方法的推广。基于混合模型的危害估计。计算与图形统计,11(4),784-798。https://doi.org/10.1198/106186002862]以纳入额外的针对特定主题的危险函数变化。用两个实例说明了结果。
{"title":"Estimating subject-specific hazard functions","authors":"Moumita Chatterjee, B. Ganguli, Sugata Sen Roy","doi":"10.1093/jrsssc/qlad030","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad030","url":null,"abstract":"\u0000 The central idea of this paper is to compare mean responses of several subjects in the presence of censoring and subject-specific variation. We develop a semiparametric mixed model for fitting subject-specific hazard curves to a set of censored failure times. A spline-based model and a mixed effects framework for smoothing are used. Efficient estimators of fixed parameters and predictors of the random components are derived and their asymptotic properties studied. This is a generalization of the method proposed by [Cai, T., Hyndman, R. J., & Wand, M. P. (2002). Mixed model-based hazard estimation. Journal of Computational and Graphical Statistics, 11(4), 784–798. https://doi.org/10.1198/106186002862] to incorporate additional subject-specific variation of the hazard function. The results are illustrated using two motivating examples.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"11 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88976374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Penalized weighted least-squares estimate for variable selection on correlated multiply imputed data 在相关多重输入数据上进行变量选择的惩罚加权最小二乘估计
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-04-24 DOI: 10.1093/jrsssc/qlad028
Yang Li, Haoyu Yang, Haochen Yu, Hanwen Huang, Ye Shen
Considering the inevitable correlation among different datasets within the same subject, we propose a framework of variable selection on multiply imputed data with penalized weighted least squares (PWLS–MI). The methodological development is motivated by an epidemiological study of A/H7N9 patients from Zhejiang province in China, where nearly half of the variables are not fully observed. Multiple imputation is commonly adopted as a missing data processing method. However, it generates correlations among imputed values within the same subject across datasets. Recent work on variable selection for multiply imputed data does not fully address such similarities. We propose PWLS–MI to incorporate the correlation when performing the variable selection. PWLS–MI can be considered as a framework for variable selection on multiply imputed data since it allows various penalties. We use adaptive LASSO as an illustrating example. Extensive simulation studies are conducted to compare PWLS–MI with recently developed methods and the results suggest that the proposed approach outperforms in terms of both selection accuracy and deletion accuracy. PWLS–MI is shown to select variables with clinical relevance when applied to the A/H7N9 database.
考虑到同一主题内不同数据集之间不可避免的相关性,提出了一种基于惩罚加权最小二乘(PWLS-MI)的多重输入数据变量选择框架。该方法的发展源于对中国浙江省A/H7N9患者进行的一项流行病学研究,在该研究中,近一半的变量没有得到充分观察。对于缺失数据的处理,通常采用多重插值法。但是,它在跨数据集的同一主题内的输入值之间生成相关性。最近关于多重输入数据的变量选择的工作并没有完全解决这种相似性。我们提出PWLS-MI在进行变量选择时将相关性纳入其中。可将PWLS-MI视为对多重输入数据进行变量选择的框架,因为它允许各种惩罚。我们以自适应LASSO为例进行了说明。我们进行了大量的模拟研究,将PWLS-MI与最近开发的方法进行了比较,结果表明,所提出的方法在选择准确性和删除准确性方面都优于其他方法。PWLS-MI应用于A/H7N9数据库时,可以选择具有临床相关性的变量。
{"title":"Penalized weighted least-squares estimate for variable selection on correlated multiply imputed data","authors":"Yang Li, Haoyu Yang, Haochen Yu, Hanwen Huang, Ye Shen","doi":"10.1093/jrsssc/qlad028","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad028","url":null,"abstract":"\u0000 Considering the inevitable correlation among different datasets within the same subject, we propose a framework of variable selection on multiply imputed data with penalized weighted least squares (PWLS–MI). The methodological development is motivated by an epidemiological study of A/H7N9 patients from Zhejiang province in China, where nearly half of the variables are not fully observed. Multiple imputation is commonly adopted as a missing data processing method. However, it generates correlations among imputed values within the same subject across datasets. Recent work on variable selection for multiply imputed data does not fully address such similarities. We propose PWLS–MI to incorporate the correlation when performing the variable selection. PWLS–MI can be considered as a framework for variable selection on multiply imputed data since it allows various penalties. We use adaptive LASSO as an illustrating example. Extensive simulation studies are conducted to compare PWLS–MI with recently developed methods and the results suggest that the proposed approach outperforms in terms of both selection accuracy and deletion accuracy. PWLS–MI is shown to select variables with clinical relevance when applied to the A/H7N9 database.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91394783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A spline-based time-varying reproduction number for modelling epidemiological outbreaks 基于样条的时变繁殖数,用于模拟流行病学暴发
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-04-05 DOI: 10.1093/jrsssc/qlad027
Eugen Pircalabelu
We develop in this manuscript a method for performing estimation and inference for the reproduction number of an epidemiological outbreak, focusing on the COVID-19 epidemic. The estimator is time-dependent and uses spline modelling to adapt to changes in the outbreak. This is accomplished by directly modelling the series of new infections as a function of time and subsequently using the derivative of the function to define a time-varying reproduction number, which is then used to assess the evolution of the epidemic for several countries.
我们在本文中开发了一种对流行病学暴发再现数进行估计和推断的方法,重点是COVID-19流行病。估计器是时变的,并使用样条建模来适应爆发的变化。这是通过直接将一系列新感染作为时间函数进行建模,然后使用该函数的导数来确定随时间变化的繁殖数,然后用于评估若干国家流行病的演变情况来实现的。
{"title":"A spline-based time-varying reproduction number for modelling epidemiological outbreaks","authors":"Eugen Pircalabelu","doi":"10.1093/jrsssc/qlad027","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad027","url":null,"abstract":"\u0000 We develop in this manuscript a method for performing estimation and inference for the reproduction number of an epidemiological outbreak, focusing on the COVID-19 epidemic. The estimator is time-dependent and uses spline modelling to adapt to changes in the outbreak. This is accomplished by directly modelling the series of new infections as a function of time and subsequently using the derivative of the function to define a time-varying reproduction number, which is then used to assess the evolution of the epidemic for several countries.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"26 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74294069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Bayesian two-stage group sequential scheme for ordinal endpoints 有序端点的贝叶斯两阶段群序列格式
IF 1.6 4区 数学 Q3 STATISTICS & PROBABILITY Pub Date : 2023-04-01 DOI: 10.1093/jrsssc/qlad026
Chengxue Zhong, Hongyu Miao, H. Pan
Ordinal endpoints are common in clinical studies. For example, many clinical trials for evaluating COVID-19 infection therapies have adopted an ordinal scale as recommended by the World Health Organization. Despite their importance in clinical studies, design methods for ordinal endpoints are limited; in practice, a dichotomized approach is often used for simplicity. Here, we introduce a Bayesian group sequential scheme to assess ordinal endpoints, which considers a proportional-odds (PO) model, a nonproportional-odds (NPO) model, and a PO/NPO-switch model to handle various scenarios. Extensive simulations are conducted to demonstrate desirable performance, and the R package BayesOrdDesign has been made publicly available.
顺序终点在临床研究中很常见。例如,许多评估COVID-19感染治疗的临床试验采用了世界卫生组织推荐的顺序量表。尽管它们在临床研究中很重要,但顺序终点的设计方法是有限的;在实践中,为了简单起见,通常使用二分法。在这里,我们引入了一个贝叶斯群序列方案来评估有序端点,该方案考虑了比例几率(PO)模型、非比例几率(NPO)模型和PO/NPO切换模型来处理各种场景。进行了大量的模拟以证明理想的性能,并且R包BayesOrdDesign已经公开可用。
{"title":"A Bayesian two-stage group sequential scheme for ordinal endpoints","authors":"Chengxue Zhong, Hongyu Miao, H. Pan","doi":"10.1093/jrsssc/qlad026","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad026","url":null,"abstract":"\u0000 Ordinal endpoints are common in clinical studies. For example, many clinical trials for evaluating COVID-19 infection therapies have adopted an ordinal scale as recommended by the World Health Organization. Despite their importance in clinical studies, design methods for ordinal endpoints are limited; in practice, a dichotomized approach is often used for simplicity. Here, we introduce a Bayesian group sequential scheme to assess ordinal endpoints, which considers a proportional-odds (PO) model, a nonproportional-odds (NPO) model, and a PO/NPO-switch model to handle various scenarios. Extensive simulations are conducted to demonstrate desirable performance, and the R package BayesOrdDesign has been made publicly available.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86828937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of the Royal Statistical Society Series C-Applied Statistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1