首页 > 最新文献

International Journal of Biostatistics最新文献

英文 中文
Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan 对 Chen 和 Heitjan 的 "不完全依从性临床试验中估计值的敏感性 "的评论
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-27 DOI: 10.1515/ijb-2023-0127
Stuart G. Baker, Karen S. Lindeman
Chen and Heitjan (Sensitivity of estimands in clinical trials with imperfect compliance. Int J Biostat. 2023) used linear extrapolation to estimate the population average causal effect (PACE) from the complier average causal effect (CACE) in multiple randomized trials with all-or-none compliance. For extrapolating from CACE to PACE in this setting and in the paired availability design involving different availabilities of treatment among before-and-after studies, we recommend the sensitivity analysis in Baker and Lindeman (J Causal Inference, 2013) because it is not restricted to a linear model, as it involves various random effect and trend models.
Chen and Heitjan (Sensitivity of estimands in clinical trials with imperfect compliance.Int J Biostat.2023)使用线性外推法,在全遵从或无遵从的多项随机试验中,从遵从者平均因果效应(CACE)估算出人群平均因果效应(PACE)。在这种情况下,以及在涉及前后研究中不同治疗可用性的配对可用性设计中,要从 CACE 外推到 PACE,我们推荐使用 Baker 和 Lindeman(《因果推论》,2013 年)中的敏感性分析,因为它不局限于线性模型,还涉及各种随机效应和趋势模型。
{"title":"Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan","authors":"Stuart G. Baker, Karen S. Lindeman","doi":"10.1515/ijb-2023-0127","DOIUrl":"https://doi.org/10.1515/ijb-2023-0127","url":null,"abstract":"Chen and Heitjan (Sensitivity of estimands in clinical trials with imperfect compliance. Int J Biostat. 2023) used linear extrapolation to estimate the population average causal effect (PACE) from the complier average causal effect (CACE) in multiple randomized trials with all-or-none compliance. For extrapolating from CACE to PACE in this setting and in the paired availability design involving different availabilities of treatment among before-and-after studies, we recommend the sensitivity analysis in Baker and Lindeman (J Causal Inference, 2013) because it is not restricted to a linear model, as it involves various random effect and trend models.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"72 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141785513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting differentially expressed genes from RNA-seq data using fuzzy clustering 利用模糊聚类从 RNA-seq 数据中检测差异表达基因
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-27 DOI: 10.1515/ijb-2023-0125
Yuki Ando, Asanao Shimokawa
A two-group comparison test is generally performed on RNA sequencing data to detect differentially expressed genes (DEGs). However, the accuracy of this method is low due to the small sample size. To address this, we propose a method using fuzzy clustering that artificially generates data with expression patterns similar to those of DEGs to identify genes that are highly likely to be classified into the same cluster as the initial cluster data. The proposed method is advantageous in that it does not perform any test. Furthermore, a certain level of accuracy can be maintained even when the sample size is biased, and we show that such a situation may improve the accuracy of the proposed method. We compared the proposed method with the conventional method using simulations. In the simulations, we changed the sample size and difference between the expression levels of group 1 and group 2 in the DEGs to obtain the desired accuracy of the proposed method. The results show that the proposed method is superior in all cases under the conditions simulated. We also show that the effect of the difference between group 1 and group 2 on the accuracy is more prominent when the sample size is biased.
通常对 RNA 测序数据进行两组比较测试,以检测差异表达基因(DEG)。然而,由于样本量较小,这种方法的准确性较低。为了解决这个问题,我们提出了一种使用模糊聚类的方法,该方法可人为生成与 DEGs 表达模式相似的数据,从而识别出极有可能与初始聚类数据归入同一聚类的基因。拟议方法的优势在于无需进行任何测试。此外,即使样本量存在偏差,也能保持一定的准确性,而且我们发现这种情况可能会提高拟议方法的准确性。我们通过模拟比较了建议方法和传统方法。在模拟中,我们改变了样本量以及 DEGs 中第 1 组和第 2 组表达水平的差异,以获得建议方法所需的准确度。结果表明,在模拟条件下,建议的方法在所有情况下都更胜一筹。我们还发现,当样本量有偏差时,组 1 和组 2 之间的差异对准确度的影响更为突出。
{"title":"Detecting differentially expressed genes from RNA-seq data using fuzzy clustering","authors":"Yuki Ando, Asanao Shimokawa","doi":"10.1515/ijb-2023-0125","DOIUrl":"https://doi.org/10.1515/ijb-2023-0125","url":null,"abstract":"A two-group comparison test is generally performed on RNA sequencing data to detect differentially expressed genes (DEGs). However, the accuracy of this method is low due to the small sample size. To address this, we propose a method using fuzzy clustering that artificially generates data with expression patterns similar to those of DEGs to identify genes that are highly likely to be classified into the same cluster as the initial cluster data. The proposed method is advantageous in that it does not perform any test. Furthermore, a certain level of accuracy can be maintained even when the sample size is biased, and we show that such a situation may improve the accuracy of the proposed method. We compared the proposed method with the conventional method using simulations. In the simulations, we changed the sample size and difference between the expression levels of group 1 and group 2 in the DEGs to obtain the desired accuracy of the proposed method. The results show that the proposed method is superior in all cases under the conditions simulated. We also show that the effect of the difference between group 1 and group 2 on the accuracy is more prominent when the sample size is biased.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"61 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141778930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random forests for survival data: which methods work best and under what conditions? 生存数据的随机森林:哪些方法在哪些条件下最有效?
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-24 DOI: 10.1515/ijb-2023-0056
Matthew Berkowitz, Rachel MacKay Altman, Thomas M. Loughin
Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance – forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods’ relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.
文献中很少对构建生存树和生存林的方法进行系统比较。重要的是,当目标是预测生存时间或估计生存函数时,最佳方法的选择并不明确。我们利用广泛的模拟研究,系统地调查了影响生存森林性能的各种因素--森林构建方法、删减、样本大小、响应的分布、线性预测因子的结构以及相关或噪声协变量的存在。我们特别研究了最近在文献中提出的 11 种方法,并确定了 6 种表现最佳的方法。我们发现,我们研究的所有因素都对这些方法的生存时间点预测和生存函数估计的相对准确性有重大影响。我们利用研究结果为在特定情况下使用哪种方法提出了建议,并为观察到的相对性能差异提供了解释。
{"title":"Random forests for survival data: which methods work best and under what conditions?","authors":"Matthew Berkowitz, Rachel MacKay Altman, Thomas M. Loughin","doi":"10.1515/ijb-2023-0056","DOIUrl":"https://doi.org/10.1515/ijb-2023-0056","url":null,"abstract":"Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance – forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods’ relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"8 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140801340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers 具有脉冲噪声离群值的卡尔曼滤波器:过滤大量离群值数据的稳健顺序算法
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-16 DOI: 10.1515/ijb-2023-0065
Bertrand Cloez, Bénédicte Fontez, Eliel González-García, Isabelle Sanchez
Impulse noised outliers are data points that differ significantly from other observations. They are generally removed from the data set through local regression or the Kalman filter algorithm. However, these methods, or their generalizations, are not well suited when the number of outliers is of the same order as the number of low-noise data (often called nominal measurement). In this article, we propose a new model for impulsed noise outliers. It is based on a hierarchical model and a simple linear Gaussian process as with the Kalman Filter. We present a fast forward-backward algorithm to filter and smooth sequential data and which also detects these outliers. We compare the robustness and efficiency of this algorithm with classical methods. Finally, we apply this method on a real data set from a Walk Over Weighing system admitting around 60 % of outliers. For this application, we further develop an (explicit) EM algorithm to calibrate some algorithm parameters.
脉冲噪声离群值是指与其他观测值有显著差异的数据点。通常通过局部回归或卡尔曼滤波算法将其从数据集中剔除。然而,当离群值的数量与低噪声数据(通常称为标称测量)的数量同阶时,这些方法或其广义方法就不太适用了。在本文中,我们提出了一种针对脉冲噪声离群值的新模型。它与卡尔曼滤波器一样,基于分层模型和简单的线性高斯过程。我们提出了一种快速的前向后向算法,用于过滤和平滑连续数据,并检测这些离群值。我们将该算法的鲁棒性和效率与经典方法进行了比较。最后,我们将该方法应用于一个来自步行称重系统的真实数据集,该数据集含有约 60% 的异常值。针对这一应用,我们进一步开发了一种(显式)EM 算法来校准一些算法参数。
{"title":"Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers","authors":"Bertrand Cloez, Bénédicte Fontez, Eliel González-García, Isabelle Sanchez","doi":"10.1515/ijb-2023-0065","DOIUrl":"https://doi.org/10.1515/ijb-2023-0065","url":null,"abstract":"Impulse noised outliers are data points that differ significantly from other observations. They are generally removed from the data set through local regression or the Kalman filter algorithm. However, these methods, or their generalizations, are not well suited when the number of outliers is of the same order as the number of low-noise data (often called <jats:italic>nominal measurement</jats:italic>). In this article, we propose a new model for impulsed noise outliers. It is based on a hierarchical model and a simple linear Gaussian process as with the Kalman Filter. We present a fast forward-backward algorithm to filter and smooth sequential data and which also detects these outliers. We compare the robustness and efficiency of this algorithm with classical methods. Finally, we apply this method on a real data set from a Walk Over Weighing system admitting around 60 % of outliers. For this application, we further develop an (explicit) EM algorithm to calibrate some algorithm parameters.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"4 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140613617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications 综合右删失和长度偏右删失故障时间数据的生存函数 NPMLE:特性与应用
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-09 DOI: 10.1515/ijb-2023-0121
James H. McVittie, David B. Wolfson, David A. Stephens
Many cohort studies in survival analysis have imbedded in them subcohorts consisting of incident cases and prevalent cases. Instead of analysing the data from the incident and prevalent cohorts alone, there are surely advantages to combining the data from these two subcohorts. In this paper, we discuss a survival function nonparametric maximum likelihood estimator (NPMLE) using both length-biased right-censored prevalent cohort data and right-censored incident cohort data. We establish the asymptotic properties of the survival function NPMLE and utilize the NPMLE to estimate the distribution for time spent in a Montreal area hospital.
在生存分析中,许多队列研究都包含了由事故病例和流行病例组成的子队列。与单独分析事件队列和流行队列的数据相比,将这两个子队列的数据结合起来肯定有其优势。在本文中,我们讨论了使用长度偏右删失流行队列数据和右删失事件队列数据的生存函数非参数极大似然估计法(NPMLE)。我们建立了生存函数 NPMLE 的渐近特性,并利用 NPMLE 估算了在蒙特利尔地区医院花费时间的分布。
{"title":"The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications","authors":"James H. McVittie, David B. Wolfson, David A. Stephens","doi":"10.1515/ijb-2023-0121","DOIUrl":"https://doi.org/10.1515/ijb-2023-0121","url":null,"abstract":"Many cohort studies in survival analysis have imbedded in them subcohorts consisting of incident cases and prevalent cases. Instead of analysing the data from the incident and prevalent cohorts alone, there are surely advantages to combining the data from these two subcohorts. In this paper, we discuss a survival function nonparametric maximum likelihood estimator (NPMLE) using both length-biased right-censored prevalent cohort data and right-censored incident cohort data. We establish the asymptotic properties of the survival function NPMLE and utilize the NPMLE to estimate the distribution for time spent in a Montreal area hospital.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"56 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140569156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ensemble learning methods of inference for spatially stratified infectious disease systems 空间分层传染病系统推理的集合学习方法
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-04-09 DOI: 10.1515/ijb-2023-0102
Jeffrey Peitsch, Gyanendra Pokharel, Shakhawat Hossain
Individual level models are a class of mechanistic models that are widely used to infer infectious disease transmission dynamics. These models incorporate individual level covariate information accounting for population heterogeneity and are generally fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. However, Bayesian MCMC methods of inference are computationally expensive for large data sets. This issue becomes more severe when applied to infectious disease data collected from spatially heterogeneous populations, as the number of covariates increases. In addition, summary statistics over the global population may not capture the true spatio-temporal dynamics of disease transmission. In this study we propose to use ensemble learning methods to predict epidemic generating models instead of time consuming Bayesian MCMC method. We apply these methods to infer disease transmission dynamics over spatially clustered populations, considering the clusters as natural strata instead of a global population. We compare the performance of two tree-based ensemble learning techniques: random forest and gradient boosting. These methods are applied to the 2001 foot-and-mouth disease epidemic in the U.K. and evaluated using simulated data from a clustered population. It is shown that the spatially clustered data can help to predict epidemic generating models more accurately than the global data.
个体水平模型是一类广泛用于推断传染病传播动态的机理模型。这些模型结合了个体水平的协变量信息,考虑了种群的异质性,通常在贝叶斯马尔科夫链蒙特卡罗(MCMC)框架内进行拟合。然而,对于大型数据集来说,贝叶斯 MCMC 推理方法的计算成本很高。当应用于从空间异质性人群中收集的传染病数据时,随着协变量数量的增加,这一问题变得更加严重。此外,全球人口的汇总统计可能无法捕捉到疾病传播的真实时空动态。在本研究中,我们建议使用集合学习方法来预测流行病生成模型,而不是耗时的贝叶斯 MCMC 方法。我们将这些方法应用于推断空间聚类人群的疾病传播动态,将聚类视为自然分层而非总体人群。我们比较了两种基于树的集合学习技术:随机森林和梯度提升的性能。这些方法被应用于 2001 年英国口蹄疫疫情,并使用聚类种群的模拟数据进行了评估。结果表明,与全局数据相比,空间聚类数据有助于更准确地预测流行病生成模型。
{"title":"Ensemble learning methods of inference for spatially stratified infectious disease systems","authors":"Jeffrey Peitsch, Gyanendra Pokharel, Shakhawat Hossain","doi":"10.1515/ijb-2023-0102","DOIUrl":"https://doi.org/10.1515/ijb-2023-0102","url":null,"abstract":"Individual level models are a class of mechanistic models that are widely used to infer infectious disease transmission dynamics. These models incorporate individual level covariate information accounting for population heterogeneity and are generally fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. However, Bayesian MCMC methods of inference are computationally expensive for large data sets. This issue becomes more severe when applied to infectious disease data collected from spatially heterogeneous populations, as the number of covariates increases. In addition, summary statistics over the global population may not capture the true spatio-temporal dynamics of disease transmission. In this study we propose to use ensemble learning methods to predict epidemic generating models instead of time consuming Bayesian MCMC method. We apply these methods to infer disease transmission dynamics over spatially clustered populations, considering the clusters as natural strata instead of a global population. We compare the performance of two tree-based ensemble learning techniques: random forest and gradient boosting. These methods are applied to the 2001 foot-and-mouth disease epidemic in the U.K. and evaluated using simulated data from a clustered population. It is shown that the spatially clustered data can help to predict epidemic generating models more accurately than the global data.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"56 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140569024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination MBPCA-OS:针对不同测量水平变量的探索性多块方法。应用于研究 SARS-CoV-2 感染和疫苗接种的免疫反应
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-12-12 DOI: 10.1515/ijb-2023-0062
Martin Paries, Evelyne Vigneau, Adeline Huneau, Olivier Lantz, Stéphanie Bougeard
Studying a large number of variables measured on the same observations and organized in blocks – denoted multiblock data – is becoming standard in several domains especially in biology. To explore the relationships between all these variables – at the block- and the variable-level – several exploratory multiblock methods were proposed. However, most of them are only designed for numeric variables. In reality, some data sets contain variables of different measurement levels (i.e., numeric, nominal, ordinal). In this article, we focus on exploratory multiblock methods that handle variables at their appropriate measurement level. Multi-Block Principal Component Analysis with Optimal Scaling (MBPCA-OS) is proposed and applied to multiblock data from the CURIE-O-SA French cohort. In this study, variables are of different measurement levels and organized in four blocks. The objective is to study the immune responses according to the SARS-CoV-2 infection and vaccination statuses, the symptoms and the participant’s characteristics.
研究在相同观测数据上测量的大量变量并将其组织成块(称为多块数据)已成为多个领域,尤其是生物学领域的标准方法。为了探索所有这些变量之间在块和变量层面的关系,人们提出了几种探索性多块方法。然而,这些方法大多只针对数值变量。实际上,有些数据集包含不同测量水平的变量(即数字变量、名义变量、序数变量)。在本文中,我们将重点讨论在适当的测量水平上处理变量的探索性多块方法。我们提出了具有最佳比例的多区块主成分分析法(MBPCA-OS),并将其应用于 CURIE-O-SA 法国队列的多区块数据。在这项研究中,变量具有不同的测量水平,并分为四个区块。目的是根据 SARS-CoV-2 感染和疫苗接种情况、症状和参与者的特征研究免疫反应。
{"title":"MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination","authors":"Martin Paries, Evelyne Vigneau, Adeline Huneau, Olivier Lantz, Stéphanie Bougeard","doi":"10.1515/ijb-2023-0062","DOIUrl":"https://doi.org/10.1515/ijb-2023-0062","url":null,"abstract":"Studying a large number of variables measured on the same observations and organized in blocks – denoted multiblock data – is becoming standard in several domains especially in biology. To explore the relationships between all these variables – at the block- and the variable-level – several exploratory multiblock methods were proposed. However, most of them are only designed for numeric variables. In reality, some data sets contain variables of different measurement levels (i.e., numeric, nominal, ordinal). In this article, we focus on exploratory multiblock methods that handle variables at their appropriate measurement level. Multi-Block Principal Component Analysis with Optimal Scaling (MBPCA-OS) is proposed and applied to multiblock data from the CURIE-O-SA French cohort. In this study, variables are of different measurement levels and organized in four blocks. The objective is to study the immune responses according to the SARS-CoV-2 infection and vaccination statuses, the symptoms and the participant’s characteristics.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"92 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing efficient randomized trials: power and sample size calculation when using semiparametric efficient estimators. 设计有效的随机试验:使用半参数有效估计量时的功率和样本量计算。
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-08-06 DOI: 10.1515/ijb-2021-0039
Alejandro Schuler

Trials enroll a large number of subjects in order to attain power, making them expensive and time-consuming. Sample size calculations are often performed with the assumption of an unadjusted analysis, even if the trial analysis plan specifies a more efficient estimator (e.g. ANCOVA). This leads to conservative estimates of required sample sizes and an opportunity for savings. Here we show that a relatively simple formula can be used to estimate the power of any two-arm, single-timepoint trial analyzed with a semiparametric efficient estimator, regardless of the domain of the outcome or kind of treatment effect (e.g. odds ratio, mean difference). Since an efficient estimator attains the minimum possible asymptotic variance, this allows for the design of trials that are as small as possible while still attaining design power and control of type I error. The required sample size calculation is parsimonious and requires the analyst to provide only a small number of population parameters. We verify in simulation that the large-sample properties of trials designed this way attain their nominal values. Lastly, we demonstrate how to use this formula in the "design" (and subsequent reanalysis) of a real randomized trial and show that fewer subjects are required to attain the same design power when a semiparametric efficient estimator is accounted for at the design stage.

为了获得权力,试验招募了大量的受试者,这使得它们既昂贵又耗时。即使试验分析计划指定了一个更有效的估计值(例如ANCOVA),样本大小计算也经常在未调整分析的假设下进行。这导致对所需样本大小的保守估计和节省的机会。在这里,我们展示了一个相对简单的公式可以用来估计任何用半参数有效估计器分析的双臂单时间点试验的功率,而不管结果的域或治疗效果的类型(例如优势比,平均差异)。由于有效的估计器可以获得最小可能的渐近方差,因此可以设计尽可能小的试验,同时仍然可以获得设计功率和控制类型I误差。所需的样本量计算很简单,只需要分析人员提供少量的总体参数。我们在模拟中验证,以这种方式设计的试验的大样本特性达到其标称值。最后,我们展示了如何在一个真正的随机试验的“设计”(和随后的再分析)中使用这个公式,并表明当在设计阶段考虑到半参数有效估计量时,需要更少的受试者来获得相同的设计能力。
{"title":"Designing efficient randomized trials: power and sample size calculation when using semiparametric efficient estimators.","authors":"Alejandro Schuler","doi":"10.1515/ijb-2021-0039","DOIUrl":"https://doi.org/10.1515/ijb-2021-0039","url":null,"abstract":"<p><p>Trials enroll a large number of subjects in order to attain power, making them expensive and time-consuming. Sample size calculations are often performed with the assumption of an unadjusted analysis, even if the trial analysis plan specifies a more efficient estimator (e.g. ANCOVA). This leads to conservative estimates of required sample sizes and an opportunity for savings. Here we show that a relatively simple formula can be used to estimate the power of any two-arm, single-timepoint trial analyzed with a semiparametric efficient estimator, regardless of the domain of the outcome or kind of treatment effect (e.g. odds ratio, mean difference). Since an efficient estimator attains the minimum possible asymptotic variance, this allows for the design of trials that are as small as possible while still attaining design power and control of type I error. The required sample size calculation is parsimonious and requires the analyst to provide only a small number of population parameters. We verify in simulation that the large-sample properties of trials designed this way attain their nominal values. Lastly, we demonstrate how to use this formula in the \"design\" (and subsequent reanalysis) of a real randomized trial and show that fewer subjects are required to attain the same design power when a semiparametric efficient estimator is accounted for at the design stage.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"151-171"},"PeriodicalIF":1.2,"publicationDate":"2021-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39297947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bayesian adaptive design of early-phase clinical trials for precision medicine based on cancer biomarkers. 基于癌症生物标志物的精准医学早期临床试验的贝叶斯自适应设计。
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-06-10 DOI: 10.1515/ijb-2021-0009
Shinjo Yada

Cancer tissue samples obtained via biopsy or surgery were examined for specific gene mutations by genetic testing to inform treatment. Precision medicine, which considers not only the cancer type and location, but also the genetic information, environment, and lifestyle of each patient, can be applied for disease prevention and treatment in individual patients. The number of patient-specific characteristics, including biomarkers, has been increasing with time; these characteristics are highly correlated with outcomes. The number of patients at the beginning of early-phase clinical trials is often limited. Moreover, it is challenging to estimate parameters of models that include baseline characteristics as covariates such as biomarkers. To overcome these issues and promote personalized medicine, we propose a dose-finding method that considers patient background characteristics, including biomarkers, using a model for phase I/II oncology trials. We built a Bayesian neural network with input variables of dose, biomarkers, and interactions between dose and biomarkers and output variables of efficacy outcomes for each patient. We trained the neural network to select the optimal dose based on all background characteristics of a patient. Simulation analysis showed that the probability of selecting the desirable dose was higher using the proposed method than that using the naïve method.

通过活检或手术获得的癌症组织样本通过基因检测检查特定基因突变,以告知治疗。精准医学不仅考虑癌症的类型和位置,还考虑每个患者的遗传信息、环境和生活方式,可以应用于个体患者的疾病预防和治疗。包括生物标志物在内的患者特异性特征的数量随着时间的推移而增加;这些特征与结果高度相关。早期临床试验初期的患者数量通常是有限的。此外,估计包括基线特征作为协变量(如生物标志物)的模型参数具有挑战性。为了克服这些问题并促进个性化医疗,我们提出了一种剂量发现方法,该方法考虑了患者背景特征,包括生物标志物,使用I/II期肿瘤试验模型。我们建立了一个贝叶斯神经网络,输入变量是剂量、生物标志物、剂量和生物标志物之间的相互作用,输出变量是每位患者的疗效结果。我们训练神经网络根据患者的所有背景特征选择最佳剂量。仿真分析表明,所提方法比naïve方法选择理想剂量的概率更高。
{"title":"Bayesian adaptive design of early-phase clinical trials for precision medicine based on cancer biomarkers.","authors":"Shinjo Yada","doi":"10.1515/ijb-2021-0009","DOIUrl":"https://doi.org/10.1515/ijb-2021-0009","url":null,"abstract":"<p><p>Cancer tissue samples obtained via biopsy or surgery were examined for specific gene mutations by genetic testing to inform treatment. Precision medicine, which considers not only the cancer type and location, but also the genetic information, environment, and lifestyle of each patient, can be applied for disease prevention and treatment in individual patients. The number of patient-specific characteristics, including biomarkers, has been increasing with time; these characteristics are highly correlated with outcomes. The number of patients at the beginning of early-phase clinical trials is often limited. Moreover, it is challenging to estimate parameters of models that include baseline characteristics as covariates such as biomarkers. To overcome these issues and promote personalized medicine, we propose a dose-finding method that considers patient background characteristics, including biomarkers, using a model for phase I/II oncology trials. We built a Bayesian neural network with input variables of dose, biomarkers, and interactions between dose and biomarkers and output variables of efficacy outcomes for each patient. We trained the neural network to select the optimal dose based on all background characteristics of a patient. Simulation analysis showed that the probability of selecting the desirable dose was higher using the proposed method than that using the naïve method.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"109-125"},"PeriodicalIF":1.2,"publicationDate":"2021-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2021-0009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39015527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effect of data aggregation on dispersion estimates in count data models. 计数数据模型中数据聚合对离散估计的影响。
IF 1.2 4区 数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-05-07 DOI: 10.1515/ijb-2020-0079
Adam Errington, Jochen Einbeck, Jonathan Cumming, Ute Rössler, David Endesfelder

For the modelling of count data, aggregation of the raw data over certain subgroups or predictor configurations is common practice. This is, for instance, the case for count data biomarkers of radiation exposure. Under the Poisson law, count data can be aggregated without loss of information on the Poisson parameter, which remains true if the Poisson assumption is relaxed towards quasi-Poisson. However, in biodosimetry in particular, but also beyond, the question of how the dispersion estimates for quasi-Poisson models behave under data aggregation have received little attention. Indeed, for real data sets featuring unexplained heterogeneities, dispersion estimates can increase strongly after aggregation, an effect which we will demonstrate and quantify explicitly for some scenarios. The increase in dispersion estimates implies an inflation of the parameter standard errors, which, however, by comparison with random effect models, can be shown to serve a corrective purpose. The phenomena are illustrated by γ-H2AX foci data as used for instance in radiation biodosimetry for the calibration of dose-response curves.

对于计数数据的建模,通常的做法是在某些子组或预测器配置上聚合原始数据。例如,辐射暴露的计数数据生物标志物就是这种情况。在泊松定律下,计数数据可以在不丢失泊松参数信息的情况下聚合,如果泊松假设放宽为准泊松,则计数数据仍然是正确的。然而,在生物剂量学中,尤其是在其他领域,准泊松模型的离散度估计在数据聚集下的表现如何的问题很少受到关注。事实上,对于具有无法解释的异质性的真实数据集,分散估计在聚合后可能会强烈增加,我们将在某些情况下明确地证明和量化这种效应。分散估计的增加意味着参数标准误差的膨胀,然而,通过与随机效应模型的比较,可以证明这是一种校正目的。用γ-H2AX焦点数据说明了这种现象,例如在辐射生物剂量学中用于校准剂量-响应曲线。
{"title":"The effect of data aggregation on dispersion estimates in count data models.","authors":"Adam Errington,&nbsp;Jochen Einbeck,&nbsp;Jonathan Cumming,&nbsp;Ute Rössler,&nbsp;David Endesfelder","doi":"10.1515/ijb-2020-0079","DOIUrl":"https://doi.org/10.1515/ijb-2020-0079","url":null,"abstract":"<p><p>For the modelling of count data, aggregation of the raw data over certain subgroups or predictor configurations is common practice. This is, for instance, the case for count data biomarkers of radiation exposure. Under the Poisson law, count data can be aggregated without loss of information on the Poisson parameter, which remains true if the Poisson assumption is relaxed towards quasi-Poisson. However, in biodosimetry in particular, but also beyond, the question of how the dispersion estimates for quasi-Poisson models behave under data aggregation have received little attention. Indeed, for real data sets featuring unexplained heterogeneities, dispersion estimates can increase strongly after aggregation, an effect which we will demonstrate and quantify explicitly for some scenarios. The increase in dispersion estimates implies an inflation of the parameter standard errors, which, however, by comparison with random effect models, can be shown to serve a corrective purpose. The phenomena are illustrated by <i>γ</i>-H2AX foci data as used for instance in radiation biodosimetry for the calibration of dose-response curves.</p>","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"183-202"},"PeriodicalIF":1.2,"publicationDate":"2021-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2020-0079","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38961045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
International Journal of Biostatistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1