International Journal of Biostatistics最新文献

英文中文

Comments on “sensitivity of estimands in clinical trials with imperfect compliance” by Chen and Heitjan 对 Chen 和 Heitjan 的 "不完全依从性临床试验中估计值的敏感性 "的评论

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics

Pub Date : 2024-07-27 DOI: 10.1515/ijb-2023-0127

Stuart G. Baker, Karen S. Lindeman

Chen and Heitjan (Sensitivity of estimands in clinical trials with imperfect compliance. Int J Biostat. 2023) used linear extrapolation to estimate the population average causal effect (PACE) from the complier average causal effect (CACE) in multiple randomized trials with all-or-none compliance. For extrapolating from CACE to PACE in this setting and in the paired availability design involving different availabilities of treatment among before-and-after studies, we recommend the sensitivity analysis in Baker and Lindeman (J Causal Inference, 2013) because it is not restricted to a linear model, as it involves various random effect and trend models.

Chen and Heitjan (Sensitivity of estimands in clinical trials with imperfect compliance.Int J Biostat.2023）使用线性外推法，在全遵从或无遵从的多项随机试验中，从遵从者平均因果效应（CACE）估算出人群平均因果效应（PACE）。在这种情况下，以及在涉及前后研究中不同治疗可用性的配对可用性设计中，要从 CACE 外推到 PACE，我们推荐使用 Baker 和 Lindeman（《因果推论》，2013 年）中的敏感性分析，因为它不局限于线性模型，还涉及各种随机效应和趋势模型。

引用次数: 0

Detecting differentially expressed genes from RNA-seq data using fuzzy clustering 利用模糊聚类从 RNA-seq 数据中检测差异表达基因

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics

Pub Date : 2024-07-27 DOI: 10.1515/ijb-2023-0125

Yuki Ando, Asanao Shimokawa

A two-group comparison test is generally performed on RNA sequencing data to detect differentially expressed genes (DEGs). However, the accuracy of this method is low due to the small sample size. To address this, we propose a method using fuzzy clustering that artificially generates data with expression patterns similar to those of DEGs to identify genes that are highly likely to be classified into the same cluster as the initial cluster data. The proposed method is advantageous in that it does not perform any test. Furthermore, a certain level of accuracy can be maintained even when the sample size is biased, and we show that such a situation may improve the accuracy of the proposed method. We compared the proposed method with the conventional method using simulations. In the simulations, we changed the sample size and difference between the expression levels of group 1 and group 2 in the DEGs to obtain the desired accuracy of the proposed method. The results show that the proposed method is superior in all cases under the conditions simulated. We also show that the effect of the difference between group 1 and group 2 on the accuracy is more prominent when the sample size is biased.

通常对 RNA 测序数据进行两组比较测试，以检测差异表达基因（DEG）。然而，由于样本量较小，这种方法的准确性较低。为了解决这个问题，我们提出了一种使用模糊聚类的方法，该方法可人为生成与 DEGs 表达模式相似的数据，从而识别出极有可能与初始聚类数据归入同一聚类的基因。拟议方法的优势在于无需进行任何测试。此外，即使样本量存在偏差，也能保持一定的准确性，而且我们发现这种情况可能会提高拟议方法的准确性。我们通过模拟比较了建议方法和传统方法。在模拟中，我们改变了样本量以及 DEGs 中第 1 组和第 2 组表达水平的差异，以获得建议方法所需的准确度。结果表明，在模拟条件下，建议的方法在所有情况下都更胜一筹。我们还发现，当样本量有偏差时，组 1 和组 2 之间的差异对准确度的影响更为突出。

{"title":"Detecting differentially expressed genes from RNA-seq data using fuzzy clustering","authors":"Yuki Ando, Asanao Shimokawa","doi":"10.1515/ijb-2023-0125","DOIUrl":"https://doi.org/10.1515/ijb-2023-0125","url":null,"abstract":"A two-group comparison test is generally performed on RNA sequencing data to detect differentially expressed genes (DEGs). However, the accuracy of this method is low due to the small sample size. To address this, we propose a method using fuzzy clustering that artificially generates data with expression patterns similar to those of DEGs to identify genes that are highly likely to be classified into the same cluster as the initial cluster data. The proposed method is advantageous in that it does not perform any test. Furthermore, a certain level of accuracy can be maintained even when the sample size is biased, and we show that such a situation may improve the accuracy of the proposed method. We compared the proposed method with the conventional method using simulations. In the simulations, we changed the sample size and difference between the expression levels of group 1 and group 2 in the DEGs to obtain the desired accuracy of the proposed method. The results show that the proposed method is superior in all cases under the conditions simulated. We also show that the effect of the difference between group 1 and group 2 on the accuracy is more prominent when the sample size is biased.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"61 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141778930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Random forests for survival data: which methods work best and under what conditions? 生存数据的随机森林：哪些方法在哪些条件下最有效？

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics

Pub Date : 2024-04-24 DOI: 10.1515/ijb-2023-0056

Matthew Berkowitz, Rachel MacKay Altman, Thomas M. Loughin

Few systematic comparisons of methods for constructing survival trees and forests exist in the literature. Importantly, when the goal is to predict a survival time or estimate a survival function, the optimal choice of method is unclear. We use an extensive simulation study to systematically investigate various factors that influence survival forest performance – forest construction method, censoring, sample size, distribution of the response, structure of the linear predictor, and presence of correlated or noisy covariates. In particular, we study 11 methods that have recently been proposed in the literature and identify 6 top performers. We find that all the factors that we investigate have significant impact on the methods’ relative accuracy of point predictions of survival times and survival function estimates. We use our results to make recommendations for which methods to use in a given context and offer explanations for the observed differences in relative performance.

文献中很少对构建生存树和生存林的方法进行系统比较。重要的是，当目标是预测生存时间或估计生存函数时，最佳方法的选择并不明确。我们利用广泛的模拟研究，系统地调查了影响生存森林性能的各种因素--森林构建方法、删减、样本大小、响应的分布、线性预测因子的结构以及相关或噪声协变量的存在。我们特别研究了最近在文献中提出的 11 种方法，并确定了 6 种表现最佳的方法。我们发现，我们研究的所有因素都对这些方法的生存时间点预测和生存函数估计的相对准确性有重大影响。我们利用研究结果为在特定情况下使用哪种方法提出了建议，并为观察到的相对性能差异提供了解释。

引用次数: 0

Kalman filter with impulse noised outliers: a robust sequential algorithm to filter data with a large number of outliers 具有脉冲噪声离群值的卡尔曼滤波器：过滤大量离群值数据的稳健顺序算法

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics

Pub Date : 2024-04-16 DOI: 10.1515/ijb-2023-0065

Bertrand Cloez, Bénédicte Fontez, Eliel González-García, Isabelle Sanchez

Impulse noised outliers are data points that differ significantly from other observations. They are generally removed from the data set through local regression or the Kalman filter algorithm. However, these methods, or their generalizations, are not well suited when the number of outliers is of the same order as the number of low-noise data (often called nominal measurement). In this article, we propose a new model for impulsed noise outliers. It is based on a hierarchical model and a simple linear Gaussian process as with the Kalman Filter. We present a fast forward-backward algorithm to filter and smooth sequential data and which also detects these outliers. We compare the robustness and efficiency of this algorithm with classical methods. Finally, we apply this method on a real data set from a Walk Over Weighing system admitting around 60 % of outliers. For this application, we further develop an (explicit) EM algorithm to calibrate some algorithm parameters.

脉冲噪声离群值是指与其他观测值有显著差异的数据点。通常通过局部回归或卡尔曼滤波算法将其从数据集中剔除。然而，当离群值的数量与低噪声数据（通常称为标称测量）的数量同阶时，这些方法或其广义方法就不太适用了。在本文中，我们提出了一种针对脉冲噪声离群值的新模型。它与卡尔曼滤波器一样，基于分层模型和简单的线性高斯过程。我们提出了一种快速的前向后向算法，用于过滤和平滑连续数据，并检测这些离群值。我们将该算法的鲁棒性和效率与经典方法进行了比较。最后，我们将该方法应用于一个来自步行称重系统的真实数据集，该数据集含有约 60% 的异常值。针对这一应用，我们进一步开发了一种（显式）EM 算法来校准一些算法参数。

引用次数: 0

Ensemble learning methods of inference for spatially stratified infectious disease systems 空间分层传染病系统推理的集合学习方法

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics

Pub Date : 2024-04-09 DOI: 10.1515/ijb-2023-0102

Jeffrey Peitsch, Gyanendra Pokharel, Shakhawat Hossain

Individual level models are a class of mechanistic models that are widely used to infer infectious disease transmission dynamics. These models incorporate individual level covariate information accounting for population heterogeneity and are generally fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. However, Bayesian MCMC methods of inference are computationally expensive for large data sets. This issue becomes more severe when applied to infectious disease data collected from spatially heterogeneous populations, as the number of covariates increases. In addition, summary statistics over the global population may not capture the true spatio-temporal dynamics of disease transmission. In this study we propose to use ensemble learning methods to predict epidemic generating models instead of time consuming Bayesian MCMC method. We apply these methods to infer disease transmission dynamics over spatially clustered populations, considering the clusters as natural strata instead of a global population. We compare the performance of two tree-based ensemble learning techniques: random forest and gradient boosting. These methods are applied to the 2001 foot-and-mouth disease epidemic in the U.K. and evaluated using simulated data from a clustered population. It is shown that the spatially clustered data can help to predict epidemic generating models more accurately than the global data.

个体水平模型是一类广泛用于推断传染病传播动态的机理模型。这些模型结合了个体水平的协变量信息，考虑了种群的异质性，通常在贝叶斯马尔科夫链蒙特卡罗（MCMC）框架内进行拟合。然而，对于大型数据集来说，贝叶斯 MCMC 推理方法的计算成本很高。当应用于从空间异质性人群中收集的传染病数据时，随着协变量数量的增加，这一问题变得更加严重。此外，全球人口的汇总统计可能无法捕捉到疾病传播的真实时空动态。在本研究中，我们建议使用集合学习方法来预测流行病生成模型，而不是耗时的贝叶斯 MCMC 方法。我们将这些方法应用于推断空间聚类人群的疾病传播动态，将聚类视为自然分层而非总体人群。我们比较了两种基于树的集合学习技术：随机森林和梯度提升的性能。这些方法被应用于 2001 年英国口蹄疫疫情，并使用聚类种群的模拟数据进行了评估。结果表明，与全局数据相比，空间聚类数据有助于更准确地预测流行病生成模型。

{"title":"Ensemble learning methods of inference for spatially stratified infectious disease systems","authors":"Jeffrey Peitsch, Gyanendra Pokharel, Shakhawat Hossain","doi":"10.1515/ijb-2023-0102","DOIUrl":"https://doi.org/10.1515/ijb-2023-0102","url":null,"abstract":"Individual level models are a class of mechanistic models that are widely used to infer infectious disease transmission dynamics. These models incorporate individual level covariate information accounting for population heterogeneity and are generally fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. However, Bayesian MCMC methods of inference are computationally expensive for large data sets. This issue becomes more severe when applied to infectious disease data collected from spatially heterogeneous populations, as the number of covariates increases. In addition, summary statistics over the global population may not capture the true spatio-temporal dynamics of disease transmission. In this study we propose to use ensemble learning methods to predict epidemic generating models instead of time consuming Bayesian MCMC method. We apply these methods to infer disease transmission dynamics over spatially clustered populations, considering the clusters as natural strata instead of a global population. We compare the performance of two tree-based ensemble learning techniques: random forest and gradient boosting. These methods are applied to the 2001 foot-and-mouth disease epidemic in the U.K. and evaluated using simulated data from a clustered population. It is shown that the spatially clustered data can help to predict epidemic generating models more accurately than the global data.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"56 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140569024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The survival function NPMLE for combined right-censored and length-biased right-censored failure time data: properties and applications 综合右删失和长度偏右删失故障时间数据的生存函数 NPMLE：特性与应用

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics

Pub Date : 2024-04-09 DOI: 10.1515/ijb-2023-0121

James H. McVittie, David B. Wolfson, David A. Stephens

Many cohort studies in survival analysis have imbedded in them subcohorts consisting of incident cases and prevalent cases. Instead of analysing the data from the incident and prevalent cohorts alone, there are surely advantages to combining the data from these two subcohorts. In this paper, we discuss a survival function nonparametric maximum likelihood estimator (NPMLE) using both length-biased right-censored prevalent cohort data and right-censored incident cohort data. We establish the asymptotic properties of the survival function NPMLE and utilize the NPMLE to estimate the distribution for time spent in a Montreal area hospital.

在生存分析中，许多队列研究都包含了由事故病例和流行病例组成的子队列。与单独分析事件队列和流行队列的数据相比，将这两个子队列的数据结合起来肯定有其优势。在本文中，我们讨论了使用长度偏右删失流行队列数据和右删失事件队列数据的生存函数非参数极大似然估计法（NPMLE）。我们建立了生存函数 NPMLE 的渐近特性，并利用 NPMLE 估算了在蒙特利尔地区医院花费时间的分布。

引用次数: 0

MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination MBPCA-OS：针对不同测量水平变量的探索性多块方法。应用于研究 SARS-CoV-2 感染和疫苗接种的免疫反应

IF 1.2 4区数学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Biostatistics

Pub Date : 2023-12-12 DOI: 10.1515/ijb-2023-0062

Martin Paries, Evelyne Vigneau, Adeline Huneau, Olivier Lantz, Stéphanie Bougeard

Studying a large number of variables measured on the same observations and organized in blocks – denoted multiblock data – is becoming standard in several domains especially in biology. To explore the relationships between all these variables – at the block- and the variable-level – several exploratory multiblock methods were proposed. However, most of them are only designed for numeric variables. In reality, some data sets contain variables of different measurement levels (i.e., numeric, nominal, ordinal). In this article, we focus on exploratory multiblock methods that handle variables at their appropriate measurement level. Multi-Block Principal Component Analysis with Optimal Scaling (MBPCA-OS) is proposed and applied to multiblock data from the CURIE-O-SA French cohort. In this study, variables are of different measurement levels and organized in four blocks. The objective is to study the immune responses according to the SARS-CoV-2 infection and vaccination statuses, the symptoms and the participant’s characteristics.

研究在相同观测数据上测量的大量变量并将其组织成块（称为多块数据）已成为多个领域，尤其是生物学领域的标准方法。为了探索所有这些变量之间在块和变量层面的关系，人们提出了几种探索性多块方法。然而，这些方法大多只针对数值变量。实际上，有些数据集包含不同测量水平的变量（即数字变量、名义变量、序数变量）。在本文中，我们将重点讨论在适当的测量水平上处理变量的探索性多块方法。我们提出了具有最佳比例的多区块主成分分析法（MBPCA-OS），并将其应用于 CURIE-O-SA 法国队列的多区块数据。在这项研究中，变量具有不同的测量水平，并分为四个区块。目的是根据 SARS-CoV-2 感染和疫苗接种情况、症状和参与者的特征研究免疫反应。

{"title":"MBPCA-OS: an exploratory multiblock method for variables of different measurement levels. Application to study the immune response to SARS-CoV-2 infection and vaccination","authors":"Martin Paries, Evelyne Vigneau, Adeline Huneau, Olivier Lantz, Stéphanie Bougeard","doi":"10.1515/ijb-2023-0062","DOIUrl":"https://doi.org/10.1515/ijb-2023-0062","url":null,"abstract":"Studying a large number of variables measured on the same observations and organized in blocks – denoted multiblock data – is becoming standard in several domains especially in biology. To explore the relationships between all these variables – at the block- and the variable-level – several exploratory multiblock methods were proposed. However, most of them are only designed for numeric variables. In reality, some data sets contain variables of different measurement levels (i.e., numeric, nominal, ordinal). In this article, we focus on exploratory multiblock methods that handle variables at their appropriate measurement level. Multi-Block Principal Component Analysis with Optimal Scaling (MBPCA-OS) is proposed and applied to multiblock data from the CURIE-O-SA French cohort. In this study, variables are of different measurement levels and organized in four blocks. The objective is to study the immune responses according to the SARS-CoV-2 infection and vaccination statuses, the symptoms and the participant’s characteristics.","PeriodicalId":49058,"journal":{"name":"International Journal of Biostatistics","volume":"92 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138579777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Biostatistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀