首页 > 最新文献

Biometrical Journal最新文献

英文 中文
Domain Selection for Gaussian Process Data: An Application to Electrocardiogram Signals 高斯过程数据的领域选择:心电图信号的应用
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-28 DOI: 10.1002/bimj.70018
Nicolás Hernández, Gabriel Martos

Gaussian processes and the Kullback–Leibler divergence have been deeply studied in statistics and machine learning. This paper marries these two concepts and introduce the local Kullback–Leibler divergence to learn about intervals where two Gaussian processes differ the most. We address subtleties entailed in the estimation of local divergences and the corresponding interval of local maximum divergence as well. The estimation performance and the numerical efficiency of the proposed method are showcased via a Monte Carlo simulation study. In a medical research context, we assess the potential of the devised tools in the analysis of electrocardiogram signals.

统计学和机器学习领域对高斯过程和库尔贝克-莱布勒发散进行了深入研究。本文将这两个概念结合起来,引入了局部库尔贝克-莱布勒发散,以了解两个高斯过程差异最大的区间。我们还讨论了估计局部发散和相应的局部最大发散区间所涉及的微妙问题。我们通过蒙特卡罗模拟研究展示了所提方法的估计性能和数值效率。在医学研究方面,我们评估了所设计的工具在分析心电图信号方面的潜力。
{"title":"Domain Selection for Gaussian Process Data: An Application to Electrocardiogram Signals","authors":"Nicolás Hernández,&nbsp;Gabriel Martos","doi":"10.1002/bimj.70018","DOIUrl":"10.1002/bimj.70018","url":null,"abstract":"<p>Gaussian processes and the Kullback–Leibler divergence have been deeply studied in statistics and machine learning. This paper marries these two concepts and introduce the local Kullback–Leibler divergence to learn about intervals where two Gaussian processes differ the most. We address subtleties entailed in the estimation of local divergences and the corresponding interval of local maximum divergence as well. The estimation performance and the numerical efficiency of the proposed method are showcased via a Monte Carlo simulation study. In a medical research context, we assess the potential of the devised tools in the analysis of electrocardiogram signals.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Semiparametric Two-Sample Density Ratio Model With a Change Point 带变化点的半参数双样本密度比模型
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-25 DOI: 10.1002/bimj.202300214
Jiahui Feng, Kin Yau Wong, Chun Yin Lee

The logistic regression model for a binary outcome with a continuous covariate can be expressed equivalently as a two-sample density ratio model for the covariate. Utilizing this equivalence, we study a change-point logistic regression model within the corresponding density ratio modeling framework. We investigate estimation and inference methods for the density ratio model and develop maximal score-type tests to detect the presence of a change point. In contrast to existing work, the density ratio modeling framework facilitates the development of a natural Kolmogorov–Smirnov type test to assess the validity of the logistic model assumptions. A simulation study is conducted to evaluate the finite-sample performance of the proposed tests and estimation methods. We illustrate the proposed approach using a mother-to-child HIV-1 transmission data set and an oral cancer data set.

带有连续协变量的二元结果逻辑回归模型可以等价地表示为协变量的双样本密度比模型。利用这一等价关系,我们在相应的密度比模型框架内研究了变化点逻辑回归模型。我们研究了密度比模型的估计和推理方法,并开发了最大得分类型检验来检测变化点的存在。与现有工作不同的是,密度比建模框架有助于开发一种自然的 Kolmogorov-Smirnov 类型检验,以评估逻辑模型假设的有效性。我们进行了一项模拟研究,以评估所提出的检验和估算方法的有限样本性能。我们使用 HIV-1 母婴传播数据集和口腔癌数据集说明了所提出的方法。
{"title":"A Semiparametric Two-Sample Density Ratio Model With a Change Point","authors":"Jiahui Feng,&nbsp;Kin Yau Wong,&nbsp;Chun Yin Lee","doi":"10.1002/bimj.202300214","DOIUrl":"10.1002/bimj.202300214","url":null,"abstract":"<div>\u0000 \u0000 <p>The logistic regression model for a binary outcome with a continuous covariate can be expressed equivalently as a two-sample density ratio model for the covariate. Utilizing this equivalence, we study a change-point logistic regression model within the corresponding density ratio modeling framework. We investigate estimation and inference methods for the density ratio model and develop maximal score-type tests to detect the presence of a change point. In contrast to existing work, the density ratio modeling framework facilitates the development of a natural Kolmogorov–Smirnov type test to assess the validity of the logistic model assumptions. A simulation study is conducted to evaluate the finite-sample performance of the proposed tests and estimation methods. We illustrate the proposed approach using a mother-to-child HIV-1 transmission data set and an oral cancer data set.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142717664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smoothed Estimation on Optimal Treatment Regime Under Semisupervised Setting in Randomized Trials 随机试验中半监督设置下最佳治疗方案的平滑估计
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-23 DOI: 10.1002/bimj.70006
Xiaoqi Jiao, Mengjiao Peng, Yong Zhou

A treatment regime refers to the process of assigning the most suitable treatment to a patient based on their observed information. However, prevailing research on treatment regimes predominantly relies on labeled data, which may lead to the omission of valuable information contained within unlabeled data, such as historical records and healthcare databases. Current semisupervised works for deriving optimal treatment regimes either rely on model assumptions or struggle with high computational burdens for even moderate-dimensional covariates. To address this concern, we propose a semisupervised framework that operates within a model-free context to estimate the optimal treatment regime by leveraging the abundant unlabeled data. Our proposed approach encompasses three key steps. First, we employ a single-index model to achieve dimension reduction, followed by kernel regression to impute the missing outcomes in the unlabeled data. Second, we propose various forms of semisupervised value functions based on the imputed values, incorporating both labeled and unlabeled data components. Lastly, the optimal treatment regimes are derived by maximizing the semisupervised value functions. We establish the consistency and asymptotic normality of the estimators proposed in our framework. Furthermore, we introduce a perturbation resampling procedure to estimate the asymptotic variance. Simulations confirm the advantageous properties of incorporating unlabeled data in the estimation for optimal treatment regimes. A practical data example is also provided to illustrate the application of our methodology. This work is rooted in the framework of randomized trials, with additional discussions extending to observational studies.

治疗方案是指根据观察到的患者信息为其指定最适合的治疗方法的过程。然而,目前有关治疗方案的研究主要依赖于标注数据,这可能会导致遗漏未标注数据(如历史记录和医疗数据库)中包含的宝贵信息。目前用于推导最佳治疗方案的半监督工作要么依赖于模型假设,要么即使是中等维度的协变量也要承受高昂的计算负担。为了解决这个问题,我们提出了一个半监督框架,该框架在无模型的背景下运行,利用丰富的无标记数据来估计最佳治疗方案。我们提出的方法包括三个关键步骤。首先,我们采用单指标模型来实现降维,然后用核回归来补偿未标记数据中的缺失结果。其次,我们根据估算值提出了各种形式的半监督值函数,其中包含标记和非标记数据成分。最后,通过使半监督价值函数最大化,得出最佳处理机制。我们确定了我们框架中提出的估计值的一致性和渐近正态性。此外,我们还引入了扰动重采样程序来估计渐近方差。模拟证实了将非标记数据纳入最优处理机制估计的优势特性。我们还提供了一个实际数据示例来说明我们方法的应用。本研究以随机试验为基础,并对观察性研究进行了补充讨论。
{"title":"Smoothed Estimation on Optimal Treatment Regime Under Semisupervised Setting in Randomized Trials","authors":"Xiaoqi Jiao,&nbsp;Mengjiao Peng,&nbsp;Yong Zhou","doi":"10.1002/bimj.70006","DOIUrl":"10.1002/bimj.70006","url":null,"abstract":"<div>\u0000 \u0000 <p>A treatment regime refers to the process of assigning the most suitable treatment to a patient based on their observed information. However, prevailing research on treatment regimes predominantly relies on labeled data, which may lead to the omission of valuable information contained within unlabeled data, such as historical records and healthcare databases. Current semisupervised works for deriving optimal treatment regimes either rely on model assumptions or struggle with high computational burdens for even moderate-dimensional covariates. To address this concern, we propose a semisupervised framework that operates within a model-free context to estimate the optimal treatment regime by leveraging the abundant unlabeled data. Our proposed approach encompasses three key steps. First, we employ a single-index model to achieve dimension reduction, followed by kernel regression to impute the missing outcomes in the unlabeled data. Second, we propose various forms of semisupervised value functions based on the imputed values, incorporating both labeled and unlabeled data components. Lastly, the optimal treatment regimes are derived by maximizing the semisupervised value functions. We establish the consistency and asymptotic normality of the estimators proposed in our framework. Furthermore, we introduce a perturbation resampling procedure to estimate the asymptotic variance. Simulations confirm the advantageous properties of incorporating unlabeled data in the estimation for optimal treatment regimes. A practical data example is also provided to illustrate the application of our methodology. This work is rooted in the framework of randomized trials, with additional discussions extending to observational studies.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142696123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simulating Data From Marginal Structural Models for a Survival Time Outcome 模拟生存时间结果的边际结构模型数据。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-23 DOI: 10.1002/bimj.70010
Shaun R. Seaman, Ruth H. Keogh

Marginal structural models (MSMs) are often used to estimate causal effects of treatments on survival time outcomes from observational data when time-dependent confounding may be present. They can be fitted using, for example, inverse probability of treatment weighting (IPTW). It is important to evaluate the performance of statistical methods in different scenarios, and simulation studies are a key tool for such evaluations. In such simulation studies, it is common to generate data in such a way that the model of interest is correctly specified, but this is not always straightforward when the model of interest is for potential outcomes, as is an MSM. Methods have been proposed for simulating from MSMs for a survival outcome, but these methods impose restrictions on the data-generating mechanism. Here, we propose a method that overcomes these restrictions. The MSM can be, for example, a marginal structural logistic model for a discrete survival time or a Cox or additive hazards MSM for a continuous survival time. The hazard of the potential survival time can be conditional on baseline covariates, and the treatment variable can be discrete or continuous. We illustrate the use of the proposed simulation algorithm by carrying out a brief simulation study. This study compares the coverage of confidence intervals calculated in two different ways for causal effect estimates obtained by fitting an MSM via IPTW.

边际结构模型(MSMs)通常用于估算观察数据中治疗对生存时间结果的因果效应,此时可能存在与时间相关的混杂因素。例如,可以使用治疗反概率加权法(IPTW)对其进行拟合。评估统计方法在不同情况下的性能非常重要,而模拟研究则是进行此类评估的重要工具。在此类模拟研究中,通常要以正确指定相关模型的方式生成数据,但如果相关模型是针对潜在结果的,如 MSM,则并非总是那么简单。有人提出了用 MSM 模拟生存结果的方法,但这些方法对数据生成机制施加了限制。在此,我们提出一种克服这些限制的方法。例如,MSM 可以是离散生存时间的边际结构逻辑模型,也可以是连续生存时间的 Cox 或加性危害 MSM。潜在生存时间的危害可以是以基线协变量为条件的,治疗变量可以是离散的,也可以是连续的。我们通过开展一项简短的模拟研究来说明所提出的模拟算法的使用方法。这项研究比较了通过 IPTW 拟合 MSM 得到的因果效应估计值的两种不同方法计算出的置信区间的覆盖范围。
{"title":"Simulating Data From Marginal Structural Models for a Survival Time Outcome","authors":"Shaun R. Seaman,&nbsp;Ruth H. Keogh","doi":"10.1002/bimj.70010","DOIUrl":"10.1002/bimj.70010","url":null,"abstract":"<p>Marginal structural models (MSMs) are often used to estimate causal effects of treatments on survival time outcomes from observational data when time-dependent confounding may be present. They can be fitted using, for example, inverse probability of treatment weighting (IPTW). It is important to evaluate the performance of statistical methods in different scenarios, and simulation studies are a key tool for such evaluations. In such simulation studies, it is common to generate data in such a way that the model of interest is correctly specified, but this is not always straightforward when the model of interest is for potential outcomes, as is an MSM. Methods have been proposed for simulating from MSMs for a survival outcome, but these methods impose restrictions on the data-generating mechanism. Here, we propose a method that overcomes these restrictions. The MSM can be, for example, a marginal structural logistic model for a discrete survival time or a Cox or additive hazards MSM for a continuous survival time. The hazard of the potential survival time can be conditional on baseline covariates, and the treatment variable can be discrete or continuous. We illustrate the use of the proposed simulation algorithm by carrying out a brief simulation study. This study compares the coverage of confidence intervals calculated in two different ways for causal effect estimates obtained by fitting an MSM via IPTW.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142696121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conditional Variable Screening for Ultra-High Dimensional Longitudinal Data With Time Interactions 对具有时间交互作用的超高维纵向数据进行条件变量筛选。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-23 DOI: 10.1002/bimj.70005
Andrea Bratsberg, Abhik Ghosh, Magne Thoresen

In recent years, we have been able to gather large amounts of genomic data at a fast rate, creating situations where the number of variables greatly exceeds the number of observations. In these situations, most models that can handle a moderately high dimension will now become computationally infeasible or unstable. Hence, there is a need for a prescreening of variables to reduce the dimension efficiently and accurately to a more moderate scale. There has been much work to develop such screening procedures for independent outcomes. However, much less work has been done for high-dimensional longitudinal data in which the observations can no longer be assumed to be independent. In addition, it is of interest to capture possible interactions between the genomic variable and time in many of these longitudinal studies. In this work, we propose a novel conditional screening procedure that ranks variables according to the likelihood value at the maximum likelihood estimates in a marginal linear mixed model, where the genomic variable and its interaction with time are included in the model. This is to our knowledge the first conditional screening approach for clustered data. We prove that this approach enjoys the sure screening property, and assess the finite sample performance of the method through simulations.

近年来,我们能够以极快的速度收集大量基因组数据,从而产生了变量数量大大超过观测数据数量的情况。在这种情况下,大多数能处理中等维度的模型在计算上都变得不可行或不稳定。因此,有必要对变量进行预筛选,以便有效、准确地将维度降低到更适中的程度。针对独立结果开发此类筛选程序的工作已经开展了很多。然而,针对高维度纵向数据的工作却少得多,因为在这种数据中,观察结果不能再假定是独立的。此外,在许多这类纵向研究中,捕捉基因组变量与时间之间可能存在的交互作用也很有意义。在这项工作中,我们提出了一种新颖的条件筛选程序,该程序根据边际线性混合模型中最大似然估计值的似然值对变量进行排序,其中基因组变量及其与时间的交互作用都包含在模型中。据我们所知,这是第一种针对聚类数据的条件筛选方法。我们证明了这种方法具有确定筛选属性,并通过模拟评估了该方法的有限样本性能。
{"title":"Conditional Variable Screening for Ultra-High Dimensional Longitudinal Data With Time Interactions","authors":"Andrea Bratsberg,&nbsp;Abhik Ghosh,&nbsp;Magne Thoresen","doi":"10.1002/bimj.70005","DOIUrl":"10.1002/bimj.70005","url":null,"abstract":"<p>In recent years, we have been able to gather large amounts of genomic data at a fast rate, creating situations where the number of variables greatly exceeds the number of observations. In these situations, most models that can handle a moderately high dimension will now become computationally infeasible or unstable. Hence, there is a need for a prescreening of variables to reduce the dimension efficiently and accurately to a more moderate scale. There has been much work to develop such screening procedures for independent outcomes. However, much less work has been done for high-dimensional longitudinal data in which the observations can no longer be assumed to be independent. In addition, it is of interest to capture possible interactions between the genomic variable and time in many of these longitudinal studies. In this work, we propose a novel conditional screening procedure that ranks variables according to the likelihood value at the maximum likelihood estimates in a marginal linear mixed model, where the genomic variable and its interaction with time are included in the model. This is to our knowledge the first conditional screening approach for clustered data. We prove that this approach enjoys the sure screening property, and assess the finite sample performance of the method through simulations.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70005","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142696119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incompletely Observed Nonparametric Factorial Designs With Repeated Measurements: A Wild Bootstrap Approach 具有重复测量的不完全观测非参数因子设计:野性引导法
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-23 DOI: 10.1002/bimj.70008
Lubna Amro, Frank Konietschke, Markus Pauly

In many life science experiments or medical studies, subjects are repeatedly observed and measurements are collected in factorial designs with multivariate data. The analysis of such multivariate data is typically based on multivariate analysis of variance (MANOVA) or mixed models, requiring complete data, and certain assumption on the underlying parametric distribution such as continuity or a specific covariance structure, for example, compound symmetry. However, these methods are usually not applicable when discrete data or even ordered categorical data are present. In such cases, nonparametric rank-based methods that do not require stringent distributional assumptions are the preferred choice. However, in the multivariate case, most rank-based approaches have only been developed for complete observations. It is the aim of this work to develop asymptotic correct procedures that are capable of handling missing values, allowing for singular covariance matrices and are applicable for ordinal or ordered categorical data. This is achieved by applying a wild bootstrap procedure in combination with quadratic form-type test statistics. Beyond proving their asymptotic correctness, extensive simulation studies validate their applicability for small samples. Finally, two real data examples are analyzed.

在许多生命科学实验或医学研究中,受试者会被反复观察,并在因子设计中收集多变量数据。对这类多变量数据的分析通常基于多变量方差分析(MANOVA)或混合模型,需要完整的数据,以及对基本参数分布的某些假设,如连续性或特定的协方差结构,例如复合对称性。然而,这些方法通常不适用于离散数据甚至有序分类数据。在这种情况下,无需严格分布假设的非参数秩方法是首选。然而,在多变量情况下,大多数基于秩的方法只针对完整的观测数据。这项工作的目的是开发能够处理缺失值、允许奇异协方差矩阵并适用于序数或有序分类数据的渐进正确程序。这是通过应用野生引导程序与二次型检验统计相结合来实现的。除了证明其渐近正确性之外,大量的模拟研究也验证了其对小样本的适用性。最后,还分析了两个真实数据实例。
{"title":"Incompletely Observed Nonparametric Factorial Designs With Repeated Measurements: A Wild Bootstrap Approach","authors":"Lubna Amro,&nbsp;Frank Konietschke,&nbsp;Markus Pauly","doi":"10.1002/bimj.70008","DOIUrl":"10.1002/bimj.70008","url":null,"abstract":"<p>In many life science experiments or medical studies, subjects are repeatedly observed and measurements are collected in factorial designs with multivariate data. The analysis of such multivariate data is typically based on multivariate analysis of variance (MANOVA) or mixed models, requiring complete data, and certain assumption on the underlying parametric distribution such as continuity or a specific covariance structure, for example, compound symmetry. However, these methods are usually not applicable when discrete data or even ordered categorical data are present. In such cases, nonparametric rank-based methods that do not require stringent distributional assumptions are the preferred choice. However, in the multivariate case, most rank-based approaches have only been developed for complete observations. It is the aim of this work to develop asymptotic correct procedures that are capable of handling missing values, allowing for singular covariance matrices and are applicable for ordinal or ordered categorical data. This is achieved by applying a wild bootstrap procedure in combination with quadratic form-type test statistics. Beyond proving their asymptotic correctness, extensive simulation studies validate their applicability for small samples. Finally, two real data examples are analyzed.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142696120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing Class Imbalance in Bayesian Classification Through Posterior Probability Adjustment 通过后验概率调整解决贝叶斯分类中的类不平衡问题
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-18 DOI: 10.1002/bimj.70004
Vahid Nassiri, Fetene Tekle, Kanaka Tatikola, Helena Geys

Class imbalance is a known issue in classification tasks that can lead to predictive bias toward dominant classes. This paper introduces a novel straightforward Bayesian framework that adjusts posterior probabilities to counteract the bias introduced by imbalanced data sets. Instead of relying on the mean posterior distribution of class probabilities, we propose a method that scales the posterior probability of each class according to their representation in the training data.

类不平衡是分类任务中的一个已知问题,它可能导致对优势类的预测偏差。本文介绍了一种新颖、直接的贝叶斯框架,它可以调整后验概率以抵消不平衡数据集带来的偏差。我们提出的方法不是依赖类概率的平均后验分布,而是根据每个类在训练数据中的代表性来调整它们的后验概率。
{"title":"Addressing Class Imbalance in Bayesian Classification Through Posterior Probability Adjustment","authors":"Vahid Nassiri,&nbsp;Fetene Tekle,&nbsp;Kanaka Tatikola,&nbsp;Helena Geys","doi":"10.1002/bimj.70004","DOIUrl":"10.1002/bimj.70004","url":null,"abstract":"<div>\u0000 \u0000 <p>Class imbalance is a known issue in classification tasks that can lead to predictive bias toward dominant classes. This paper introduces a novel straightforward Bayesian framework that adjusts posterior probabilities to counteract the bias introduced by imbalanced data sets. Instead of relying on the mean posterior distribution of class probabilities, we propose a method that scales the posterior probability of each class according to their representation in the training data.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inverse-Weighted Quantile Regression With Partially Interval-Censored Data 使用部分区间删失数据的反加权定量回归
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-14 DOI: 10.1002/bimj.70001
Yeji Kim, Taehwa Choi, Seohyeon Park, Sangbum Choi, Dipankar Bandyopadhyay

This paper introduces a novel approach to estimating censored quantile regression using inverse probability of censoring weighted (IPCW) methodology, specifically tailored for data sets featuring partially interval-censored data. Such data sets, often encountered in HIV/AIDS and cancer biomedical research, may include doubly censored (DC) and partly interval-censored (PIC) endpoints. DC responses involve either left-censoring or right-censoring alongside some exact failure time observations, while PIC responses are subject to interval-censoring. Despite the existence of complex estimating techniques for interval-censored quantile regression, we propose a simple and intuitive IPCW-based method, easily implementable by assigning suitable inverse-probability weights to subjects with exact failure time observations. The resulting estimator exhibits asymptotic properties, such as uniform consistency and weak convergence, and we explore an augmented-IPCW (AIPCW) approach to enhance efficiency. In addition, our method can be adapted for multivariate partially interval-censored data. Simulation studies demonstrate the new procedure's strong finite-sample performance. We illustrate the practical application of our approach through an analysis of progression-free survival endpoints in a phase III clinical trial focusing on metastatic colorectal cancer.

本文介绍了一种利用反删失概率加权(IPCW)方法估计删失量回归的新方法,该方法专门针对具有部分区间删失数据的数据集。此类数据集在艾滋病和癌症生物医学研究中经常遇到,可能包括双重删减(DC)和部分区间删减(PIC)终点。双删失反应涉及左删失或右删失以及一些精确的失败时间观测,而部分区间删失反应则受区间删失的影响。尽管存在复杂的区间校正量子回归估计技术,但我们提出了一种简单直观的基于 IPCW 的方法,通过为具有确切故障时间观测值的受试者分配合适的反概率权重,该方法很容易实现。由此产生的估计器具有渐近特性,如均匀一致性和弱收敛性,我们还探索了一种增强型 IPCW(AIPCW)方法来提高效率。此外,我们的方法还适用于多变量部分区间删失数据。仿真研究表明,新方法具有很强的有限样本性能。我们通过分析一项以转移性结直肠癌为重点的 III 期临床试验中的无进展生存终点来说明我们的方法的实际应用。
{"title":"Inverse-Weighted Quantile Regression With Partially Interval-Censored Data","authors":"Yeji Kim,&nbsp;Taehwa Choi,&nbsp;Seohyeon Park,&nbsp;Sangbum Choi,&nbsp;Dipankar Bandyopadhyay","doi":"10.1002/bimj.70001","DOIUrl":"10.1002/bimj.70001","url":null,"abstract":"<p>This paper introduces a novel approach to estimating censored quantile regression using inverse probability of censoring weighted (IPCW) methodology, specifically tailored for data sets featuring partially interval-censored data. Such data sets, often encountered in HIV/AIDS and cancer biomedical research, may include doubly censored (DC) and partly interval-censored (PIC) endpoints. DC responses involve either left-censoring or right-censoring alongside some exact failure time observations, while PIC responses are subject to interval-censoring. Despite the existence of complex estimating techniques for interval-censored quantile regression, we propose a simple and intuitive IPCW-based method, easily implementable by assigning suitable inverse-probability weights to subjects with exact failure time observations. The resulting estimator exhibits asymptotic properties, such as uniform consistency and weak convergence, and we explore an augmented-IPCW (AIPCW) approach to enhance efficiency. In addition, our method can be adapted for multivariate partially interval-censored data. Simulation studies demonstrate the new procedure's strong finite-sample performance. We illustrate the practical application of our approach through an analysis of progression-free survival endpoints in a phase III clinical trial focusing on metastatic colorectal cancer.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142632702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixture Cure Semiparametric Accelerated Failure Time Models With Partly Interval-Censored Data 具有部分区间缺失数据的混合物验证半参数加速失效时间模型
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-07 DOI: 10.1002/bimj.202300203
Isabel Li, Jun Ma, Benoit Liquet

In practical survival analysis, the situation of no event for a patient can arise even after a long period of waiting time, which means a portion of the population may never experience the event of interest. Under this circumstance, one remedy is to adopt a mixture cure Cox model to analyze the survival data. However, if there clearly exhibits an acceleration (or deceleration) factor among their survival times, then an accelerated failure time (AFT) model will be preferred, leading to a mixture cure AFT model. In this paper, we consider a penalized likelihood method to estimate the mixture cure semiparametric AFT models, where the unknown baseline hazard is approximated using Gaussian basis functions. We allow partly interval-censored survival data which can include event times and left-, right-, and interval-censoring times. The penalty function helps to achieve a smooth estimate of the baseline hazard function. We will also provide asymptotic properties to the estimates so that inferences can be made on regression parameters and hazard-related quantities. Simulation studies are conducted to evaluate the model performance, which includes a comparative study with an existing method from the smcure R package. The results show that our proposed penalized likelihood method has acceptable performance in general and produces less bias when faced with the identifiability issue compared to smcure. To illustrate the application of our method, a real case study involving melanoma recurrence is conducted and reported. Our model is implemented in our R package aftQnp which is available from https://github.com/Isabellee4555/aftQnP.

在实际的生存分析中,即使经过很长一段时间的等待,也可能会出现患者无事件发生的情况,这意味着有一部分人可能永远不会经历感兴趣的事件。在这种情况下,一种补救方法是采用混合治愈考克斯模型来分析生存数据。但是,如果他们的存活时间明显存在加速(或减速)因素,那么加速失效时间(AFT)模型将更受青睐,从而导致混合固化 AFT 模型。在本文中,我们考虑用惩罚似然法估计混合治愈半参数 AFT 模型,其中未知基线危害使用高斯基函数近似。我们允许部分区间校正的生存数据,这些数据可以包括事件时间、左校正时间、右校正时间和区间校正时间。惩罚函数有助于实现基线危害函数的平稳估计。我们还将提供估计值的渐近特性,以便对回归参数和危害相关量进行推断。我们进行了模拟研究来评估模型的性能,其中包括与 smcure R 软件包中现有方法的比较研究。结果表明,我们提出的惩罚似然法总体上具有可接受的性能,与 smcure 相比,在面临可识别性问题时产生的偏差较小。为了说明我们方法的应用,我们进行并报告了一个涉及黑色素瘤复发的真实案例研究。我们的模型在 R 软件包 aftQnp 中实现,该软件包可从 https://github.com/Isabellee4555/aftQnP 获取。
{"title":"Mixture Cure Semiparametric Accelerated Failure Time Models With Partly Interval-Censored Data","authors":"Isabel Li,&nbsp;Jun Ma,&nbsp;Benoit Liquet","doi":"10.1002/bimj.202300203","DOIUrl":"10.1002/bimj.202300203","url":null,"abstract":"<div>\u0000 \u0000 <p>In practical survival analysis, the situation of no event for a patient can arise even after a long period of waiting time, which means a portion of the population may never experience the event of interest. Under this circumstance, one remedy is to adopt a mixture cure Cox model to analyze the survival data. However, if there clearly exhibits an acceleration (or deceleration) factor among their survival times, then an accelerated failure time (AFT) model will be preferred, leading to a mixture cure AFT model. In this paper, we consider a penalized likelihood method to estimate the mixture cure semiparametric AFT models, where the unknown baseline hazard is approximated using Gaussian basis functions. We allow partly interval-censored survival data which can include event times and left-, right-, and interval-censoring times. The penalty function helps to achieve a smooth estimate of the baseline hazard function. We will also provide asymptotic properties to the estimates so that inferences can be made on regression parameters and hazard-related quantities. Simulation studies are conducted to evaluate the model performance, which includes a comparative study with an existing method from the <span>smcure</span> <span>R</span> package. The results show that our proposed penalized likelihood method has acceptable performance in general and produces less bias when faced with the identifiability issue compared to <span>smcure</span>. To illustrate the application of our method, a real case study involving melanoma recurrence is conducted and reported. Our model is implemented in our R package <span>aftQnp</span> which is available from https://github.com/Isabellee4555/aftQnP.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Replication of Equivalence Studies 等效研究的复制。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-29 DOI: 10.1002/bimj.202300232
Charlotte Micheloud, Leonhard Held

Replication studies are increasingly conducted to assess the credibility of scientific findings. Most of these replication attempts target studies with a superiority design, but there is a lack of methodology regarding the analysis of replication studies with alternative types of designs, such as equivalence. In order to fill this gap, we propose two approaches, the two-trials rule and the sceptical two one-sided tests (TOST) procedure, adapted from methods used in superiority settings. Both methods have the same overall Type-I error rate, but the sceptical TOST procedure allows replication success even for nonsignificant original or replication studies. This leads to a larger project power and other differences in relevant operating characteristics. Both methods can be used for sample size calculation of the replication study, based on the results from the original one. The two methods are applied to data from the Reproducibility Project: Cancer Biology.

为评估科学发现的可信度,越来越多地开展了复制研究。这些复制尝试大多以优越性设计的研究为目标,但对于采用其他类型设计(如等效设计)的复制研究,却缺乏分析方法。为了填补这一空白,我们提出了两种方法,即两两试验规则和怀疑性两单侧试验(TOST)程序,这两种方法改编自优越性设计中使用的方法。这两种方法的总体I类错误率相同,但怀疑性 TOST 程序允许即使是不显著的原始或复制研究也能复制成功。这就导致了更大的项目功率和相关操作特征的其他差异。这两种方法都可用于根据原始研究的结果计算复制研究的样本量。这两种方法适用于可重复性项目的数据:癌症生物学
{"title":"The Replication of Equivalence Studies","authors":"Charlotte Micheloud,&nbsp;Leonhard Held","doi":"10.1002/bimj.202300232","DOIUrl":"10.1002/bimj.202300232","url":null,"abstract":"<p>Replication studies are increasingly conducted to assess the credibility of scientific findings. Most of these replication attempts target studies with a superiority design, but there is a lack of methodology regarding the analysis of replication studies with alternative types of designs, such as equivalence. In order to fill this gap, we propose two approaches, the two-trials rule and the sceptical two one-sided tests (TOST) procedure, adapted from methods used in superiority settings. Both methods have the same overall Type-I error rate, but the sceptical TOST procedure allows replication success even for nonsignificant original or replication studies. This leads to a larger project power and other differences in relevant operating characteristics. Both methods can be used for sample size calculation of the replication study, based on the results from the original one. The two methods are applied to data from the Reproducibility Project: Cancer Biology.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202300232","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrical Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1