首页 > 最新文献

Biometrical Journal最新文献

英文 中文
Test Statistics and Statistical Inference for Data With Informative Cluster Sizes 具有信息簇大小的数据的检验统计和统计推断。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70021
Soyoung Kim, Michael J. Martens, Kwang Woo Ahn

In biomedical studies, investigators often encounter clustered data. The cluster sizes are said to be informative if the outcome depends on the cluster size. Ignoring informative cluster sizes in the analysis leads to biased parameter estimation in marginal and mixed-effect regression models. Several methods to analyze data with informative cluster sizes have been proposed; however, methods to test the informativeness of the cluster sizes are limited, particularly for the marginal model. In this paper, we propose a score test and a Wald test to examine the informativeness of the cluster sizes for a generalized linear model, a Cox model, and a proportional subdistribution hazards model. Statistical inference can be conducted through weighted estimating equations. The simulation results show that both tests control Type I error rates well, but the score test has higher power than the Wald test for right-censored data while the power of the Wald test is generally higher than the score test for the binary outcome. We apply the Wald and score tests to hematopoietic cell transplant data and compare regression analysis results with/without adjusting for informative cluster sizes.

在生物医学研究中,研究人员经常会遇到聚类数据。如果结果取决于聚类大小,聚类大小就被认为是有信息量的。在分析中忽略有信息的聚类大小会导致边际回归模型和混合效应回归模型的参数估计出现偏差。目前已经提出了几种方法来分析具有信息量聚类大小的数据;然而,检验聚类大小信息量的方法却很有限,尤其是在边际模型中。在本文中,我们提出了一种得分检验和一种 Wald 检验来检验广义线性模型、Cox 模型和比例子分布危险模型的聚类大小的信息性。统计推断可通过加权估计方程进行。模拟结果表明,两种检验都能很好地控制 I 类错误率,但对于右删失数据,得分检验的功率高于 Wald 检验,而对于二元结果,Wald 检验的功率一般高于得分检验。我们将 Wald 检验和得分检验应用于造血细胞移植数据,并比较了有/无信息群组大小调整的回归分析结果。
{"title":"Test Statistics and Statistical Inference for Data With Informative Cluster Sizes","authors":"Soyoung Kim,&nbsp;Michael J. Martens,&nbsp;Kwang Woo Ahn","doi":"10.1002/bimj.70021","DOIUrl":"10.1002/bimj.70021","url":null,"abstract":"<div>\u0000 \u0000 <p>In biomedical studies, investigators often encounter clustered data. The cluster sizes are said to be informative if the outcome depends on the cluster size. Ignoring informative cluster sizes in the analysis leads to biased parameter estimation in marginal and mixed-effect regression models. Several methods to analyze data with informative cluster sizes have been proposed; however, methods to test the informativeness of the cluster sizes are limited, particularly for the marginal model. In this paper, we propose a score test and a Wald test to examine the informativeness of the cluster sizes for a generalized linear model, a Cox model, and a proportional subdistribution hazards model. Statistical inference can be conducted through weighted estimating equations. The simulation results show that both tests control Type I error rates well, but the score test has higher power than the Wald test for right-censored data while the power of the Wald test is generally higher than the score test for the binary outcome. We apply the Wald and score tests to hematopoietic cell transplant data and compare regression analysis results with/without adjusting for informative cluster sizes.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization 使用连续优化的线性降维模型的最佳子集求解路径
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70015
Benoit Liquet, Sarat Moka, Samuel Muller

The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high-dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper, we focus on two multivariate statistical methods: principal components analysis and partial least squares. Both approaches are popular linear dimension-reduction methods with numerous applications in several fields including in genomics, biology, environmental science, and engineering. In particular, these approaches build principal components, new variables that are combinations of all the original variables. A main drawback of principal components is the difficulty to interpret them when the number of variables is large. To define principal components from the most relevant variables, we propose to cast the best subset solution path method into principal component analysis and partial least square frameworks. We offer a new alternative by exploiting a continuous optimization algorithm for best subset solution path. Empirical studies show the efficacy of our approach for providing the best subset solution path. The usage of our algorithm is further exposed through the analysis of two real data sets. The first data set is analyzed using the principle component analysis while the analysis of the second data set is based on partial least square framework.

在有监督和无监督学习中,最佳变量的选择是一个具有挑战性的问题,尤其是在高维情况下,变量的数量通常远远大于观测值的数量。本文重点讨论两种多元统计方法:主成分分析和偏最小二乘法。这两种方法都是流行的线性降维方法,在基因组学、生物学、环境科学和工程学等多个领域都有大量应用。特别是,这些方法可以建立主成分,即由所有原始变量组合而成的新变量。主成分的一个主要缺点是在变量数量较多时难以解释。为了从最相关的变量中定义主成分,我们建议将最佳子集求解路径法引入主成分分析和偏最小二乘法框架。我们利用最佳子集求解路径的连续优化算法,提供了一种新的选择。实证研究表明,我们的方法能有效提供最佳子集求解路径。通过对两个真实数据集的分析,进一步揭示了我们算法的用途。第一个数据集使用原理成分分析法进行分析,而第二个数据集的分析则基于偏最小二乘法框架。
{"title":"Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization","authors":"Benoit Liquet,&nbsp;Sarat Moka,&nbsp;Samuel Muller","doi":"10.1002/bimj.70015","DOIUrl":"10.1002/bimj.70015","url":null,"abstract":"<div>\u0000 \u0000 <p>The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high-dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper, we focus on two multivariate statistical methods: principal components analysis and partial least squares. Both approaches are popular linear dimension-reduction methods with numerous applications in several fields including in genomics, biology, environmental science, and engineering. In particular, these approaches build principal components, new variables that are combinations of all the original variables. A main drawback of principal components is the difficulty to interpret them when the number of variables is large. To define principal components from the most relevant variables, we propose to cast the best subset solution path method into principal component analysis and partial least square frameworks. We offer a new alternative by exploiting a continuous optimization algorithm for best subset solution path. Empirical studies show the efficacy of our approach for providing the best subset solution path. The usage of our algorithm is further exposed through the analysis of two real data sets. The first data set is analyzed using the principle component analysis while the analysis of the second data set is based on partial least square framework.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Goodness-of-Fit Testing for a Regression Model With a Doubly Truncated Response 双截断响应回归模型的拟合优度检验。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70022
Jacobo de Uña-Álvarez

In survival analysis and epidemiology, among other fields, interval sampling is often employed. With interval sampling, the individuals undergoing the event of interest within a calendar time interval are recruited. This results in doubly truncated event times. Double truncation, which may appear with other sampling designs too, induces a selection bias, so ordinary statistical methods are generally inconsistent. In this paper, we introduce goodness-of-fit procedures for a regression model when the response variable is doubly truncated. With this purpose, a marked empirical process based on weighted residuals is constructed and its weak convergence is established. Kolmogorov–Smirnov– and Cramér–von Mises–type tests are consequently derived from such core process, and a bootstrap approximation for their practical implementation is given. The performance of the proposed tests is investigated through simulations. An application to model selection for AIDS incubation time as depending on age at infection is provided.

在生存分析和流行病学等领域,经常采用间隔抽样。使用间隔抽样,在日历时间间隔内招募经历感兴趣事件的个体。这导致事件时间被双重截断。在其他抽样设计中也可能出现双截尾,这导致了选择偏差,因此普通的统计方法通常不一致。本文介绍了当响应变量被双重截断时回归模型的拟合优度过程。为此,构造了一个基于加权残差的标记经验过程,并证明了其弱收敛性。因此,从这种核心过程导出了Kolmogorov-Smirnov- type检验和cram -von Mises-type检验,并给出了其实际实施的自举近似。通过仿真研究了所提出的测试方法的性能。提供了一种应用于艾滋病潜伏期模型选择的方法,这取决于感染年龄。
{"title":"Goodness-of-Fit Testing for a Regression Model With a Doubly Truncated Response","authors":"Jacobo de Uña-Álvarez","doi":"10.1002/bimj.70022","DOIUrl":"10.1002/bimj.70022","url":null,"abstract":"<p>In survival analysis and epidemiology, among other fields, interval sampling is often employed. With interval sampling, the individuals undergoing the event of interest within a calendar time interval are recruited. This results in doubly truncated event times. Double truncation, which may appear with other sampling designs too, induces a selection bias, so ordinary statistical methods are generally inconsistent. In this paper, we introduce goodness-of-fit procedures for a regression model when the response variable is doubly truncated. With this purpose, a marked empirical process based on weighted residuals is constructed and its weak convergence is established. Kolmogorov–Smirnov– and Cramér–von Mises–type tests are consequently derived from such core process, and a bootstrap approximation for their practical implementation is given. The performance of the proposed tests is investigated through simulations. An application to model selection for AIDS incubation time as depending on age at infection is provided.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70022","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adjusted Inference for Multiple Testing Procedure in Group-Sequential Designs 组序贯设计中多重检验程序的调整推理。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70020
Yujie Zhao, Qi Liu, Linda Z. Sun, Keaven M. Anderson

Adjustment of statistical significance levels for repeated analysis in group-sequential trials has been understood for some time. Adjustment accounting for testing multiple hypotheses is also well understood. There is limited research on simultaneously adjusting for both multiple hypothesis testing and repeated analyses of one or more hypotheses. We address this gap by proposing adjusted-sequential p-values that reject when they are less than or equal to the family-wise Type I error rate (FWER). We also propose sequential p$p$-values for intersection hypotheses to compute adjusted-sequential p$p$-values for elementary hypotheses. We demonstrate the application using weighted Bonferroni tests and weighted parametric tests for inference on each elementary hypothesis tested.

在分组序列试验中,对重复分析的统计显著性水平进行调整已经有一段时间了。对多重假设检验的调整也已广为人知。关于同时对多重假设检验和一个或多个假设的重复分析进行调整的研究还很有限。为了弥补这一不足,我们提出了调整后的序列 p 值,当其小于或等于族内 I 类错误率 (FWER) 时,就拒绝接受。我们还提出了交集假设的序列 p $p $ 值,以计算基本假设的调整序列 p $p $ 值。我们使用加权 Bonferroni 检验和加权参数检验来演示应用,以推断所检验的每个基本假设。
{"title":"Adjusted Inference for Multiple Testing Procedure in Group-Sequential Designs","authors":"Yujie Zhao,&nbsp;Qi Liu,&nbsp;Linda Z. Sun,&nbsp;Keaven M. Anderson","doi":"10.1002/bimj.70020","DOIUrl":"10.1002/bimj.70020","url":null,"abstract":"<div>\u0000 \u0000 <p>Adjustment of statistical significance levels for repeated analysis in group-sequential trials has been understood for some time. Adjustment accounting for testing multiple hypotheses is also well understood. There is limited research on simultaneously adjusting for both multiple hypothesis testing and repeated analyses of one or more hypotheses. We address this gap by proposing <i>adjusted-sequential p-values</i> that reject when they are less than or equal to the family-wise Type I error rate (FWER). We also propose sequential <span></span><math>\u0000 <semantics>\u0000 <mi>p</mi>\u0000 <annotation>$p$</annotation>\u0000 </semantics></math>-values for intersection hypotheses to compute adjusted-sequential <span></span><math>\u0000 <semantics>\u0000 <mi>p</mi>\u0000 <annotation>$p$</annotation>\u0000 </semantics></math>-values for elementary hypotheses. We demonstrate the application using weighted Bonferroni tests and weighted parametric tests for inference on each elementary hypothesis tested.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Issue Information: Biometrical Journal 1'25 期刊信息:biometic Journal 1'25
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-15 DOI: 10.1002/bimj.70027
{"title":"Issue Information: Biometrical Journal 1'25","authors":"","doi":"10.1002/bimj.70027","DOIUrl":"https://doi.org/10.1002/bimj.70027","url":null,"abstract":"","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142868580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Interactions in High-Dimensional Data Using Cross Leverage Scores 利用交叉杠杆分数检测高维数据中的相互作用
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-29 DOI: 10.1002/bimj.70014
Sven Teschke, Katja Ickstadt, Alexander Munteanu

We develop a variable selection method for interactions in regression models on large data in the context of genetics. The method is intended for investigating the influence of single-nucleotide polymorphisms (SNPs) and their interactions on health outcomes, which is a pn$pgg n$ problem. We introduce cross leverage scores (CLSs) to detect interactions of variables while maintaining interpretability. Using this method, it is not necessary to consider every possible interaction between variables individually, which would be very time-consuming even for moderate amounts of variables. Instead, we calculate the CLS for each variable and obtain a measure of importance for this variable. Calculating the scores remains time-consuming for large data sets. The key idea for scaling to large data is to divide the data into smaller random batches or consecutive windows of variables. This avoids complex and time-consuming computations on high-dimensional matrices by performing the computations only for small subsets of the data, which is less costly. We compare these methods to provable approximations of CLS based on sketching, which aims at summarizing data succinctly. In a simulation study, we show that the CLSs are directly linked to the importance of a variable in the sense of an interaction effect. We further show that the approximation approaches are appropriate for performing the calculations efficiently on arbitrarily large data while preserving the interaction detection effect of the CLS. This underlines their scalability to genome wide data. In addition, we evaluate the methods on real data from the HapMap project.

我们开发了一种变量选择方法,用于在遗传学背景下的大数据回归模型中的相互作用。该方法旨在研究单核苷酸多态性(snp)及其相互作用对健康结果的影响,这是一个p > n$ pgg n$的问题。我们引入交叉杠杆分数(cls)来检测变量的相互作用,同时保持可解释性。使用这种方法,不需要单独考虑变量之间的每个可能的相互作用,即使对于适量的变量,也会非常耗时。相反,我们计算每个变量的CLS,并获得该变量的重要性度量。对于大型数据集,计算分数仍然很耗时。扩展到大数据的关键思想是将数据分成更小的随机批次或连续的变量窗口。通过只对数据的小子集执行计算,这避免了在高维矩阵上进行复杂和耗时的计算,成本更低。我们将这些方法与基于草图的可证明的CLS近似进行比较,草图旨在简洁地总结数据。在模拟研究中,我们表明,在交互效应的意义上,cls与变量的重要性直接相关。我们进一步表明,近似方法适用于在任意大数据上有效地执行计算,同时保留CLS的相互作用检测效果。这强调了它们对全基因组数据的可扩展性。此外,我们还对来自HapMap项目的实际数据进行了评估。
{"title":"Detecting Interactions in High-Dimensional Data Using Cross Leverage Scores","authors":"Sven Teschke,&nbsp;Katja Ickstadt,&nbsp;Alexander Munteanu","doi":"10.1002/bimj.70014","DOIUrl":"https://doi.org/10.1002/bimj.70014","url":null,"abstract":"<p>We develop a variable selection method for interactions in regression models on large data in the context of genetics. The method is intended for investigating the influence of single-nucleotide polymorphisms (SNPs) and their interactions on health outcomes, which is a <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>p</mi>\u0000 <mo>≫</mo>\u0000 <mi>n</mi>\u0000 </mrow>\u0000 <annotation>$pgg n$</annotation>\u0000 </semantics></math> problem. We introduce cross leverage scores (CLSs) to detect interactions of variables while maintaining interpretability. Using this method, it is not necessary to consider every possible interaction between variables individually, which would be very time-consuming even for moderate amounts of variables. Instead, we calculate the CLS for each variable and obtain a measure of importance for this variable. Calculating the scores remains time-consuming for large data sets. The key idea for scaling to large data is to divide the data into smaller random batches or consecutive windows of variables. This avoids complex and time-consuming computations on high-dimensional matrices by performing the computations only for small subsets of the data, which is less costly. We compare these methods to provable approximations of CLS based on sketching, which aims at summarizing data succinctly. In a simulation study, we show that the CLSs are directly linked to the importance of a variable in the sense of an interaction effect. We further show that the approximation approaches are appropriate for performing the calculations efficiently on arbitrarily large data while preserving the interaction detection effect of the CLS. This underlines their scalability to genome wide data. In addition, we evaluate the methods on real data from the HapMap project.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142749303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model Selection for Ordinary Differential Equations: A Statistical Testing Approach 常微分方程的模型选择:统计检验方法》。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-28 DOI: 10.1002/bimj.70013
Itai Dattner, Shota Gugushvili, Oleksandr Laskorunskyi

Ordinary differential equations (ODEs) are foundational tools in modeling intricate dynamics across a gamut of scientific disciplines. Yet, a possibility to represent a single phenomenon through multiple ODE models, driven by different understandings of nuances in internal mechanisms or abstraction levels, presents a model selection challenge. This study introduces a testing-based approach for ODE model selection amidst statistical noise. Rooted in the model misspecification framework, we adapt classical statistical paradigms (Vuong and Hotelling) to the ODE context, allowing for the comparison and ranking of diverse causal explanations without the constraints of nested models. Our simulation studies numerically investigate the statistical properties of the test, demonstrating its attainment of the nominal size and power across various settings. Real-world data examples further underscore the algorithm's applicability in practice. To foster accessibility and encourage real-world applications, we provide a user-friendly Python implementation of our model selection algorithm, bridging theoretical advancements with hands-on tools for the scientific community.

常微分方程(ODEs)是各学科复杂动力学建模的基础工具。然而,由于对内部机制或抽象程度的细微差别有不同的理解,通过多个 ODE 模型表示单一现象的可能性给模型选择带来了挑战。本研究介绍了一种基于测试的方法,用于在统计噪声中选择 ODE 模型。植根于模型错配框架,我们将经典统计范式(Vuong 和 Hotelling)应用于 ODE,从而可以在不受嵌套模型限制的情况下对不同的因果解释进行比较和排序。我们的模拟研究从数值上研究了该检验的统计特性,证明它在各种环境下都能达到标称规模和功率。真实世界的数据实例进一步强调了该算法在实践中的适用性。为了提高可访问性并鼓励实际应用,我们为模型选择算法提供了用户友好的 Python 实现,为科学界架起了理论进展与实践工具之间的桥梁。
{"title":"Model Selection for Ordinary Differential Equations: A Statistical Testing Approach","authors":"Itai Dattner,&nbsp;Shota Gugushvili,&nbsp;Oleksandr Laskorunskyi","doi":"10.1002/bimj.70013","DOIUrl":"10.1002/bimj.70013","url":null,"abstract":"<p>Ordinary differential equations (ODEs) are foundational tools in modeling intricate dynamics across a gamut of scientific disciplines. Yet, a possibility to represent a single phenomenon through multiple ODE models, driven by different understandings of nuances in internal mechanisms or abstraction levels, presents a model selection challenge. This study introduces a testing-based approach for ODE model selection amidst statistical noise. Rooted in the model misspecification framework, we adapt classical statistical paradigms (Vuong and Hotelling) to the ODE context, allowing for the comparison and ranking of diverse causal explanations without the constraints of nested models. Our simulation studies numerically investigate the statistical properties of the test, demonstrating its attainment of the nominal size and power across various settings. Real-world data examples further underscore the algorithm's applicability in practice. To foster accessibility and encourage real-world applications, we provide a user-friendly Python implementation of our model selection algorithm, bridging theoretical advancements with hands-on tools for the scientific community.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
τ $tau$ -Inflated Beta Regression Model for Estimating τ $tau$ -Restricted Means and Event-Free Probabilities for Censored Time-to-Event Data τ $tau$ -Inflated Beta Regression Model for Estimating τ $tau$ -Restricted Means and Event-Free Probabilities for Censored Time-to-Event Data.
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-28 DOI: 10.1002/bimj.70009
Yizhuo Wang, Susan Murray
<p>In this research, we propose analysis of <span></span><math> <semantics> <mi>τ</mi> <annotation>$tau$</annotation> </semantics></math>-restricted censored time-to-event data via a <span></span><math> <semantics> <mi>τ</mi> <annotation>$tau$</annotation> </semantics></math>-inflated beta regression (<span></span><math> <semantics> <mi>τ</mi> <annotation>$tau$</annotation> </semantics></math>-IBR) model. The outcome of interest is <span></span><math> <semantics> <mrow> <mi>min</mi> <mo>(</mo> <mi>τ</mi> <mo>,</mo> <mi>T</mi> <mo>)</mo> </mrow> <annotation>${rm min}(tau,T)$</annotation> </semantics></math>, where <span></span><math> <semantics> <mi>T</mi> <annotation>$T$</annotation> </semantics></math> and <span></span><math> <semantics> <mi>τ</mi> <annotation>$tau$</annotation> </semantics></math> are the time-to-event and follow-up duration, respectively. Our analysis goals include estimation and inference related to <span></span><math> <semantics> <mi>τ</mi> <annotation>$tau$</annotation> </semantics></math>-restricted mean survival time (<span></span><math> <semantics> <mi>τ</mi> <annotation>$tau$</annotation> </semantics></math>-RMST) values and event-free probabilities at <span></span><math> <semantics> <mi>τ</mi> <annotation>$tau$</annotation> </semantics></math> that address the censored nature of the data. In this setting, it is common to observe many individuals with <span></span><math> <semantics> <mrow> <mi>min</mi> <mo>(</mo> <mi>τ</mi> <mo>,</mo> <mi>T</mi> <mo>)</mo> <mo>=</mo> <mi>τ</mi> </mrow> <annotation>${rm min}(tau,T)=tau$</annotation> </semantics></math>, a point mass that is typically overlooked in <span></span><math> <semantics> <mi>τ</mi> <annotation>$tau$</annotation> </semantics></math>-restricted event-time analyses. Our proposed <span></span><math> <semantics> <mi>τ</mi> <annotation>$tau$</annotation> </semantics></
在这项研究中,我们提出通过τ $tau$ -膨胀贝塔回归(τ $tau$ -IBR)模型来分析τ $tau$ -限制删减的时间到事件数据。我们感兴趣的结果是 min ( τ , T ) ${rm min}(tau,T)$,其中 T $T$ 和 τ $tau$ 分别是事件发生时间和随访持续时间。我们的分析目标包括与τ $tau$ -限制平均生存时间(τ $tau$ -RMST)值和τ $tau$ 处的无事件概率相关的估计和推断,以解决数据的删减性质。在这种情况下,通常会观察到许多个体的 min ( τ , T ) = τ ${rm min}(tau,T)=tau$,这是在 τ $tau$ 限制的事件时间分析中通常会忽略的点质量。我们提出的 τ $tau$ -IBR 模型基于将 min ( τ , T ) ${rm min}(tau,T)$分解为 τ [ I ( T ≥ τ ) + ( T / τ ) I ( T τ ) ] $tau [I(T ge tau) +(T/tau) I(T <tau)]$ 。我们使用联合逻辑和贝塔回归模型对后一个表达式的均值进行建模,并使用期望最大化算法进行拟合。用于拟合 τ $tau$ -IBR 模型的另一种多重归因(MI)算法的另一个优点是可以生成用于分析的无删减数据集。模拟结果表明,在独立和从属删失设置中,τ $tau$ -IBR模型和相应的τ $tau$ -RMST估计值都具有出色的性能。我们将我们的方法应用于阿奇霉素预防慢性阻塞性肺病(COPD)恶化试验。除了τ $tau$ -IBR模型结果提供了对治疗效果的细微理解外,我们还给出了基于我们的MI数据集的τ $tau$ -限制事件时间的直观热图,这种可视化方式通常无法用于删减时间到事件数据。
{"title":"τ\u0000 $tau$\u0000 -Inflated Beta Regression Model for Estimating \u0000 \u0000 τ\u0000 $tau$\u0000 -Restricted Means and Event-Free Probabilities for Censored Time-to-Event Data","authors":"Yizhuo Wang,&nbsp;Susan Murray","doi":"10.1002/bimj.70009","DOIUrl":"10.1002/bimj.70009","url":null,"abstract":"&lt;p&gt;In this research, we propose analysis of &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;annotation&gt;$tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;-restricted censored time-to-event data via a &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;annotation&gt;$tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;-inflated beta regression (&lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;annotation&gt;$tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;-IBR) model. The outcome of interest is &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;min&lt;/mi&gt;\u0000 &lt;mo&gt;(&lt;/mo&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;mo&gt;,&lt;/mo&gt;\u0000 &lt;mi&gt;T&lt;/mi&gt;\u0000 &lt;mo&gt;)&lt;/mo&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;${rm min}(tau,T)$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, where &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;T&lt;/mi&gt;\u0000 &lt;annotation&gt;$T$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; and &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;annotation&gt;$tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; are the time-to-event and follow-up duration, respectively. Our analysis goals include estimation and inference related to &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;annotation&gt;$tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;-restricted mean survival time (&lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;annotation&gt;$tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;-RMST) values and event-free probabilities at &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;annotation&gt;$tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; that address the censored nature of the data. In this setting, it is common to observe many individuals with &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;min&lt;/mi&gt;\u0000 &lt;mo&gt;(&lt;/mo&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;mo&gt;,&lt;/mo&gt;\u0000 &lt;mi&gt;T&lt;/mi&gt;\u0000 &lt;mo&gt;)&lt;/mo&gt;\u0000 &lt;mo&gt;=&lt;/mo&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;${rm min}(tau,T)=tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, a point mass that is typically overlooked in &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;annotation&gt;$tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;-restricted event-time analyses. Our proposed &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mi&gt;τ&lt;/mi&gt;\u0000 &lt;annotation&gt;$tau$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Risk-Based Decision Making: Estimands for Sequential Prediction Under Interventions 基于风险的决策:干预下的连续预测估计值。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-28 DOI: 10.1002/bimj.70011
Kim Luijken, Paweł Morzywołek, Wouter van Amsterdam, Giovanni Cinà, Jeroen Hoogland, Ruth Keogh, Jesse H. Krijthe, Sara Magliacane, Thijs van Ommen, Niels Peek, Hein Putter, Maarten van Smeden, Matthew Sperrin, Junfeng Wang, Daniala L. Weir, Vanessa Didelez, Nan van Geloven

Prediction models are used among others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: for example, an individual may be estimated to be at low risk because similar individuals in the past received an intervention which lowered their risk. Therefore, prediction models supporting decisions should target risks belonging to defined intervention strategies. Previous works on prediction under interventions assumed that the prediction model was used only at one time point to make an intervention decision. In clinical practice, intervention decisions are rarely made only once: they might be repeated, deferred, and reevaluated. This requires estimated risks under interventions that can be reconsidered at several potential decision moments. In the current work, we highlight key considerations for formulating estimands in sequential prediction under interventions that can inform such intervention decisions. We illustrate these considerations by giving examples of estimands for a case study about choosing between vaginal delivery and cesarean section for women giving birth. Our formalization of prediction tasks in a sequential, causal, and estimand context provides guidance for future studies to ensure that the right question is answered and appropriate causal estimation approaches are chosen to develop sequential prediction models that can inform intervention decisions.

预测模型主要用于为医疗干预决策提供信息。通常,建议不良后果风险高的人接受干预,而建议风险低的人不要接受干预。标准预测模型并不总能提供与此类决策相关的风险信息:例如,一个人可能被估计为低风险,因为过去类似的人接受了干预,从而降低了风险。因此,支持决策的预测模型应针对属于既定干预策略的风险。以前关于干预下预测的研究假设预测模型只在一个时间点用于做出干预决定。在临床实践中,干预决策很少只做一次:可能会重复、推迟和重新评估。这就要求干预措施下的估计风险可以在多个潜在的决策时刻进行重新考虑。在当前的工作中,我们强调了制定干预措施下的连续预测估计值的关键考虑因素,这些估计值可为此类干预决策提供信息。我们举例说明了这些注意事项,并给出了一个关于产妇在阴道分娩和剖腹产之间做出选择的案例研究的估计值。我们在顺序、因果关系和估计因素的背景下对预测任务进行了形式化,为今后的研究提供了指导,以确保回答正确的问题,并选择适当的因果关系估计方法来开发可为干预决策提供信息的顺序预测模型。
{"title":"Risk-Based Decision Making: Estimands for Sequential Prediction Under Interventions","authors":"Kim Luijken,&nbsp;Paweł Morzywołek,&nbsp;Wouter van Amsterdam,&nbsp;Giovanni Cinà,&nbsp;Jeroen Hoogland,&nbsp;Ruth Keogh,&nbsp;Jesse H. Krijthe,&nbsp;Sara Magliacane,&nbsp;Thijs van Ommen,&nbsp;Niels Peek,&nbsp;Hein Putter,&nbsp;Maarten van Smeden,&nbsp;Matthew Sperrin,&nbsp;Junfeng Wang,&nbsp;Daniala L. Weir,&nbsp;Vanessa Didelez,&nbsp;Nan van Geloven","doi":"10.1002/bimj.70011","DOIUrl":"10.1002/bimj.70011","url":null,"abstract":"<p>Prediction models are used among others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: for example, an individual may be estimated to be at low risk because similar individuals in the past received an intervention which lowered their risk. Therefore, prediction models supporting decisions should target risks belonging to defined intervention strategies. Previous works on prediction under interventions assumed that the prediction model was used only at one time point to make an intervention decision. In clinical practice, intervention decisions are rarely made only once: they might be repeated, deferred, and reevaluated. This requires estimated risks under interventions that can be reconsidered at several potential decision moments. In the current work, we highlight key considerations for formulating estimands in sequential prediction under interventions that can inform such intervention decisions. We illustrate these considerations by giving examples of estimands for a case study about choosing between vaginal delivery and cesarean section for women giving birth. Our formalization of prediction tasks in a sequential, causal, and estimand context provides guidance for future studies to ensure that the right question is answered and appropriate causal estimation approaches are chosen to develop sequential prediction models that can inform intervention decisions.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Matched Design for Causal Inference With Survey Data: Evaluation of Medical Marijuana Legalization in Kentucky and Tennessee 利用调查数据进行因果推断的匹配设计:肯塔基州和田纳西州医用大麻合法化评估》。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-28 DOI: 10.1002/bimj.70012
Marco H. Benedetti, Bo Lu, Motao Zhu

A concern surrounding marijuana legalization is that driving after marijuana use may become more prevalent. Survey data are valuable for estimating policy effects, however their observational nature and unequal sampling probabilities create challenges for causal inference. To estimate population-level effects using survey data, we propose a matched design and implement sensitivity analyses to quantify how robust conclusions are to unmeasured confounding. Both theoretical justification and simulation studies are presented. We found no support that marijuana legalization increased tolerant behaviors and attitudes toward driving after marijuana use, and these conclusions seem moderately robust to unmeasured confounding.

围绕大麻合法化的一个担忧是,吸食大麻后驾车的现象可能会更加普遍。调查数据对于估算政策效果很有价值,但其观察性质和不平等的抽样概率给因果推断带来了挑战。为了利用调查数据估计人口层面的影响,我们提出了一种匹配设计,并实施了敏感性分析,以量化结论对未测量混杂因素的稳健程度。我们还介绍了理论依据和模拟研究。我们没有发现大麻合法化会增加容忍行为和对吸食大麻后驾车的态度,这些结论似乎对未测量的混杂因素具有适度的稳健性。
{"title":"A Matched Design for Causal Inference With Survey Data: Evaluation of Medical Marijuana Legalization in Kentucky and Tennessee","authors":"Marco H. Benedetti,&nbsp;Bo Lu,&nbsp;Motao Zhu","doi":"10.1002/bimj.70012","DOIUrl":"10.1002/bimj.70012","url":null,"abstract":"<p>A concern surrounding marijuana legalization is that driving after marijuana use may become more prevalent. Survey data are valuable for estimating policy effects, however their observational nature and unequal sampling probabilities create challenges for causal inference. To estimate population-level effects using survey data, we propose a matched design and implement sensitivity analyses to quantify how robust conclusions are to unmeasured confounding. Both theoretical justification and simulation studies are presented. We found no support that marijuana legalization increased tolerant behaviors and attitudes toward driving after marijuana use, and these conclusions seem moderately robust to unmeasured confounding.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 8","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70012","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142741434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrical Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1