首页 > 最新文献

Biometrical Journal最新文献

英文 中文
Investigating a Domain Adaptation Approach for Integrating Different Measurement Instruments in a Longitudinal Clinical Registry
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-19 DOI: 10.1002/bimj.70023
Maren Hackenberg, Michelle Pfaffenlehner, Max Behrens, Astrid Pechmann, Janbernd Kirschner, Harald Binder

In a longitudinal clinical registry, different measurement instruments might have been used for assessing individuals at different time points. To combine them, we investigate deep learning techniques for obtaining a joint latent representation, to which the items of different measurement instruments are mapped. This corresponds to domain adaptation, an established concept in computer science for image data. Using the proposed approach as an example, we evaluate the potential of domain adaptation in a longitudinal cohort setting with a rather small number of time points, motivated by an application with different motor function measurement instruments in a registry of spinal muscular atrophy (SMA) patients. There, we model trajectories in the latent representation by ordinary differential equations (ODEs), where person-specific ODE parameters are inferred from baseline characteristics. The goodness of fit and complexity of the ODE solutions then allow to judge the measurement instrument mappings. We subsequently explore how alignment can be improved by incorporating corresponding penalty terms into model fitting. To systematically investigate the effect of differences between measurement instruments, we consider several scenarios based on modified SMA data, including scenarios where a mapping should be feasible in principle and scenarios where no perfect mapping is available. While misalignment increases in more complex scenarios, some structure is still recovered, even if the availability of measurement instruments depends on patient state. A reasonable mapping is feasible also in the more complex real SMA data set. These results indicate that domain adaptation might be more generally useful in statistical modeling for longitudinal registry data.

{"title":"Investigating a Domain Adaptation Approach for Integrating Different Measurement Instruments in a Longitudinal Clinical Registry","authors":"Maren Hackenberg,&nbsp;Michelle Pfaffenlehner,&nbsp;Max Behrens,&nbsp;Astrid Pechmann,&nbsp;Janbernd Kirschner,&nbsp;Harald Binder","doi":"10.1002/bimj.70023","DOIUrl":"10.1002/bimj.70023","url":null,"abstract":"<p>In a longitudinal clinical registry, different measurement instruments might have been used for assessing individuals at different time points. To combine them, we investigate deep learning techniques for obtaining a joint latent representation, to which the items of different measurement instruments are mapped. This corresponds to domain adaptation, an established concept in computer science for image data. Using the proposed approach as an example, we evaluate the potential of domain adaptation in a longitudinal cohort setting with a rather small number of time points, motivated by an application with different motor function measurement instruments in a registry of spinal muscular atrophy (SMA) patients. There, we model trajectories in the latent representation by ordinary differential equations (ODEs), where person-specific ODE parameters are inferred from baseline characteristics. The goodness of fit and complexity of the ODE solutions then allow to judge the measurement instrument mappings. We subsequently explore how alignment can be improved by incorporating corresponding penalty terms into model fitting. To systematically investigate the effect of differences between measurement instruments, we consider several scenarios based on modified SMA data, including scenarios where a mapping should be feasible in principle and scenarios where no perfect mapping is available. While misalignment increases in more complex scenarios, some structure is still recovered, even if the availability of measurement instruments depends on patient state. A reasonable mapping is feasible also in the more complex real SMA data set. These results indicate that domain adaptation might be more generally useful in statistical modeling for longitudinal registry data.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142857076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
To Tweak or Not to Tweak. How Exploiting Flexibilities in Gene Set Analysis Leads to Overoptimism
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-19 DOI: 10.1002/bimj.70016
Milena Wünsch, Christina Sauer, Moritz Herrmann, Ludwig Christian Hinske, Anne-Laure Boulesteix

Gene set analysis, a popular approach for analyzing high-throughput gene expression data, aims to identify sets of genes that show enriched expression patterns between two conditions. In addition to the multitude of methods available for this task, users are typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibility can lead to uncertainty about the “right” choice, further reinforced by a lack of evidence-based guidance. Especially when their statistical experience is scarce, this uncertainty might entice users to produce preferable results using a “trial-and-error” approach. While it may seem unproblematic at first glance, this practice can be viewed as a form of “cherry-picking” and cause an optimistic bias, rendering the results nonreplicable on independent data. After this problem has attracted a lot of attention in the context of classical hypothesis testing, we now aim to raise awareness of such overoptimism in the different and more complex context of gene set analyses. We mimic a hypothetical researcher who systematically selects the analysis variants yielding their preferred results, thereby considering three distinct goals they might pursue. Using a selection of popular gene set analysis methods, we tweak the results in this way for two frequently used benchmark gene expression data sets. Our study indicates that the potential for overoptimism is particularly high for a group of methods frequently used despite being commonly criticized. We conclude by providing practical recommendations to counter overoptimism in research findings in gene set analysis and beyond.

{"title":"To Tweak or Not to Tweak. How Exploiting Flexibilities in Gene Set Analysis Leads to Overoptimism","authors":"Milena Wünsch,&nbsp;Christina Sauer,&nbsp;Moritz Herrmann,&nbsp;Ludwig Christian Hinske,&nbsp;Anne-Laure Boulesteix","doi":"10.1002/bimj.70016","DOIUrl":"10.1002/bimj.70016","url":null,"abstract":"<p>Gene set analysis, a popular approach for analyzing high-throughput gene expression data, aims to identify sets of genes that show enriched expression patterns between two conditions. In addition to the multitude of methods available for this task, users are typically left with many options when creating the required input and specifying the internal parameters of the chosen method. This flexibility can lead to uncertainty about the “right” choice, further reinforced by a lack of evidence-based guidance. Especially when their statistical experience is scarce, this uncertainty might entice users to produce preferable results using a “trial-and-error” approach. While it may seem unproblematic at first glance, this practice can be viewed as a form of “cherry-picking” and cause an optimistic bias, rendering the results nonreplicable on independent data. After this problem has attracted a lot of attention in the context of classical hypothesis testing, we now aim to raise awareness of such overoptimism in the different and more complex context of gene set analyses. We mimic a hypothetical researcher who systematically selects the analysis variants yielding their preferred results, thereby considering three distinct goals they might pursue. Using a selection of popular gene set analysis methods, we tweak the results in this way for two frequently used benchmark gene expression data sets. Our study indicates that the potential for overoptimism is particularly high for a group of methods frequently used despite being commonly criticized. We conclude by providing practical recommendations to counter overoptimism in research findings in gene set analysis and beyond.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70016","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142857080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Progression-Free-Survival Ratio in Molecularly Aided Tumor Trials: A Critical Examination of Current Practice and Suggestions for Alternative Methods
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-18 DOI: 10.1002/bimj.70028
Dominic Edelmann, Tobias Terzer, Peter Horak, Richard Schlenk, Axel Benner

The progression-free-survival ratio is a popular endpoint in oncology trials, which is frequently applied to evaluate the efficacy of molecularly targeted treatments in late-stage patients. Using elementary calculations and simulations, numerous shortcomings of the current methodology are pointed out. As a remedy to these shortcomings, an alternative methodology is proposed, using a marginal Cox model or a marginal accelerated failure time model for clustered time-to-event data. Using comprehensive simulations, it is shown that this methodology outperforms existing methods in settings where the intrapatient correlation is low to moderate. The performance of the model is further demonstrated in a real data example from a molecularly aided tumor trial. Sample size considerations are discussed.

{"title":"The Progression-Free-Survival Ratio in Molecularly Aided Tumor Trials: A Critical Examination of Current Practice and Suggestions for Alternative Methods","authors":"Dominic Edelmann,&nbsp;Tobias Terzer,&nbsp;Peter Horak,&nbsp;Richard Schlenk,&nbsp;Axel Benner","doi":"10.1002/bimj.70028","DOIUrl":"10.1002/bimj.70028","url":null,"abstract":"<p>The progression-free-survival ratio is a popular endpoint in oncology trials, which is frequently applied to evaluate the efficacy of molecularly targeted treatments in late-stage patients. Using elementary calculations and simulations, numerous shortcomings of the current methodology are pointed out. As a remedy to these shortcomings, an alternative methodology is proposed, using a marginal Cox model or a marginal accelerated failure time model for clustered time-to-event data. Using comprehensive simulations, it is shown that this methodology outperforms existing methods in settings where the intrapatient correlation is low to moderate. The performance of the model is further demonstrated in a real data example from a molecularly aided tumor trial. Sample size considerations are discussed.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Principled Approach to Adjust for Unmeasured Time-Stable Confounding of Supervised Treatment 调整未测量的监督治疗时间稳定混杂因素的原则性方法。
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70026
Jeppe Ekstrand Halkjær Madsen, Thomas Delvin, Thomas Scheike, Christian Pipper

We propose a novel method to adjust for unmeasured time-stable confounding when the time between consecutive treatment administrations is fixed. We achieve this by focusing on a new-user cohort. Furthermore, we envisage that all time-stable confounding goes through the potential time on treatment as dictated by the disease condition at the initiation of treatment. Following this logic, we may eliminate all unmeasured time-stable confounding by adjusting for the potential time on treatment. A challenge with this approach is that right censoring of the potential time on treatment occurs when treatment is terminated at the time of the event of interest, for example, if the event of interest is death. We show how this challenge may be solved by means of the expectation-maximization algorithm without imposing any further assumptions on the distribution of the potential time on treatment. The usefulness of the methodology is illustrated in a simulation study. We also apply the methodology to investigate the effect of depression/anxiety drugs on subsequent poisoning by other medications in the Danish population by means of national registries. We find a protective effect of treatment with selective serotonin reuptake inhibitors on the risk of poisoning by various medications (1- year risk difference of approximately 3%$-3%$) and a standard Cox model analysis shows a harming effect (1-year risk difference of approximately 2%$2%$), which is consistent with what we would expect due to confounding by indication. Unmeasured time-stable confounding can be entirely adjusted for when the time between consecutive treatment administrations is fixed.

我们提出了一种新方法,用于在连续治疗之间的时间固定时调整未测量的时间稳定混杂因素。我们通过关注新用户队列来实现这一目标。此外,我们还设想,所有时间稳定混杂因素都会随着开始治疗时的疾病状况所决定的潜在治疗时间而变化。根据这一逻辑,我们可以通过调整潜在的治疗时间来消除所有未测量的时间稳定混杂因素。这种方法面临的一个挑战是,当治疗在相关事件发生时终止(例如,如果相关事件是死亡),潜在的治疗时间就会发生正确的删减。我们展示了如何通过期望最大化算法来解决这一难题,而无需对潜在治疗时间的分布做任何进一步的假设。我们通过模拟研究说明了该方法的实用性。我们还应用该方法,通过国家登记资料调查了丹麦人口中抑郁/焦虑药物对后续其他药物中毒的影响。我们发现,使用选择性 5-羟色胺再摄取抑制剂治疗对各种药物的中毒风险具有保护作用(1 年的风险差异约为 - 3 % $-3%$),而标准 Cox 模型分析则显示出伤害作用(1 年的风险差异约为 2 % $2%$),这与我们预期的适应症混杂情况一致。当连续治疗之间的时间固定时,未测量的时间稳定混杂因素完全可以调整。
{"title":"A Principled Approach to Adjust for Unmeasured Time-Stable Confounding of Supervised Treatment","authors":"Jeppe Ekstrand Halkjær Madsen,&nbsp;Thomas Delvin,&nbsp;Thomas Scheike,&nbsp;Christian Pipper","doi":"10.1002/bimj.70026","DOIUrl":"10.1002/bimj.70026","url":null,"abstract":"<div>\u0000 \u0000 <p>We propose a novel method to adjust for unmeasured time-stable confounding when the time between consecutive treatment administrations is fixed. We achieve this by focusing on a new-user cohort. Furthermore, we envisage that all time-stable confounding goes through the potential time on treatment as dictated by the disease condition at the initiation of treatment. Following this logic, we may eliminate all unmeasured time-stable confounding by adjusting for the potential time on treatment. A challenge with this approach is that right censoring of the potential time on treatment occurs when treatment is terminated at the time of the event of interest, for example, if the event of interest is death. We show how this challenge may be solved by means of the expectation-maximization algorithm without imposing any further assumptions on the distribution of the potential time on treatment. The usefulness of the methodology is illustrated in a simulation study. We also apply the methodology to investigate the effect of depression/anxiety drugs on subsequent poisoning by other medications in the Danish population by means of national registries. We find a protective effect of treatment with selective serotonin reuptake inhibitors on the risk of poisoning by various medications (1- year risk difference of approximately <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mo>−</mo>\u0000 <mn>3</mn>\u0000 <mo>%</mo>\u0000 </mrow>\u0000 <annotation>$-3%$</annotation>\u0000 </semantics></math>) and a standard Cox model analysis shows a harming effect (1-year risk difference of approximately <span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 <mo>%</mo>\u0000 </mrow>\u0000 <annotation>$2%$</annotation>\u0000 </semantics></math>), which is consistent with what we would expect due to confounding by indication. Unmeasured time-stable confounding can be entirely adjusted for when the time between consecutive treatment administrations is fixed.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing Balance of Baseline Time-Dependent Covariates via the Fréchet Distance
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70024
Mireya Díaz

Assessment of covariate balance is a key step when performing comparisons between groups particularly in real-world data. We generally evaluate it on baseline covariates, but rarely on longitudinal ones prior to a management decision. We could use pointwise standardized mean differences, standardized differences of slopes, or weights from the model for such purpose. Pointwise differences could be cumbersome for densely sampled longitudinal markers and/or measured at different points. Slopes are suitable for linear or transformable models but not for more complex curves. Weights do not identify the specific covariate(s) responsible for imbalances. This work presents the Fréchet distance as a viable alternative to assess balance of time-dependent covariates. A set of linear and nonlinear curves for which their standardized difference or differences in functional parameters were within 10% sought to identify the Fréchet distance equivalent to this threshold. This threshold is dependent on the level of noise present and thus within group heterogeneity and error variance are needed for its interpretation. Applied to a set of real curves representing the monthly trajectory of hemoglobin A1c from diabetic patients showed that the curves in the two groups were not balanced at the 10% mark. A Beta distribution represents the Fréchet distance distribution reasonably well in most scenarios. This assessment of covariate balance provides the following advantages: It can handle curves of different lengths, shapes, and arbitrary time points. Future work includes examining the utility of this measure under within-series missingness, within-group heterogeneity, its comparison with other approaches, and asymptotics.

在进行组间比较时,尤其是在实际数据中,评估协变量平衡是一个关键步骤。我们通常对基线协变量进行评估,但很少在管理决策前对纵向协变量进行评估。为此,我们可以使用标准化均值点差、标准化斜率差或模型权重。对于取样密集的纵向标记和/或在不同点测量的标记,点平均差可能比较麻烦。斜率适用于线性或可转换模型,但不适用于更复杂的曲线。权重不能确定造成不平衡的具体协变量。这项工作提出了弗雷谢特距离,作为评估随时间变化的协变量平衡的可行替代方法。一组线性和非线性曲线的标准化差异或功能参数差异在 10%以内,我们试图找出与这一阈值相当的弗雷谢特距离。该阈值取决于存在的噪声水平,因此在解释时需要考虑组内异质性和误差方差。对一组代表糖尿病患者血红蛋白 A1c 每月变化轨迹的真实曲线进行应用后发现,两组的曲线在 10%的界限处并不平衡。Beta 分布在大多数情况下都能很好地代表弗雷谢特距离分布。这种协变量平衡评估具有以下优点:它可以处理不同长度、形状和任意时间点的曲线。未来的工作包括研究这种测量方法在序列内缺失、组内异质性、与其他方法的比较以及渐近性等情况下的实用性。
{"title":"Assessing Balance of Baseline Time-Dependent Covariates via the Fréchet Distance","authors":"Mireya Díaz","doi":"10.1002/bimj.70024","DOIUrl":"10.1002/bimj.70024","url":null,"abstract":"<div>\u0000 \u0000 <p>Assessment of covariate balance is a key step when performing comparisons between groups particularly in real-world data. We generally evaluate it on baseline covariates, but rarely on longitudinal ones prior to a management decision. We could use pointwise standardized mean differences, standardized differences of slopes, or weights from the model for such purpose. Pointwise differences could be cumbersome for densely sampled longitudinal markers and/or measured at different points. Slopes are suitable for linear or transformable models but not for more complex curves. Weights do not identify the specific covariate(s) responsible for imbalances. This work presents the Fréchet distance as a viable alternative to assess balance of time-dependent covariates. A set of linear and nonlinear curves for which their standardized difference or differences in functional parameters were within 10% sought to identify the Fréchet distance equivalent to this threshold. This threshold is dependent on the level of noise present and thus within group heterogeneity and error variance are needed for its interpretation. Applied to a set of real curves representing the monthly trajectory of hemoglobin A1c from diabetic patients showed that the curves in the two groups were not balanced at the 10% mark. A Beta distribution represents the Fréchet distance distribution reasonably well in most scenarios. This assessment of covariate balance provides the following advantages: It can handle curves of different lengths, shapes, and arbitrary time points. Future work includes examining the utility of this measure under within-series missingness, within-group heterogeneity, its comparison with other approaches, and asymptotics.</p>\u0000 </div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Oncology Clinical Trial Design Planning Based on a Multistate Model That Jointly Models Progression-Free and Overall Survival Endpoints
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70017
Alexandra Erdmann, Jan Beyersmann, Kaspar Rufibach

When planning an oncology clinical trial, the usual approach is to assume proportional hazards and even an exponential distribution for time-to-event endpoints. Often, besides the gold-standard endpoint overall survival (OS), progression-free survival (PFS) is considered as a second confirmatory endpoint. We use a survival multistate model to jointly model these two endpoints and find that neither exponential distribution nor proportional hazards will typically hold for both endpoints simultaneously. The multistate model provides a stochastic process approach to model the dependency of such endpoints neither requiring latent failure times nor explicit dependency modeling such as copulae. We use the multistate model framework to simulate clinical trials with endpoints OS and PFS and show how design planning questions can be answered using this approach. In particular, nonproportional hazards for at least one of the endpoints are a consequence of OS and PFS being dependent and are naturally modeled to improve planning. We then illustrate how clinical trial design can be based on simulations from a multistate model. Key applications are coprimary endpoints and group-sequential designs. Simulations for these applications show that the standard simplifying approach may very well lead to underpowered or overpowered clinical trials. Our approach is quite general and can be extended to more complex trial designs, further endpoints, and other therapeutic areas. An R package is available on CRAN.

{"title":"Oncology Clinical Trial Design Planning Based on a Multistate Model That Jointly Models Progression-Free and Overall Survival Endpoints","authors":"Alexandra Erdmann,&nbsp;Jan Beyersmann,&nbsp;Kaspar Rufibach","doi":"10.1002/bimj.70017","DOIUrl":"10.1002/bimj.70017","url":null,"abstract":"<p>When planning an oncology clinical trial, the usual approach is to assume proportional hazards and even an exponential distribution for time-to-event endpoints. Often, besides the gold-standard endpoint overall survival (OS), progression-free survival (PFS) is considered as a second confirmatory endpoint. We use a survival multistate model to jointly model these two endpoints and find that neither exponential distribution nor proportional hazards will typically hold for both endpoints simultaneously. The multistate model provides a stochastic process approach to model the dependency of such endpoints neither requiring latent failure times nor explicit dependency modeling such as copulae. We use the multistate model framework to simulate clinical trials with endpoints OS and PFS and show how design planning questions can be answered using this approach. In particular, nonproportional hazards for at least one of the endpoints are a consequence of OS and PFS being dependent and are naturally modeled to improve planning. We then illustrate how clinical trial design can be based on simulations from a multistate model. Key applications are coprimary endpoints and group-sequential designs. Simulations for these applications show that the standard simplifying approach may very well lead to underpowered or overpowered clinical trials. Our approach is quite general and can be extended to more complex trial designs, further endpoints, and other therapeutic areas. An R package is available on CRAN.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70017","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test Statistics and Statistical Inference for Data With Informative Cluster Sizes
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70021
Soyoung Kim, Michael J. Martens, Kwang Woo Ahn

In biomedical studies, investigators often encounter clustered data. The cluster sizes are said to be informative if the outcome depends on the cluster size. Ignoring informative cluster sizes in the analysis leads to biased parameter estimation in marginal and mixed-effect regression models. Several methods to analyze data with informative cluster sizes have been proposed; however, methods to test the informativeness of the cluster sizes are limited, particularly for the marginal model. In this paper, we propose a score test and a Wald test to examine the informativeness of the cluster sizes for a generalized linear model, a Cox model, and a proportional subdistribution hazards model. Statistical inference can be conducted through weighted estimating equations. The simulation results show that both tests control Type I error rates well, but the score test has higher power than the Wald test for right-censored data while the power of the Wald test is generally higher than the score test for the binary outcome. We apply the Wald and score tests to hematopoietic cell transplant data and compare regression analysis results with/without adjusting for informative cluster sizes.

在生物医学研究中,研究人员经常会遇到聚类数据。如果结果取决于聚类大小,聚类大小就被认为是有信息量的。在分析中忽略有信息的聚类大小会导致边际回归模型和混合效应回归模型的参数估计出现偏差。目前已经提出了几种方法来分析具有信息量聚类大小的数据;然而,检验聚类大小信息量的方法却很有限,尤其是在边际模型中。在本文中,我们提出了一种得分检验和一种 Wald 检验来检验广义线性模型、Cox 模型和比例子分布危险模型的聚类大小的信息性。统计推断可通过加权估计方程进行。模拟结果表明,两种检验都能很好地控制 I 类错误率,但对于右删失数据,得分检验的功率高于 Wald 检验,而对于二元结果,Wald 检验的功率一般高于得分检验。我们将 Wald 检验和得分检验应用于造血细胞移植数据,并比较了有/无信息群组大小调整的回归分析结果。
{"title":"Test Statistics and Statistical Inference for Data With Informative Cluster Sizes","authors":"Soyoung Kim,&nbsp;Michael J. Martens,&nbsp;Kwang Woo Ahn","doi":"10.1002/bimj.70021","DOIUrl":"10.1002/bimj.70021","url":null,"abstract":"<div>\u0000 \u0000 <p>In biomedical studies, investigators often encounter clustered data. The cluster sizes are said to be informative if the outcome depends on the cluster size. Ignoring informative cluster sizes in the analysis leads to biased parameter estimation in marginal and mixed-effect regression models. Several methods to analyze data with informative cluster sizes have been proposed; however, methods to test the informativeness of the cluster sizes are limited, particularly for the marginal model. In this paper, we propose a score test and a Wald test to examine the informativeness of the cluster sizes for a generalized linear model, a Cox model, and a proportional subdistribution hazards model. Statistical inference can be conducted through weighted estimating equations. The simulation results show that both tests control Type I error rates well, but the score test has higher power than the Wald test for right-censored data while the power of the Wald test is generally higher than the score test for the binary outcome. We apply the Wald and score tests to hematopoietic cell transplant data and compare regression analysis results with/without adjusting for informative cluster sizes.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization 使用连续优化的线性降维模型的最佳子集求解路径
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70015
Benoit Liquet, Sarat Moka, Samuel Muller

The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high-dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper, we focus on two multivariate statistical methods: principal components analysis and partial least squares. Both approaches are popular linear dimension-reduction methods with numerous applications in several fields including in genomics, biology, environmental science, and engineering. In particular, these approaches build principal components, new variables that are combinations of all the original variables. A main drawback of principal components is the difficulty to interpret them when the number of variables is large. To define principal components from the most relevant variables, we propose to cast the best subset solution path method into principal component analysis and partial least square frameworks. We offer a new alternative by exploiting a continuous optimization algorithm for best subset solution path. Empirical studies show the efficacy of our approach for providing the best subset solution path. The usage of our algorithm is further exposed through the analysis of two real data sets. The first data set is analyzed using the principle component analysis while the analysis of the second data set is based on partial least square framework.

在有监督和无监督学习中,最佳变量的选择是一个具有挑战性的问题,尤其是在高维情况下,变量的数量通常远远大于观测值的数量。本文重点讨论两种多元统计方法:主成分分析和偏最小二乘法。这两种方法都是流行的线性降维方法,在基因组学、生物学、环境科学和工程学等多个领域都有大量应用。特别是,这些方法可以建立主成分,即由所有原始变量组合而成的新变量。主成分的一个主要缺点是在变量数量较多时难以解释。为了从最相关的变量中定义主成分,我们建议将最佳子集求解路径法引入主成分分析和偏最小二乘法框架。我们利用最佳子集求解路径的连续优化算法,提供了一种新的选择。实证研究表明,我们的方法能有效提供最佳子集求解路径。通过对两个真实数据集的分析,进一步揭示了我们算法的用途。第一个数据集使用原理成分分析法进行分析,而第二个数据集的分析则基于偏最小二乘法框架。
{"title":"Best Subset Solution Path for Linear Dimension Reduction Models Using Continuous Optimization","authors":"Benoit Liquet,&nbsp;Sarat Moka,&nbsp;Samuel Muller","doi":"10.1002/bimj.70015","DOIUrl":"10.1002/bimj.70015","url":null,"abstract":"<div>\u0000 \u0000 <p>The selection of best variables is a challenging problem in supervised and unsupervised learning, especially in high-dimensional contexts where the number of variables is usually much larger than the number of observations. In this paper, we focus on two multivariate statistical methods: principal components analysis and partial least squares. Both approaches are popular linear dimension-reduction methods with numerous applications in several fields including in genomics, biology, environmental science, and engineering. In particular, these approaches build principal components, new variables that are combinations of all the original variables. A main drawback of principal components is the difficulty to interpret them when the number of variables is large. To define principal components from the most relevant variables, we propose to cast the best subset solution path method into principal component analysis and partial least square frameworks. We offer a new alternative by exploiting a continuous optimization algorithm for best subset solution path. Empirical studies show the efficacy of our approach for providing the best subset solution path. The usage of our algorithm is further exposed through the analysis of two real data sets. The first data set is analyzed using the principle component analysis while the analysis of the second data set is based on partial least square framework.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Goodness-of-Fit Testing for a Regression Model With a Doubly Truncated Response
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70022
Jacobo de Uña-Álvarez

In survival analysis and epidemiology, among other fields, interval sampling is often employed. With interval sampling, the individuals undergoing the event of interest within a calendar time interval are recruited. This results in doubly truncated event times. Double truncation, which may appear with other sampling designs too, induces a selection bias, so ordinary statistical methods are generally inconsistent. In this paper, we introduce goodness-of-fit procedures for a regression model when the response variable is doubly truncated. With this purpose, a marked empirical process based on weighted residuals is constructed and its weak convergence is established. Kolmogorov–Smirnov– and Cramér–von Mises–type tests are consequently derived from such core process, and a bootstrap approximation for their practical implementation is given. The performance of the proposed tests is investigated through simulations. An application to model selection for AIDS incubation time as depending on age at infection is provided.

{"title":"Goodness-of-Fit Testing for a Regression Model With a Doubly Truncated Response","authors":"Jacobo de Uña-Álvarez","doi":"10.1002/bimj.70022","DOIUrl":"10.1002/bimj.70022","url":null,"abstract":"<p>In survival analysis and epidemiology, among other fields, interval sampling is often employed. With interval sampling, the individuals undergoing the event of interest within a calendar time interval are recruited. This results in doubly truncated event times. Double truncation, which may appear with other sampling designs too, induces a selection bias, so ordinary statistical methods are generally inconsistent. In this paper, we introduce goodness-of-fit procedures for a regression model when the response variable is doubly truncated. With this purpose, a marked empirical process based on weighted residuals is constructed and its weak convergence is established. Kolmogorov–Smirnov– and Cramér–von Mises–type tests are consequently derived from such core process, and a bootstrap approximation for their practical implementation is given. The performance of the proposed tests is investigated through simulations. An application to model selection for AIDS incubation time as depending on age at infection is provided.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.70022","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adjusted Inference for Multiple Testing Procedure in Group-Sequential Designs
IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-16 DOI: 10.1002/bimj.70020
Yujie Zhao, Qi Liu, Linda Z. Sun, Keaven M. Anderson

Adjustment of statistical significance levels for repeated analysis in group-sequential trials has been understood for some time. Adjustment accounting for testing multiple hypotheses is also well understood. There is limited research on simultaneously adjusting for both multiple hypothesis testing and repeated analyses of one or more hypotheses. We address this gap by proposing adjusted-sequential p-values that reject when they are less than or equal to the family-wise Type I error rate (FWER). We also propose sequential p$p$-values for intersection hypotheses to compute adjusted-sequential p$p$-values for elementary hypotheses. We demonstrate the application using weighted Bonferroni tests and weighted parametric tests for inference on each elementary hypothesis tested.

在分组序列试验中,对重复分析的统计显著性水平进行调整已经有一段时间了。对多重假设检验的调整也已广为人知。关于同时对多重假设检验和一个或多个假设的重复分析进行调整的研究还很有限。为了弥补这一不足,我们提出了调整后的序列 p 值,当其小于或等于族内 I 类错误率 (FWER) 时,就拒绝接受。我们还提出了交集假设的序列 p $p $ 值,以计算基本假设的调整序列 p $p $ 值。我们使用加权 Bonferroni 检验和加权参数检验来演示应用,以推断所检验的每个基本假设。
{"title":"Adjusted Inference for Multiple Testing Procedure in Group-Sequential Designs","authors":"Yujie Zhao,&nbsp;Qi Liu,&nbsp;Linda Z. Sun,&nbsp;Keaven M. Anderson","doi":"10.1002/bimj.70020","DOIUrl":"10.1002/bimj.70020","url":null,"abstract":"<div>\u0000 \u0000 <p>Adjustment of statistical significance levels for repeated analysis in group-sequential trials has been understood for some time. Adjustment accounting for testing multiple hypotheses is also well understood. There is limited research on simultaneously adjusting for both multiple hypothesis testing and repeated analyses of one or more hypotheses. We address this gap by proposing <i>adjusted-sequential p-values</i> that reject when they are less than or equal to the family-wise Type I error rate (FWER). We also propose sequential <span></span><math>\u0000 <semantics>\u0000 <mi>p</mi>\u0000 <annotation>$p$</annotation>\u0000 </semantics></math>-values for intersection hypotheses to compute adjusted-sequential <span></span><math>\u0000 <semantics>\u0000 <mi>p</mi>\u0000 <annotation>$p$</annotation>\u0000 </semantics></math>-values for elementary hypotheses. We demonstrate the application using weighted Bonferroni tests and weighted parametric tests for inference on each elementary hypothesis tested.</p></div>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"67 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142840160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biometrical Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1