Pub Date : 2024-11-30Epub Date: 2024-09-19DOI: 10.1002/sim.10216
Yi Shi, Michael T Eadon, Yao Chen, Anna Sun, Yuedi Yang, Chienwei Chiang, Macarius Donneyong, Jing Su, Pengyue Zhang
Despite the success of pharmacovigilance studies in detecting signals of adverse drug events (ADEs) from real-world data, the risks of ADEs in subpopulations warrant increased scrutiny to prevent them in vulnerable individuals. Recently, the case-crossover design has been implemented to leverage large-scale administrative claims data for ADE detection, while controlling both observed confounding effects and short-term fixed unobserved confounding effects. Additionally, as the case-crossover design only includes cases, subpopulations can be conveniently derived. In this manuscript, we propose a precision mixture risk model (PMRM) to identify ADE signals from subpopulations under the case-crossover design. The proposed model is able to identify signals from all ADE-subpopulation-drug combinations, while controlling for false discovery rate (FDR) and confounding effects. We applied the PMRM to an administrative claims data. We identified ADE signals in subpopulations defined by demographic variables, comorbidities, and detailed diagnosis codes. Interestingly, certain drugs were associated with a higher risk of ADE only in subpopulations, while these drugs had a neutral association with ADE in the general population. Additionally, the PMRM could control FDR at a desired level and had a higher probability to detect true ADE signals than the widely used McNemar's test. In conclusion, the PMRM is able to identify subpopulation-specific ADE signals from a tremendous number of ADE-subpopulation-drug combinations, while controlling for both FDR and confounding effects.
{"title":"A Precision Mixture Risk Model to Identify Adverse Drug Events in Subpopulations Using a Case-Crossover Design.","authors":"Yi Shi, Michael T Eadon, Yao Chen, Anna Sun, Yuedi Yang, Chienwei Chiang, Macarius Donneyong, Jing Su, Pengyue Zhang","doi":"10.1002/sim.10216","DOIUrl":"10.1002/sim.10216","url":null,"abstract":"<p><p>Despite the success of pharmacovigilance studies in detecting signals of adverse drug events (ADEs) from real-world data, the risks of ADEs in subpopulations warrant increased scrutiny to prevent them in vulnerable individuals. Recently, the case-crossover design has been implemented to leverage large-scale administrative claims data for ADE detection, while controlling both observed confounding effects and short-term fixed unobserved confounding effects. Additionally, as the case-crossover design only includes cases, subpopulations can be conveniently derived. In this manuscript, we propose a precision mixture risk model (PMRM) to identify ADE signals from subpopulations under the case-crossover design. The proposed model is able to identify signals from all ADE-subpopulation-drug combinations, while controlling for false discovery rate (FDR) and confounding effects. We applied the PMRM to an administrative claims data. We identified ADE signals in subpopulations defined by demographic variables, comorbidities, and detailed diagnosis codes. Interestingly, certain drugs were associated with a higher risk of ADE only in subpopulations, while these drugs had a neutral association with ADE in the general population. Additionally, the PMRM could control FDR at a desired level and had a higher probability to detect true ADE signals than the widely used McNemar's test. In conclusion, the PMRM is able to identify subpopulation-specific ADE signals from a tremendous number of ADE-subpopulation-drug combinations, while controlling for both FDR and confounding effects.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5088-5099"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142295957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The cancer atlas edited by several countries is the main resource for the analysis of the geographic variation of cancer risk. Correlating the observed spatial patterns with known or hypothesized risk factors is time-consuming work for epidemiologists who need to deal with each cancer separately, breaking down the patterns according to sex and race. The recent literature has proposed to study more than one cancer simultaneously looking for common spatial risk factors. However, this previous work has two constraints: they consider only a very small (2-4) number of cancers previously known to share risk factors. In this article, we propose an exploratory method to search for latent spatial risk factors of a large number of supposedly unrelated cancers. The method is based on the singular value decomposition and nonnegative matrix factorization, it is computationally efficient, scaling easily with the number of regions and cancers. We carried out a simulation study to evaluate the method's performance and apply it to cancer atlas from the USA, England, France, Australia, Spain, and Brazil. We conclude that with very few latent maps, which can represent a reduction of up to 90% of atlas maps, most of the spatial variability is conserved. By concentrating on the epidemiological analysis of these few latent maps a substantial amount of work is saved and, at the same time, high-level explanations affecting many cancers simultaneously can be reached.
{"title":"Latent Archetypes of the Spatial Patterns of Cancer.","authors":"Thaís Pacheco Menezes, Marcos Oliveira Prates, Renato Assunção, Mônica Silva Monteiro De Castro","doi":"10.1002/sim.10232","DOIUrl":"10.1002/sim.10232","url":null,"abstract":"<p><p>The cancer atlas edited by several countries is the main resource for the analysis of the geographic variation of cancer risk. Correlating the observed spatial patterns with known or hypothesized risk factors is time-consuming work for epidemiologists who need to deal with each cancer separately, breaking down the patterns according to sex and race. The recent literature has proposed to study more than one cancer simultaneously looking for common spatial risk factors. However, this previous work has two constraints: they consider only a very small (2-4) number of cancers previously known to share risk factors. In this article, we propose an exploratory method to search for latent spatial risk factors of a large number of supposedly unrelated cancers. The method is based on the singular value decomposition and nonnegative matrix factorization, it is computationally efficient, scaling easily with the number of regions and cancers. We carried out a simulation study to evaluate the method's performance and apply it to cancer atlas from the USA, England, France, Australia, Spain, and Brazil. We conclude that with very few latent maps, which can represent a reduction of up to 90% of atlas maps, most of the spatial variability is conserved. By concentrating on the epidemiological analysis of these few latent maps a substantial amount of work is saved and, at the same time, high-level explanations affecting many cancers simultaneously can be reached.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5115-5137"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583956/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142372925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-30Epub Date: 2024-10-07DOI: 10.1002/sim.10228
Zonghui Hu, Dean Follmann
Preventing malaria during pregnancy is of critical importance, yet there are no approved malaria vaccines for pregnant women due to lack of efficacy results within this population. Conducting a randomized trial in pregnant women throughout the entire duration of pregnancy is impractical. Instead, a randomized trial was conducted among women of childbearing potential (WOCBP), and some participants became pregnant during the 2-year study. We explore a statistical method for estimating vaccine effect within the target subpopulation-women who can naturally become pregnant, namely, women who can become pregnant under a placebo condition-within the causal inference framework. Two vaccine effect estimators are employed to effectively utilize baseline characteristics and account for the fact that certain baseline characteristics were only available from pregnant participants. The first estimator considers all participants but can only utilize baseline variables collected from the entire participant pool. In contrast, the second estimator, which includes only pregnant participants, utilizes all available baseline information. Both estimators are evaluated numerically through simulation studies and applied to the WOCBP trial to assess vaccine effect against pregnancy malaria.
{"title":"Causal Inference Over a Subpopulation: The Effect of Malaria Vaccine in Women During Pregnancy.","authors":"Zonghui Hu, Dean Follmann","doi":"10.1002/sim.10228","DOIUrl":"10.1002/sim.10228","url":null,"abstract":"<p><p>Preventing malaria during pregnancy is of critical importance, yet there are no approved malaria vaccines for pregnant women due to lack of efficacy results within this population. Conducting a randomized trial in pregnant women throughout the entire duration of pregnancy is impractical. Instead, a randomized trial was conducted among women of childbearing potential (WOCBP), and some participants became pregnant during the 2-year study. We explore a statistical method for estimating vaccine effect within the target subpopulation-women who can naturally become pregnant, namely, women who can become pregnant under a placebo condition-within the causal inference framework. Two vaccine effect estimators are employed to effectively utilize baseline characteristics and account for the fact that certain baseline characteristics were only available from pregnant participants. The first estimator considers all participants but can only utilize baseline variables collected from the entire participant pool. In contrast, the second estimator, which includes only pregnant participants, utilizes all available baseline information. Both estimators are evaluated numerically through simulation studies and applied to the WOCBP trial to assess vaccine effect against pregnancy malaria.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5193-5202"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583954/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-30Epub Date: 2024-10-03DOI: 10.1002/sim.10226
Yushuf Sharker, Zaynab Diallo, Wasiur R KhudaBukhsh, Eben Kenah
Many important questions in infectious disease epidemiology involve associations between covariates (e.g., age or vaccination status) and infectiousness or susceptibility. Because disease transmission produces dependent outcomes, these questions are difficult or impossible to address using standard regression models from biostatistics. Pairwise survival analysis handles dependent outcomes by calculating likelihoods in terms of contact interval distributions in ordered pairs of individuals. The contact interval in the ordered pair is the time from the onset of infectiousness in to infectious contact from to , where an infectious contact is sufficient to infect if they are susceptible. Here, we introduce a pairwise accelerated failure time regression model for infectious disease transmission that allows the rate parameter of the contact interval distribution to depend on individual-level infectiousness covariates for , individual-level susceptibility covariates for , and pair-level covariates (e.g., type of relationship). This model can simultaneously handle internal infections (caused by transmission between individuals under observation) and external infections (caused by environmental or community sources of infection). We show that this model produces consistent and asymptotically normal parameter estimates. In a simulation study, we evaluate bias and confidence interval coverage probabilities, explore the role of epidemiologic study design, and investigate the effects of model misspecification. We use this regression model to analyze household data from Los Angeles County during the 2009 influenza A (H1N1) pandemic, where we find that the ability to account for external sources of infection increases the statistical power to estimate the effect of antiviral prophylaxis.
传染病流行病学中的许多重要问题都涉及协变量(如年龄或疫苗接种状况)与传染性或易感性之间的关联。由于疾病传播会产生依赖性结果,这些问题很难或不可能用生物统计学的标准回归模型来解决。配对生存分析通过计算有序配对个体接触间隔分布的可能性来处理依赖性结果。有序配对 i j $$ ij $$ 中的接触间隔是指从 i $$ i $$ 开始感染到 i $$ i $$ 与 j $$ j $$ 发生感染性接触的时间,其中,如果 j $$ j $$ 是易感人群,则感染性接触足以感染他们。在此,我们引入了一个传染病传播的成对加速失败时间回归模型,该模型允许接触间隔分布的速率参数取决于 i $$ i $$ 的个体层面感染性协变量、j $$ j $$ 的个体层面易感性协变量和成对层面协变量(如关系类型)。该模型可同时处理内部感染(由观察对象之间的传播引起)和外部感染(由环境或社区感染源引起)。我们的研究表明,该模型可得出一致且渐近正态的参数估计值。在模拟研究中,我们对偏差和置信区间覆盖概率进行了评估,探讨了流行病学研究设计的作用,并研究了模型不规范的影响。我们使用该回归模型分析了 2009 年甲型 H1N1 流感大流行期间洛杉矶县的家庭数据,发现考虑外部感染源的能力提高了估计抗病毒预防效果的统计能力。
{"title":"Pairwise Accelerated Failure Time Regression Models for Infectious Disease Transmission in Close-Contact Groups With External Sources of Infection.","authors":"Yushuf Sharker, Zaynab Diallo, Wasiur R KhudaBukhsh, Eben Kenah","doi":"10.1002/sim.10226","DOIUrl":"10.1002/sim.10226","url":null,"abstract":"<p><p>Many important questions in infectious disease epidemiology involve associations between covariates (e.g., age or vaccination status) and infectiousness or susceptibility. Because disease transmission produces dependent outcomes, these questions are difficult or impossible to address using standard regression models from biostatistics. Pairwise survival analysis handles dependent outcomes by calculating likelihoods in terms of contact interval distributions in ordered pairs of individuals. The contact interval in the ordered pair <math> <semantics><mrow><mi>i</mi> <mi>j</mi></mrow> <annotation>$$ ij $$</annotation></semantics> </math> is the time from the onset of infectiousness in <math> <semantics><mrow><mi>i</mi></mrow> <annotation>$$ i $$</annotation></semantics> </math> to infectious contact from <math> <semantics><mrow><mi>i</mi></mrow> <annotation>$$ i $$</annotation></semantics> </math> to <math> <semantics><mrow><mi>j</mi></mrow> <annotation>$$ j $$</annotation></semantics> </math> , where an infectious contact is sufficient to infect <math> <semantics><mrow><mi>j</mi></mrow> <annotation>$$ j $$</annotation></semantics> </math> if they are susceptible. Here, we introduce a pairwise accelerated failure time regression model for infectious disease transmission that allows the rate parameter of the contact interval distribution to depend on individual-level infectiousness covariates for <math> <semantics><mrow><mi>i</mi></mrow> <annotation>$$ i $$</annotation></semantics> </math> , individual-level susceptibility covariates for <math> <semantics><mrow><mi>j</mi></mrow> <annotation>$$ j $$</annotation></semantics> </math> , and pair-level covariates (e.g., type of relationship). This model can simultaneously handle internal infections (caused by transmission between individuals under observation) and external infections (caused by environmental or community sources of infection). We show that this model produces consistent and asymptotically normal parameter estimates. In a simulation study, we evaluate bias and confidence interval coverage probabilities, explore the role of epidemiologic study design, and investigate the effects of model misspecification. We use this regression model to analyze household data from Los Angeles County during the 2009 influenza A (H1N1) pandemic, where we find that the ability to account for external sources of infection increases the statistical power to estimate the effect of antiviral prophylaxis.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5138-5154"},"PeriodicalIF":1.8,"publicationDate":"2024-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142372926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To assess the preliminary therapeutic impact of a novel treatment, futility monitoring is commonly employed in Phase II clinical trials to facilitate informed decisions regarding the early termination of trials. Given the rapid evolution in cancer treatment development, particularly with new agents like immunotherapeutic agents, the focus has often shifted from objective response to time-to-event endpoints. In trials involving multiple time-to-event endpoints, existing monitoring designs typically select one as the primary endpoint or employ a composite endpoint as the time to the first occurrence of any event. However, relying on a single efficacy endpoint may not adequately evaluate an experimental treatment. Additionally, the time-to-first-event endpoint treats all events equally, ignoring their differences in clinical priorities. To tackle these issues, we propose a Bayesian futility monitoring design for a two-arm randomized Phase II trial, which incorporates the win ratio approach to account for the clinical priority of multiple time-to-event endpoints. A joint lognormal distribution was assumed to model the time-to-event variables for the estimation. We conducted simulation studies to assess the operating characteristics of the proposed monitoring design and compared them to those of conventional methods. The proposed design allows for early termination for futility if the endpoint with higher clinical priority (e.g., death) deteriorates in the treatment arm, compared to the time-to-first-event approach. Meanwhile, it prevents an aggressive early termination if the endpoint with lower clinical priority (e.g., cancer recurrence) shows deterioration in the treatment arm, offering a more tailored approach to decision-making in clinical trials with multiple time-to-event endpoints.
为了评估新型疗法的初步治疗效果,II 期临床试验通常会采用无效性监测,以便在知情的情况下做出提前终止试验的决定。鉴于癌症治疗研发的快速发展,尤其是免疫治疗药物等新药的研发,重点往往从客观反应转向时间到事件终点。在涉及多个时间到事件终点的试验中,现有的监测设计通常会选择其中一个作为主要终点,或采用一个复合终点作为首次发生任何事件的时间。然而,依赖单一疗效终点可能无法充分评估试验性治疗。此外,首次事件发生时间终点对所有事件一视同仁,忽略了它们在临床优先级上的差异。为了解决这些问题,我们提出了一种针对双臂随机 II 期试验的贝叶斯无效性监测设计,该设计采用了胜率法来考虑多个时间到事件终点的临床优先级。我们假设联合对数正态分布为时间到事件变量建模,以进行估算。我们进行了模拟研究,以评估建议的监测设计的运行特性,并将其与传统方法的运行特性进行比较。与首次事件发生时间法相比,如果治疗组中临床优先级较高的终点(如死亡)恶化,建议的设计允许因无效而提前终止治疗。同时,如果临床优先级较低的终点(如癌症复发)在治疗组出现恶化,它还能防止激进的提前终止,为具有多个到事件时间终点的临床试验提供更有针对性的决策方法。
{"title":"The Win Ratio Approach in Bayesian Monitoring for Two-Arm Phase II Clinical Trial Designs With Multiple Time-To-Event Endpoints.","authors":"Xinran Huang, Jian Wang, Jing Ning","doi":"10.1002/sim.10282","DOIUrl":"https://doi.org/10.1002/sim.10282","url":null,"abstract":"<p><p>To assess the preliminary therapeutic impact of a novel treatment, futility monitoring is commonly employed in Phase II clinical trials to facilitate informed decisions regarding the early termination of trials. Given the rapid evolution in cancer treatment development, particularly with new agents like immunotherapeutic agents, the focus has often shifted from objective response to time-to-event endpoints. In trials involving multiple time-to-event endpoints, existing monitoring designs typically select one as the primary endpoint or employ a composite endpoint as the time to the first occurrence of any event. However, relying on a single efficacy endpoint may not adequately evaluate an experimental treatment. Additionally, the time-to-first-event endpoint treats all events equally, ignoring their differences in clinical priorities. To tackle these issues, we propose a Bayesian futility monitoring design for a two-arm randomized Phase II trial, which incorporates the win ratio approach to account for the clinical priority of multiple time-to-event endpoints. A joint lognormal distribution was assumed to model the time-to-event variables for the estimation. We conducted simulation studies to assess the operating characteristics of the proposed monitoring design and compared them to those of conventional methods. The proposed design allows for early termination for futility if the endpoint with higher clinical priority (e.g., death) deteriorates in the treatment arm, compared to the time-to-first-event approach. Meanwhile, it prevents an aggressive early termination if the endpoint with lower clinical priority (e.g., cancer recurrence) shows deterioration in the treatment arm, offering a more tailored approach to decision-making in clinical trials with multiple time-to-event endpoints.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142710656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In clinical trials, subjects are usually recruited sequentially. According to the outcomes amassed thus far in a trial, the response-adaptive randomization (RAR) design has been shown to be an advantageous treatment assignment procedure that skews the treatment allocation proportion to pre-specified objectives, such as sending more patients to a more promising treatment. Unfortunately, there are circumstances under which very few data of the primary endpoints are collected in the recruitment period, such as circumstances relating to public health emergencies and chronic diseases, and RAR is thus difficult to apply in allocating treatments using available outcomes. To overcome this problem, if an informative surrogate endpoint can be acquired much earlier than the primary endpoint, the surrogate endpoint can be used as a substitute for the primary endpoint in the RAR procedure. In this paper, we propose an RAR procedure that relies only on surrogate endpoints. The validity of the statistical inference on the primary endpoint and the patient benefit of this approach are justified by both theory and simulation. Furthermore, different types of surrogate endpoint and primary endpoint are considered. The results reassure that RAR with surrogate endpoints can be a viable option in some cases for clinical trials when primary endpoints are unavailable for adaptation.
{"title":"Response-Adaptive Randomization Procedure in Clinical Trials with Surrogate Endpoints.","authors":"Jingya Gao, Feifang Hu, Wei Ma","doi":"10.1002/sim.10286","DOIUrl":"https://doi.org/10.1002/sim.10286","url":null,"abstract":"<p><p>In clinical trials, subjects are usually recruited sequentially. According to the outcomes amassed thus far in a trial, the response-adaptive randomization (RAR) design has been shown to be an advantageous treatment assignment procedure that skews the treatment allocation proportion to pre-specified objectives, such as sending more patients to a more promising treatment. Unfortunately, there are circumstances under which very few data of the primary endpoints are collected in the recruitment period, such as circumstances relating to public health emergencies and chronic diseases, and RAR is thus difficult to apply in allocating treatments using available outcomes. To overcome this problem, if an informative surrogate endpoint can be acquired much earlier than the primary endpoint, the surrogate endpoint can be used as a substitute for the primary endpoint in the RAR procedure. In this paper, we propose an RAR procedure that relies only on surrogate endpoints. The validity of the statistical inference on the primary endpoint and the patient benefit of this approach are justified by both theory and simulation. Furthermore, different types of surrogate endpoint and primary endpoint are considered. The results reassure that RAR with surrogate endpoints can be a viable option in some cases for clinical trials when primary endpoints are unavailable for adaptation.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142710726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-20Epub Date: 2024-09-15DOI: 10.1002/sim.10210
Jingwei Lu, Grace Y Yi, Denis Rustand, Patrick Parfrey, Laurent Briollais, Yun-Hee Choi
Trivariate joint modeling for longitudinal count data, recurrent events, and a terminal event for family data has increased interest in medical studies. For example, families with Lynch syndrome (LS) are at high risk of developing colorectal cancer (CRC), where the number of polyps and the frequency of colonoscopy screening visits are highly associated with the risk of CRC among individuals and families. To assess how screening visits influence polyp detection, which in turn influences time to CRC, we propose a clustered trivariate joint model. The proposed model facilitates longitudinal count data that are zero-inflated and over-dispersed and invokes individual-specific and family-specific random effects to account for dependence among individuals and families. We formulate our proposed model as a latent Gaussian model to use the Bayesian estimation approach with the integrated nested Laplace approximation algorithm and evaluate its performance using simulation studies. Our trivariate joint model is applied to a series of 18 families from Newfoundland, with the occurrence of CRC taken as the terminal event, the colonoscopy screening visits as recurrent events, and the number of polyps detected at each visit as zero-inflated count data with overdispersion. We showed that our trivariate model fits better than alternative bivariate models and that the cluster effects should not be ignored when analyzing family data. Finally, the proposed model enables us to quantify heterogeneity across families and individuals in polyp detection and CRC risk, thus helping to identify individuals and families who would benefit from more intensive screening visits.
{"title":"Trivariate Joint Modeling for Family Data with Longitudinal Counts, Recurrent Events and a Terminal Event with Application to Lynch Syndrome.","authors":"Jingwei Lu, Grace Y Yi, Denis Rustand, Patrick Parfrey, Laurent Briollais, Yun-Hee Choi","doi":"10.1002/sim.10210","DOIUrl":"10.1002/sim.10210","url":null,"abstract":"<p><p>Trivariate joint modeling for longitudinal count data, recurrent events, and a terminal event for family data has increased interest in medical studies. For example, families with Lynch syndrome (LS) are at high risk of developing colorectal cancer (CRC), where the number of polyps and the frequency of colonoscopy screening visits are highly associated with the risk of CRC among individuals and families. To assess how screening visits influence polyp detection, which in turn influences time to CRC, we propose a clustered trivariate joint model. The proposed model facilitates longitudinal count data that are zero-inflated and over-dispersed and invokes individual-specific and family-specific random effects to account for dependence among individuals and families. We formulate our proposed model as a latent Gaussian model to use the Bayesian estimation approach with the integrated nested Laplace approximation algorithm and evaluate its performance using simulation studies. Our trivariate joint model is applied to a series of 18 families from Newfoundland, with the occurrence of CRC taken as the terminal event, the colonoscopy screening visits as recurrent events, and the number of polyps detected at each visit as zero-inflated count data with overdispersion. We showed that our trivariate model fits better than alternative bivariate models and that the cluster effects should not be ignored when analyzing family data. Finally, the proposed model enables us to quantify heterogeneity across families and individuals in polyp detection and CRC risk, thus helping to identify individuals and families who would benefit from more intensive screening visits.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"5000-5022"},"PeriodicalIF":1.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142295958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The study of precision medicine involves dynamic treatment regimes (DTRs), which are sequences of treatment decision rules recommended based on patient-level information. The primary goal of the DTR study is to identify an optimal DTR, a sequence of treatment decision rules that optimizes the clinical outcome across multiple decision points. Statistical methods have been developed in recent years to estimate an optimal DTR, including Q-learning, a regression-based method in the DTR literature. Although there are many studies concerning Q-learning, little attention has been paid in the presence of noisy data, such as misclassified outcomes. In this article, we investigate the effect of outcome misclassification on identifying optimal DTRs using Q-learning and propose a correction method to accommodate the misclassification effect on DTR. Simulation studies are conducted to demonstrate the satisfactory performance of the proposed method. We illustrate the proposed method using two examples from the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study and the Population Assessment of Tobacco and Health Study.
{"title":"Q-Learning in Dynamic Treatment Regimes With Misclassified Binary Outcome.","authors":"Dan Liu, Wenqing He","doi":"10.1002/sim.10223","DOIUrl":"https://doi.org/10.1002/sim.10223","url":null,"abstract":"<p><p>The study of precision medicine involves dynamic treatment regimes (DTRs), which are sequences of treatment decision rules recommended based on patient-level information. The primary goal of the DTR study is to identify an optimal DTR, a sequence of treatment decision rules that optimizes the clinical outcome across multiple decision points. Statistical methods have been developed in recent years to estimate an optimal DTR, including Q-learning, a regression-based method in the DTR literature. Although there are many studies concerning Q-learning, little attention has been paid in the presence of noisy data, such as misclassified outcomes. In this article, we investigate the effect of outcome misclassification on identifying optimal DTRs using Q-learning and propose a correction method to accommodate the misclassification effect on DTR. Simulation studies are conducted to demonstrate the satisfactory performance of the proposed method. We illustrate the proposed method using two examples from the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study and the Population Assessment of Tobacco and Health Study.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaogang Su, Lei Liu, Lili Liu, Ruiwen Zhou, Guoqiao Wang, Elise Dusseldorp, Tianni Zhou
We propose a novel regression tree method named "TreeFuL," an abbreviation for 'Tree with Fused Leaves.' TreeFuL innovatively combines recursive partitioning with fused regularization, offering a distinct approach to the conventional pruning method. One of TreeFuL's noteworthy advantages is its capacity for cross-validated amalgamation of non-neighboring terminal nodes. This is facilitated by a leaf coloring scheme that supports tree shearing and node amalgamation. As a result, TreeFuL facilitates the development of more parsimonious tree models without compromising predictive accuracy. The refined model offers enhanced interpretability, making it particularly well-suited for biomedical applications of decision trees, such as disease diagnosis and prognosis. We demonstrate the practical advantages of our proposed method through simulation studies and an analysis of data collected in an obesity study.
{"title":"Regression Trees With Fused Leaves.","authors":"Xiaogang Su, Lei Liu, Lili Liu, Ruiwen Zhou, Guoqiao Wang, Elise Dusseldorp, Tianni Zhou","doi":"10.1002/sim.10272","DOIUrl":"https://doi.org/10.1002/sim.10272","url":null,"abstract":"<p><p>We propose a novel regression tree method named \"TreeFuL,\" an abbreviation for 'Tree with Fused Leaves.' TreeFuL innovatively combines recursive partitioning with fused regularization, offering a distinct approach to the conventional pruning method. One of TreeFuL's noteworthy advantages is its capacity for cross-validated amalgamation of non-neighboring terminal nodes. This is facilitated by a leaf coloring scheme that supports tree shearing and node amalgamation. As a result, TreeFuL facilitates the development of more parsimonious tree models without compromising predictive accuracy. The refined model offers enhanced interpretability, making it particularly well-suited for biomedical applications of decision trees, such as disease diagnosis and prognosis. We demonstrate the practical advantages of our proposed method through simulation studies and an analysis of data collected in an obesity study.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142682916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-20Epub Date: 2024-09-09DOI: 10.1002/sim.10214
Corentin Ségalas, Catherine Helmer, Robin Genuer, Cécile Proust-Lima
Analyzing longitudinal data in health studies is challenging due to sparse and error-prone measurements, strong within-individual correlation, missing data and various trajectory shapes. While mixed-effect models (MM) effectively address these challenges, they remain parametric models and may incur computational costs. In contrast, functional principal component analysis (FPCA) is a non-parametric approach developed for regular and dense functional data that flexibly describes temporal trajectories at a potentially lower computational cost. This article presents an empirical simulation study evaluating the behavior of FPCA with sparse and error-prone repeated measures and its robustness under different missing data schemes in comparison with MM. The results show that FPCA is well-suited in the presence of missing at random data caused by dropout, except in scenarios involving most frequent and systematic dropout. Like MM, FPCA fails under missing not at random mechanism. The FPCA was applied to describe the trajectories of four cognitive functions before clinical dementia and contrast them with those of matched controls in a case-control study nested in a population-based aging cohort. The average cognitive declines of future dementia cases showed a sudden divergence from those of their matched controls with a sharp acceleration 5 to 2.5 years prior to diagnosis.
由于测量数据稀疏且容易出错、个体内部相关性强、数据缺失以及轨迹形状各异,分析健康研究中的纵向数据具有挑战性。虽然混合效应模型(MM)能有效解决这些难题,但它们仍然是参数模型,可能会产生计算成本。相比之下,函数主成分分析(FPCA)是一种针对规则和密集函数数据开发的非参数方法,能以较低的计算成本灵活描述时间轨迹。本文介绍了一项实证模拟研究,评估了 FPCA 在稀疏且易出错的重复测量中的表现,以及它与 MM 相比在不同缺失数据方案下的鲁棒性。研究结果表明,FPCA 非常适合因遗漏而导致的随机数据缺失,但涉及最频繁和系统性遗漏的情况除外。与 MM 一样,FPCA 在非随机缺失机制下也会失效。在一项嵌套于人口老龄化队列的病例对照研究中,应用 FPCA 描述了临床痴呆前四种认知功能的变化轨迹,并与匹配对照组的认知功能变化轨迹进行了对比。未来痴呆症病例的平均认知功能衰退与匹配对照组的平均认知功能衰退出现了突然的背离,在确诊前 5 到 2.5 年出现了急剧的加速。
{"title":"Functional Principal Component Analysis as an Alternative to Mixed-Effect Models for Describing Sparse Repeated Measures in Presence of Missing Data.","authors":"Corentin Ségalas, Catherine Helmer, Robin Genuer, Cécile Proust-Lima","doi":"10.1002/sim.10214","DOIUrl":"10.1002/sim.10214","url":null,"abstract":"<p><p>Analyzing longitudinal data in health studies is challenging due to sparse and error-prone measurements, strong within-individual correlation, missing data and various trajectory shapes. While mixed-effect models (MM) effectively address these challenges, they remain parametric models and may incur computational costs. In contrast, functional principal component analysis (FPCA) is a non-parametric approach developed for regular and dense functional data that flexibly describes temporal trajectories at a potentially lower computational cost. This article presents an empirical simulation study evaluating the behavior of FPCA with sparse and error-prone repeated measures and its robustness under different missing data schemes in comparison with MM. The results show that FPCA is well-suited in the presence of missing at random data caused by dropout, except in scenarios involving most frequent and systematic dropout. Like MM, FPCA fails under missing not at random mechanism. The FPCA was applied to describe the trajectories of four cognitive functions before clinical dementia and contrast them with those of matched controls in a case-control study nested in a population-based aging cohort. The average cognitive declines of future dementia cases showed a sudden divergence from those of their matched controls with a sharp acceleration 5 to 2.5 years prior to diagnosis.</p>","PeriodicalId":21879,"journal":{"name":"Statistics in Medicine","volume":" ","pages":"4899-4912"},"PeriodicalIF":1.8,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142154969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}