Pub Date : 2025-01-01Epub Date: 2024-12-14DOI: 10.1007/s10985-024-09642-x
Walmir Dos Reis Miranda Filho, Fábio Nogueira Demarqui
We propose a new class of bivariate survival models based on the family of Archimedean copulas with margins modeled by the Yang and Prentice (YP) model. The Ali-Mikhail-Haq (AMH), Clayton, Frank, Gumbel-Hougaard (GH), and Joe copulas are employed to accommodate the dependency among marginal distributions. Baseline distributions are modeled semiparametrically by the Piecewise Exponential (PE) distribution and the Bernstein polynomials (BP). Inference procedures for the proposed class of models are based on the maximum likelihood (ML) approach. The new class of models possesses some attractive features: i) the ability to take into account survival data with crossing survival curves; ii) the inclusion of the well-known proportional hazards (PH) and proportional odds (PO) models as particular cases; iii) greater flexibility provided by the semiparametric modeling of the marginal baseline distributions; iv) the availability of closed-form expressions for the likelihood functions, leading to more straightforward inferential procedures. The properties of the proposed class are numerically investigated through an extensive simulation study. Finally, we demonstrate the versatility of our new class of models through the analysis of survival data involving patients diagnosed with ovarian cancer.
{"title":"A class of semiparametric models for bivariate survival data.","authors":"Walmir Dos Reis Miranda Filho, Fábio Nogueira Demarqui","doi":"10.1007/s10985-024-09642-x","DOIUrl":"10.1007/s10985-024-09642-x","url":null,"abstract":"<p><p>We propose a new class of bivariate survival models based on the family of Archimedean copulas with margins modeled by the Yang and Prentice (YP) model. The Ali-Mikhail-Haq (AMH), Clayton, Frank, Gumbel-Hougaard (GH), and Joe copulas are employed to accommodate the dependency among marginal distributions. Baseline distributions are modeled semiparametrically by the Piecewise Exponential (PE) distribution and the Bernstein polynomials (BP). Inference procedures for the proposed class of models are based on the maximum likelihood (ML) approach. The new class of models possesses some attractive features: i) the ability to take into account survival data with crossing survival curves; ii) the inclusion of the well-known proportional hazards (PH) and proportional odds (PO) models as particular cases; iii) greater flexibility provided by the semiparametric modeling of the marginal baseline distributions; iv) the availability of closed-form expressions for the likelihood functions, leading to more straightforward inferential procedures. The properties of the proposed class are numerically investigated through an extensive simulation study. Finally, we demonstrate the versatility of our new class of models through the analysis of survival data involving patients diagnosed with ovarian cancer.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":" ","pages":"102-125"},"PeriodicalIF":1.2,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142824540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-10-12DOI: 10.1007/s10985-024-09639-6
Nicholas Hartman
Period-prevalent cohorts are often used for their cost-saving potential in epidemiological studies of survival outcomes. Under this design, prevalent patients allow for evaluations of long-term survival outcomes without the need for long follow-up, whereas incident patients allow for evaluations of short-term survival outcomes without the issue of left-truncation. In most period-prevalent survival analyses from the existing literature, patients have been recruited to achieve an overall sample size, with little attention given to the relative frequencies of prevalent and incident patients and their statistical implications. Furthermore, there are no existing methods available to rigorously quantify the impact of these relative frequencies on estimation and inference and incorporate this information into study design strategies. To address these gaps, we develop an approach to identify the optimal mix of prevalent and incident patients that maximizes precision over the entire estimated survival curve, subject to a flexible weighting scheme. In addition, we prove that inference based on the weighted log-rank test or Cox proportional hazards model is most powerful with an entirely prevalent or incident cohort, and we derive theoretical formulas to determine the optimal choice. Simulations confirm the validity of the proposed optimization criteria and show that substantial efficiency gains can be achieved by recruiting the optimal mix of prevalent and incident patients. The proposed methods are applied to assess waitlist outcomes among kidney transplant candidates.
{"title":"Optimal survival analyses with prevalent and incident patients.","authors":"Nicholas Hartman","doi":"10.1007/s10985-024-09639-6","DOIUrl":"10.1007/s10985-024-09639-6","url":null,"abstract":"<p><p>Period-prevalent cohorts are often used for their cost-saving potential in epidemiological studies of survival outcomes. Under this design, prevalent patients allow for evaluations of long-term survival outcomes without the need for long follow-up, whereas incident patients allow for evaluations of short-term survival outcomes without the issue of left-truncation. In most period-prevalent survival analyses from the existing literature, patients have been recruited to achieve an overall sample size, with little attention given to the relative frequencies of prevalent and incident patients and their statistical implications. Furthermore, there are no existing methods available to rigorously quantify the impact of these relative frequencies on estimation and inference and incorporate this information into study design strategies. To address these gaps, we develop an approach to identify the optimal mix of prevalent and incident patients that maximizes precision over the entire estimated survival curve, subject to a flexible weighting scheme. In addition, we prove that inference based on the weighted log-rank test or Cox proportional hazards model is most powerful with an entirely prevalent or incident cohort, and we derive theoretical formulas to determine the optimal choice. Simulations confirm the validity of the proposed optimization criteria and show that substantial efficiency gains can be achieved by recruiting the optimal mix of prevalent and incident patients. The proposed methods are applied to assess waitlist outcomes among kidney transplant candidates.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":" ","pages":"24-51"},"PeriodicalIF":1.2,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-08-24DOI: 10.1007/s10985-024-09632-z
Huazhen Yu, Rui Zhang, Lixin Zhang
This paper discusses regression analysis of current status data with dependent censoring, a problem that often occurs in many areas such as cross-sectional studies, epidemiological investigations and tumorigenicity experiments. Copula model-based methods are commonly employed to tackle this issue. However, these methods often face challenges in terms of model and parameter identification. The primary aim of this paper is to propose a copula-based analysis for dependent current status data, where the association parameter is left unspecified. Our method is based on a general class of semiparametric linear transformation models and parametric copulas. We demonstrate that the proposed semiparametric model is identifiable under certain regularity conditions from the distribution of the observed data. For inference, we develop a sieve maximum likelihood estimation method, using Bernstein polynomials to approximate the nonparametric functions involved. The asymptotic consistency and normality of the proposed estimators are established. Finally, to demonstrate the effectiveness and practical applicability of our method, we conduct an extensive simulation study and apply the proposed method to a real data example.
{"title":"Copula-based analysis of dependent current status data with semiparametric linear transformation model.","authors":"Huazhen Yu, Rui Zhang, Lixin Zhang","doi":"10.1007/s10985-024-09632-z","DOIUrl":"10.1007/s10985-024-09632-z","url":null,"abstract":"<p><p>This paper discusses regression analysis of current status data with dependent censoring, a problem that often occurs in many areas such as cross-sectional studies, epidemiological investigations and tumorigenicity experiments. Copula model-based methods are commonly employed to tackle this issue. However, these methods often face challenges in terms of model and parameter identification. The primary aim of this paper is to propose a copula-based analysis for dependent current status data, where the association parameter is left unspecified. Our method is based on a general class of semiparametric linear transformation models and parametric copulas. We demonstrate that the proposed semiparametric model is identifiable under certain regularity conditions from the distribution of the observed data. For inference, we develop a sieve maximum likelihood estimation method, using Bernstein polynomials to approximate the nonparametric functions involved. The asymptotic consistency and normality of the proposed estimators are established. Finally, to demonstrate the effectiveness and practical applicability of our method, we conduct an extensive simulation study and apply the proposed method to a real data example.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":" ","pages":"742-775"},"PeriodicalIF":1.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142047379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-09-05DOI: 10.1007/s10985-024-09633-y
Yei Eun Shin, Takumi Saegusa
Nested case-control design (NCC) is a cost-effective outcome-dependent design in epidemiology that collects all cases and a fixed number of controls at the time of case diagnosis from a large cohort. Due to inefficiency relative to full cohort studies, previous research developed various estimation methodologies but changing designs in the formulation of risk sets was considered only in view of potential bias in the partial likelihood estimation. In this paper, we study a modified design that excludes previously selected controls from risk sets in view of efficiency improvement as well as bias. To this end, we extend the inverse probability weighting method of Samuelsen which was shown to outperform the partial likelihood estimator in the standard setting. We develop its asymptotic theory and a variance estimation of both regression coefficients and the cumulative baseline hazard function that takes account of the complex feature of the modified sampling design. In addition to good finite sample performance of variance estimation, simulation studies show that the modified design with the proposed estimator is more efficient than the standard design. Examples are provided using data from NIH-AARP Diet and Health Cohort Study.
{"title":"Nested case-control sampling without replacement.","authors":"Yei Eun Shin, Takumi Saegusa","doi":"10.1007/s10985-024-09633-y","DOIUrl":"10.1007/s10985-024-09633-y","url":null,"abstract":"<p><p>Nested case-control design (NCC) is a cost-effective outcome-dependent design in epidemiology that collects all cases and a fixed number of controls at the time of case diagnosis from a large cohort. Due to inefficiency relative to full cohort studies, previous research developed various estimation methodologies but changing designs in the formulation of risk sets was considered only in view of potential bias in the partial likelihood estimation. In this paper, we study a modified design that excludes previously selected controls from risk sets in view of efficiency improvement as well as bias. To this end, we extend the inverse probability weighting method of Samuelsen which was shown to outperform the partial likelihood estimator in the standard setting. We develop its asymptotic theory and a variance estimation of both regression coefficients and the cumulative baseline hazard function that takes account of the complex feature of the modified sampling design. In addition to good finite sample performance of variance estimation, simulation studies show that the modified design with the proposed estimator is more efficient than the standard design. Examples are provided using data from NIH-AARP Diet and Health Cohort Study.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":" ","pages":"776-799"},"PeriodicalIF":1.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502564/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142134285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-05-28DOI: 10.1007/s10985-024-09630-1
Dayu Sun, Yuanyuan Guo, Yang Li, Jianguo Sun, Wanzhu Tu
Panel count regression is often required in recurrent event studies, where the interest is to model the event rate. Existing rate models are unable to handle time-varying covariate effects due to theoretical and computational difficulties. Mean models provide a viable alternative but are subject to the constraints of the monotonicity assumption, which tends to be violated when covariates fluctuate over time. In this paper, we present a new semiparametric rate model for panel count data along with related theoretical results. For model fitting, we present an efficient EM algorithm with three different methods for variance estimation. The algorithm allows us to sidestep the challenges of numerical integration and difficulties with the iterative convex minorant algorithm. We showed that the estimators are consistent and asymptotically normally distributed. Simulation studies confirmed an excellent finite sample performance. To illustrate, we analyzed data from a real clinical study of behavioral risk factors for sexually transmitted infections.
在经常性事件研究中经常需要进行面板计数回归,其目的是建立事件发生率模型。由于理论和计算上的困难,现有的比率模型无法处理时变协变量效应。均值模型提供了一个可行的替代方案,但受到单调性假设的限制,当协变量随时间波动时,单调性假设往往会被违反。在本文中,我们针对面板计数数据提出了一种新的半参数率模型以及相关的理论结果。在模型拟合方面,我们提出了一种高效的 EM 算法,其中包含三种不同的方差估计方法。该算法使我们能够避开数值积分的挑战和迭代凸小法算法的困难。我们的研究表明,这些估计值是一致的,并具有渐近正态分布。模拟研究证实了其出色的有限样本性能。为了说明这一点,我们分析了一项关于性传播感染行为风险因素的真实临床研究数据。
{"title":"A flexible time-varying coefficient rate model for panel count data.","authors":"Dayu Sun, Yuanyuan Guo, Yang Li, Jianguo Sun, Wanzhu Tu","doi":"10.1007/s10985-024-09630-1","DOIUrl":"10.1007/s10985-024-09630-1","url":null,"abstract":"<p><p>Panel count regression is often required in recurrent event studies, where the interest is to model the event rate. Existing rate models are unable to handle time-varying covariate effects due to theoretical and computational difficulties. Mean models provide a viable alternative but are subject to the constraints of the monotonicity assumption, which tends to be violated when covariates fluctuate over time. In this paper, we present a new semiparametric rate model for panel count data along with related theoretical results. For model fitting, we present an efficient EM algorithm with three different methods for variance estimation. The algorithm allows us to sidestep the challenges of numerical integration and difficulties with the iterative convex minorant algorithm. We showed that the estimators are consistent and asymptotically normally distributed. Simulation studies confirmed an excellent finite sample performance. To illustrate, we analyzed data from a real clinical study of behavioral risk factors for sexually transmitted infections.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":" ","pages":"721-741"},"PeriodicalIF":1.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141158712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-10-16DOI: 10.1007/s10985-024-09636-9
Xingqiu Zhao
{"title":"Call for papers for a special issue on survival analysis in artificial intelligence.","authors":"Xingqiu Zhao","doi":"10.1007/s10985-024-09636-9","DOIUrl":"10.1007/s10985-024-09636-9","url":null,"abstract":"","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":" ","pages":"853-854"},"PeriodicalIF":1.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142479186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01Epub Date: 2024-10-04DOI: 10.1007/s10985-024-09635-w
Esra Kürüm, Danh V Nguyen, Qi Qian, Sudipto Banerjee, Connie M Rhee, Damla Şentürk
Individuals with end-stage kidney disease (ESKD) on dialysis experience high mortality and excessive burden of hospitalizations over time relative to comparable Medicare patient cohorts without kidney failure. A key interest in this population is to understand the time-dynamic effects of multilevel risk factors that contribute to the correlated outcomes of longitudinal hospitalization and mortality. For this we utilize multilevel data from the United States Renal Data System (USRDS), a national database that includes nearly all patients with ESKD, where repeated measurements/hospitalizations over time are nested in patients and patients are nested within (health service) regions across the contiguous U.S. We develop a novel spatiotemporal multilevel joint model (STM-JM) that accounts for the aforementioned hierarchical structure of the data while considering the spatiotemporal variations in both outcomes across regions. The proposed STM-JM includes time-varying effects of multilevel (patient- and region-level) risk factors on hospitalization trajectories and mortality and incorporates spatial correlations across the spatial regions via a multivariate conditional autoregressive correlation structure. Efficient estimation and inference are performed via a Bayesian framework, where multilevel varying coefficient functions are targeted via thin-plate splines. The finite sample performance of the proposed method is assessed through simulation studies. An application of the proposed method to the USRDS data highlights significant time-varying effects of patient- and region-level risk factors on hospitalization and mortality and identifies specific time periods on dialysis and spatial locations across the U.S. with elevated hospitalization and mortality risks.
{"title":"Spatiotemporal multilevel joint modeling of longitudinal and survival outcomes in end-stage kidney disease.","authors":"Esra Kürüm, Danh V Nguyen, Qi Qian, Sudipto Banerjee, Connie M Rhee, Damla Şentürk","doi":"10.1007/s10985-024-09635-w","DOIUrl":"10.1007/s10985-024-09635-w","url":null,"abstract":"<p><p>Individuals with end-stage kidney disease (ESKD) on dialysis experience high mortality and excessive burden of hospitalizations over time relative to comparable Medicare patient cohorts without kidney failure. A key interest in this population is to understand the time-dynamic effects of multilevel risk factors that contribute to the correlated outcomes of longitudinal hospitalization and mortality. For this we utilize multilevel data from the United States Renal Data System (USRDS), a national database that includes nearly all patients with ESKD, where repeated measurements/hospitalizations over time are nested in patients and patients are nested within (health service) regions across the contiguous U.S. We develop a novel spatiotemporal multilevel joint model (STM-JM) that accounts for the aforementioned hierarchical structure of the data while considering the spatiotemporal variations in both outcomes across regions. The proposed STM-JM includes time-varying effects of multilevel (patient- and region-level) risk factors on hospitalization trajectories and mortality and incorporates spatial correlations across the spatial regions via a multivariate conditional autoregressive correlation structure. Efficient estimation and inference are performed via a Bayesian framework, where multilevel varying coefficient functions are targeted via thin-plate splines. The finite sample performance of the proposed method is assessed through simulation studies. An application of the proposed method to the USRDS data highlights significant time-varying effects of patient- and region-level risk factors on hospitalization and mortality and identifies specific time periods on dialysis and spatial locations across the U.S. with elevated hospitalization and mortality risks.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":" ","pages":"827-852"},"PeriodicalIF":1.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502599/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s10985-024-09634-x
Suryo Adi Rakhmawan, Tahir Mahmood, Nasir Abbas, Muhammad Riaz
Forecasting mortality rates is crucial for evaluating life insurance company solvency, especially amid disruptions caused by phenomena like COVID-19. The Lee–Carter model is commonly employed in mortality modelling; however, extensions that can encompass count data with diverse distributions, such as the Generalized Autoregressive Score (GAS) model utilizing the COM–Poisson distribution, exhibit potential for enhancing time-to-event forecasting accuracy. Using mortality data from 29 countries, this research evaluates various distributions and determines that the COM–Poisson model surpasses the Poisson, binomial, and negative binomial distributions in forecasting mortality rates. The one-step forecasting capability of the GAS model offers distinct advantages, while the COM–Poisson distribution demonstrates enhanced flexibility and versatility by accommodating various distributions, including Poisson and negative binomial. Ultimately, the study determines that the COM–Poisson GAS model is an effective instrument for examining time series data on mortality rates, particularly when facing time-varying parameters and non-conventional data distributions.
{"title":"Unifying mortality forecasting model: an investigation of the COM–Poisson distribution in the GAS model for improved projections","authors":"Suryo Adi Rakhmawan, Tahir Mahmood, Nasir Abbas, Muhammad Riaz","doi":"10.1007/s10985-024-09634-x","DOIUrl":"https://doi.org/10.1007/s10985-024-09634-x","url":null,"abstract":"<p>Forecasting mortality rates is crucial for evaluating life insurance company solvency, especially amid disruptions caused by phenomena like COVID-19. The Lee–Carter model is commonly employed in mortality modelling; however, extensions that can encompass count data with diverse distributions, such as the Generalized Autoregressive Score (GAS) model utilizing the COM–Poisson distribution, exhibit potential for enhancing time-to-event forecasting accuracy. Using mortality data from 29 countries, this research evaluates various distributions and determines that the COM–Poisson model surpasses the Poisson, binomial, and negative binomial distributions in forecasting mortality rates. The one-step forecasting capability of the GAS model offers distinct advantages, while the COM–Poisson distribution demonstrates enhanced flexibility and versatility by accommodating various distributions, including Poisson and negative binomial. Ultimately, the study determines that the COM–Poisson GAS model is an effective instrument for examining time series data on mortality rates, particularly when facing time-varying parameters and non-conventional data distributions.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":"60 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142220139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-06-24DOI: 10.1007/s10985-024-09631-0
Mei-Ling Ting Lee
{"title":"Special issue dedicated to Mitchell H. Gail, M.D. Ph.D.","authors":"Mei-Ling Ting Lee","doi":"10.1007/s10985-024-09631-0","DOIUrl":"10.1007/s10985-024-09631-0","url":null,"abstract":"","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":" ","pages":"529-530"},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141443608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-05-08DOI: 10.1007/s10985-024-09628-9
Yaqi Cao, Weidong Ma, Ge Zhao, Anne Marie McCarthy, Jinbo Chen
The added value of candidate predictors for risk modeling is routinely evaluated by comparing the performance of models with or without including candidate predictors. Such comparison is most meaningful when the estimated risk by the two models are both unbiased in the target population. Very often data for candidate predictors are sourced from nonrepresentative convenience samples. Updating the base model using the study data without acknowledging the discrepancy between the underlying distribution of the study data and that in the target population can lead to biased risk estimates and therefore an unfair evaluation of candidate predictors. To address this issue assuming access to a well-calibrated base model, we propose a semiparametric method for model fitting that enforces good calibration. The central idea is to calibrate the fitted model against the base model by enforcing suitable constraints in maximizing the likelihood function. This approach enables unbiased assessment of model improvement offered by candidate predictors without requiring a representative sample from the target population, thus overcoming a significant practical challenge. We study theoretical properties for model parameter estimates, and demonstrate improvement in model calibration via extensive simulation studies. Finally, we apply the proposed method to data extracted from Penn Medicine Biobank to inform the added value of breast density for breast cancer risk assessment in the Caucasian woman population.
{"title":"A constrained maximum likelihood approach to developing well-calibrated models for predicting binary outcomes.","authors":"Yaqi Cao, Weidong Ma, Ge Zhao, Anne Marie McCarthy, Jinbo Chen","doi":"10.1007/s10985-024-09628-9","DOIUrl":"10.1007/s10985-024-09628-9","url":null,"abstract":"<p><p>The added value of candidate predictors for risk modeling is routinely evaluated by comparing the performance of models with or without including candidate predictors. Such comparison is most meaningful when the estimated risk by the two models are both unbiased in the target population. Very often data for candidate predictors are sourced from nonrepresentative convenience samples. Updating the base model using the study data without acknowledging the discrepancy between the underlying distribution of the study data and that in the target population can lead to biased risk estimates and therefore an unfair evaluation of candidate predictors. To address this issue assuming access to a well-calibrated base model, we propose a semiparametric method for model fitting that enforces good calibration. The central idea is to calibrate the fitted model against the base model by enforcing suitable constraints in maximizing the likelihood function. This approach enables unbiased assessment of model improvement offered by candidate predictors without requiring a representative sample from the target population, thus overcoming a significant practical challenge. We study theoretical properties for model parameter estimates, and demonstrate improvement in model calibration via extensive simulation studies. Finally, we apply the proposed method to data extracted from Penn Medicine Biobank to inform the added value of breast density for breast cancer risk assessment in the Caucasian woman population.</p>","PeriodicalId":49908,"journal":{"name":"Lifetime Data Analysis","volume":" ","pages":"624-648"},"PeriodicalIF":1.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11634939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140877759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}