Diffusion tensor imaging (DTI) is a frequently used imaging modality to investigate white matter fiber connections of human brain. DTI provides an important tool for characterizing human brain structural organization. Common goals in DTI analysis include dimension reduction, denoising, and extraction of underlying structure networks. Blind source separation methods are often used to achieve these goals for other imaging modalities. However, there has been very limited work for multi-subject DTI data. Due to the special characteristics of the 3D diffusion tensor measured in DTI, existing methods such as standard independent component analysis (ICA) cannot be directly applied. We propose a Group Distributional ICA (G-DICA) method to fill this gap. G-DICA represents a fundamentally new blind source separation method that separates the parameters in the distribution function of the observed imaging data as a mixture of independent source signals. Decomposing multi-subject DTI using G-DICA uncovers structural networks corresponding to several major white matter fiber bundles in the brain. Through simulation studies and real data applications, the proposed G-DICA method demonstrates superior performance and improved reproducibility compared to the existing method.
{"title":"A group distributional ICA method for decomposing multi-subject diffusion tensor imaging.","authors":"Guangming Yang, Ben Wu, Jian Kang, Ying Guo","doi":"10.1093/biomtc/ujaf117","DOIUrl":"10.1093/biomtc/ujaf117","url":null,"abstract":"<p><p>Diffusion tensor imaging (DTI) is a frequently used imaging modality to investigate white matter fiber connections of human brain. DTI provides an important tool for characterizing human brain structural organization. Common goals in DTI analysis include dimension reduction, denoising, and extraction of underlying structure networks. Blind source separation methods are often used to achieve these goals for other imaging modalities. However, there has been very limited work for multi-subject DTI data. Due to the special characteristics of the 3D diffusion tensor measured in DTI, existing methods such as standard independent component analysis (ICA) cannot be directly applied. We propose a Group Distributional ICA (G-DICA) method to fill this gap. G-DICA represents a fundamentally new blind source separation method that separates the parameters in the distribution function of the observed imaging data as a mixture of independent source signals. Decomposing multi-subject DTI using G-DICA uncovers structural networks corresponding to several major white matter fiber bundles in the brain. Through simulation studies and real data applications, the proposed G-DICA method demonstrates superior performance and improved reproducibility compared to the existing method.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Estimating species abundance under imperfect detection is a key challenge in biodiversity conservation. The N-mixture model, widely recognized for its ability to distinguish between abundance and individual detection probability without marking individuals, is constrained by its stringent closure assumption, which leads to biased estimates when violated in real-world settings. To address this limitation, we propose an extended framework based on a development of the mixed Gamma-Poisson model, incorporating a community parameter that represents the proportion of individuals consistently present throughout the survey period. This flexible framework generalizes both the zero-inflated type occupancy model and the standard N-mixture model as special cases, corresponding to community parameter values of 0 and 1, respectively. The model's effectiveness is validated through simulations and applications to real-world datasets, specifically with 5 species from the North American Breeding Bird Survey and 46 species from the Swiss Breeding Bird Survey, demonstrating its improved accuracy and adaptability in settings where strict closure may not hold.
{"title":"A flexible framework for N-mixture occupancy models: applications to breeding bird surveys.","authors":"Huu-Dinh Huynh, J Andrew Royle, Wen-Han Hwang","doi":"10.1093/biomtc/ujaf087","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf087","url":null,"abstract":"<p><p>Estimating species abundance under imperfect detection is a key challenge in biodiversity conservation. The N-mixture model, widely recognized for its ability to distinguish between abundance and individual detection probability without marking individuals, is constrained by its stringent closure assumption, which leads to biased estimates when violated in real-world settings. To address this limitation, we propose an extended framework based on a development of the mixed Gamma-Poisson model, incorporating a community parameter that represents the proportion of individuals consistently present throughout the survey period. This flexible framework generalizes both the zero-inflated type occupancy model and the standard N-mixture model as special cases, corresponding to community parameter values of 0 and 1, respectively. The model's effectiveness is validated through simulations and applications to real-world datasets, specifically with 5 species from the North American Breeding Bird Survey and 46 species from the Swiss Breeding Bird Survey, demonstrating its improved accuracy and adaptability in settings where strict closure may not hold.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike
{"title":"Correction to \"Propensity weighting plus adjustment in proportional hazards model is not doubly robust,\" by Erin E. Gabriel, Michael C. Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F. Blanche, Stijn Vansteelandt, Arvid Sjölander, and Thomas Scheike; Volume 80, Issue 3, September 2024, https://doi.org/10.1093/biomtc/ujae069.","authors":"Erin E Gabriel, Michael C Sachs, Ingeborg Waernbaum, Els Goetghebeur, Paul F Blanche, Stijn Vansteelandt, Arvid Sjölander, Thomas Scheike","doi":"10.1093/biomtc/ujaf091","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf091","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Two-phase sampling designs are frequently applied in epidemiological studies and large-scale health surveys. In such designs, certain variables are collected exclusively within a second-phase random subsample of the initial first-phase sample, often due to factors such as high costs, response burden, or constraints on data collection or assessment. Consequently, second-phase sample estimators can be inefficient due to the diminished sample size. Model-assisted calibration methods have been used to improve the efficiency of second-phase estimators in regression analysis. However, limited literature provides valid finite population inferences of the calibration estimators that use appropriate calibration auxiliary variables while simultaneously accounting for the complex sample designs in the first- and second-phase samples. Moreover, no literature considers the "pooled design" where some covariates are measured exclusively in certain repeated survey cycles. This paper proposes calibrating the sample weights for the second-phase sample to the weighted first-phase sample based on score functions of the regression model that uses predictions of the second-phase variable for the first-phase sample. We establish the consistency of estimation using calibrated weights and provide variance estimation for the regression coefficients under the two-phase design or the pooled design nested within complex survey designs. Empirical evidence highlights the efficiency and robustness of the proposed calibration compared to existing calibration and imputation methods. Data examples from the National Health and Nutrition Examination Survey are provided.
{"title":"Using model-assisted calibration methods to improve efficiency of regression analyses using two-phase samples or pooled samples under complex survey designs.","authors":"Lingxiao Wang","doi":"10.1093/biomtc/ujaf092","DOIUrl":"10.1093/biomtc/ujaf092","url":null,"abstract":"<p><p>Two-phase sampling designs are frequently applied in epidemiological studies and large-scale health surveys. In such designs, certain variables are collected exclusively within a second-phase random subsample of the initial first-phase sample, often due to factors such as high costs, response burden, or constraints on data collection or assessment. Consequently, second-phase sample estimators can be inefficient due to the diminished sample size. Model-assisted calibration methods have been used to improve the efficiency of second-phase estimators in regression analysis. However, limited literature provides valid finite population inferences of the calibration estimators that use appropriate calibration auxiliary variables while simultaneously accounting for the complex sample designs in the first- and second-phase samples. Moreover, no literature considers the \"pooled design\" where some covariates are measured exclusively in certain repeated survey cycles. This paper proposes calibrating the sample weights for the second-phase sample to the weighted first-phase sample based on score functions of the regression model that uses predictions of the second-phase variable for the first-phase sample. We establish the consistency of estimation using calibrated weights and provide variance estimation for the regression coefficients under the two-phase design or the pooled design nested within complex survey designs. Empirical evidence highlights the efficiency and robustness of the proposed calibration compared to existing calibration and imputation methods. Data examples from the National Health and Nutrition Examination Survey are provided.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288669/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the realm of contemporary data analysis, the use of massive datasets has taken on heightened significance, albeit often entailing considerable demands on computational time and memory. While a multitude of existing works offer optimal subsampling methods for conducting analyses on subsamples with minimized efficiency loss, they notably lack tools for judiciously selecting the subsample size. To bridge this gap, our work introduces tools designed for choosing the subsample size. We focus on three settings: the Cox regression model for survival data with rare events, and logistic regression for both balanced and imbalanced datasets. Additionally, we present a new optimal subsampling procedure tailored to logistic regression with imbalanced data. The efficacy of these tools and procedures is demonstrated through an extensive simulation study and meticulous analyses of two sizable datasets: survival analysis of UK Biobank colorectal cancer data with about 350 million rows and logistic regression of linked birth and infant death data with about 28 million observations.
{"title":"Mastering rare event analysis: subsample-size determination in Cox and logistic regressions.","authors":"Tal Agassi, Nir Keret, Malka Gorfine","doi":"10.1093/biomtc/ujaf110","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf110","url":null,"abstract":"<p><p>In the realm of contemporary data analysis, the use of massive datasets has taken on heightened significance, albeit often entailing considerable demands on computational time and memory. While a multitude of existing works offer optimal subsampling methods for conducting analyses on subsamples with minimized efficiency loss, they notably lack tools for judiciously selecting the subsample size. To bridge this gap, our work introduces tools designed for choosing the subsample size. We focus on three settings: the Cox regression model for survival data with rare events, and logistic regression for both balanced and imbalanced datasets. Additionally, we present a new optimal subsampling procedure tailored to logistic regression with imbalanced data. The efficacy of these tools and procedures is demonstrated through an extensive simulation study and meticulous analyses of two sizable datasets: survival analysis of UK Biobank colorectal cancer data with about 350 million rows and logistic regression of linked birth and infant death data with about 28 million observations.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many countries have established population-based biobanks, which are being used increasingly in epidemiological and clinical research. These biobanks offer opportunities for large-scale studies addressing questions beyond the scope of traditional clinical trials or cohort studies. However, using biobank data poses new challenges. Typically, biobank data are collected from a study cohort recruited over a defined calendar period, with subjects entering the study at various ages falling between $c_L$ and $c_U$. This work focuses on biobank data with individuals reporting disease-onset age upon recruitment, termed prevalent data, along with individuals initially recruited as healthy, and their disease onset observed during the follow-up period. We propose a novel cumulative incidence function (CIF) estimator that efficiently incorporates prevalent cases, in contrast to existing methods, providing two advantages: (1) increased efficiency and (2) CIF estimation for ages before the lower limit, $c_L$.
{"title":"Cumulative incidence function estimation using population-based biobank data.","authors":"Malka Gorfine, David M Zucker, Shoval Shoham","doi":"10.1093/biomtc/ujaf049","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf049","url":null,"abstract":"<p><p>Many countries have established population-based biobanks, which are being used increasingly in epidemiological and clinical research. These biobanks offer opportunities for large-scale studies addressing questions beyond the scope of traditional clinical trials or cohort studies. However, using biobank data poses new challenges. Typically, biobank data are collected from a study cohort recruited over a defined calendar period, with subjects entering the study at various ages falling between $c_L$ and $c_U$. This work focuses on biobank data with individuals reporting disease-onset age upon recruitment, termed prevalent data, along with individuals initially recruited as healthy, and their disease onset observed during the follow-up period. We propose a novel cumulative incidence function (CIF) estimator that efficiently incorporates prevalent cases, in contrast to existing methods, providing two advantages: (1) increased efficiency and (2) CIF estimation for ages before the lower limit, $c_L$.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144783415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clustering is widely used in biomedical research for meaningful subgroup identification. However, most existing clustering algorithms do not account for the statistical uncertainty of the resulting clusters and consequently may generate spurious clusters due to natural sampling variation. To address this problem, the Statistical Significance of Clustering (SigClust) method was developed to evaluate the significance of clusters in high-dimensional data. While SigClust has been successful in assessing clustering significance for continuous data, it is not specifically designed for discrete data, such as count data in genomics. Moreover, SigClust and its variations can suffer from reduced statistical power when applied to non-Gaussian high-dimensional data. To overcome these limitations, we propose SigClust-DEV, a method designed to evaluate the significance of clusters in count data. Through extensive simulations, we compare SigClust-DEV against other existing SigClust approaches across various count distributions and demonstrate its superior performance. Furthermore, we apply our proposed SigClust-DEV to Hydra single-cell RNA sequencing (scRNA) data and electronic health records (EHRs) of cancer patients to identify meaningful latent cell types and patient subgroups, respectively.
{"title":"Statistical significance of clustering for count data.","authors":"Yifan Dai, Di Wu, Yufeng Liu","doi":"10.1093/biomtc/ujaf120","DOIUrl":"10.1093/biomtc/ujaf120","url":null,"abstract":"<p><p>Clustering is widely used in biomedical research for meaningful subgroup identification. However, most existing clustering algorithms do not account for the statistical uncertainty of the resulting clusters and consequently may generate spurious clusters due to natural sampling variation. To address this problem, the Statistical Significance of Clustering (SigClust) method was developed to evaluate the significance of clusters in high-dimensional data. While SigClust has been successful in assessing clustering significance for continuous data, it is not specifically designed for discrete data, such as count data in genomics. Moreover, SigClust and its variations can suffer from reduced statistical power when applied to non-Gaussian high-dimensional data. To overcome these limitations, we propose SigClust-DEV, a method designed to evaluate the significance of clusters in count data. Through extensive simulations, we compare SigClust-DEV against other existing SigClust approaches across various count distributions and demonstrate its superior performance. Furthermore, we apply our proposed SigClust-DEV to Hydra single-cell RNA sequencing (scRNA) data and electronic health records (EHRs) of cancer patients to identify meaningful latent cell types and patient subgroups, respectively.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448855/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145091099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Investigators often focus on predicting extreme random effects from mixed effects models fitted to longitudinal or clustered data, and on identifying or "flagging" outliers such as poorly performing hospitals or rapidly deteriorating patients. Our recent work with Gaussian outcomes showed that weighted prediction methods can substantially reduce mean square error of prediction for extremes and substantially increase correct flagging rates compared to previous methods, while controlling the incorrect flagging rates. This paper extends the weighted prediction methods to non-Gaussian outcomes such as binary and count data. Closed-form expressions for predicted random effects and probabilities of correct and incorrect flagging are not available for the usual non-Gaussian outcomes, and the computational challenges are substantial. Therefore, our results include the development of theory to support algorithms that tune predictors that we call "self-calibrated" (which control the incorrect flagging rate using very simple flagging rules) and innovative numerical methods to calculate weighted predictors as well as to evaluate their performance. Comprehensive numerical evaluations show that the novel weighted predictors for non-Gaussian outcomes have substantially lower mean square error of prediction at the extremes and considerably higher correct flagging rates than previously proposed methods, while controlling the incorrect flagging rates. We illustrate our new methods using data on emergency room readmissions for children with asthma.
{"title":"Improved prediction and flagging of extreme random effects for non-Gaussian outcomes using weighted methods.","authors":"John Neuhaus, Charles McCulloch, Ross Boylan","doi":"10.1093/biomtc/ujaf094","DOIUrl":"10.1093/biomtc/ujaf094","url":null,"abstract":"<p><p>Investigators often focus on predicting extreme random effects from mixed effects models fitted to longitudinal or clustered data, and on identifying or \"flagging\" outliers such as poorly performing hospitals or rapidly deteriorating patients. Our recent work with Gaussian outcomes showed that weighted prediction methods can substantially reduce mean square error of prediction for extremes and substantially increase correct flagging rates compared to previous methods, while controlling the incorrect flagging rates. This paper extends the weighted prediction methods to non-Gaussian outcomes such as binary and count data. Closed-form expressions for predicted random effects and probabilities of correct and incorrect flagging are not available for the usual non-Gaussian outcomes, and the computational challenges are substantial. Therefore, our results include the development of theory to support algorithms that tune predictors that we call \"self-calibrated\" (which control the incorrect flagging rate using very simple flagging rules) and innovative numerical methods to calculate weighted predictors as well as to evaluate their performance. Comprehensive numerical evaluations show that the novel weighted predictors for non-Gaussian outcomes have substantially lower mean square error of prediction at the extremes and considerably higher correct flagging rates than previously proposed methods, while controlling the incorrect flagging rates. We illustrate our new methods using data on emergency room readmissions for children with asthma.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309285/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144741072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Assessment of multistate disease progression is commonplace in biomedical research, such as in periodontal disease (PD). However, the presence of multistate current status endpoints, where only a single snapshot of each subject's progression through disease states is available at a random inspection time after a known starting state, complicates the inferential framework. In addition, these endpoints can be clustered, and spatially associated, where a group of proximally located teeth (within subjects) may experience similar PD status, compared to those distally located. Motivated by a clinical study recording PD progression, we propose a Bayesian semiparametric accelerated failure time model with an inverse-Wishart proposal for accommodating (spatial) random effects, and flexible errors that follow a Dirichlet process mixture of Gaussians. For clinical interpretability, the systematic component of the event times is modeled using a monotone single index model, with the (unknown) link function estimated via a novel integrated basis expansion and basis coefficients endowed with constrained Gaussian process priors. In addition to establishing parameter identifiability, we present scalable computing via a combination of elliptical slice sampling, fast circulant embedding techniques, and smoothing of hard constraints, leading to straightforward estimation of parameters, and state occupation and transition probabilities. Using synthetic data, we study the finite sample properties of our Bayesian estimates and their performance under model misspecification. We also illustrate our method via application to the real clinical PD dataset.
{"title":"A monotone single index model for spatially referenced multistate current status data.","authors":"Snigdha Das, Minwoo Chae, Debdeep Pati, Dipankar Bandyopadhyay","doi":"10.1093/biomtc/ujaf105","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf105","url":null,"abstract":"<p><p>Assessment of multistate disease progression is commonplace in biomedical research, such as in periodontal disease (PD). However, the presence of multistate current status endpoints, where only a single snapshot of each subject's progression through disease states is available at a random inspection time after a known starting state, complicates the inferential framework. In addition, these endpoints can be clustered, and spatially associated, where a group of proximally located teeth (within subjects) may experience similar PD status, compared to those distally located. Motivated by a clinical study recording PD progression, we propose a Bayesian semiparametric accelerated failure time model with an inverse-Wishart proposal for accommodating (spatial) random effects, and flexible errors that follow a Dirichlet process mixture of Gaussians. For clinical interpretability, the systematic component of the event times is modeled using a monotone single index model, with the (unknown) link function estimated via a novel integrated basis expansion and basis coefficients endowed with constrained Gaussian process priors. In addition to establishing parameter identifiability, we present scalable computing via a combination of elliptical slice sampling, fast circulant embedding techniques, and smoothing of hard constraints, leading to straightforward estimation of parameters, and state occupation and transition probabilities. Using synthetic data, we study the finite sample properties of our Bayesian estimates and their performance under model misspecification. We also illustrate our method via application to the real clinical PD dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12391879/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daily deaths from an infectious disease provide a means for retrospectively inferring daily incidence, given knowledge of the infection-to-death interval distribution. Existing methods for doing so rely either on fitting simplified non-linear epidemic models to the deaths data or on spline based deconvolution approaches. The former runs the risk of introducing unintended artefacts via the model formulation, while the latter may be viewed as technically obscure, impeding uptake by practitioners. This note proposes a simple simulation based approach to inferring fatal incidence from deaths that requires minimal assumptions, is easy to understand, and allows testing of alternative hypothesized incidence trajectories. The aim is that in any future situation similar to the COVID pandemic, the method can be easily, rapidly, transparently, and uncontroversially deployed as an input to management.
{"title":"Simple simulation based reconstruction of incidence rates from death data.","authors":"Simon N Wood","doi":"10.1093/biomtc/ujaf088","DOIUrl":"10.1093/biomtc/ujaf088","url":null,"abstract":"<p><p>Daily deaths from an infectious disease provide a means for retrospectively inferring daily incidence, given knowledge of the infection-to-death interval distribution. Existing methods for doing so rely either on fitting simplified non-linear epidemic models to the deaths data or on spline based deconvolution approaches. The former runs the risk of introducing unintended artefacts via the model formulation, while the latter may be viewed as technically obscure, impeding uptake by practitioners. This note proposes a simple simulation based approach to inferring fatal incidence from deaths that requires minimal assumptions, is easy to understand, and allows testing of alternative hypothesized incidence trajectories. The aim is that in any future situation similar to the COVID pandemic, the method can be easily, rapidly, transparently, and uncontroversially deployed as an input to management.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}