Pub Date : 2025-10-24eCollection Date: 2025-01-01DOI: 10.1080/02664763.2025.2575038
Thomas Farrar, Renette Blignaut, Retha Luus, Sarel Steel
This study reviews inferential methods for diagnosing heteroskedasticity in the linear regression model, classifying the methods into four types: deflator tests, auxiliary design tests, omnibus tests, and portmanteau tests. A Monte Carlo simulation experiment is used to compare the performance of deflator tests and the performance of auxiliary design and omnibus tests, using the metric of average excess power over size. Certain lesser-known tests (that are not included with some standard statistical software) are found to outperform better-known tests. For instance, the best-performing deflator test was the Evans-King test, and the best-performing auxiliary design and omnibus tests were Verbyla's test and the Cook-Weisberg test, and not standard methods such as White's test and the Breusch-Pagan-Koenker test.
{"title":"A review and comparison of methods of testing for heteroskedasticity in the linear regression model.","authors":"Thomas Farrar, Renette Blignaut, Retha Luus, Sarel Steel","doi":"10.1080/02664763.2025.2575038","DOIUrl":"https://doi.org/10.1080/02664763.2025.2575038","url":null,"abstract":"<p><p>This study reviews inferential methods for diagnosing heteroskedasticity in the linear regression model, classifying the methods into four types: deflator tests, auxiliary design tests, omnibus tests, and portmanteau tests. A Monte Carlo simulation experiment is used to compare the performance of deflator tests and the performance of auxiliary design and omnibus tests, using the metric of average excess power over size. Certain lesser-known tests (that are not included with some standard statistical software) are found to outperform better-known tests. For instance, the best-performing deflator test was the Evans-King test, and the best-performing auxiliary design and omnibus tests were Verbyla's test and the Cook-Weisberg test, and not standard methods such as White's test and the Breusch-Pagan-Koenker test.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"52 16","pages":"3121-3150"},"PeriodicalIF":1.1,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12683758/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145714436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-07eCollection Date: 2026-01-01DOI: 10.1080/02664763.2025.2519136
Nathan A Judd, Kalliopi Mylona, Haiming Liu, Andy Hogg, Tim Butler
Accurate predictions of product sales are essential to the foodservice sector, for planning and saving of resources. In this paper, a zero-inflated negative binomial mixed-effects model with several factors was used to predict the total sales of different product categories, taking into consideration different sites, time and weather conditions. It fits quickly by maximising the ordinary Monte Carlo likelihood approximation. The model succeeded in accurate predictions with limited data where the random effects fitted well to the exogenous factors that added noise to the dataset. This enabled an improved inference from the model by reducing the variance in the estimates of fixed effects used in the interpretation of the results. This shows how statistical modelling, using less data, can improve predictions in the foodservice industry during times of volatile demand.
{"title":"Forecasting hourly foodservice sales during geopolitical and economical disruption using zero-inflated mixed effects models.","authors":"Nathan A Judd, Kalliopi Mylona, Haiming Liu, Andy Hogg, Tim Butler","doi":"10.1080/02664763.2025.2519136","DOIUrl":"10.1080/02664763.2025.2519136","url":null,"abstract":"<p><p>Accurate predictions of product sales are essential to the foodservice sector, for planning and saving of resources. In this paper, a zero-inflated negative binomial mixed-effects model with several factors was used to predict the total sales of different product categories, taking into consideration different sites, time and weather conditions. It fits quickly by maximising the ordinary Monte Carlo likelihood approximation. The model succeeded in accurate predictions with limited data where the random effects fitted well to the exogenous factors that added noise to the dataset. This enabled an improved inference from the model by reducing the variance in the estimates of fixed effects used in the interpretation of the results. This shows how statistical modelling, using less data, can improve predictions in the foodservice industry during times of volatile demand.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"372-390"},"PeriodicalIF":1.1,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872089/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-11eCollection Date: 2026-01-01DOI: 10.1080/02664763.2025.2511938
Shamriddha De, Joyee Ghosh
Our goal is to develop a Bayesian model averaging technique in linear regression models that accommodates heavier tailed error densities than the normal distribution. Motivated by the use of the Huber loss function in the presence of outliers, the Bayesian Huberized lasso with hyperbolic errors has been proposed and recently implemented in the literature. Since the Huberized lasso cannot enforce regression coefficients to be exactly zero, we propose a Bayesian variable selection approach with spike and slab priors to address sparsity more effectively. The shapes of the hyperbolic and the Student-t density functions differ. Furthermore, the tails of a hyperbolic distribution are less heavy compared to those of a Cauchy distribution. Thus, we propose a flexible regression model with an error distribution encompassing both the hyperbolic and the Student-t family of distributions, with an unknown tail heaviness parameter, that is estimated based on the data. It is known that the limiting form of both the hyperbolic and the Student-t distributions is a normal distribution. We develop an efficient Gibbs sampler for posterior computation. Through simulation studies and analyzes of real datasets, we show that our method is competitive with various state-of-the-art methods.
{"title":"Robust Bayesian model averaging for linear regression models with heavy-tailed errors.","authors":"Shamriddha De, Joyee Ghosh","doi":"10.1080/02664763.2025.2511938","DOIUrl":"https://doi.org/10.1080/02664763.2025.2511938","url":null,"abstract":"<p><p>Our goal is to develop a Bayesian model averaging technique in linear regression models that accommodates heavier tailed error densities than the normal distribution. Motivated by the use of the Huber loss function in the presence of outliers, the Bayesian Huberized lasso with hyperbolic errors has been proposed and recently implemented in the literature. Since the Huberized lasso cannot enforce regression coefficients to be exactly zero, we propose a Bayesian variable selection approach with spike and slab priors to address sparsity more effectively. The shapes of the hyperbolic and the Student-<i>t</i> density functions differ. Furthermore, the tails of a hyperbolic distribution are less heavy compared to those of a Cauchy distribution. Thus, we propose a flexible regression model with an error distribution encompassing both the hyperbolic and the Student-<i>t</i> family of distributions, with an unknown tail heaviness parameter, that is estimated based on the data. It is known that the limiting form of both the hyperbolic and the Student-<i>t</i> distributions is a normal distribution. We develop an efficient Gibbs sampler for posterior computation. Through simulation studies and analyzes of real datasets, we show that our method is competitive with various state-of-the-art methods.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"304-330"},"PeriodicalIF":1.1,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-10eCollection Date: 2026-01-01DOI: 10.1080/02664763.2025.2512965
Akurathi Jayanagasri, S Anjana
Competing risks data with missing causes of failure are common in biomedical studies. Often, competing risks data may arise with the covariates that are measured with error. In this work, we consider a semiparametric linear transformation model to deal with the combined problem of competing risks data with missing causes of failure and the covariate measurement error. We consider a set of estimating equations to obtain the estimators of the parameters involved in this linear transformation model. To handle the missing causes of failure, we employ the Inverse Probability Weight (IPW) approach, and a flexible Simulation Extrapolation (SIMEX) method is adopted as the covariate measurement error correction technique. We study the asymptotic properties of the estimators and assess the finite sample properties of the estimators by a Monte Carlo simulation study. The proposed method is illustrated using real data.
{"title":"Semiparametric analysis of competing risks data with missing causes of failure and covariate measurement error.","authors":"Akurathi Jayanagasri, S Anjana","doi":"10.1080/02664763.2025.2512965","DOIUrl":"https://doi.org/10.1080/02664763.2025.2512965","url":null,"abstract":"<p><p>Competing risks data with missing causes of failure are common in biomedical studies. Often, competing risks data may arise with the covariates that are measured with error. In this work, we consider a semiparametric linear transformation model to deal with the combined problem of competing risks data with missing causes of failure and the covariate measurement error. We consider a set of estimating equations to obtain the estimators of the parameters involved in this linear transformation model. To handle the missing causes of failure, we employ the Inverse Probability Weight (IPW) approach, and a flexible Simulation Extrapolation (SIMEX) method is adopted as the covariate measurement error correction technique. We study the asymptotic properties of the estimators and assess the finite sample properties of the estimators by a Monte Carlo simulation study. The proposed method is illustrated using real data.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"331-355"},"PeriodicalIF":1.1,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872102/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-07eCollection Date: 2026-01-01DOI: 10.1080/02664763.2025.2511934
Didier Sornette, Ran Wei
We introduce two ratio-based robust test statistics, max-robust-sum (MRS) and sum-robust-sum (SRS), which compare the largest suspected outlier(s) to a trimmed partial sum of the sample. They are designed to enhance the robustness of outlier detection in samples with exponential or Pareto tails. These statistics are invariant to scale parameters and offer improved overall resistance to masking and swamping by recalibrating the denominator to reduce the influence of the largest observations. In particular, the proposed tests are shown to substantially reduce the masking problem in inward sequential testing, thereby re-establishing the inward sequential testing method - formerly relegated since the introduction of outward testing - as a competitive alternative to outward testing, without requiring multiple testing correction. The analytical null distributions of the statistics are derived, and a comprehensive comparison of the test statistics is conducted through simulation, evaluating the performance of the proposed tests in both block and sequential settings, and contrasting their performance with classical statistics across various data scenarios. In five case studies - financial crashes, nuclear power generation accidents, stock market returns, epidemic fatalities, and city sizes - significant outliers are detected and related to the concept of 'Dragon King' events, defined as meaningful outliers that arise from a unique generating mechanism.
{"title":"Multiple outlier detection in samples with exponential & Pareto tails.","authors":"Didier Sornette, Ran Wei","doi":"10.1080/02664763.2025.2511934","DOIUrl":"https://doi.org/10.1080/02664763.2025.2511934","url":null,"abstract":"<p><p>We introduce two ratio-based robust test statistics, <i>max-robust-sum</i> (MRS) and <i>sum-robust-sum</i> (SRS), which compare the largest suspected outlier(s) to a trimmed partial sum of the sample. They are designed to enhance the robustness of outlier detection in samples with exponential or Pareto tails. These statistics are invariant to scale parameters and offer improved overall resistance to masking and swamping by recalibrating the denominator to reduce the influence of the largest observations. In particular, the proposed tests are shown to substantially reduce the masking problem in inward sequential testing, thereby re-establishing the inward sequential testing method - formerly relegated since the introduction of outward testing - as a competitive alternative to outward testing, without requiring multiple testing correction. The analytical null distributions of the statistics are derived, and a comprehensive comparison of the test statistics is conducted through simulation, evaluating the performance of the proposed tests in both block and sequential settings, and contrasting their performance with classical statistics across various data scenarios. In five case studies - financial crashes, nuclear power generation accidents, stock market returns, epidemic fatalities, and city sizes - significant outliers are detected and related to the concept of 'Dragon King' events, defined as meaningful outliers that arise from a unique generating mechanism.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"224-256"},"PeriodicalIF":1.1,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-04eCollection Date: 2026-01-01DOI: 10.1080/02664763.2025.2511932
Hassan Pazira, Emanuele Massa, Jetty A M Weijers, Anthony C C Coolen, Marianne A Jonker
To accurately estimate the parameters in a prediction model for survival data, sufficient events need to be observed compared to the number of model parameters. In practice, this is often a problem. Merging data sets from different medical centers may help, but this is not always possible due to strict privacy legislation and logistic difficulties. Recently, the Bayesian Federated Inference (BFI) strategy for generalized linear models was proposed. With this strategy, the statistical analyzes are performed in the local centers where the data were collected (or stored), and only the inference results are combined to a single estimated model; merging data is not necessary. The BFI methodology aims to compute from the separate inference results in the local centers what would have been obtained if the analysis had been based on the merged data sets. In the present paper, we generalize the BFI methodology as initially developed for generalized linear models to survival models. Simulation studies and real data analyzes show excellent performance; that is, the results obtained with the BFI methodology are very similar to the results obtained by analyzing the merged data. An R package for doing the analyzes is available.
{"title":"Bayesian federated inference for survival models.","authors":"Hassan Pazira, Emanuele Massa, Jetty A M Weijers, Anthony C C Coolen, Marianne A Jonker","doi":"10.1080/02664763.2025.2511932","DOIUrl":"10.1080/02664763.2025.2511932","url":null,"abstract":"<p><p>To accurately estimate the parameters in a prediction model for survival data, sufficient events need to be observed compared to the number of model parameters. In practice, this is often a problem. Merging data sets from different medical centers may help, but this is not always possible due to strict privacy legislation and logistic difficulties. Recently, the Bayesian Federated Inference (BFI) strategy for generalized linear models was proposed. With this strategy, the statistical analyzes are performed in the local centers where the data were collected (or stored), and only the inference results are combined to a single estimated model; merging data is not necessary. The BFI methodology aims to compute from the separate inference results in the local centers what would have been obtained if the analysis had been based on the merged data sets. In the present paper, we generalize the BFI methodology as initially developed for generalized linear models to survival models. Simulation studies and real data analyzes show excellent performance; that is, the results obtained with the BFI methodology are very similar to the results obtained by analyzing the merged data. An R package for doing the analyzes is available.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"203-223"},"PeriodicalIF":1.1,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-03eCollection Date: 2026-01-01DOI: 10.1080/02664763.2025.2511935
Damitri Kundu, Sevantee Basu, Manash Pratim Gogoi, Kiranmoy Das
Acute Lymphocytic Leukemia (ALL) is globally the main cause of death from blood cancer among children. While the survival rate of ALL has increased significantly in the first-world countries (e.g. in the United States) over the last 50 years the same is not the case for the developing countries. In this article, we develop a joint latent-class Bayesian model for analysing a dataset from a clinical trial conducted by the Tata Translational Cancer Research Center (TTCRC), Kolkata. The trial considers a group of children who were identified as ALL patients, and were treated with two standard drugs (i.e. 6MP and MTx) over a period of time. Three longitudinal biomarkers (i.e. lymphocyte count, neutrophil count and platelet count) were collected from the patients whenever they visited the clinic (weekly/bi-weekly). We consider a latent-class model for the lymphocyte count which is the main biomarker associated with ALL, and the other two biomarkers, i.e. the neutrophil count and the platelet count are modeled using linear mixed models. The time-to-event is modeled by a semi-parametric proportional hazards model, and is linked to the longitudinal submodels by sharing the Gaussian random effects. The proposed model detects two latent classes for the lymphocyte count, and we estimate the class-specific (average) non-relapse probability at different time points of the study period. We notice a significant difference in the estimated non-relapse probabilities between the two latent classes. Through simulation study we illustrate the accuracy and practical usefulness of the proposed joint latent-class model over the traditional models.
{"title":"A joint latent-class Bayesian model with application to ALL maintenance studies.","authors":"Damitri Kundu, Sevantee Basu, Manash Pratim Gogoi, Kiranmoy Das","doi":"10.1080/02664763.2025.2511935","DOIUrl":"https://doi.org/10.1080/02664763.2025.2511935","url":null,"abstract":"<p><p>Acute Lymphocytic Leukemia (ALL) is globally the main cause of death from blood cancer among children. While the survival rate of ALL has increased significantly in the first-world countries (e.g. in the United States) over the last 50 years the same is not the case for the developing countries. In this article, we develop a joint latent-class Bayesian model for analysing a dataset from a clinical trial conducted by the Tata Translational Cancer Research Center (TTCRC), Kolkata. The trial considers a group of children who were identified as ALL patients, and were treated with two standard drugs (i.e. 6MP and MTx) over a period of time. Three longitudinal biomarkers (i.e. lymphocyte count, neutrophil count and platelet count) were collected from the patients whenever they visited the clinic (weekly/bi-weekly). We consider a latent-class model for the lymphocyte count which is the main biomarker associated with ALL, and the other two biomarkers, i.e. the neutrophil count and the platelet count are modeled using linear mixed models. The time-to-event is modeled by a semi-parametric proportional hazards model, and is linked to the longitudinal submodels by sharing the Gaussian random effects. The proposed model detects two latent classes for the lymphocyte count, and we estimate the class-specific (average) non-relapse probability at different time points of the study period. We notice a significant difference in the estimated non-relapse probabilities between the two latent classes. Through simulation study we illustrate the accuracy and practical usefulness of the proposed joint latent-class model over the traditional models.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"257-273"},"PeriodicalIF":1.1,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-31eCollection Date: 2026-01-01DOI: 10.1080/02664763.2025.2512967
Tuğba Aktaş Aslan, Başak Bulut Karageyik
Effective risk management in actuarial science requires precise modeling of claim severity, particularly for heavy-tailed distributions that capture extreme losses. This study investigates the applicability of the Tempered Stable Subordinator (TSS) distribution, a subclass of heavy-tailed distributions, as a robust tool for modeling claim severity in insurance portfolios. To evaluate its practical relevance, the TSS distribution's performance is compared to the widely utilized Gamma and Inverse Gaussian (IG) distributions, and their relative strengths in premium pricing are assessed using the Esscher transformation method. Premiums are calculated for each distribution, and their comparative advantages in the context of heavy-tailed risks are analyzed. Additionally, key risk measures such as Value at Risk (VaR) and Conditional Tail Expectation (CTE) are computed to evaluate the ability of each distribution to capture tail risk effectively. The findings reveal that the TSS distribution provides more flexibility and precision in modeling extreme insurance claims, positioning it as a valuable tool in actuarial risk management and premium pricing strategies.
{"title":"Insurance risk analysis using tempered stable subordinator.","authors":"Tuğba Aktaş Aslan, Başak Bulut Karageyik","doi":"10.1080/02664763.2025.2512967","DOIUrl":"https://doi.org/10.1080/02664763.2025.2512967","url":null,"abstract":"<p><p>Effective risk management in actuarial science requires precise modeling of claim severity, particularly for heavy-tailed distributions that capture extreme losses. This study investigates the applicability of the Tempered Stable Subordinator (TSS) distribution, a subclass of heavy-tailed distributions, as a robust tool for modeling claim severity in insurance portfolios. To evaluate its practical relevance, the TSS distribution's performance is compared to the widely utilized Gamma and Inverse Gaussian (IG) distributions, and their relative strengths in premium pricing are assessed using the Esscher transformation method. Premiums are calculated for each distribution, and their comparative advantages in the context of heavy-tailed risks are analyzed. Additionally, key risk measures such as Value at Risk (VaR) and Conditional Tail Expectation (CTE) are computed to evaluate the ability of each distribution to capture tail risk effectively. The findings reveal that the TSS distribution provides more flexibility and precision in modeling extreme insurance claims, positioning it as a valuable tool in actuarial risk management and premium pricing strategies.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"356-371"},"PeriodicalIF":1.1,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-31eCollection Date: 2026-01-01DOI: 10.1080/02664763.2025.2511937
Marco Marto, Sarah Santos, António Vieira, António Bento-Gonçalves, Filipe Alvelos
Ignition probabilities play an important role in wildfire-related decision-making and can be included in quantitative approaches for risk management, fuel management and in prepositioning of firefighting resources. We are studying an area around the municipality of Baião in northern Portugal, which frequently experiences fires during the Portuguese fire season. This study can help firefighting authorities identify areas prone to fire and assist them in combating fire occurrences. We estimate fire ignition probabilities using a GWLR model with an exponential kernel, as well as logit and probit link functions. The independent variables used are the population density, the distance to roads, the altitude, the land use (proportion of forest), and the spectral index NDMI (Normalized Difference Moisture Index) from LANDSAT 8. The dependent variable is binary and takes the value 1 if there has been at least one wildfire ignition in a hexagon around each grid point for the decade 2011-2020. Using stratified sampling proportional to the dependent variable values, a training set (70%) and a test set were generated. The results were evaluated with accuracy, an area under the ROC curve, precision, recall, specificity, balanced accuracy and F1. They reveal useful application models, considering the existing reference models for Portugal.
{"title":"Estimating wildfire ignition probabilities with geographic weighted logistic regression.","authors":"Marco Marto, Sarah Santos, António Vieira, António Bento-Gonçalves, Filipe Alvelos","doi":"10.1080/02664763.2025.2511937","DOIUrl":"https://doi.org/10.1080/02664763.2025.2511937","url":null,"abstract":"<p><p>Ignition probabilities play an important role in wildfire-related decision-making and can be included in quantitative approaches for risk management, fuel management and in prepositioning of firefighting resources. We are studying an area around the municipality of Baião in northern Portugal, which frequently experiences fires during the Portuguese fire season. This study can help firefighting authorities identify areas prone to fire and assist them in combating fire occurrences. We estimate fire ignition probabilities using a GWLR model with an exponential kernel, as well as logit and probit link functions. The independent variables used are the population density, the distance to roads, the altitude, the land use (proportion of forest), and the spectral index NDMI (Normalized Difference Moisture Index) from LANDSAT 8. The dependent variable is binary and takes the value 1 if there has been at least one wildfire ignition in a hexagon around each grid point for the decade 2011-2020. Using stratified sampling proportional to the dependent variable values, a training set (70%) and a test set were generated. The results were evaluated with accuracy, an area under the ROC curve, precision, recall, specificity, balanced accuracy and F1. They reveal useful application models, considering the existing reference models for Portugal.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"274-303"},"PeriodicalIF":1.1,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872103/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This research advances joint modeling and personalized scheduling for HIV and TB by incorporating censored longitudinal outcomes in multivariate joint models, providing a more flexible and accurate approach for complex data scenarios. Inspired by the SAPiT study, we deviate from standard model selection procedures by using super learning techniques to identify the optimal model for predicting future events in event-free subjects. Specifically, the Integrated Brier score and Expected Predictive Cross-Entropy (EPCE) identified the multivariate joint model with the parameterization of the area under the longitudinal profiles of CD4 count and viral load as optimal and strong predictors of death. Integrating this model with a risk-based screening strategy, we recommend extending intervals to 10.3 months for stable patients, with additional measurements every 12 months. For patients with deteriorating health, we suggest a 3.5-month interval, followed by 6.2 months, and then annual screenings. These findings refine patient care protocols and advance personalized medicine in HIV/TB co-infected individuals. Furthermore, our approach is adaptable, allowing adjustments based on patients' evolving health status. While focused on HIV/TB co-infection, this method has broader applicability, offering a promising avenue for biomarker studies across various disease populations and potential for future clinical trials and biomarker-guided therapies.
{"title":"Optimizing personalized screening intervals for clinical biomarkers using extended joint models.","authors":"Nobuhle Nokubonga Mchunu, Henry Mwambi, Tarylee Reddy, Nonhlanhla Yende-Zuma, Dimitris Rizopoulos","doi":"10.1080/02664763.2025.2505636","DOIUrl":"10.1080/02664763.2025.2505636","url":null,"abstract":"<p><p>This research advances joint modeling and personalized scheduling for HIV and TB by incorporating censored longitudinal outcomes in multivariate joint models, providing a more flexible and accurate approach for complex data scenarios. Inspired by the SAPiT study, we deviate from standard model selection procedures by using super learning techniques to identify the optimal model for predicting future events in event-free subjects. Specifically, the Integrated Brier score and Expected Predictive Cross-Entropy (EPCE) identified the multivariate joint model with the parameterization of the area under the longitudinal profiles of CD4 count and viral load as optimal and strong predictors of death. Integrating this model with a risk-based screening strategy, we recommend extending intervals to 10.3 months for stable patients, with additional measurements every 12 months. For patients with deteriorating health, we suggest a 3.5-month interval, followed by 6.2 months, and then annual screenings. These findings refine patient care protocols and advance personalized medicine in HIV/TB co-infected individuals. Furthermore, our approach is adaptable, allowing adjustments based on patients' evolving health status. While focused on HIV/TB co-infection, this method has broader applicability, offering a promising avenue for biomarker studies across various disease populations and potential for future clinical trials and biomarker-guided therapies.</p>","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"171-202"},"PeriodicalIF":1.1,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}