Journal of Applied Statistics最新文献

A review and comparison of methods of testing for heteroskedasticity in the linear regression model. 线性回归模型中异方差检验方法的综述与比较。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-10-24 eCollection Date: 2025-01-01 DOI: 10.1080/02664763.2025.2575038

Thomas Farrar, Renette Blignaut, Retha Luus, Sarel Steel

This study reviews inferential methods for diagnosing heteroskedasticity in the linear regression model, classifying the methods into four types: deflator tests, auxiliary design tests, omnibus tests, and portmanteau tests. A Monte Carlo simulation experiment is used to compare the performance of deflator tests and the performance of auxiliary design and omnibus tests, using the metric of average excess power over size. Certain lesser-known tests (that are not included with some standard statistical software) are found to outperform better-known tests. For instance, the best-performing deflator test was the Evans-King test, and the best-performing auxiliary design and omnibus tests were Verbyla's test and the Cook-Weisberg test, and not standard methods such as White's test and the Breusch-Pagan-Koenker test.

本研究回顾了线性回归模型中诊断异方差的推理方法，并将其分为平减指数检验、辅助设计检验、综合检验和组合检验四类。采用蒙特卡罗模拟实验，以尺寸平均剩余功率为度量，比较了平压试验与辅助设计和综合试验的性能。某些不太知名的测试（不包括在一些标准统计软件中）被发现优于知名的测试。例如，效果最好的平减指数测试是Evans-King测试，效果最好的辅助设计和综合测试是Verbyla测试和Cook-Weisberg测试，而不是White测试和Breusch-Pagan-Koenker测试等标准方法。

引用次数: 0

Forecasting hourly foodservice sales during geopolitical and economical disruption using zero-inflated mixed effects models. 使用零膨胀混合效应模型预测地缘政治和经济中断期间每小时餐饮服务销售。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-07-07 eCollection Date: 2026-01-01 DOI: 10.1080/02664763.2025.2519136

Nathan A Judd, Kalliopi Mylona, Haiming Liu, Andy Hogg, Tim Butler

Accurate predictions of product sales are essential to the foodservice sector, for planning and saving of resources. In this paper, a zero-inflated negative binomial mixed-effects model with several factors was used to predict the total sales of different product categories, taking into consideration different sites, time and weather conditions. It fits quickly by maximising the ordinary Monte Carlo likelihood approximation. The model succeeded in accurate predictions with limited data where the random effects fitted well to the exogenous factors that added noise to the dataset. This enabled an improved inference from the model by reducing the variance in the estimates of fixed effects used in the interpretation of the results. This shows how statistical modelling, using less data, can improve predictions in the foodservice industry during times of volatile demand.

准确的产品销售预测对于餐饮服务部门的规划和资源节约至关重要。本文采用多因素零膨胀负二项混合效应模型，在考虑不同地点、时间和天气条件的情况下，对不同品类的总销售额进行预测。它通过最大化普通的蒙特卡洛似然近似来快速拟合。该模型成功地用有限的数据进行了准确的预测，其中随机效应很好地适应了向数据集添加噪声的外生因素。这通过减少在解释结果时使用的固定效应估计中的方差，使模型的推断得到改进。这表明，在需求不稳定的时期，统计模型如何使用较少的数据来改进食品服务行业的预测。

引用次数: 0

Robust Bayesian model averaging for linear regression models with heavy-tailed errors. 具有重尾误差的线性回归模型的鲁棒贝叶斯模型平均。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-06-11 eCollection Date: 2026-01-01 DOI: 10.1080/02664763.2025.2511938

Shamriddha De, Joyee Ghosh

Our goal is to develop a Bayesian model averaging technique in linear regression models that accommodates heavier tailed error densities than the normal distribution. Motivated by the use of the Huber loss function in the presence of outliers, the Bayesian Huberized lasso with hyperbolic errors has been proposed and recently implemented in the literature. Since the Huberized lasso cannot enforce regression coefficients to be exactly zero, we propose a Bayesian variable selection approach with spike and slab priors to address sparsity more effectively. The shapes of the hyperbolic and the Student-t density functions differ. Furthermore, the tails of a hyperbolic distribution are less heavy compared to those of a Cauchy distribution. Thus, we propose a flexible regression model with an error distribution encompassing both the hyperbolic and the Student-t family of distributions, with an unknown tail heaviness parameter, that is estimated based on the data. It is known that the limiting form of both the hyperbolic and the Student-t distributions is a normal distribution. We develop an efficient Gibbs sampler for posterior computation. Through simulation studies and analyzes of real datasets, we show that our method is competitive with various state-of-the-art methods.

我们的目标是在线性回归模型中开发贝叶斯模型平均技术，以适应比正态分布更重的尾部误差密度。由于在异常值存在的情况下使用Huber损失函数，具有双曲误差的贝叶斯Huberized套索已被提出并最近在文献中实现。由于Huberized套索不能强制回归系数完全为零，我们提出了一种带有spike和slab先验的贝叶斯变量选择方法，以更有效地解决稀疏性问题。双曲密度函数和Student-t密度函数的形状不同。此外，与柯西分布的尾部相比，双曲分布的尾部较轻。因此，我们提出了一个灵活的回归模型，其误差分布包括双曲分布和Student-t分布族，并带有未知的尾重参数，该参数是根据数据估计的。已知双曲分布和Student-t分布的极限形式都是正态分布。我们开发了一种高效的Gibbs采样器用于后验计算。通过对真实数据集的仿真研究和分析，我们证明了我们的方法与各种最先进的方法相比具有竞争力。

{"title":"Robust Bayesian model averaging for linear regression models with heavy-tailed errors.","authors":"Shamriddha De, Joyee Ghosh","doi":"10.1080/02664763.2025.2511938","DOIUrl":"https://doi.org/10.1080/02664763.2025.2511938","url":null,"abstract":"Our goal is to develop a Bayesian model averaging technique in linear regression models that accommodates heavier tailed error densities than the normal distribution. Motivated by the use of the Huber loss function in the presence of outliers, the Bayesian Huberized lasso with hyperbolic errors has been proposed and recently implemented in the literature. Since the Huberized lasso cannot enforce regression coefficients to be exactly zero, we propose a Bayesian variable selection approach with spike and slab priors to address sparsity more effectively. The shapes of the hyperbolic and the Student-t density functions differ. Furthermore, the tails of a hyperbolic distribution are less heavy compared to those of a Cauchy distribution. Thus, we propose a flexible regression model with an error distribution encompassing both the hyperbolic and the Student-t family of distributions, with an unknown tail heaviness parameter, that is estimated based on the data. It is known that the limiting form of both the hyperbolic and the Student-t distributions is a normal distribution. We develop an efficient Gibbs sampler for posterior computation. Through simulation studies and analyzes of real datasets, we show that our method is competitive with various state-of-the-art methods.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"304-330"},"PeriodicalIF":1.1,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872095/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semiparametric analysis of competing risks data with missing causes of failure and covariate measurement error. 缺少失效原因和协变量测量误差的竞争风险数据的半参数分析。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-06-10 eCollection Date: 2026-01-01 DOI: 10.1080/02664763.2025.2512965

Akurathi Jayanagasri, S Anjana

Competing risks data with missing causes of failure are common in biomedical studies. Often, competing risks data may arise with the covariates that are measured with error. In this work, we consider a semiparametric linear transformation model to deal with the combined problem of competing risks data with missing causes of failure and the covariate measurement error. We consider a set of estimating equations to obtain the estimators of the parameters involved in this linear transformation model. To handle the missing causes of failure, we employ the Inverse Probability Weight (IPW) approach, and a flexible Simulation Extrapolation (SIMEX) method is adopted as the covariate measurement error correction technique. We study the asymptotic properties of the estimators and assess the finite sample properties of the estimators by a Monte Carlo simulation study. The proposed method is illustrated using real data.

在生物医学研究中，缺少失败原因的竞争风险数据是很常见的。通常，竞争风险数据可能与测量误差的协变量一起出现。在这项工作中，我们考虑了一个半参数线性变换模型来处理具有缺失失效原因的竞争风险数据和协变量测量误差的组合问题。我们考虑了一组估计方程来得到这个线性变换模型中所涉及的参数的估计量。为了处理缺失的故障原因，我们采用了逆概率权重（IPW）方法，并采用了灵活的模拟外推（SIMEX）方法作为协变量测量误差校正技术。我们研究了估计量的渐近性质，并通过蒙特卡罗模拟研究评估了估计量的有限样本性质。用实际数据对该方法进行了说明。

引用次数: 0

Multiple outlier detection in samples with exponential & Pareto tails. 指数尾和帕累托尾样本中的多离群值检测。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-06-07 eCollection Date: 2026-01-01 DOI: 10.1080/02664763.2025.2511934

Didier Sornette, Ran Wei

We introduce two ratio-based robust test statistics, max-robust-sum (MRS) and sum-robust-sum (SRS), which compare the largest suspected outlier(s) to a trimmed partial sum of the sample. They are designed to enhance the robustness of outlier detection in samples with exponential or Pareto tails. These statistics are invariant to scale parameters and offer improved overall resistance to masking and swamping by recalibrating the denominator to reduce the influence of the largest observations. In particular, the proposed tests are shown to substantially reduce the masking problem in inward sequential testing, thereby re-establishing the inward sequential testing method - formerly relegated since the introduction of outward testing - as a competitive alternative to outward testing, without requiring multiple testing correction. The analytical null distributions of the statistics are derived, and a comprehensive comparison of the test statistics is conducted through simulation, evaluating the performance of the proposed tests in both block and sequential settings, and contrasting their performance with classical statistics across various data scenarios. In five case studies - financial crashes, nuclear power generation accidents, stock market returns, epidemic fatalities, and city sizes - significant outliers are detected and related to the concept of 'Dragon King' events, defined as meaningful outliers that arise from a unique generating mechanism.

我们引入了两个基于比率的稳健检验统计量，最大稳健和（MRS）和和稳健和（SRS），它们将最大的可疑异常值与样本的修剪过的部分和进行比较。它们的设计目的是为了增强具有指数尾或帕累托尾的样本中离群值检测的鲁棒性。这些统计数据对于缩放参数是不变的，并且通过重新校准分母来减少最大观测值的影响，从而提高了对屏蔽和淹没的总体抵抗力。特别是，所提议的测试已被证明大大减少了内向顺序测试中的掩蔽问题，从而重新建立内向顺序测试方法——自引入外向测试以来以前被降级——作为外向测试的竞争性替代方法，而不需要多次测试修正。推导了统计量的分析零分布，并通过模拟对测试统计量进行了全面比较，评估了所提出的测试在块和顺序设置下的性能，并将其性能与各种数据场景下的经典统计量进行了对比。在五个案例研究中——金融崩溃、核电事故、股市回报、流行病死亡人数和城市规模——发现了显著的异常值，并与“龙王”事件的概念相关，“龙王”事件被定义为由独特的产生机制产生的有意义的异常值。

{"title":"Multiple outlier detection in samples with exponential & Pareto tails.","authors":"Didier Sornette, Ran Wei","doi":"10.1080/02664763.2025.2511934","DOIUrl":"https://doi.org/10.1080/02664763.2025.2511934","url":null,"abstract":"We introduce two ratio-based robust test statistics, max-robust-sum (MRS) and sum-robust-sum (SRS), which compare the largest suspected outlier(s) to a trimmed partial sum of the sample. They are designed to enhance the robustness of outlier detection in samples with exponential or Pareto tails. These statistics are invariant to scale parameters and offer improved overall resistance to masking and swamping by recalibrating the denominator to reduce the influence of the largest observations. In particular, the proposed tests are shown to substantially reduce the masking problem in inward sequential testing, thereby re-establishing the inward sequential testing method - formerly relegated since the introduction of outward testing - as a competitive alternative to outward testing, without requiring multiple testing correction. The analytical null distributions of the statistics are derived, and a comprehensive comparison of the test statistics is conducted through simulation, evaluating the performance of the proposed tests in both block and sequential settings, and contrasting their performance with classical statistics across various data scenarios. In five case studies - financial crashes, nuclear power generation accidents, stock market returns, epidemic fatalities, and city sizes - significant outliers are detected and related to the concept of 'Dragon King' events, defined as meaningful outliers that arise from a unique generating mechanism.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"224-256"},"PeriodicalIF":1.1,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian federated inference for survival models. 生存模型的贝叶斯联合推理。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-06-04 eCollection Date: 2026-01-01 DOI: 10.1080/02664763.2025.2511932

Hassan Pazira, Emanuele Massa, Jetty A M Weijers, Anthony C C Coolen, Marianne A Jonker

To accurately estimate the parameters in a prediction model for survival data, sufficient events need to be observed compared to the number of model parameters. In practice, this is often a problem. Merging data sets from different medical centers may help, but this is not always possible due to strict privacy legislation and logistic difficulties. Recently, the Bayesian Federated Inference (BFI) strategy for generalized linear models was proposed. With this strategy, the statistical analyzes are performed in the local centers where the data were collected (or stored), and only the inference results are combined to a single estimated model; merging data is not necessary. The BFI methodology aims to compute from the separate inference results in the local centers what would have been obtained if the analysis had been based on the merged data sets. In the present paper, we generalize the BFI methodology as initially developed for generalized linear models to survival models. Simulation studies and real data analyzes show excellent performance; that is, the results obtained with the BFI methodology are very similar to the results obtained by analyzing the merged data. An R package for doing the analyzes is available.

为了准确地估计生存数据预测模型中的参数，需要观察到足够多的事件，而不是模型参数的数量。在实践中，这经常是一个问题。合并来自不同医疗中心的数据集可能会有所帮助，但由于严格的隐私立法和后勤困难，这并不总是可行的。近年来，针对广义线性模型提出了贝叶斯联邦推理（BFI）策略。使用这种策略，统计分析在数据收集（或存储）的局部中心进行，并且只有推断结果被合并到单个估计模型中；不需要合并数据。BFI方法旨在从本地中心的单独推断结果中计算如果基于合并数据集进行分析将获得的结果。在本文中，我们将最初为广义线性模型开发的BFI方法推广到生存模型。仿真研究和实际数据分析显示了优异的性能；也就是说，用BFI方法得到的结果与分析合并数据得到的结果非常相似。可以使用一个R包来进行分析。

{"title":"Bayesian federated inference for survival models.","authors":"Hassan Pazira, Emanuele Massa, Jetty A M Weijers, Anthony C C Coolen, Marianne A Jonker","doi":"10.1080/02664763.2025.2511932","DOIUrl":"10.1080/02664763.2025.2511932","url":null,"abstract":"To accurately estimate the parameters in a prediction model for survival data, sufficient events need to be observed compared to the number of model parameters. In practice, this is often a problem. Merging data sets from different medical centers may help, but this is not always possible due to strict privacy legislation and logistic difficulties. Recently, the Bayesian Federated Inference (BFI) strategy for generalized linear models was proposed. With this strategy, the statistical analyzes are performed in the local centers where the data were collected (or stored), and only the inference results are combined to a single estimated model; merging data is not necessary. The BFI methodology aims to compute from the separate inference results in the local centers what would have been obtained if the analysis had been based on the merged data sets. In the present paper, we generalize the BFI methodology as initially developed for generalized linear models to survival models. Simulation studies and real data analyzes show excellent performance; that is, the results obtained with the BFI methodology are very similar to the results obtained by analyzing the merged data. An R package for doing the analyzes is available.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"203-223"},"PeriodicalIF":1.1,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872092/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A joint latent-class Bayesian model with application to ALL maintenance studies. 联合潜类贝叶斯模型及其在ALL维护研究中的应用。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-06-03 eCollection Date: 2026-01-01 DOI: 10.1080/02664763.2025.2511935

Damitri Kundu, Sevantee Basu, Manash Pratim Gogoi, Kiranmoy Das

Acute Lymphocytic Leukemia (ALL) is globally the main cause of death from blood cancer among children. While the survival rate of ALL has increased significantly in the first-world countries (e.g. in the United States) over the last 50 years the same is not the case for the developing countries. In this article, we develop a joint latent-class Bayesian model for analysing a dataset from a clinical trial conducted by the Tata Translational Cancer Research Center (TTCRC), Kolkata. The trial considers a group of children who were identified as ALL patients, and were treated with two standard drugs (i.e. 6MP and MTx) over a period of time. Three longitudinal biomarkers (i.e. lymphocyte count, neutrophil count and platelet count) were collected from the patients whenever they visited the clinic (weekly/bi-weekly). We consider a latent-class model for the lymphocyte count which is the main biomarker associated with ALL, and the other two biomarkers, i.e. the neutrophil count and the platelet count are modeled using linear mixed models. The time-to-event is modeled by a semi-parametric proportional hazards model, and is linked to the longitudinal submodels by sharing the Gaussian random effects. The proposed model detects two latent classes for the lymphocyte count, and we estimate the class-specific (average) non-relapse probability at different time points of the study period. We notice a significant difference in the estimated non-relapse probabilities between the two latent classes. Through simulation study we illustrate the accuracy and practical usefulness of the proposed joint latent-class model over the traditional models.

急性淋巴细胞白血病（ALL）是全球儿童血癌死亡的主要原因。虽然在过去的50年里，第一世界国家（例如美国）ALL的存活率显著提高，但发展中国家的情况并非如此。在本文中，我们开发了一个联合潜在类贝叶斯模型，用于分析加尔各答塔塔转化癌症研究中心（TTCRC）进行的临床试验的数据集。该试验考虑了一组被确定为ALL患者的儿童，并在一段时间内使用两种标准药物（即6MP和MTx）进行治疗。三种纵向生物标志物（即淋巴细胞计数、中性粒细胞计数和血小板计数）在患者就诊时（每周/每两周）收集。我们考虑了淋巴细胞计数的潜在类模型，这是与ALL相关的主要生物标志物，而其他两个生物标志物，即中性粒细胞计数和血小板计数使用线性混合模型建模。事件时间模型采用半参数比例风险模型，并通过共享高斯随机效应与纵向子模型相联系。所提出的模型检测淋巴细胞计数的两种潜在类别，并在研究期间的不同时间点估计类别特异性（平均）不复发概率。我们注意到两个潜在类别之间估计的非复发概率有显著差异。通过仿真研究，我们证明了所提出的联合潜类模型相对于传统模型的准确性和实用性。

{"title":"A joint latent-class Bayesian model with application to ALL maintenance studies.","authors":"Damitri Kundu, Sevantee Basu, Manash Pratim Gogoi, Kiranmoy Das","doi":"10.1080/02664763.2025.2511935","DOIUrl":"https://doi.org/10.1080/02664763.2025.2511935","url":null,"abstract":"Acute Lymphocytic Leukemia (ALL) is globally the main cause of death from blood cancer among children. While the survival rate of ALL has increased significantly in the first-world countries (e.g. in the United States) over the last 50 years the same is not the case for the developing countries. In this article, we develop a joint latent-class Bayesian model for analysing a dataset from a clinical trial conducted by the Tata Translational Cancer Research Center (TTCRC), Kolkata. The trial considers a group of children who were identified as ALL patients, and were treated with two standard drugs (i.e. 6MP and MTx) over a period of time. Three longitudinal biomarkers (i.e. lymphocyte count, neutrophil count and platelet count) were collected from the patients whenever they visited the clinic (weekly/bi-weekly). We consider a latent-class model for the lymphocyte count which is the main biomarker associated with ALL, and the other two biomarkers, i.e. the neutrophil count and the platelet count are modeled using linear mixed models. The time-to-event is modeled by a semi-parametric proportional hazards model, and is linked to the longitudinal submodels by sharing the Gaussian random effects. The proposed model detects two latent classes for the lymphocyte count, and we estimate the class-specific (average) non-relapse probability at different time points of the study period. We notice a significant difference in the estimated non-relapse probabilities between the two latent classes. Through simulation study we illustrate the accuracy and practical usefulness of the proposed joint latent-class model over the traditional models.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"257-273"},"PeriodicalIF":1.1,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872086/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Insurance risk analysis using tempered stable subordinator. 基于调质稳定从属关系的保险风险分析。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-05-31 eCollection Date: 2026-01-01 DOI: 10.1080/02664763.2025.2512967

Tuğba Aktaş Aslan, Başak Bulut Karageyik

Effective risk management in actuarial science requires precise modeling of claim severity, particularly for heavy-tailed distributions that capture extreme losses. This study investigates the applicability of the Tempered Stable Subordinator (TSS) distribution, a subclass of heavy-tailed distributions, as a robust tool for modeling claim severity in insurance portfolios. To evaluate its practical relevance, the TSS distribution's performance is compared to the widely utilized Gamma and Inverse Gaussian (IG) distributions, and their relative strengths in premium pricing are assessed using the Esscher transformation method. Premiums are calculated for each distribution, and their comparative advantages in the context of heavy-tailed risks are analyzed. Additionally, key risk measures such as Value at Risk (VaR) and Conditional Tail Expectation (CTE) are computed to evaluate the ability of each distribution to capture tail risk effectively. The findings reveal that the TSS distribution provides more flexibility and precision in modeling extreme insurance claims, positioning it as a valuable tool in actuarial risk management and premium pricing strategies.

精算科学中有效的风险管理需要精确的索赔严重程度建模，特别是对于捕获极端损失的重尾分布。本文研究了重尾分布的一个子类——缓变稳定次级分布（TSS）作为保险组合索赔严重程度建模的稳健工具的适用性。为了评估其实际相关性，将TSS分布的性能与广泛使用的Gamma和逆高斯（IG）分布进行了比较，并使用Esscher变换方法评估了它们在溢价定价方面的相对优势。计算了每种分布的保费，并分析了它们在重尾风险背景下的比较优势。此外，计算了风险值（VaR）和条件尾部期望（CTE）等关键风险度量，以评估每个分布有效捕获尾部风险的能力。研究结果表明，TSS分布为极端保险索赔建模提供了更大的灵活性和准确性，使其成为精算风险管理和保费定价策略的有价值工具。

{"title":"Insurance risk analysis using tempered stable subordinator.","authors":"Tuğba Aktaş Aslan, Başak Bulut Karageyik","doi":"10.1080/02664763.2025.2512967","DOIUrl":"https://doi.org/10.1080/02664763.2025.2512967","url":null,"abstract":"Effective risk management in actuarial science requires precise modeling of claim severity, particularly for heavy-tailed distributions that capture extreme losses. This study investigates the applicability of the Tempered Stable Subordinator (TSS) distribution, a subclass of heavy-tailed distributions, as a robust tool for modeling claim severity in insurance portfolios. To evaluate its practical relevance, the TSS distribution's performance is compared to the widely utilized Gamma and Inverse Gaussian (IG) distributions, and their relative strengths in premium pricing are assessed using the Esscher transformation method. Premiums are calculated for each distribution, and their comparative advantages in the context of heavy-tailed risks are analyzed. Additionally, key risk measures such as Value at Risk (VaR) and Conditional Tail Expectation (CTE) are computed to evaluate the ability of each distribution to capture tail risk effectively. The findings reveal that the TSS distribution provides more flexibility and precision in modeling extreme insurance claims, positioning it as a valuable tool in actuarial risk management and premium pricing strategies.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"356-371"},"PeriodicalIF":1.1,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating wildfire ignition probabilities with geographic weighted logistic regression. 用地理加权逻辑回归估计野火着火概率。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-05-31 eCollection Date: 2026-01-01 DOI: 10.1080/02664763.2025.2511937

Marco Marto, Sarah Santos, António Vieira, António Bento-Gonçalves, Filipe Alvelos

Ignition probabilities play an important role in wildfire-related decision-making and can be included in quantitative approaches for risk management, fuel management and in prepositioning of firefighting resources. We are studying an area around the municipality of Baião in northern Portugal, which frequently experiences fires during the Portuguese fire season. This study can help firefighting authorities identify areas prone to fire and assist them in combating fire occurrences. We estimate fire ignition probabilities using a GWLR model with an exponential kernel, as well as logit and probit link functions. The independent variables used are the population density, the distance to roads, the altitude, the land use (proportion of forest), and the spectral index NDMI (Normalized Difference Moisture Index) from LANDSAT 8. The dependent variable is binary and takes the value 1 if there has been at least one wildfire ignition in a hexagon around each grid point for the decade 2011-2020. Using stratified sampling proportional to the dependent variable values, a training set (70%) and a test set were generated. The results were evaluated with accuracy, an area under the ROC curve, precision, recall, specificity, balanced accuracy and F1. They reveal useful application models, considering the existing reference models for Portugal.

着火概率在野火相关决策中发挥着重要作用，可以纳入风险管理、燃料管理和消防资源预先部署的定量方法中。我们正在研究葡萄牙北部拜奥市附近的一个地区，该地区在葡萄牙火灾季节经常发生火灾。这项研究可以帮助消防当局确定容易发生火灾的区域，并协助他们扑灭火灾。我们使用具有指数核的GWLR模型以及logit和probit链接函数来估计火灾着火概率。使用的自变量是人口密度、与道路的距离、海拔、土地利用（森林比例）和LANDSAT 8的光谱指数NDMI（归一化差分湿度指数）。因变量是二进制的，如果在2011-2020年的十年中，每个网格点周围的六边形中至少有一个野火点燃，则取值1。使用与因变量值成比例的分层抽样，生成训练集（70%）和测试集。用准确度、ROC曲线下面积、精密度、召回率、特异性、平衡准确度和F1对结果进行评价。考虑到葡萄牙现有的参考模型，它们揭示了有用的应用模型。

{"title":"Estimating wildfire ignition probabilities with geographic weighted logistic regression.","authors":"Marco Marto, Sarah Santos, António Vieira, António Bento-Gonçalves, Filipe Alvelos","doi":"10.1080/02664763.2025.2511937","DOIUrl":"https://doi.org/10.1080/02664763.2025.2511937","url":null,"abstract":"Ignition probabilities play an important role in wildfire-related decision-making and can be included in quantitative approaches for risk management, fuel management and in prepositioning of firefighting resources. We are studying an area around the municipality of Baião in northern Portugal, which frequently experiences fires during the Portuguese fire season. This study can help firefighting authorities identify areas prone to fire and assist them in combating fire occurrences. We estimate fire ignition probabilities using a GWLR model with an exponential kernel, as well as logit and probit link functions. The independent variables used are the population density, the distance to roads, the altitude, the land use (proportion of forest), and the spectral index NDMI (Normalized Difference Moisture Index) from LANDSAT 8. The dependent variable is binary and takes the value 1 if there has been at least one wildfire ignition in a hexagon around each grid point for the decade 2011-2020. Using stratified sampling proportional to the dependent variable values, a training set (70%) and a test set were generated. The results were evaluated with accuracy, an area under the ROC curve, precision, recall, specificity, balanced accuracy and F1. They reveal useful application models, considering the existing reference models for Portugal.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"274-303"},"PeriodicalIF":1.1,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872103/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing personalized screening intervals for clinical biomarkers using extended joint models. 使用扩展关节模型优化临床生物标志物的个性化筛选间隔。

IF 1.1 4区数学 Q2 STATISTICS & PROBABILITY

Journal of Applied Statistics

Pub Date : 2025-05-30 eCollection Date: 2026-01-01 DOI: 10.1080/02664763.2025.2505636

Nobuhle Nokubonga Mchunu, Henry Mwambi, Tarylee Reddy, Nonhlanhla Yende-Zuma, Dimitris Rizopoulos

This research advances joint modeling and personalized scheduling for HIV and TB by incorporating censored longitudinal outcomes in multivariate joint models, providing a more flexible and accurate approach for complex data scenarios. Inspired by the SAPiT study, we deviate from standard model selection procedures by using super learning techniques to identify the optimal model for predicting future events in event-free subjects. Specifically, the Integrated Brier score and Expected Predictive Cross-Entropy (EPCE) identified the multivariate joint model with the parameterization of the area under the longitudinal profiles of CD4 count and viral load as optimal and strong predictors of death. Integrating this model with a risk-based screening strategy, we recommend extending intervals to 10.3 months for stable patients, with additional measurements every 12 months. For patients with deteriorating health, we suggest a 3.5-month interval, followed by 6.2 months, and then annual screenings. These findings refine patient care protocols and advance personalized medicine in HIV/TB co-infected individuals. Furthermore, our approach is adaptable, allowing adjustments based on patients' evolving health status. While focused on HIV/TB co-infection, this method has broader applicability, offering a promising avenue for biomarker studies across various disease populations and potential for future clinical trials and biomarker-guided therapies.

本研究通过将删减的纵向结果纳入多变量联合模型，推进了艾滋病和结核病的联合建模和个性化调度，为复杂的数据场景提供了更灵活和准确的方法。受SAPiT研究的启发，我们通过使用超级学习技术来确定无事件受试者预测未来事件的最佳模型，从而偏离标准模型选择程序。具体而言，综合Brier评分和预期预测交叉熵（EPCE）确定了CD4计数和病毒载量纵向剖面下区域参数化的多变量联合模型，作为最佳和强大的死亡预测因子。将该模型与基于风险的筛查策略相结合，我们建议将病情稳定的患者的间隔延长至10.3个月，每12个月进行额外的检查。对于健康状况恶化的患者，我们建议间隔3.5个月，然后是6.2个月，然后每年筛查一次。这些发现改进了患者护理方案，并推进了艾滋病毒/结核病合并感染者的个性化医疗。此外，我们的方法是适应性强的，允许根据患者不断变化的健康状况进行调整。虽然该方法侧重于艾滋病毒/结核病合并感染，但它具有更广泛的适用性，为各种疾病人群的生物标志物研究提供了一条有希望的途径，并为未来的临床试验和生物标志物指导疗法提供了潜力。

{"title":"Optimizing personalized screening intervals for clinical biomarkers using extended joint models.","authors":"Nobuhle Nokubonga Mchunu, Henry Mwambi, Tarylee Reddy, Nonhlanhla Yende-Zuma, Dimitris Rizopoulos","doi":"10.1080/02664763.2025.2505636","DOIUrl":"10.1080/02664763.2025.2505636","url":null,"abstract":"This research advances joint modeling and personalized scheduling for HIV and TB by incorporating censored longitudinal outcomes in multivariate joint models, providing a more flexible and accurate approach for complex data scenarios. Inspired by the SAPiT study, we deviate from standard model selection procedures by using super learning techniques to identify the optimal model for predicting future events in event-free subjects. Specifically, the Integrated Brier score and Expected Predictive Cross-Entropy (EPCE) identified the multivariate joint model with the parameterization of the area under the longitudinal profiles of CD4 count and viral load as optimal and strong predictors of death. Integrating this model with a risk-based screening strategy, we recommend extending intervals to 10.3 months for stable patients, with additional measurements every 12 months. For patients with deteriorating health, we suggest a 3.5-month interval, followed by 6.2 months, and then annual screenings. These findings refine patient care protocols and advance personalized medicine in HIV/TB co-infected individuals. Furthermore, our approach is adaptable, allowing adjustments based on patients' evolving health status. While focused on HIV/TB co-infection, this method has broader applicability, offering a promising avenue for biomarker studies across various disease populations and potential for future clinical trials and biomarker-guided therapies.","PeriodicalId":15239,"journal":{"name":"Journal of Applied Statistics","volume":"53 2","pages":"171-202"},"PeriodicalIF":1.1,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12872093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146125176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0